Yesterday at the IEEE Winter Conference on Applications of Computer Vision (WACV), Gérard Medioni, vice president and distinguished scientist at Amazon, delivered a keynote talk about the computer vision system that enables Amazon Go, the first store that uses Just Walk Out technology to enable a checkout-free shopping experience.
Medioni was a general chair for last year’s edition of WACV, and he is general chair again for next year’s. Indeed, he’s been a general chair of the conference off and on since its founding in 1992.
As Medioni explains, the field of computer vision has a number of high-profile conferences — CVPR, ICCV, and ECCV — that accept papers on general topics. But “there was a need for some more applied work to be presented, as opposed to highly theoretical, mathematical work,” Medioni says. “I pushed for this conference.”
“The field was fairly small when I started as a student in computer vision in the early ’80s,” Medioni says. “I graduated in ’83, and there were probably 50 researchers doing computer vision worldwide at the time.”
Also in 1983, the Conference on Computer Vision and Pattern Recognition — CVPR — was held for the first time, and it has remained the flagship conference in the field. “Computer vision is an applied field,” Medioni says, but at the time, “the applications didn’t really work, except in fairly limited domains. There were essentially no companies that were based on computer vision. But as the tools became more mature and the technology became more mature, it started to make sense to have true end-to-end applications that could use computer vision.”
When WACV launched in 1992, it was focused “not on describing the technology but more describing a problem and how computer vision solves it,” Medioni says. “In fact, we had this debate as we were reviewing papers for WACV. We would get papers that were strong but separated from any application. They would say, ‘We have these great techniques, and we could be using them for XYZ.’ And we would say, No, this is not a good venue for that. What we really want is that you tell me what the problem is and tell me how computer vision solves it.”
Birds in trees
Over the years, Medioni says, the research presented at WACV, like all research in artificial intelligence, has been revolutionized by deep learning.
“I’ll give you a very interesting example,” Medioni says. “Probably in 2009, 2010, while I was at USC, I was talking to an industry partner, and he showed me a picture of a tree with birds on it. And he said, ‘Look, my management wants to know if we can detect the birds on this tree.’
“I looked at the image, and there were birds facing forward and birds facing sideways and birds with open wings, and I said, ‘No way. You’re talking about a very difficult problem here. If we knew what kind of birds they were, and if we kind of knew which way they were facing, then yes, we could probably do it. But generically, knowing that there is a bird in a tree is an extremely hard problem that we don’t know how to solve.’ Fast-forward a few years, and it’s just like, well, push a button and it will show you the bird in the tree. That was the inflection point for a very different capability that was suddenly possible.”
The title of Medioni’s keynote was “Amazon Go: A peek under the hood”, and, Medioni says, “The reason why I came to Amazon, in fact, is strictly for Amazon Go — to solve for and build a store where customers could grab what they want and just go. I had an opportunity to put into practice what I’d been teaching for 35 years, and it was a challenge I just could not pass up.”
Amazon Go uses Amazon’s Just Walk Out technology — the same types of technologies used in self-driving cars, such as computer vision, sensor fusion, and deep learning — to give customers a checkout-free shopping experience: they can take what they want from the store and leave without stopping to check out. To enter the store, customers use the Amazon Go app on their smart phones. Once customers are in the store, anything they take off the shelves is automatically added to their virtual carts. Anything put back on the shelf comes out of their virtual carts. When they’re done shopping, they’re good to go: no lines, no checkout. A little later, Amazon notifies them that their receipts are ready and charges their cards.
“If you have one person who comes into the store, any graduate student with a few months’ experience can come up with a system,” Medioni says. “The challenge is that there is a very long tail of much more complex occurrences, and in order for the system to be seamless and fully reliable, it has to deal with this long tail of complexity.”
Also at WACV, researchers from Amazon’s Performance Advertising Technology group and Alexa AI presented a paper on using generative adversarial networks to produce apparel images from textual descriptions, as a way to help shoppers on amazon.com refine their search queries.