The venerable Art Institute of Chicago is now welcoming visitors again after being closed for much of last year due to the COVID-19 pandemic. On Amazon's Echo Show, however, the museum is always open, thanks to the Alexa skill Art Museum. Created using the Alexa Conversations dialogue management model, Art Museum allows people to browse more than 300 pieces of art from the institute's collection via voice commands.
Alexa Conversations, which today is now generally available to developers in the US, is the first deep learning-based dialogue manager available for development of voice skills. It uses artificial intelligence to help developers create natural, human-like voice exchanges, bridging the gap between experiences that could be built manually and the wide range of possible interactions that might happen organically.
With Art Museum, a visitor can say phrases such as, "I want to see a painting," "Bring me to sculptures from India," or "Show me another one like that," to navigate among pieces. At the same time, subtle ambient audio — that hushed sound of people milling around familiar to anyone who has spent time in a museum — lends a sense of the physical environment.
The skill, made possible by the Art Institute of Chicago's public API, won grand prize in the Alexa Skill Challenge for Alexa Conversations last fall. Customers can access the skill by saying, "Alexa, open Art Museum".
"It's an awesome experience, especially in a time when we all have to stay at home, to be able to browse through an art museum in Chicago," said Arindam Mandal, director of Dialog Services for Alexa. "This was one of the first skills that had a conversational experience for browsing through art, where you felt like you were in the museum."
An innovative way to navigate media
Art Museum developers John Gillilan and Katy Boungard initially created a prototype for the concept during a hackathon at the AWS re:Invent conference in 2018. When the Alexa Conversations challenge came up last year, they recognized the opportunity to build on the idea of exploring a catalog of cultural assets in a new way.
Based in Los Angeles, Gillilan and Boungard do consulting work with media companies to explore the creative potential of voice and more natural, conversational AI.
"Voice is often utility-focused," Gillilan said. "We both always approached voice technology with a content and media sensibility. That's what excites us about the technology."
🗣🎨 @katybow and I built a #VoiceFirst Art Museum powered by @artinstitutechi's new public API. It won Grand Prize in the Alexa Conversations Skills Challenge.
— John Gillilan (@bondad) January 27, 2021
Read more on how we made it, and how we’re paying it forward in Chicago and beyond.https://t.co/dy0tVNkPIg
Coding for voice can be deceptively complex. Take, for example, something as simple as ordering a pizza. Someone placing an order might submit two data points at once by asking for a "medium pizza with two toppings." They then might decide to revise that order by saying something like, "make that a large." When all is said and done, a developer might be accounting for thousands of dialogue paths to fulfill one pizza order.
Alexa Conversations reduces the amount of code a developer needs to write by using deep learning to extrapolate different phrasing variations and dialogue paths based on samples the developer provides. For Art Museum, this enables art collections that are dynamically built based on simple requests from users — whether or not they are familiar with the art.
"When designing Alexa skills without Alexa Conversations, you really have to map and plan for what a user might ask for at every turn,” Boungard said. "Alexa Conversations allows you the flexibility to capture that without creating specific dialogue flows."
A user could ask to see French paintings, for example, and then suddenly decide to switch things up and ask for paintings from Italy. The context management Alexa Conversations provided helped make that sort of transition seamless, Boungard said. The developers also used AWS Rekognition to pull additional descriptive tags for how people might visually describe art, such as water, or tree.
The Art Institute of Chicago welcomed the new skill. "We were excited to make our API available to the public, because we knew people would build things that we wouldn't have conceived of ourselves," said Nikhil Trivedi, the institute's director of engineering. "Katy and John's Alexa skill is one of many examples we've started to see—a tool that combines an exploration of our collection with the rich trove of audio content we have developed over the years."
The AI behind Alexa Conversations
Up until now, tool kits for voice have "institutionalized the knowledge of building experiences that are linear, and they make it really easy to achieve those linear paths. That's why when you deploy them, they don't work very well if customers deviate from those linear paths,” Mandal said.
Science innovations power Alexa Conversations dialogue management
Dialogue simulator and conversations-first modeling architecture provide ability for customers to interact with Alexa in a natural and conversational manner. Learn more.
Instead, Alexa Conversations encourages developers to work backward from the natural dialogue experience they want to create. To help with that process, Amazon has published guidelines on authoring sample dialogues, starting with creating a simple exchange and customizing from there.
"At the heart of dialogue management, which is what Alexa Conversations is all about, is looking at a sequence of utterances and interpreting what is the best intent of the user at this turn, and what action should I take?" Mandal observed.
The core of Alexa Conversations rests on a deep learning model that can interpret language without having to be trained on all possible variations of it. The model is trained through simulated human and machine dialogues, so developers don't need to bring their own training data. Instead, they provide sample dialogues, also specifying when to invoke APIs along with their required arguments, so the dialogue manager can gather the information to trigger the developer’s skill code.
Browse our collection with your Alexa device or hang a favorite artwork on your wall in Animal Crossing.
— The Art Institute of Chicago (@artinstitutechi) January 27, 2021
Our API creates new doors through which people can enter the museum—featuring metadata on over 100,000 works and information on all our exhibitions.https://t.co/a3KIB9eVxF
Alexa Conversations can "directly go from words to predicting the APIs," Mandal said. "That’s the future of authoring spoken dialog experiences with minimal developer effort."
Gillilan and Boungard said the flexibility of Alexa Conversations encourages a whole different way of thinking about how to design and build voice interactions. As Mandal noted, many developers have gotten used to thinking about voice experiences of all types in a linear way — that will change as it becomes easier to build more natural, flexible skills.
"I've worked on stuff before that is transaction-oriented where I've had to build that scaffolding by hand," Gillilan said. "Having Alexa Conversations for those projects would have made them a lot easier."
For more information on Alexa Conversations, visit the Alexa Developer blog.