The technology behind Amazon’s GenAI-powered shopping assistant, Rufus

Rufus leverages AWS chips Trainium and Inferentia, AWS’s elasticity and scalability, and a custom-built large language model to quickly answer shoppers’ questions.

October 4, 2024

[Editor's note: A condensed version of this blog post appeared previously in IEEE Spectrum]

"What do I need for cold-weather golf?”

“What are the differences between trail shoes and running shoes?”

“What are the best dinosaur toys for a five-year-old?”

These are some of the open-ended questions customers might ask a helpful sales associate in a brick-and-mortar store. But what about shopping online? How do customers navigate online selection?

The answer for Amazon is Rufus, a generative-AI-powered shopping assistant. Rufus helps Amazon customers make more-informed shopping decisions by answering a wide range of questions in the Amazon Shopping app, from product details and comparisons to recommendations. And it exists because of advances and innovations in AI.

My team leads the scientific and engineering effort behind Rufus. To build a helpful conversational shopping assistant, it was critical to use innovative technologies across multiple aspects of generative AI. This included building a custom large language model (LLM) specialized for shopping; employing retrieval-augmented generation (RAG) with a variety of evidence sources; leveraging reinforcement learning to improve responses; making advances in high-performance computing to improve inference efficiency and reduce latency; and implementing a new streaming architecture to support a better customer experience. We selected AWS infrastructure, including the AWS chips Trainium and Inferentia, to deliver these experiences at scale because AWS offers the most advanced, secure, reliable, and price-performant generative-AI foundations.

Building a custom LLM from scratch

Most LLMs are trained on datasets that improve their overall knowledge and capabilities, and then they are customized for a particular domain. From the beginning, our aim with Rufus was to train it primarily with shopping data — the entire Amazon catalogue, for starters, as well as customer reviews and information from community Q&A posts. Our scientists built an advanced, custom LLM that incorporated these data sources along with public information on the web, carefully curating the contribution of each data source for model training. We used the AWS service Amazon EMR — a cloud big-data platform for doing large-scale distributed data processing — to prepare the data and Amazon S3, the leading cloud storage solution, to store the data. Both services played a pivotal role in creating a secure and reliable foundation to build the custom model.

Using RAG to surface well-sourced answers

To answer the vast span of questions that could possibly be asked, Rufus needs to be able to go beyond the training data and use information it hasn’t seen before. That’s where RAG comes in: before generating a response, the LLM first selects information that may be helpful in answering the shopper’s questions.

Constantly improving through reinforcement learning

Every LLM, and every use of generative AI, is a work in progress. For Rufus to become more helpful over time, it needs to learn which responses are helpful and which can be improved. Through a process called reinforcement learning, customers can be the best source of that information. Amazon encourages customers to give Rufus feedback, letting the model know if they liked or disliked answers. Over time, Rufus learns from customer feedback and improves its responses, generating answers that better help customers shop.

Low latency and high throughput with Amazon’s AI chips

Rufus needs to be able to engage with millions of customers simultaneously without any noticeable delay. This is particularly challenging since generative-AI applications are very computationally intensive, especially at Amazon’s scale.

Streaming architecture

We want Rufus to provide the most relevant and helpful answer to any given question. Sometimes that’s a long-form text answer, but sometimes it’s short-form text, or a clickable link that helps the customer navigate the store.

Presenting the answer in a way that’s easy for the customer to interpret presents its own technical difficulties. The information needs to follow a logical flow. If we didn’t group and format things correctly, we could end up with a confusing response that’s not very helpful.