“Robin deals with a world where things are changing all around it”

An advanced perception system, which detects and learns from its own mistakes, enables Robin robots to select individual objects from jumbled packages — at production scale.

Inside an Amazon fulfillment center, as packages roll down a conveyor, the Robin robotic arm goes to work. It dips, picks up a package, scans its, and places it on a small drive robot that routes it to the correct loading dock. By the time the drive has dropped off its package, Robin has loaded several more delivery robots.

While Robin looks a lot like other robotic arms used in industry, its vision system enables it to see and react to the world in an entirely different way.

“Most robotic arms work in a controlled environment,” explained Charles Swan, a senior manager of software development at Amazon Robotics & AI. “If they weld vehicle frames, for example, they expect the parts to be in a fixed location and follow a pre-scripted set of motions. They do not really perceive their environment.

Related content
While these systems look like other robot arms, they embed advanced technologies that will shape Amazon's robot fleet for years to come.

“Robin deals with a world where things are changing all around it. It understands what objects are there — different sized boxes, soft packages, envelopes on top of other envelopes — and decides which one it wants and grabs it. It does all these things without a human scripting each move that it makes. What Robin does is not unusual in research. But it is unusual in production.”

Yet, thanks to machine learning, Robin and its advanced perception system are moving rapidly into production. When Swan began working with the robot in 2021, Amazon was operating only a couple dozen units at its fulfillment centers. Today, Swan’s team is significantly scaling that perception system.

To reach that goal, Amazon Robotics researchers are exploring ways for Robin to achieve unparalleled levels of production accuracy. Because Amazon is so focused on improving the customer experience through timely deliveries, even 99.9% accuracy doesn’t meet the mark for robotics researchers.

Training day

Over the past five years, machine learning has significantly advanced the ability of robots to see, understand, and reason about their environment.

Robin perception testing
Model 1 from October 2021 — The model misses two black packages and one occluded package.

In the past, classical computer vision algorithms systematically segmented scenes into individual elements, a slow and computationally intensive approach. Supervised machine learning has made that process more efficient.

robinperceptiontest2.png
Model 2 from November 2021 — The black packages are detected, but a heavily occluded one is still missed.

“We don’t explicitly say how the model should learn,” said Bhavana Chandrashekhar, a software development manager at Amazon Robotics & AI. “Instead, we give it an input image and say, ‘This is an object.’ Then it tries to identify the object in the image, and we grade how well it does that. Using only that supervised feedback, the model learns how to extract features from the images so it can classify the objects in them.”

robinperceptiontest3.png
Model 3 from February 2022 — All packages are correctly detected.

Robin’s perception system started with pre-trained models that could already identify object elements like edges and planes.

Next, it was taught to identify the type of packages found within the fulfillment center’s sortation area.

Machine learning models learn best when provided with an abundance of sample images. Yet, despite shipping millions of packages daily, Chandrashekhar’s team initially found it hard to find enough training data to capture the enormous variation of the boxes and packages continuously rolling down a conveyor.

“Everything comes in a jumble of sizes and shapes, some on top of the other, some in the shadows,” Chandrashekhar said. “During the holidays, you might see pictures of Minions or Billy Eilish mixed in with our usual brown and white packages. The taping might change.

“Sometimes, the differences between one package and another are hard to see, even for humans. You might have a white envelope on another white envelope, and both are crinkled so you can’t tell where one begins and the other ends,” she explained.

To teach Robin’s model to make sense of what it sees, researchers gathered thousands of images, drew lines around features like boxes, yellow, brown and white mailers, and labels, and added descriptions. The team then used these annotated images to continually retrain the robot.

The training continued in a simulated production environment, with the robot working on a live conveyor with test packages.

Whenever Robin failed to identify an object or make a pick, the researchers would annotate the errors and add them to the training deck. This on-going training regimen significantly improved the robot’s efficiency.

Continual learning

Robin’s success rate during these tests improved markedly, but the researchers pushed for near perfection. “We want to be really good at these random edge problems, which happen only a few times during testing, but occur more often in field when we’re running at larger scale,” Chandrashekhar said.

Because of Robin’s high accuracy rate in testing, researchers found it difficult to find enough of those mistakes to create a dataset for further training. “In the beginning, we had to imagine how the robot would make a mistake in order to create the type of data we could use to improve the model,” Chandrashekhar explained.

The Amazon team also monitored Robin’s confidence in its decisions. The perception model might, for example, indicate it was confident about spotting a package, but less confident about assigning it to a specific type of package. Chandrashekhar’s team developed a framework to ensure those low-confidence images were automatically sent for annotation by a human and then added back to the training deck.

Amazon's Robin robotic arm is seen inside a facility gripping a package
While Robin looks a lot like other robotic arms used in industry, its vision system enables it to see and react to the world in an entirely different way.

“This is part of continual learning,” says Jeremy Wyatt, senior manager of applied science. “It’s incredibly powerful because every package becomes a learning opportunity. Every robot contributes experiences that helps the entire fleet get better.”

That continual learning led to big improvements. “In just six months, we halved the number of packages Robin’s perception system can’t pick and we reduced the errors the perception system makes by a factor of 10,” Wyatt notes.

Still, robots will make mistakes in production that have to be corrected. What happens in the moment if Robin drops a package or puts two mailers on one sortation robot? While most production robots are oblivious to mistakes, Robin is an exception. It monitors its performance for missteps.

Robin’s quality assurance system oversees how it handles packages. If it identifies a problem, it will try to fix it on its own, or call for human intervention if it cannot. “If Robin finds and corrects a mistake, it might lose some time,” Swan explained. “However, if that error wasn’t addressed at all, we might lose a day or two getting that product to the customer.”

Scaling Robin perception

Swan joined the Robin perception team when there were only a few dozen units in production. His goal: scale the perception system to thousands of robotic arms. To accomplish this, Swan’s team doesn’t just focus on catching and annotating errors for continual learning, it seeks the root cause of those errors.

They rely on Robin perception’s user interface, which lets engineers look through the robot’s eyes and trace how its vision system made the decision. They might, for example, find a Robin that picked up two packages because it could not distinguish one from the other, or another that failed to grab any package owing to a noisy depth signal. Auditing Robin’s decisions lets Amazon Robotics engineers fine-tune the robot’s behaviors.

This is complemented by the metrics derived from a fleet of machines sorting well over 1 million items every day. “Once you have that kind of data, then you can start to look for correlations,” Swan said. “Then you can say the latency in making a decision is related to this property of the machine or this property of the scene and that’s something we can focus on.”

Fleet metrics provide data about a greater range of scenes and problems than any one machine would ever see, from a broken light to an address label stuck on the conveyor belt. That data, used to retrain Robin every few days, gives it a much broader understanding of the world in which it works.

The Robin robotic arm sorts packages

It also helps Amazon improve efficiency. Before Robin picks up a package, it must first segment a cluttered scene, decide which package it will grab, calculate how it will approach the package, and choose how many of its eight suction cups to use to pick it up. Choose too many and it might lift more than one package; too few, and it could drop its cargo.

That decision requires much more than computer vision. “Making decisions on what and where to grasp is accomplished with a combination of learning systems, optimization, geometric reasoning, and 3D understanding,” explained Nick Hudson, principal applied scientist with Amazon Robotics AI. “There are a lot of components which interact, and they all need to accommodate the variations seen across different sites and regions.”

“There is always a tradeoff between efficiency and good decisions,” Swan continued. “That was a major scaling challenge. We did a lot of experimentation offline with very cluttered scenes and other situations that slowed the robots down to improve our algorithms. When we liked them, we would run them on a small portion of the fleet. If they did well, we would roll them out to all the robots.”

Related content
The collaboration will support research, education, and outreach efforts in areas of mutual interest, beginning with artificial intelligence and robotics.

Those rollouts were also made possible because the software was rewritten to support regular updates, said Sicong Zhao, a software development manager. “The software is modular. That way, we can upgrade one component without affecting the others. It also enables multiple groups to work on different improvements at the same time.” That modularity has enabled key parts of the perception system to be automatically retrained twice a week.

Nor was that a simple task. Robin had many tens of thousands of lines of code, so it took Zhao’s team months to understand how those lines interacted with one another well enough to modularize their components. The effort was worth it. It made Robin easier to upgrade and will ultimately enable automatic fleet updates as frequently as needed while mitigating operational disruptions.

Next-generation robot perception

Those continuous improvements are essential to deploy Robin at Amazon’s scale, Swan explained. The team’s goal is to update the fleet of Robin robots automatically several times weekly.

“We are increasing our usage of Robin,” Swan said. “To do that, we must continue to improve Robin’s ability to handle those random edge cases, so it never mis-sorts, has great motion planning, and moves at the fastest safe speed its arm can handle — all with time to spare.”

That means even more innovation. Take, for example, package recognition. Robin’s perception system needs to be able to spot a pile of packages and know to start with the top one to avoid upending the pile. “Robin has a sense of how to do that as well, but we need machine learning to accelerate the way Robin decides which one it is most likely to pick up successfully as we keep adding new types of packaging,” Zhao explained.

Related content
Scientists and engineers are developing a new generation of simulation tools accurate enough to develop and test robots virtually.

Chandrashekhar believes more powerful digital simulations, based on the physics of robot and package movement, will enable faster innovation. “This is very difficult when we’re talking about deformable packages, like a water bottle in a soft mailer,” she said. “But we’re getting a lot closer.”

Longer-term, she wants to see self-learning robots that teach themselves to make fewer mistakes and to recover from them faster. Self-learning will also make the robots easier to use. “Deploying a robot shouldn’t require a PhD,” Swan said.

We’ve only scratched the surface of what’s possible with robots.
Charles Swan

“There is a unique opportunity to have this fleet adapt automatically,” agreed Hudson. “There are open questions on how to accomplish this, including whether individual robots should adapt on their own. The fleet already updates its object understanding using data collected worldwide. How can we also have the individual robots adapt to issues they are seeing locally – for instance if one of the suction cups is blocked or torn?”

Ultimately, though, Swan would like to use what Amazon Robotics researchers have learned to create new types of robots. “We’ve only scratched the surface of what’s possible with robots,” he said.

Research areas

Related content

US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Sr. Applied Scientists with Recommender System or Search Ranking or Ads Ranking experience to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Recommendation/Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Recommendation/Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
US, WA, Seattle
Amazon's Price Perception and Evaluation team is seeking a driven Principal Applied Scientist to harness planet scale multi-modal datasets, and navigate a continuously evolving competitor landscape, in order to build and scale an advanced self-learning scientific price estimation and product understanding system, regularly generating fresh customer-relevant prices on billions of Amazon and Third Party Seller products worldwide. We are looking for a talented, organized, and customer-focused technical leader with a charter to derive deep neural product relationships, quantify substitution and complementarity effects, and publish trust-preserving probabilistic price ranges on all products listed on Amazon. This role requires an individual with excellent scientific modeling and system design skills, bar-raising business acumen, and an entrepreneurial spirit. We are looking for an experienced leader who is a self-starter comfortable with ambiguity, demonstrates strong attention to detail, and has the ability to work in a fast-paced and ever-changing environment. Key job responsibilities - Develop the team. Mentor a highly talented group of applied machine learning scientists & researchers. - See the big picture. Shape long term vision for Amazon's science-based competitive, perception-preserving pricing techniques - Build strong collaborations. Partner with product, engineering, and science teams within Pricing & Promotions to deploy machine learning price estimation and error correction solutions at Amazon scale - Stay informed. Establish mechanisms to stay up to date on latest scientific advancements in machine learning, neural networks, natural language processing, probabilistic forecasting, and multi-objective optimization techniques. Identify opportunities to apply them to relevant Pricing & Promotions business problems - Keep innovating for our customers. Foster an environment that promotes rapid experimentation, continuous learning, and incremental value delivery. - Deliver Impact. Develop, Deploy, and Scale Amazon's next generation foundational price estimation and understanding system
US, WA, Seattle
Here at Amazon, we embrace our differences. We are committed to furthering our culture of diversity and inclusion of our teams within the organization. How do you get items to customers quickly, cost-effectively, and—most importantly—safely, in less than an hour? And how do you do it in a way that can scale? Our teams of hundreds of scientists, engineers, aerospace professionals, and futurists have been working hard to do just that! We are delivering to customers, and are excited for what’s to come. Check out more information about Prime Air on the About Amazon blog (https://www.aboutamazon.com/news/transportation/amazon-prime-air-delivery-drone-reveal-photos). If you are seeking an iterative environment where you can drive innovation, apply state-of-the-art technologies to solve real world delivery challenges, and provide benefits to customers, Prime Air is the place for you. Come work on the Amazon Prime Air Team! We are seeking a highly skilled Navigation Scientist to help develop advanced algorithms and software for our Prime Air delivery drone program. In this role, you will conduct comprehensive navigation analysis to support cross-functional decision-making, define system architecture and requirements, contribute to the development of flight algorithms, and actively identify innovative technological opportunities that will drive significant enhancements to meet our customers' evolving demands. Export Control License: This position may require a deemed export control license for compliance with applicable laws and regulations. Placement is contingent on Amazon’s ability to apply for and obtain an export control license on your behalf.
IN, KA, Bengaluru
Alexa+ is Amazon’s next-generation, AI-powered virtual assistant. Building on the original Alexa, it uses generative AI to deliver a more conversational, personalized, and effective experience. As an Applied Scientist II on the Alexa Sensitive Content Intelligence (ASCI) team, you'll be part of an elite group developing industry-leading technologies in attribute extraction and sensitive content detection that work seamlessly across all languages and countries. In this role, you'll join a team of exceptional scientists pushing the boundaries of Natural Language Processing. Working in our dynamic, fast-paced environment, you'll develop novel algorithms and modeling techniques that advance the state of the art in NLP. Your innovations will directly shape how millions of customers interact with Amazon Echo, Echo Dot, Echo Show, and Fire TV devices every day. What makes this role exciting is the unique blend of scientific innovation and real-world impact. You'll be at the intersection of theoretical research and practical application, working alongside talented engineers and product managers to transform breakthrough ideas into customer-facing experiences. Your work will be crucial in ensuring Alexa remains at the forefront of AI technology while maintaining the highest standards of trust and safety. We're looking for a passionate innovator who combines strong technical expertise with creative problem-solving skills. Your deep understanding of NLP models (including LSTM and transformer-based architectures) will be essential in tackling complex challenges and identifying novel solutions. You'll leverage your exceptional technical knowledge, strong Computer Science fundamentals, and experience with large-scale distributed systems to create reliable, scalable, and high-performance products that delight our customers. Key job responsibilities In this dynamic role, you'll design and implement GenAI solutions that define the future of AI interaction. You'll pioneer novel algorithms, conduct ground breaking experiments, and optimize user experiences through innovative approaches to sensitive content detection and mitigation. Working alongside exceptional engineers and scientists, you'll transform theoretical breakthroughs into practical, scalable solutions that strengthen user trust in Alexa globally. You'll also have the opportunity to mentor rising talent, contributing to Amazon's culture of scientific excellence while helping build high-performing teams that deliver swift, impactful results. A day in the life Imagine starting your day collaborating with brilliant minds on advancing state-of-the-art NLP algorithms, then moving on to analyze experiment results that could reshape how Alexa understands and responds to users. You'll partner with cross-functional teams - from engineers to product managers - to ensure data quality, refine policies, and enhance model performance. Your expertise will guide technical discussions, shape roadmaps, and influence key platform features that require cross-team leadership. About the team The mission of the Alexa Sensitive Content Intelligence (ASCI) team is to (1) minimize negative surprises to customers caused by sensitive content, (2) detect and prevent potential brand-damaging interactions, and (3) build customer trust through appropriate interactions on sensitive topics. The term “sensitive content” includes within its scope a wide range of categories of content such as offensive content (e.g., hate speech, racist speech), profanity, content that is suitable only for certain age groups, politically polarizing content, and religiously polarizing content. The term “content” refers to any material that is exposed to customers by Alexa (including both 1P and 3P experiences) and includes text, speech, audio, and video.
IN, KA, Bengaluru
Alexa+ is Amazon’s next-generation, AI-powered virtual assistant. Building on the original Alexa, it uses generative AI to deliver a more conversational, personalized, and effective experience. As an Applied Scientist II on the Alexa Sensitive Content Intelligence (ASCI) team, you'll be part of an elite group developing industry-leading technologies in attribute extraction and sensitive content detection that work seamlessly across all languages and countries. In this role, you'll join a team of exceptional scientists pushing the boundaries of Natural Language Processing. Working in our dynamic, fast-paced environment, you'll develop novel algorithms and modeling techniques that advance the state of the art in NLP. Your innovations will directly shape how millions of customers interact with Amazon Echo, Echo Dot, Echo Show, and Fire TV devices every day. What makes this role exciting is the unique blend of scientific innovation and real-world impact. You'll be at the intersection of theoretical research and practical application, working alongside talented engineers and product managers to transform breakthrough ideas into customer-facing experiences. Your work will be crucial in ensuring Alexa remains at the forefront of AI technology while maintaining the highest standards of trust and safety. We're looking for a passionate innovator who combines strong technical expertise with creative problem-solving skills. Your deep understanding of NLP models (including LSTM and transformer-based architectures) will be essential in tackling complex challenges and identifying novel solutions. You'll leverage your exceptional technical knowledge, strong Computer Science fundamentals, and experience with large-scale distributed systems to create reliable, scalable, and high-performance products that delight our customers. Key job responsibilities In this dynamic role, you'll design and implement GenAI solutions that define the future of AI interaction. You'll pioneer novel algorithms, conduct ground breaking experiments, and optimize user experiences through innovative approaches to sensitive content detection and mitigation. Working alongside exceptional engineers and scientists, you'll transform theoretical breakthroughs into practical, scalable solutions that strengthen user trust in Alexa globally. You'll also have the opportunity to mentor rising talent, contributing to Amazon's culture of scientific excellence while helping build high-performing teams that deliver swift, impactful results. A day in the life Imagine starting your day collaborating with brilliant minds on advancing state-of-the-art NLP algorithms, then moving on to analyze experiment results that could reshape how Alexa understands and responds to users. You'll partner with cross-functional teams - from engineers to product managers - to ensure data quality, refine policies, and enhance model performance. Your expertise will guide technical discussions, shape roadmaps, and influence key platform features that require cross-team leadership. About the team The mission of the Alexa Sensitive Content Intelligence (ASCI) team is to (1) minimize negative surprises to customers caused by sensitive content, (2) detect and prevent potential brand-damaging interactions, and (3) build customer trust through appropriate interactions on sensitive topics. The term “sensitive content” includes within its scope a wide range of categories of content such as offensive content (e.g., hate speech, racist speech), profanity, content that is suitable only for certain age groups, politically polarizing content, and religiously polarizing content. The term “content” refers to any material that is exposed to customers by Alexa (including both 1P and 3P experiences) and includes text, speech, audio, and video.