Responsible AI in the wild: Lessons learned at AWS

Real-world deployment requires notions of fairness that are task relevant and responsive to the available data, recognition of unforeseen variation in the “last mile” of AI delivery, and collaboration with AI activists.

When we first joined AWS AI/ML as Amazon Scholars over three years ago, we had already been doing scientific research in the area now known as responsible AI for a while. We had authored a number of papers proposing mathematical definitions of fairness and machine learning (ML) training algorithms enforcing them, as well as methods for ensuring strong notions of privacy in trained models. We were well versed in adjacent subjects like explainability and robustness and were generally denizens of the emerging responsible-AI research community. We even wrote a general-audience book on these topics to try to explain their importance to a broader audience.

Related content
Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

So we were excited to come to AWS in 2020 to apply our expertise and methodologies to the ongoing responsible-AI efforts here — or at least, that was our mindset on arrival. But our journey has taken us somewhere quite different, somewhere more consequential and interesting than we expected. It’s not that the definitions and algorithms we knew from the research world aren’t relevant — they are — but rather that they are only one component of a complex AI workstream comprising data, models, services, enterprise customers, and end-users. It’s also a workstream in which AWS is uniquely situated due to its pioneering role in cloud computing generally and cloud AI services specifically.

Our time here has revealed to us some practical challenges of which we were previously unaware. These include diverse data modalities, “last mile” effects with customers and end-users, and the recent emergence of AI activism. Like many good interactions between industry and academia, what we’ve learned at AWS has altered our research agenda in healthy ways. In case it’s useful to anyone else trying to parse the burgeoning responsible-AI landscape (especially in the generative-AI era), we thought we’d detail some of our experiences here.

Modality matters

One of our first important practical lessons might be paraphrased as “modality matters”. By this we mean that the particular medium in which an AI service operates (such as visual images or spoken or written language) matters greatly in how we analyze and understand it from both performance and responsible-AI perspectives.

Consider specifically the desire for trained models be “fair”, or free of significant demographic bias. Much of the scientific literature on ML fairness assumes that the features used to compare performance across groups (which might include gender, race, age, and other attributes) are readily available, or can be accurately estimated, in both training and test datasets.

Related content
Two of the world’s leading experts on algorithmic bias look back at the events of the past year and reflect on what we’ve learned, what we’re still grappling with, and how far we have to go.

If this is indeed the case (as it might be for some spreadsheet-like “tabular” datasets recording things like medical or financial records, in which a person’s age and gender might be explicit columns), we can more easily test a trained model for bias. For instance, in a medical diagnosis application we might evaluate the model to make sure the error rates are approximately the same across genders. If these rates aren’t close enough, we can augment our data or retrain the model in various ways until the evaluation is passed to satisfaction.

But many cloud AI/ML services operate on data that simply does not contain explicit demographic information. Rather, these services live in entirely different modalities such as speech, natural language, and vision. Applications such as our speech recognition and transcription services take as input time series of frequencies that capture spoken utterances. Consequently, there are not direct annotations in the data of things like gender, race, or age.

But what can be more readily detected from speech data, and are also more directly related to performance, are regional dialects and accents — of which there are dozens in North American English alone. English-language speech can also feature non-native accents, influenced more by the first languages of the speakers than by the regions in which they currently live. This presents an even more diverse landscape, given the large number of first languages and the international mobility of speakers. And while spoken accents may be weakly correlated or associated with one or more ancestry groups, they are usually uninformative on things like age and gender (speakers with a Philadelphia accent may be young or old; male, female or nonbinary; etc.). Finally, the speech of even a particular person may exhibit many other sources of variation, such as situational stress and fatigue.

Regional dialects.jpeg
Data — such as regional variations in word choice and accents — may lead toward alternative notions of fairness that are more task-relevant, as with word error rates across dialects and accents.

What is the responsible-AI practitioner to do when confronted with so many different accents and other moving parts, in a task as complex as speech transcription? At AWS, our answer is to meet the task and data on their own terms, which in this case involves some heavy lifting: meticulously gathering samples from large populations of representative speakers with different accents and carefully transcribing each word. The “representative” is important here: while it might be more expedient to (for instance) gather this data from professional actors trained in diction, such data would not be typical of spoken language in the wild.

Related content
Both secure multiparty computation and differential privacy protect the privacy of data used in computation, but each has advantages in different contexts.

We also gather speech data that exhibits variability along other important dimensions, including the acoustic conditions during recording (varying amounts and types of background noise, recordings made via different mobile-phone handsets, whose microphones may vary in quality, etc.). The sheer number of combinations makes obtaining sufficient coverage challenging. (In some domains such as computer vision, coverage issues that are similar — variability across visual properties such as skin tone, lighting conditions, indoor vs. outdoor settings, and so on — have led to increased interest in synthetic data to augment human-generated data, including for fairness testing here at AWS.)

Once curated, such datasets can be used for training a transcription model that is not only good overall but also roughly equally performant across accents. And “performant” here means something more complex than in a simple prediction task; speech recognition typically uses a measure like the word error rate. On top of all the curation and annotations above, we also annotate some data by self-reported speaker demographics to make sure we’re fair not just by accent but by race and gender as well, as detailed in the service’s accompanying service card.

Our overarching point here is twofold. First, while as a society we tend to focus on dimensions such as race and gender when speaking about and assessing fairness, sometimes the data simply doesn’t permit such assessments, and it may not be a good idea to impute such dimensions to the data (for instance, by trying to infer race from speech signals). And second, in such cases the data may lead us toward alternative notions of fairness that might be more task-relevant, as with word error rates across dialects and accents.

The last mile of responsible AI

The specific properties of individuals that can or cannot (or should not) be gleaned from a particular dataset or modality are not the only things that may be out of the direct control of AI developers — especially in the era of cloud computing. As we have seen above, it’s challenging work to get coverage of everything you can anticipate. It’s even harder to anticipate everything.

The supply chain phrase “the last mile” refers to the fact that “upstream” providers of goods and products may have limited control over the “downstream” suppliers that directly connect to end-users or consumers. The emergence of cloud providers like AWS has created an AI service supply chain with its own last-mile challenges.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

AWS AI/ML provides enterprise customers with API access to services like speech transcription because many want to integrate such services into their own workflows but don’t have the resources, expertise, or interest to build them from scratch. These enterprise customers sit between the general-purpose services of a cloud provider like AWS and the final end-users of the technology. For example, a health care system might want to provide cloud speech transcription services optimized for medical vocabulary to allow doctors to take verbal notes during their patient rounds.

As diligent as we are at AWS at battle-testing our services and underlying models for state-of-the-art performance, fairness, and other responsible-AI dimensions, it is obviously impossible to anticipate all possible downstream use cases and conditions. Continuing our health care example, perhaps there is a floor of a particular hospital that has new and specialized imaging equipment that emits background noise at a specific regularity and acoustic frequency. In the likely event that these exact conditions were not represented in either the training or test data, it’s possible that overall word error rates will not only be higher but may be so differentially across accents and dialects.

Such last-mile effects can be as diverse as the enterprise customers themselves. With time and awareness of such conditions, we can use targeted training data and customer-side testing to improve downstream performance. But due to the proliferation of new use cases, it is an ever-evolving process, not one that is ever “finished”.

AI activism: from bugs to bias

It’s not only cloud customers whose last miles may present conditions that differ from those during training and testing. We live in a (healthy) era of what might be called AI activism, in which not only enterprises but individual citizens — including scientists, journalists, and members of nonprofit organizations — can obtain API or open-source access to ML services and models and perform their own evaluations on their own curated datasets. Such tests are often done to highlight weaknesses of the technology, including shortfalls in overall performance and fairness but also potential security and privacy vulnerabilities. As such, they are typically performed without the AI developer’s knowledge and may be first publicized in both research and mainstream media outlets. Indeed, we have been on the receiving end of such critical publicity in the past.

Related content
Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

To date, the dynamic between AI developers and activists has been somewhat adversarial: activists design and conduct a private experimental evaluation of a deployed AI model and report their findings in open forums, and developers are left to evaluate the claims and make any needed improvements to their technology. It is a dynamic that is somewhat reminiscent of the historical tensions between more traditional software and security developers and the ethical and unethical hacker communities, in which external parties probe software, operating systems, and other platforms for vulnerabilities and either expose them for the public good or exploit them privately for profit.

Over time the software community has developed mechanisms to alter these dynamics to be more productive than adversarial, in particular in the form of bug bounty programs. These are formal events or competitions in which software developers invite the hacker community to deliberately find vulnerabilities in their technology and offer financial or other rewards for reporting and describing them to the developers.

Bias bounties.png
In a fair-ML (“bias bounty”) competition, different teams (x-axis) focus on different demographic features (y-axis) in the dataset, indicating that crowdsourced bias mitigation can help contend with the breadth of possible sources of bias. (The darker the blue, the greater the use of the feature.)

In the last couple of years, the ideas and motivations behind bug bounties have been adopted and adapted by the AI development community, in the form of “bias bounties”. Rather than finding bugs in traditional software, participants are invited to help identify demographic or other biases in trained ML models and systems. Early versions of this idea were informal hackathons of short duration focused on finding subsets of a dataset on which a model underperformed. But more recent proposals incubated at AWS and elsewhere include variants that are more formal and algorithmic in nature. The explosion of models, interest in, and concerns about generative AI have also led to more codified and institutionalized responsible-AI methodologies such as the HELM framework for evaluating large language models.

We view these recent developments — AI developers opening up their technology and its evaluation to a wider community of stakeholders than just enterprise customers, and those stakeholders playing an active role in identifying necessary improvements in both technical and nontechnical ways — as healthy and organic, a natural outcome of the complex and evolving AI industry. Indeed, such collaborations are in keeping with our recent White House commitments to external testing and model red-teaming.

Responsible AI is neither a problem to be “solved” once and for all, nor a problem that can be isolated to a single location in the pipeline stretching from developers to their customers to end-users and society at large. Developers are certainly the first line where best practices must be established and implemented and responsible-AI principles defended. But the keys to the long-term success of the AI industry lie in community, communication, and cooperation among all those affected by it.

Related content

IN, HR, Gurugram
Our customers have immense faith in our ability to deliver packages timely and as expected. A well planned network seamlessly scales to handle millions of package movements a day. It has monitoring mechanisms that detect failures before they even happen (such as predicting network congestion, operations breakdown), and perform proactive corrective actions. When failures do happen, it has inbuilt redundancies to mitigate impact (such as determine other routes or service providers that can handle the extra load), and avoids relying on single points of failure (service provider, node, or arc). Finally, it is cost optimal, so that customers can be passed the benefit from an efficiently set up network. Amazon Shipping is hiring Applied Scientists to help improve our ability to plan and execute package movements. As an Applied Scientist in Amazon Shipping, you will work on multiple challenging machine learning problems spread across a wide spectrum of business problems. You will build ML models to help our transportation cost auditing platforms effectively audit off-manifest (discrepancies between planned and actual shipping cost). You will build models to improve the quality of financial and planning data by accurately predicting ship cost at a package level. Your models will help forecast the packages required to be pick from shipper warehouses to reduce First Mile shipping cost. Using signals from within the transportation network (such as network load, and velocity of movements derived from package scan events) and outside (such as weather signals), you will build models that predict delivery delay for every package. These models will help improve buyer experience by triggering early corrective actions, and generating proactive customer notifications. Your role will require you to demonstrate Think Big and Invent and Simplify, by refining and translating Transportation domain-related business problems into one or more Machine Learning problems. You will use techniques from a wide array of machine learning paradigms, such as supervised, unsupervised, semi-supervised and reinforcement learning. Your model choices will include, but not be limited to, linear/logistic models, tree based models, deep learning models, ensemble models, and Q-learning models. You will use techniques such as LIME and SHAP to make your models interpretable for your customers. You will employ a family of reusable modelling solutions to ensure that your ML solution scales across multiple regions (such as North America, Europe, Asia) and package movement types (such as small parcel movements and truck movements). You will partner with Applied Scientists and Research Scientists from other teams in US and India working on related business domains. Your models are expected to be of production quality, and will be directly used in production services. You will work as part of a diverse data science and engineering team comprising of other Applied Scientists, Software Development Engineers and Business Intelligence Engineers. You will participate in the Amazon ML community by authoring scientific papers and submitting them to Machine Learning conferences. You will mentor Applied Scientists and Software Development Engineers having a strong interest in ML. You will also be called upon to provide ML consultation outside your team for other problem statements. If you are excited by this charter, come join us!
US, NJ, Newark
At Audible, we believe stories have the power to transform lives. It’s why we work with some of the world’s leading creators to produce and share audio storytelling with our millions of global listeners. We are dreamers and inventors who come from a wide range of backgrounds and experiences to empower and inspire each other. Imagine your future with us. ABOUT THIS ROLE We are seeking a Data Scientist to own our causal inference infrastructure and drive sophisticated modeling that measures the incremental impact of business decisions. This role requires deep expertise in advanced causal inference methodologies—including synthetic control methods, Synthetic Difference-in-Differences (SDID), and Bayesian approaches—to design rigorous experiments, estimate long-term customer behavior effects, and translate complex analytical results into clear business recommendations. You will own the development and continuous improvement of these causal inference models while being responsible for machine learning operations at scale to ensure our organization makes data-driven decisions with confidence. At Audible, you will have an opportunity to make the best of your skillsets to both develop advanced scientific solutions and drive critical customer and business impact. You will play a key role to drive end-to-end solutions from understanding our business and business requirements, identifying opportunities from a large amount of historical data and engaging in research to solve the business problems. You'll seek to create value for both stakeholders and customers and inform findings in a clear, actionable way to managers and senior leaders. You will be at the heart of an agile and growing area at Audible. ABOUT THE TEAM Audible Data Scientists are members of a global interdisciplinary insights and research team with an integral role in the design and integration of models to automate decision making throughout the business in every country. We empower the machine learning and deep learning techniques in many areas of the business. We translate business goals into agile, insightful analytics and seek to create value for both stakeholders and customers and convey findings in a clear, actionable way to managers and senior leaders. As a Data Scientist, you will... - Design and execute geo-level randomized experiments to measure incremental impact - Apply statistical techniques to evaluate causal impact in quasi-experimental settings - Ensure experiments are statistically valid by evaluating sampling strategies, statistical power, and potential sources of bias - Develop models that estimate long-term effects from short-term experiments using machine learning - Estimate how changes in customer behavior persist and decay over time - Own and maintain the geo-testing codebase, including deployment and scalability - Implement machine learning models at scale with focus on performance optimization - Partner with stakeholders to ensure models align with real business dynamics - Engage deeply with business problems through curiosity-driven questioning and brainstorming - Translate experimental results into financial impact and investment recommendations - Analyze marginal and average revenue impacts relative to costs - Communicate complex quantitative ideas clearly to non-technical stakeholders - Demonstrate understanding of Audible's business model and customer experience ABOUT AUDIBLE Audible is the leading producer and provider of audio storytelling. We spark listeners’ imaginations, offering immersive, cinematic experiences full of inspiration and insight to enrich our customers daily lives. We are a global company with an entrepreneurial spirit. We are dreamers and inventors who are passionate about the positive impact Audible can make for our customers and our neighbors. This spirit courses throughout Audible, supporting a culture of creativity and inclusion built on our People Principles and our mission to build more equitable communities in the cities we call home.
US, WA, Bellevue
Do you enjoy solving challenging problems and driving innovations in research? Are you seeking for an environment with a group of motivated and talented scientists like yourself? Do you want to create scalable optimization models and apply machine learning techniques to guide real-world decisions? Do you want to play a key role in the future of Amazon transportation and operations? Come and join us at Amazon's Modeling and Optimization team (MOP). Key job responsibilities A Research Scientist in the Modeling and Optimization (MOP) team - provides analytical decision support to Amazon planning teams via applying advanced mathematical and statistical techniques. - collaborates effectively with Amazon internal business customers, and is their trusted partner - is proactive and autonomous in discovering and resolving business pain-points within a given scope - is able to identify a suitable level of sophistication in resolving the different business needs - is confident in leveraging existing solutions to new problems where appropriate and is independent in designing and implementing new solutions where needed - is aware of the limitations of their proposed solutions and is proactive in communicating them to the business, and advances the application of sciences towards Amazon business problems by bringing new methods, ideas, and practices to the team and scientific community. A day in the life - Your will be developing model-based optimization, simulation, and/or predictive tools to identify and evaluate opportunities to improve customer experience, network speed, cost, and efficiency of capital investment. - You will quantify the improvements resulting from the application of these tools and you will evaluate the trade-offs between potentially competing objectives. - You will develop good communication skills and ability to speak at a level appropriate for the audience, will collaborate effectively with fellow scientists, software development engineers, and product managers, and will deliver business value in a close partnership with many stakeholders from operations, finance, IT, and business leadership. About the team - At the Modeling and Optimization (MOP) team, we use mathematical optimization, algorithm design, statistics, and machine learning to improve decision-making capabilities across WW Operations and Amazon Logistics. - We focus on transportation topology, labor and resource planning for fulfillment facilities, routing science, visualization research, data science and development, and process optimization. - We create models to simulate, optimize, and control the fulfillment network with the objective of reducing cost while improving speed and reliability. - We support multiple business lanes, therefore maintain a comprehensive and objective view, coordinating solutions across organizational lines where possible.
US, WA, Bellevue
What does it take to build a foundation model that can forecast demand for hundreds of millions of products — including ones that have never been sold before? At Amazon, our Demand Forecasting team is tackling one of the most ambitious challenges in applied time series research: designing and building large-scale foundation models that generalize across an enormous and diverse catalog of products, geographies, and business contexts. This is not incremental modeling work. We are redefining what's possible in demand forecasting through novel architectures, training strategies, and data generation techniques. Our team operates at a scale that is unmatched in industry or academia. You'll design experiments across millions of products simultaneously, developing new model architectures and training methodologies that push the boundaries of what foundation models can learn from vast, heterogeneous time series data. You'll explore techniques in transfer learning, zero-shot forecasting, and synthetic data generation. The models you design here will ship to production and directly influence hundreds of millions of dollars in automated inventory decisions every week. Beyond operational impact, you'll publish your work at top-tier conferences and contribute to advancing the state of the art in time series foundation models for the broader scientific community. If you are a scientist who wants to work at the frontier of time series research, design novel solutions to problems no one else has solved at this scale, and see your research deployed to real-world impact — this is the team for you. Key job responsibilities 1. Design and implement novel deep learning architectures (e.g., Transformers, SSMs, or Graph Neural Networks) for time-series foundation models that generalize across hundreds of millions of products and diverse global contexts. 2. Drive the full development cycle - from whiteboarding new algorithmic approaches to overseeing production-scale deployments. 3. Collaborate with SDEs to build high-performance, distributed training and inference pipelines; translate complex scientific concepts into scalable, production-grade code in Python and Scala. 4. Leverage and develop agentic GenAI workflows to automate the end-to-end research cycle from synthesizing state-of-the-art literature and auto-generating experimental code to rapidly iterating on model architectures across millions of products. 5. Maintain a high bar for scientific excellence by publishing novel research in top-tier venues (e.g., NeurIPS, ICLR, KDD) and contributing to Amazon’s internal patent and science community. A day in the life No two days look the same, but most will involve a high-velocity blend of deep architectural work, distributed system design, and frontier scientific thinking at a scale you won’t find anywhere else. You might start the morning by designing a synthetic data pipeline to stress-test your foundation model. You’ll use generative techniques to simulate rare "black swan" supply chain events, ensuring your model remains robust where historical data is thin. You'll then lead a Scientific Design Review, walking senior leaders through your model’s architecture, defending your choice of loss functions with data-driven rigor. You’ll write high-performance code often paired with AI-coding assistants to handle the heavy lifting of boilerplate and unit testing. You’ll collaborate across a "Two-Pizza Team" of scientists and engineers, pushing the boundaries of research with a clear goal: contributing to work that will be published at top-tier venues (ICLR, NeurIPS) while simultaneously driving multi-million dollar automated decisions. The work is hard, the math is complex, and the tools are state-of-the-art. If you want to build the models that actually ship—this is where you do it. About the team The Demand Forecasting team sits at the heart of Amazon's supply chain, building the science that determines what products are available, when, and at what cost — for hundreds of millions of customers around the world. Our mission is to push the frontier of what's possible in large-scale time series forecasting, and to deploy that science where it creates real, measurable impact. We are a team of scientists who care deeply about both research rigor and real-world outcomes. We don't just publish — we ship. And we don't just ship — we measure, iterate, and raise the bar. Our work spans the full lifecycle: from foundational research and large-scale experimentation to production deployment and downstream impact measurement across supply chain, inventory, and financial planning.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to solve perception, reasoning, and planning to build useful enterprise agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. Key job responsibilities You will contribute directly to AI agent development in an applied research role to improve the multi-model perception and visual-reasoning abilities of our agent. Daily responsibilities including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.
US, WA, Seattle
WW Amazon Stores Finance Science (ASFS) works to leverage science and economics to drive improved financial results, foster data backed decisions, and embed science within Finance. ASFS is focused on developing products that empower controllership, improve business decisions and financial planning by understanding financial drivers, and innovate science capabilities for efficiency and scale. We are looking for a data scientist to lead high visibility initiatives for forecasting Amazon Stores' financials. You will develop new science-based forecasting methodologies and build scalable models to improve financial decision making and planning for senior leadership up to VP and SVP level. You will build new ML and statistical models from the ground up that aim to transform financial planning for Amazon Stores. We prize creative problem solvers with the ability to draw on an expansive methodological toolkit to transform financial decision-making with science. The ideal candidate combines data-science acumen with strong business judgment. You have versatile modeling skills and are comfortable owning and extracting insights from data. You are excited to learn from and alongside seasoned scientists, engineers, and business leaders. You are an excellent communicator and effectively translate technical findings into business action. Key job responsibilities Demonstrating thorough technical knowledge, effective exploratory data analysis, and model building using industry standard ML models Working with technical and non-technical stakeholders across every step of science project life cycle Collaborating with finance, product, data engineering, and software engineering teams to create production implementations for large-scale ML models Innovating by adapting new modeling techniques and procedures Presenting research results to our internal research community
US, WA, Seattle
Are you motivated to explore research in ambiguous spaces? Are you interested in conducting research that will improve the employee and manager experience at Amazon? Do you want to work on an interdisciplinary team of scientists that collaborate rather than compete? Join us at PXT Central Science! The People eXperience and Technology Central Science Team (PXTCS) uses economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. We are seeking a senior Applied Scientist with expertise in more than one or more of the following areas: machine learning, natural language processing, computational linguistics, algorithmic fairness, statistical inference, causal modeling, reinforcement learning, Bayesian methods, predictive analytics, decision theory, recommender systems, deep learning, time series modeling. In this role, you will lead and support research efforts within all aspects of the employee lifecycle: from candidate identification to recruiting, to onboarding and talent management, to leadership and development, to finally retention and brand advocacy upon exit. The ideal candidate should have strong problem-solving skills, excellent business acumen, the ability to work independently and collaboratively, and have an expertise in both science and engineering. The ideal candidate is not methods-driven, but driven by the research question at hand; in other words, they will select the appropriate method for the problem, rather than searching for questions to answer with a preferred method. The candidate will need to navigate complex and ambiguous business challenges by asking the right questions, understanding what methodologies to employ, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces.
IN, TN, Chennai
Are you excited about the digital media revolution and passionate about designing and delivering advanced analytics that directly influence the product decisions of Amazon's digital businesses. Do you see yourself as a champion of innovating on behalf of the customer by turning data insights into action? The Amazon Digital Acceleration Analytics team is looking for an analytical and technically skilled individual to join our team. In this role, you will invent, build and deploy state of the art machine-learning models and systems to enable and enhance the team's mission This role offers wide scope, autonomy, and ownership. You will work closely with software engineers & data engineers to put algorithms into practice. You should have strong business judgement, excellent written and verbal communication skills. The candidate should be willing to take on challenging initiatives and be capable of working both independently and with others as a team. Key job responsibilities We are looking for an experienced data scientist with strong foundations in mathematics, statistics & machine learning with exceptional communication and leadership skills, and a proven track record of delivery. In this role, You will Define a long-term science vision and roadmap for the team, driven fundamentally from our customers' needs, translating those directions into specific plans for engineering teams. Design and execute machine learning projects/products end-to-end: from ideation, analysis, prototyping, development, metrics, and monitoring. Drive end-to-end statistical analysis that have a high degree of ambiguity, scale, and complexity. Research and develop advanced Generative AI based solutions to solve diverse customer problems. About the team The MIDAS team operates within Amazon's Digital Analytics (DA) engineering organization, building analytics and data engineering solutions that support cross-digital teams. Our platform delivers a wide range of capabilities, including metadata discovery, data lineage, customer segmentation, compliance automation, AI-driven data access through generative AI and LLMs, and advanced data quality monitoring. Today, more than 100 Amazon business and technology teams rely on MIDAS, with over 20,000 monthly active users leveraging our mission-critical tools to drive data-driven decisions at Amazon scale.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are forming a new organization within Prime Video to redefine our operational landscape through the power of artificial intelligence. As a Applied Scientist within this initiative, you will be a technical leader helping to design and build the intelligent systems that power our vision. You will tackle complex and ambiguous problems, designing and delivering scalable and resilient agentic AI and ML solutions from the ground up. You will not only write high-quality, maintainable software and models, but also mentor other scientists, influence our technical strategy, and drive engineering best practices across the team. Your work will directly contribute to making Prime Video's operations more efficient and will set the technical foundation for years to come. We're seeking candidates with strong experience in computer vision and generative AI technologies. In this role, you'll apply cutting-edge techniques in image and video understanding, visual content generation, and multimodal AI systems to transform how Prime Video operates at scale. Key job responsibilities • Lead the design and architecture of highly scalable, available, and resilient services for our AI automation platform. • Write high-quality, maintainable, and robust code to solve complex business problems, building flexible systems without over-engineering. • Act as a technical leader and mentor for other engineers on the team, assisting with career growth and encouraging excellence. • Work through ambiguous requirements, cut through complexity, and translate business needs into scalable technical solutions. • Take ownership of the full software development lifecycle, including design, testing, deployment, and operations. • Work closely with product managers, scientists, and other engineers to build and launch new features and systems. About the team This role offers a unique opportunity to shape the future of one of Amazon's most exciting businesses through the application of AI technologies. If you're passionate about leveraging AI to drive real-world impact at massive scale, we want to hear from you.
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to push the boundaries of what's possible in robotic intelligence. As an Applied Scientist, you'll be at the forefront of developing breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive independent research initiatives in areas such as perception, manipulation, science understanding, locomotion, manipulation, sim2real transfer, multi-modal foundation models and multi-task robot learning, designing novel frameworks that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll balance innovative technical exploration with practical implementation, collaborating with platform teams to ensure your models and algorithms perform robustly in dynamic real-world environments. You'll have access to Amazon's vast computational resources, enabling you to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Drive independent research initiatives across the robotics stack, including robotics foundation models, focusing on breakthrough approaches in perception, and manipulation, for example open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Lead full-stack robotics projects from conceptualization through deployment, taking a system-level approach that integrates hardware considerations with algorithmic development, ensuring robust performance in production environments - Collaborate with platform and hardware teams to ensure seamless integration across the entire robotics stack, optimizing and scaling models for real-world applications - Contribute to the team's technical strategy and help shape our approach to next-generation robotics challenges A day in the life - Design and implement novel foundation model architectures and innovative systems and algorithms, leveraging our extensive infrastructure to prototype and evaluate at scale - Collaborate with our world-class research team to solve complex technical challenges - Lead technical initiatives from conception to deployment, working closely with robotics engineers to integrate your solutions into production systems - Participate in technical discussions and brainstorming sessions with team leaders and fellow scientists - Leverage our massive compute cluster and extensive robotics infrastructure to rapidly prototype and validate new ideas - Transform theoretical insights into practical solutions that can handle the complexities of real-world robotics applications About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.