Amazon Redshift re-invented research paper and photos of Rahul Pathak, vice president of analytics at AWS, and Ippokratis Pandis, AWS senior principal engineer
The "Amazon Redshift re-invented" research paper will be presented at a leading database conference next month. Two of the paper's authors, Rahul Pathak (top right), vice president of analytics at AWS, and Ippokratis Pandis (bottom right), an AWS senior principal engineer, discuss the origins of Redshift, how the system has evolved in the past decade, and where they see the service evolving in the years ahead.

Amazon Redshift: Ten years of continuous reinvention

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

Nearly ten years ago, in November 2012 at the first-ever Amazon Web Services (AWS) re:Invent, Andy Jassy, then AWS senior vice president, announced the preview of Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse. The service represented a significant leap forward from traditional on-premises data warehousing solutions, which were expensive, inflexible, and required significant human and capital resources to operate.

In a blog post on November 28, 2012, Werner Vogels, Amazon chief technical officer, highlighted the news: “Today, we are excited to announce the limited preview of Amazon Redshift, a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.”

Further in the post, Vogels added, “The result of our focus on performance has been dramatic. Amazon.com’s data warehouse team has been piloting Amazon Redshift and comparing it to their on-premise data warehouse for a range of representative queries against a two billion row data set. They saw speedups ranging from 10x – 150x!”

That’s why, on the day of the announcement, Rahul Pathak, then a senior product manager, and the entire Amazon Redshift team were confident the product would be popular.

“But we didn’t really understand how popular,” he recalls.

“At preview we asked customers to sign up and give us some indication of their data volume and workloads,” Pathak, now vice president of Relational Engines at AWS, said. “Within about three days we realized that we had ten times more demand for Redshift than we had planned for the entire first year of the service. So we scrambled right after re:Invent to accelerate our hardware orders to ensure we had enough capacity on the ground for when the product became generally available in early 2013. If we hadn’t done that preview, we would have been caught short.”

The Redshift team has been sprinting to keep apace of customer demand ever since. Today, the service is used by tens of thousands of customers to process exabytes of data daily. In June a subset of the team will present the paper “Amazon Redshift re-invented ” at a leading international forum for database researchers, practitioners, and developers, the ACM SIGMOD/PODS Conference in Philadelphia.

Related content
Amazon DynamoDB was introduced 10 years ago today; one of its key contributors reflects on its origins, and discusses the 'never-ending journey' to make DynamoDB more secure, more available and more performant.

The paper highlights four key areas where Amazon Redshift has evolved in the past decade, provides an overview of the system architecture, describes its high-performance transactional storage and compute layers, details how smart autonomics are provided, and discusses how AWS and Redshift make it easy for customers to use the best set of services to meet their needs.

Amazon Science recently connected with two of the paper’s authors, Pathak, and Ippokratis Pandis, an AWS senior principal engineer, to discuss the origins of Redshift, how the system has evolved over the past decade, and where they see the service evolving in the years ahead.

  1. Q. 

    Can you provide some background on the origin story for Redshift? What were customers seeking, and how did the initial version address those needs?

    A. 

    Rahul: We had been meeting with customers who in the years leading up to the launch of Amazon Redshift had moved just about every workload they had to the cloud except for their data warehouse. In many cases, it was the last thing they were running on premises, and they were still dealing with all of the challenges of on-premises data warehouses. They were expensive, had punitive licensing, were hard to scale, and customers couldn’t analyze all of their data. Customers told us they wanted to run data warehousing at scale in the cloud, that they didn’t want to compromise on performance or functionality, and that it had to be cost-effective enough for them to analyze all of their data.

    So, this is what we started to build, operating under the code name Cookie Monster. This was at a time when customers’ data volumes were exploding, and not just from relational databases, but from a wide variety of sources. One of our early private beta customers tried it and the results came back so fast they thought the system was broken. It was about 10 to 20 times faster than what they had been using before. Another early customer was pretty unhappy with gaps in our early functionality. When I heard about their challenges, I got in touch, understood their feedback, and incorporated it into the service before we made it generally available in February 2013. This customer soon turned into one of our biggest advocates.

    When we launched the service and announced our pricing at $1000 a terabyte per year, people just couldn’t believe we could offer a product with that much capability at such a low price point. The fact that you could provision a data warehouse in minutes instead of months also caught everyone’s attention. It was a real game-changer for this industry segment.

    Ippokratis: I was at IBM Research at the time working on database technologies there, and we recognized that providing data warehousing as a cloud service was a game changer. It was disruptive. We were working with customers’ on-premises systems where it would take us several days or weeks to resolve an issue, whereas with a cloud data warehouse like Redshift, it would take minutes. It was also apparent that the rate of innovation would accelerate in the cloud.

    In the on-premises world, it was taking months if not years to get new functionality into a software release, whereas in the cloud new capabilities could be introduced in weeks, without customers having to change a single line of code in their consuming applications. The Redshift announcement was an inflection point; I got really interested in the cloud, and cloud data warehouses, and eventually joined Amazon [Ippokratis joined the Redshift team as a principal engineer in Oct. 2015].

  2. Q. 

    How has Amazon Redshift evolved over the past decade since the launch nearly 10 years ago?

    A. 

    Ippokratis: As we highlight in the paper, the service has evolved at a rapid pace in response to customers’ needs. We focused on four main areas: 1) customers’ demand for high-performance execution of increasingly complex analytical queries; 2) our customers’ need to process more data and significantly increase the number of users who need to derive insights from that data; 3) customers’ need for us to make the system easier to use; and 4) our customers’ desire to integrate Redshift with other AWS services, and the AWS ecosystem. That’s a lot, so we’ll provide some examples across each dimension.

    Related publication
    Enterprise companies use spatial data for decision optimization and gain new insights regarding the locality of their business and services. Industries rely on efficiently combining spatial and business data from different sources, such as data warehouses, geospatial information systems, transactional systems, and data lakes, where spatial data can be found in structured or unstructured form. In this demonstration

    Offering the leading price performance has been our primary focus since Rahul first began working on what would become Redshift. From the beginning, the team has focused on making core query execution latency as low as possible so customers can run more workloads, issue more jobs into the system, and run their daily analysis. To do this, Redshift generates C++ code that is highly optimized and then sends it to the distributor in the parallel database and executes this highly optimized code. This makes Redshift unique in the way it executes queries, and it has always been the core of the service.

    We have never stopped innovating here to deliver our customers the best possible performance. Another thing that’s been interesting to me is that in the traditional business intelligence (BI) world, you optimize your system for very long-running jobs. But as we observe the behavior of our customers in aggregate, what’s surprising is that 90 percent of our queries among the billions we run daily in our service execute in less than one second. That’s not what people had traditionally expected from a data warehouse, and that has changed the areas of the code that we optimize.

    Rahul: As Ippokratis mentioned, the second area we focused on in the paper was customers’ need to process more data and to use that data to drive value throughout the organization. Analytics has always been super important, but eight or ten years ago it wasn’t necessarily mission critical for customers in the same way transactional databases were. That has definitely shifted. Today, core business processes rely on Redshift being highly available and performant. The biggest architectural change in the past decade in support of this goal was the introduction of Redshift Managed Storage, which allowed us to separate compute and storage, and focus a lot of innovation in each area.

    Diagram of the Redshift Managed Storage
    The Redshift managed storage layer (RMS) is designed for a durability of 99.999999999% and 99.99% availability over a given year, across multiple availability zones. RMS manages both user data as well as transaction metadata.

    Another big trend has been the desire of customers to query across and integrate disparate datasets. Redshift was the first data warehouse in the cloud to query Amazon S3 data, that was with Redshift Spectrum in 2017. Then we demonstrated the ability to run a query that scanned an exabyte of data in S3 as well as data in the cluster. That was a game changer.

    Customers like NASDAQ have used this extensively to query data that’s on local disk for the highest performance, but also take advantage of Redshift’s ability to integrate with the data lake and query their entire history of data with high performance. In addition to querying the data lake, integrated querying of transactional data stores like Aurora and RDS has been another big innovation, so customers can really have a high-performance analytics system that’s capable of transparently querying all of the data that matters to them without having to manage these complex integration processes that other systems require.

    Illustration of how a query flows through Redshift.
    This diagram from the research paper illustrates how a query flows through Redshift. The sequence is described in detail on pages 2 and 3 of the paper.

    Ippokratis: The third area we focused on in the paper was ease of use. One change that stands out for me is that on-premises data warehousing required IT departments to have a DBA (data base administrator) who would be responsible for maintaining the environment. Over the past decade, the expectation from customers has evolved. Now, if you are offering data warehousing as a service, the systems must be capable of auto tuning, auto healing, and auto optimizing. This has become a big area of focus for us where we incorporate machine learning and automation into the system to make it easier to use, and to reduce the amount of involvement required of administrators.

    Rahul: In terms of ease of use, three innovations come to mind. One is concurrency scaling. Similar to workload management, customers would previously have to manually tweak concurrency or reset clusters of the manually split workloads. Now, the system automatically provisions new resources and scales up and down without customers having to take any action. This is a great example of how Redshift has gotten much more dynamic and elastic.

    The second ease of use innovation is automated table optimization. This is another place where the system is able to observe workloads and data layouts and automatically suggest how data should be sorted and distributed across nodes in the cluster. This is great because it’s a continuously learning system so workloads are never static in time.

    Related publication
    How should we split data among the nodes of a distributed data warehouse in order to boost performance for a forecasted workload? In this paper, we study the effect of different data partitioning schemes on the overall network cost of pairwise joins. We describe a generally-applicable data distribution framework initially designed for Amazon Redshift, a fully-managed petabyte-scale data warehouse in the

    Customers are always adding more datasets, and adding more users, so what was optimal yesterday might not be optimal tomorrow. Redshift observes this and modifies what's happening under the covers to balance that. This was the focus of a really interesting graph optimization paper that we wrote a few years ago about how to analyze for optimal distribution keys for how data is laid out within a multi-node parallel-processing system. We've coupled this with automated optimization and then table encoding. In an analytics system, how you compress data has a big impact because the less data you scan, the faster your queries go. Customers had to reason about this in the past. Now Redshift can automatically determine how to encode data correctly to deliver the best possible performance for the data and the workload.

    The third innovation I want to highlight here is Amazon Redshift Serverless, which we launched in public preview at re:Invent last fall. Redshift Serverless removes all of the management of instances and clusters, so customers can focus on getting to insights from data faster and not spend time managing infrastructure. With Redshift Serverless, customers can simply provision an endpoint and begin to interact with their data, and Redshift Serverless will auto scale and automatically manage the system to essentially remove all of that complexity from customers.

    Customers can just focus on their data, set limits to manage their budgets, and we deliver optimal performance between those limits. This is another massive step forward in terms of ease of use because it eliminates any operations for customers. The early response to the preview has been tremendous. Thousands of customers have been excited to put Amazon Redshift Serverless through its paces over the past few months, and we’re excited about making it generally available in the near future.

    Amazon Redshift architecture diagram
    The Amazon Redshift architecture as presented in the research paper.

    Ippokratis: A fourth area of focus in the paper is on integration with other AWS services, and the AWS ecosystem. Integration is another area where customer behavior has evolved from traditional BI use cases. Today, cloud data warehouses are a central hub with tight integration with a broader set of AWS services. We provided the ability for customers to join data from the warehouse with the data lake. Then customers said they needed access to high-velocity business data in operational databases like Aurora and RDS, so we provided access to these operational data stores. Then we added support for streams, as well as integration with SageMaker and Lambda so customers can run machine learning training and inference without moving their data, and do generic compute. As a result, we’ve converted the traditional BI system into a well-integrated set of AWS services.

    Rahul: One big area of integration has been with our machine-learning ecosystem. With Redshift ML we have enabled anyone who knows SQL to take advantage of all of our machine-learning innovation. We built the ability to create a model from the SQL prompt, which gets the data into Amazon S3 and calls Amazon SageMaker, to use automated machine learning to build the most appropriate model to provide predictions on the data.

    This model is compiled efficiently and brought back into the data warehouse for customers to run very high-performance parallel inferences with no additional compute or no extra cost. The beauty of this integration is that every innovation we make within SageMaker means that Redshift ML gets better as well. This is just another means by which customers benefit from us connecting our services together.

    Related content
    Amazon researchers describe new method for distributing database tables across servers.

    Another big area for integration has been data sharing. Once we separated storage and compute layers with RA3 instances, we could enable data sharing, giving customers the ability to share data with clusters in the same account, and other accounts, or across regions. This allows us to separate consumers from producers of data, which enables things like modern data mesh architectures. Customers can share data without data copying, so they are transactionally consistent across accounts.

    For example, users within a data-science organization can securely work from the shared data, as can users within the reporting or marketing organization. We’ve also integrated data sharing with AWS Data Exchange, so now customers can search for — and subscribe to — third-party datasets that are live, up to date, and can be queried immediately in Redshift. This has been another game changer from the perspective of setting data free, enabling data monetization for third-party providers, and secure and live data access and licensing for subscribers for high-performance analytics within and across organizations. The fact that Redshift is part of an incredibly rich data ecosystem is a huge win for customers, and in keeping with customers’ desire to make data more pervasively available across the company.

  3. Q. 

    You indicate in the paper that Redshift innovation is continuing at an accelerated pace.  How do you see the cloud data warehouse segment evolving – and more specifically Redshift – over the next several years?

    A. 

    Rahul: A few things will continue to be true as we head into the future. Customers will be generating ever more amounts of data, and they’re going to want to analyze that data more cost effectively. Data volumes are growing exponentially, but obviously customers don't want their costs growing exponentially. This requires that we continue to innovate, and find new levels of performance to ensure that the cost of processing a unit of data continues to go down.

    We’ll continue innovating in software, in hardware, in silicon, and in using machine learning to make sure we deliver on that promise for customers. We’ve delivered on that promise for the past 10 years, and we’ll focus on making sure we deliver on that promise into the future.

    I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.
    Ippokratis Pandis

    Also, customers are always going to want better availability, they’re always going to want their data to be secure, and they’re always going to want more integrations with more data sources, and we intend to continue to deliver on all of those. What will stay the same is our ability to offer the-best in-segment price performance and capabilities, and the best-in-segment integration and security because they will always deliver value for customers.

    Ippokratis: It has been an incredible journey; we have been rebuilding the plane as we’ve been flying it with customers onboard, and this would not have happened without the support of AWS leadership, but most importantly the tremendous engineers, managers, and product people who have worked on the team.

    As we did in the paper, I want to recognize the contributions of Nate Binkert and Britt Johnson, who have passed, but whose words of wisdom continue to guide us. We’ve taken data warehousing, what we learned from books in school (Ippokratis earned his PhD in electrical and computer engineering from Carnegie Mellon University) and brought it to the cloud. In the process, we’ve been able to innovate, and write new pages in the book. I’m very proud of what the team has accomplished, but equally as excited about all the things we’re going to do to improve Redshift in the future.

Research areas

Related content

IN, KA, Bengaluru
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon’s customers? Do you want to build and deploy advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning and Data Sciences team for India Consumer Businesses. If you have an entrepreneurial spirit, know how to deliver, love to work with data, are deeply technical, highly innovative and long for the opportunity to build solutions to challenging problems that directly impact the company's bottom-line, we want to talk to you. Major responsibilities - Use machine learning and analytical techniques to create scalable solutions for business problems - Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes - Design, development, evaluate and deploy innovative and highly scalable models for predictive learning - Research and implement novel machine learning and statistical approaches - Work closely with software engineering teams to drive real-time model implementations and new feature creations - Work closely with business owners and operations staff to optimize various business operations - Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation - Mentor other scientists and engineers in the use of ML techniques
US, NY, New York
MULTIPLE POSITIONS AVAILABLE Employer: Amazon Development Center U.S., Inc. Offered Position: Applied Scientist III - AMZ007408 Job Location: New York, NY Position Responsibilities: Participate in the design, development, evaluation, deployment, and updating of formal reasoning systems for security, privacy, and data protection applications. Drive technical and scientific innovation in security automation, data protection, and privacy-preserving technologies, with a focus on developing scalable solutions for cloud environments. Develop and/or apply formal verification techniques and automated theorem proving methods for different applications in cloud security and privacy. Collaborate with internal and external users to understand requirements and enhance formal verification and automated reasoning capabilities. Lead research and development efforts in AI security, specifically evaluate emerging threats and opportunities, including securing Generative AI systems and designing robust safeguards. Proactively identify and explore new opportunities for deploying and leveraging formal reasoning solutions across various domains.
GB, London
The Agentic Automated Reasoning Group is building the next generation of software verification tools combining advances in artificial intelligence, the computational capacity of the cloud, and our deep expertise in the domain. Join us if you want to be a part of this transformational endeavor. The Strata team (https://github.com/strata-org) is seeking an applied scientist with broad interest and expertise in model checking, interactive theorem proving, programming language semantics, and generative AI. You will combine your expertise with that of your coworkers to build new tools that solve code analysis problems previously considered beyond reach. Our application areas span all the way from Infrastructure as Code to high-performance cryptography written in assembly code, while our methods span from interactive theorem proving to automated test generation. Each day, hundreds of thousands of developers make billions of transactions worldwide on AWS. They harness the power of the cloud to enable innovative applications, websites, and businesses. Using automated reasoning technology and mathematical proofs, AWS allows customers to answer questions about security, availability, durability, and functional correctness. We call this provable security, absolute assurance in security of the cloud and in the cloud. https://aws.amazon.com/security/provable-security/ Key job responsibilities Work with customer teams to understand the nature of their software and the properties they need to establish of it. Identify tools and methods capable of addressing the verification needs of customers, including any novel analysis capabilities required. Use techniques spanning property-based testing to model checkers, and interactive theorem provers to establish program properties. Explore generative AI techniques to help customers formalize their requirements, find revealing tests, generate required boiler plate for testing and model checking, and find and repair program proofs. About the team The Agentic Automated Reasoning Group at AWS develops and applies state of the art formal methods and automated reasoning techniques to ensure the security, reliability, and correctness of AWS services and customer applications, with a strong focus on AI based agents. Our work innovates tools and services to perform verification at scale and apply them to build safe and secure systems at AWS. We are also pioneering the use of formal verification and automated reasoning to develop agentic systems, ensuring AI agents operate within defined safety boundaries.
US, CA, San Francisco
Join the next revolution in robotics at Amazon's Frontier AI & Robotics team, where you'll work alongside world-renowned AI pioneers to lead key initiatives in robotic intelligence. As a Member of Technical Staff, you'll spearhead the development of breakthrough foundation models that enable robots to perceive, understand, and interact with the world in unprecedented ways. You'll drive technical excellence in areas such as perception, manipulation, science understanding, sim2real transfer, multi-modal foundation models, and multi-task learning, designing novel algorithms that bridge the gap between state-of-the-art research and real-world deployment at Amazon scale. In this role, you'll combine hands-on technical work with scientific leadership, ensuring your team delivers robust solutions for dynamic real-world environments. You'll leverage Amazon's vast computational resources to tackle ambitious problems in areas like very large multi-modal robotic foundation models and efficient, promptable model architectures that can scale across diverse robotic applications. Key job responsibilities - Lead technical initiatives in robotics foundation models, driving breakthrough approaches through hands-on research and development in areas like open-vocabulary panoptic scene understanding, scaling up multi-modal LLMs, sim2real/real2sim techniques, end-to-end vision-language-action models, efficient model inference, video tokenization - Design and implement novel deep learning architectures that push the boundaries of what robots can understand and accomplish - Guide technical direction for specific research initiatives, ensuring robust performance in production environments - Mentor and support fellow scientists while maintaining strong individual technical contributions - Collaborate with engineering teams to optimize and scale models for real-world applications - Influence technical decisions and implementation strategies within your area of focus A day in the life - Develop and implement novel foundation model architectures, working hands-on with our extensive compute infrastructure - Guide and support fellow scientists in solving complex technical challenges, from sim2real transfer to efficient multi-task learning - Lead focused technical initiatives from conception through deployment, ensuring successful integration with production systems - Drive technical discussions within your team and with key stakeholders - Conduct experiments and prototype new ideas using our massive compute cluster - Mentor team members while maintaining significant hands-on contribution to technical solutions Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through ground breaking foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, WI, Madison
As a Data Scientist on the Shopbop/Zappos Catalog Tech team, you will design and implement scientific approaches to revolutionize how we manage and enhance our product catalog data for our world-class selection of Shoes, Kids, and Active wear. You will work with Zappos' Senior leadership team to solve complex data challenges through advanced analytics and machine learning - creating innovative solutions and influencing product decisions through data-driven insights. You will lead critical initiatives to reduce catalog errors, accelerate product data capture, and develop state-of-the-art image classification systems for fashion features. You will partner daily with engineering teams and business stakeholders to provide expert guidance on model selection and implementation. As a member of the Zappos technical staff, you will leverage machine learning technologies and have access to industry leaders in AI/ML and E-Commerce to help grow your expertise. You will also routinely collaborate with data science teams across our sister companies at Amazon.com and Shopbop.com. You will push the boundaries of what's possible with applied machine learning and bring innovative solutions to bear for customers (including computer vision, NLP, and advanced ML models). You will think big about how data science can transform our catalog operations and be persistent in delivering robust, scalable solutions. Key job responsibilities Design and implement machine learning approaches to improve catalog data quality. Develop and validate scientific methodologies for automated data capture and classification. Partner with engineering teams to integrate ML models into production systems. Create and present analysis that drives decision-making at the senior leadership level. A day in the life You start the day reviewing model performance metrics, noting some drift in the image classification system that needs investigation. You spend the morning developing a new approach to reduce product attribute errors using recent advances in LLMs. In the afternoon, you meet with engineering teams to advise on model architecture for a new feature, and wrap up by analyzing the results of your latest A/B test on data capture efficiency improvements. About the team Zappos/Shopbop Catalog Tech team owns the software that drives our photostudio, product cataloging, and integration to Amazon's marketplace. We use Amazon's Leadership Principals and Engineering Expertise but have our own fun vibe. We are located in Madison WI, and Las Vegas NV.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for biology. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. Key job responsibilities - Build, adapt and evaluate ML models for life sciences applications - Collaborate with a cross-functional team of ML scientists, biologists, software engineers and product managers
US, WA, Seattle
Join us at the forefront of Amazon's sustainability initiatives to work on environmental and social advancements that support Amazon's long-term worldwide sustainability strategy. At Amazon, we're working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, and driven people who are passionate about making a meaningful impact on communities and the environment while helping shape the future of sustainable business practices. Sustainability Science and Innovation (SSI) is a multi-disciplinary team within WW Sustainability combining science, analytics, economics, statistics, machine learning, product development, and engineering expertise. We use data across the sustainability imperatives (carbon, water, waste, biodiversity, environmental risk and more) and these skills and capabilities to identify, develop, experiment, and scale the scientific solutions and innovations necessary for Amazon, customers and partners to help them solve their hardest unmet and evolving sustainability needs and goals. The Worldwide Sustainability (WWS) organization is seeking an exceptional scientific leader to join Amazon's Sustainability Science and Innovation team as a Researcher Scientist for Materials Chemistry Innovation. This role focuses on hands-on experimental research in materials chemistry to accelerate the discovery and validation of sustainable materials through systematic synthesis, characterization, and performance testing. You will lead the design and execution of experimental research campaigns targeting catalysts, functional materials, and sustainability-relevant chemistries across multivariate parameter spaces. You will establish scientific strategy and technical roadmaps for materials discovery while leading research initiatives that tackle complex sustainability challenges in critical industrial sectors. This position requires driving breakthrough solutions in materials synthesis and characterization through internal capabilities and strategic partnerships with universities, industry scientists, and government laboratories. You will mentor junior scientists and engineers while collaborating across Amazon's Innovation Lab Network to translate research into scalable solutions. Your leadership will be essential in developing early-stage, cost-effective materials that address significant technical and economic challenges fundamental to Amazon's operations, requiring you to navigate complex trade-offs between immediate deliverables and long-term environmental impact. You will also shape how emerging automation and AI tools are applied to accelerate materials discovery workflows. The ideal candidate demonstrates extensive experience in materials synthesis, advanced characterization techniques, and systematic experimental design for performance validations. You must possess proven ability to lead cross-functional teams, establish research priorities, and drive scientific innovation from concept to implementation. Deep technical expertise in materials testing methods, combined with strategic vision for translating research into practical applications is essential. Experience with high-throughput and combinatorial experimental approaches to efficiently explore large design spaces is highly valued. Your work will establish new paradigms in sustainable materials discovery through rigorous experimental research and performance testing, directly contributing to Amazon's sustainability goals while creating scalable solutions that extend beyond the company's immediate operations. Key job responsibilities - Develop scientific models that help solve complex and ambiguous sustainability problems, and extract strategic learnings from large datasets. - Work closely with applied scientists and software engineers to implement your scientific models. - Support early-stage strategic sustainability initiatives and effectively learn from, collaborate with, and influence stakeholders to scale-up high-value initiatives. - Support research and development of cross-cutting technologies for industrial decarbonization, including building the data foundation and analytics for new AI models. - Drive innovation in key focus areas including packaging materials, building materials, and alternative fuels. About the team Diverse Experiences: World Wide Sustainability (WWS) values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture: It’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
US, TX, Dallas
Amazon Web Services (AWS) Applied AI Solutions (AAIS) is on a mission to make AI real for enterprises. We build and deploy production AI solutions that drive measurable business outcomes at scale, bringing together applied scientists, AI architects, business development professionals, and GTM specialists to help customers move from AI experimentation to production impact. Within AAIS, the GTM Acceleration team activates the field, measures impact, and scales what works. We are the connective tissue between AAIS product and science teams and the worldwide field organization, ensuring our AI solutions reach customers effectively, that we quantify the value we deliver, and that we build repeatable motions that scale globally. We are looking for an Applied Scientist who will serve as a force multiplier across our customer engagement teams, building the analytical foundations, predictive models, and reusable tooling that power our go-to-market strategy. You will work at the intersection of data science, machine learning, and business strategy, building models that quantify our value proposition, and creating scalable analytical assets that accelerate every engagement. This is a highly visible, high-impact role where your work directly influences how we demonstrate and measure the value of AWS AI solutions for enterprise customers. You will operate with significant autonomy, owning the scientific direction of your projects while collaborating with software engineers, product managers, and business stakeholders. You will identify the right methodology for each problem, whether that is a classical statistical approach, a modern deep learning technique, or a novel combination, and communicate your findings clearly to both technical and non-technical audiences. This role spans Connect Customer initiatives and across the Applied AI solution portfolio, offering the opportunity to pioneer data science approaches that scale intelligent analytics worldwide. If you thrive at the intersection of rigorous science and customer-facing impact and are energized by translating complex model outputs into business decisions, we want to talk to you. Key job responsibilities Design, develop, and deploy statistical models and machine learning pipelines to drive product improvements, business decisions, and customer outcomes Work directly with customers during production pilots to build and deploy AI solutions that demonstrate measurable business value Design and execute A/B experiments and causal inference analyses to measure the impact of new features and model changes Build ROI models, business case tools, and forecasting systems for demand prediction, capacity planning, workforce optimization, and value quantification Apply NLP and generative AI techniques to extract insights from structured and unstructured data at scale, and partner with software engineers to productionize models with reliability, monitoring, and operational excellence Build and own customer analytics capabilities including segmentation (by size tier, AI adoption, product penetration, entitlement), usage trend analysis, propensity modeling, and foundational datasets combining service usage with sales data Create self-service analytics platforms and automated insight delivery mechanisms that enable leadership to pull strategic intelligence on demand Enable field teams with reusable analytical assets, diagnostic notebooks, benchmarking studies, and scalable tooling that accelerate customer engagements Own success metrics and create mechanisms to measure model performance, adoption, and business impact across customer cohorts Define strategic frameworks and GTM recommendations by segment, translating data patterns and market signals into actionable go-to-market motions and investment priorities Communicate findings and technical trade-offs to senior leadership and customer executives through written documents (6-pagers, science reviews) and presentations, operating as a shared resource across 2-3 teams simultaneously About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture AWS values curiosity and connection. Our employee-led and company-sponsored affinity groups promote inclusion and empower our people to take pride in what makes us unique. Our inclusion events foster stronger, more collaborative teams. Our continual innovation is fueled by the bold ideas, fresh perspectives, and passionate voices our teams bring to everything we do. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, NY, New York
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About the team SPB Agent team's vision is to build a highly personalized and context-aware agentic advertiser guidance system that seamlessly integrates Large Language Models (LLMs) with sophisticated tooling, operating across all experiences. The SPB-Agent is the central agent that interfaces with advertisers across Ads Console, Selling Partner portals (Seller Central, KDP, Vendor Central), and internal Sales systems. We identify high-impact opportunities spanning from strategic product guidance to granular optimization and deliver them through personalized, scalable experiences grounded in state-of-the-art agent architectures, reasoning frameworks, sophisticated tool integration, and model customization approaches including fine-tuning, MCP, and preference optimization. This presents an exceptional opportunity to shape the future of e-commerce advertising through advanced AI technology at unprecedented scale, creating solutions that directly impact millions of advertisers.
US, WA, Seattle
Amazon's Customer Experience and Business Trends (CXBT) is seeking a Data Science Manager to lead a team of scientists and engineers within Benchmarking Economics Analytics and Measurement (BEAM). BEAM is a central analytics and science function that drives Amazon's quantification of CX improvement opportunities through comparative benchmarks, partnering with stakeholders across CXBT, business domain teams, Finance, SCOT, and other centralized science teams. This is a hands-on leadership role for a manager who can set technical direction, build durable data products, and grow people. You will own the strategy and roadmap for a portfolio of analytics products, working backward from leadership and stakeholder needs to deliver insights that inform decisions at the speed of business. Key job responsibilities - Build a holistic metrics and trend-detection product. Lead the team to design and operationalize an always-on framework of indicators that surfaces emerging business trends reliably enough to brief senior leaders. - Partner with cross-org stakeholders to drive product adoption and impact. Work directly with internal customers and partner teams to ensure our products are tightly aligned with business use cases, translate ambiguous problems into well-scoped analytics solutions, and drive adoption so that insights translate into decisions and measurable business impact. - Manage, mentor, and grow the team. Hire, develop, and retain a high-performing team of scientists and engineers. Set clear expectations, give actionable feedback, create stretch opportunities, and build the bench strength needed to scale the team's scope over time. - Lead the transformation from traditional analytics to a GenAI-native operating model. Shape and execute the team's technical strategy to evolve from manual, study-based analytics toward GenAI-enabled products and workflows — accelerating insight generation, improving self-serve access for stakeholders, and freeing capacity for deeper scientific investment.