Optimizing neural networks for special-purpose hardware

Curating the neural-architecture search space and taking advantage of human intuition reduces latency on real-world applications by up to 55%.

As neural networks grow in size, deploying them on-device increasingly requires special-purpose hardware that parallelizes common operations. But for maximum efficiency, it’s not enough to optimize the hardware for the networks; the networks should be optimized for the hardware, too.

Related content
The first step in training a neural network to solve a problem is usually the selection of an architecture: a specification of the number of computational nodes in the network and the connections between them. Architectural decisions are generally based on historical precedent, intuition, and plenty of trial and error.

The standard way to optimize a neural network is through neural-architecture search (NAS), where the goal is to minimize both the size of the network and the number of floating-point operations (FLOPS) it performs. But this approach doesn’t work with neural chips, which can often execute easily parallelized but higher-FLOPS tasks more rapidly than they can harder-to-parallelize but lower-FLOPS tasks.

Minimizing latency is a more complicated optimization objective than minimizing FLOPS, so in the Amazon Devices Hardware group, we’ve developed a number of strategies for adapting NAS to the problem of optimizing network architectures for Amazon’s new Neural Engine family of accelerators. Those strategies involve curating the architecture search space to, for instance, reduce the chances of getting stuck in local minima. We’ve also found that combining a little human intuition with the results of NAS for particular tasks can help us generalize to new tasks more reliably and efficiently.

In experiments involving several different machine learning tasks, we’ve found that our NAS strategies can reduce latencies by as much as 55%.

Varieties of neural-architecture search

NAS needs three things: a definition of the search space, which specifies the building blocks available to construct a network; a cost model, which is a function of the network's accuracy, latency, and memory; and an optimization algorithm. We use a performance estimator to measure latency and memory footprint, but to measure accuracy, we must train the network. This is a major bottleneck, as training a single network can take days. Sampling thousands of architectures would take thousands of GPU days, which is clearly neither practical nor environmentally sustainable.

There are three categories of NAS algorithm, which require networks to be trained different numbers of times: multishot, single-shot, and zero-shot.

Related content
A new approach that grows networks dynamically promises improvements over GANs with fixed architectures or predetermined growing strategies.

Multishot methods sample a cohort of architectures in each iteration. Each network is trained and evaluated for accuracy and performance, and the next set of architectures is sampled based on their cost. Evolutionary or reinforcement-learning-based algorithms are generally used for multishot methods.

Single-shot methods start with a large network called the supernet, which has multiple possible subgraphs. During training, the subgraphs start converging to a single, small network. Single-shot methods are designed to be trained only once, but their training takes much longer than that of a single network in multishot methods.

Zero-shot methods works like multishot methods, with the key difference that the network is never trained. As a proxy for accuracy, we use the network’s trainability score, which is computed using the network's topology, nonlinearity, and operations. Zero-shot methods are the fastest to converge, because calculating the score is computationally very cheap. The downside is that the trainability may not correlate well with model accuracy.

Search space curation

The NAS cost function can be visualized as a landscape, with each point representing a potential architecture. A cost function based on FLOPS changes monotonically with factors such as sizes or channels: that is, if you find a direction across the terrain in which the cost is going down, you can be sure that continuing in that direction will not cause the cost to go up.

However, the inclusion of accelerator-aware constraints disrupts the function by introducing more asymptotes, or points at which the cost switches from going down to going up. This results in a more complex and rocky landscape.

Related content
How to make trained systems evolve gracefully.

To address this issue, we reduced the number of options in the search space. We were exploring convolutional architectures, meaning that the inputs are decomposed into several different components, each of which has its own channel through the network. The data in each channel, in turn, is filtered in several different ways; each filter involves a different data convolution.

Previously, we would have explored the number of channels — known as the channel size — at increments of one; instead, we considered only a handful of channel sizes. We limited the options for channel sizes to certain values that were favorable for the parallelism factor of the Neural Engine. The parallelism factor is a count of operations, such as dot product, that can be performed in parallel. In some cases, we even added "depth multiplier" ratio that could be used to scale the number of channels across the entire model to the search space.

These improvements can be visualized as taking fewer, larger steps across a smoother terrain, rather than trying to navigate the rocky landscape that resulted from the inclusion of accelerator-aware performance in the cost function. During the optimization process, they resulted in a faster convergence rate because of the reduced number of options and in improved stability and reliability thanks to the monotonic nature of the curated search space.

NAS - 3x1.png
Illustration of how the cost landscape (green) changes from smooth (left) to rocky (center and right) when a cost function based on Neural Engine performance replaces one based on FLOPS. Curation (right) reduces the discrete search space (black dots) and ensures that points are far apart. The trajectory of a search algorithm (blue arrows) shows how curation (right) ensures that with each step in a search, the cost is monotonically decreasing.

One key detail in our implementation is the performance estimator. Instead of deploying an architecture on real hardware or an emulator to obtain performance metrics, we estimated them using a machine learning regression model trained on measurements of different operators or subgraphs.

At inference time, the estimator would decompose the queried architecture into subgraphs and use the regression model to estimate the performance of each. Then it would accumulate these estimates to give the model-level performance. This regressor-based design simplified our NAS framework, as it no longer required compilation, inference, or hardware. This technique enables us to test accelerators in the design phase, before we’ve developed custom compilers and hardware emulators for them.

Productizing NAS with expert-in-the-loop

Curating the search space improves convergence rate, stability, and reliability, but transferability to new use cases is not straightforward. NAS results for a detector model, for instance, may not be easy to transfer to a classification model. On the other hand, running NAS from scratch for each new dataset may not be feasible, due to time constraints. In these situations, we found that combining NAS results and human expertise was the fastest approach.

Channel reduction step.png
The initial channel reduction step (1x1 conv.) in the inverted-bottleneck (IBN) block at left is fused with the channel expansion step (KxK depth. conv.) in the fused IBN at right. This proved to be a common subgraph modification across datasets.

When we performed NAS on different datasets, we saw common patterns, such as the fusion of convolution layers with previous convolution layers, reducing the number of channels and, aligning them with the hardware parallelism factor.

In particular, fusing convolution layers in inverted bottleneck (IBN) blocks contributed most to boosting efficiency. With just these modifications, we observed latency reductions of up to 50%, whereas a fully converged NAS model would yield a slightly better 53% reduction.

In situations where running NAS from scratch is not feasible, a human expert can rely on mathematical intuition and observations of the results of NAS on similar datasets to build the required model architecture.

Results and product impact

We applied this technique to multiple products in the Amazon Devices portfolio, ranging from Echo Show and Blink home security products to the latest Astro, the in-home consumer robot.

1. Reduced detection latency by half on Echo Show

Echo Show runs a model to detect human presence and locate the detected person in a room. The original model used IBN blocks. We used accelerator-aware NAS to reduce the latency of this model by 53%.

Human-presence detection.png
Schematic representation of human-presence detection.

We performed a search for depth multipliers — that is, layers that multiply the number of channels — and for opportunities to replace IBN blocks with fused-IBN blocks. The requirement was to maintain the same mean average precision (mAP) of the original model while improving the latency. Our V3 model improved the latency by more than 53% (i.e. 2.2x faster) while keeping the mAP scores same as baseline.

Latency results for the original model and three models found through NAS.

Fused-IBN search

Depth multiplier search

Latency reduction (%)

Baseline

No

No

Baseline

V1

No

Yes

14%

V2

Yes

No

35%

V3

Yes

Yes

53%

After performing NAS, we found that not every IBN fusion improves latency and accuracy. The later layers are larger, and replacing them with fused layers hurt performance. For the layers where fusion was selected, the FLOPs, as expected, increased, but the latency did not.

2. Model fitting within the tight memory budget of the Blink Floodlight Camera

Blink cameras use a classification model for security assistance. Our goal was to fit the model parameters and peak activation memory within a tight memory budget. In this case, we combined NAS techniques with an expert-in-the-loop to provide fine-tuning. The NAS result on the classification dataset provided intuition on what operator/subgraph changes could extract benefits from the accelerator design.

Classification.png
Schematic representation of the classification model output.

The expert recommendations were to replace the depth-wise convolutions with standard convolutions and reduce the channels by making them even across the model, preferably by a multiple of the parallelism factor. With these changes, model developers were able to reduce both the model size and the intermediate memory usage by 47% and fit the model within the required budget.

3. Fast semantic segmentation for robotics

In the context of robotics, semantic segmentation is used to understand the objects and scenes the robot is interacting with. For example, it can enable the robot to identify chairs, tables, or other objects in the environment, allowing it to navigate and interact with its surroundings more effectively. Our goal for this model was to reduce latency by half. Our starting point was a semantic-segmentation model that was optimized to run on a CPU.

Semantic segmentation.png
Left: original image of a room at night; center: semantic-segmentation image; right: semantic segmentation overlaid on original image.

For this model, we searched for different channel sizes, fusion, and also output and input dimensions. We used the multishot method with the evolutionary search algorithm. NAS gave us multiple candidates with different performances. The best candidate was able to reduce the latency by half.

Latency improvement for different architectures found through NAS.

Latency reduction (%)

Original

Baseline

Model A

27%

Model B

37%

Model C

38%

Model D

41%

Model E

51%

4. User privacy with on-device inference

Amazon's Neural Engine supports large-model inference on-device, so we can process microphone and video feeds without sending data to the cloud. For example, the Amazon Neural Engine has enabled Alexa to perform automatic speech recognition on-device. On-device processing also provides a better user experience because the inference pipeline is not affected by intermittent connection issues. In our NAS work, we discovered that even larger, more accurate models can now fit on-device with no hit on latency.

Making edge AI sustainable

We mentioned earlier that multishot NAS with full training can take up to 2,000 GPU-days. However, with some of the techniques described in this blog, we were able to create efficient architectures in a substantially shorter amount of time, making NAS much more scalable and sustainable. But our sustainability efforts don't end there.

Related content
Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

Because of its parallelism and mixed-precision features, the Neural Engine is more power efficient than a generic CPU. For a million average users, the difference is on order of millions of kilowatt-hours per year, equivalent to 200 gasoline-powered passenger vehicles per year or the energy consumption of a hundred average US households.

When we optimize models through NAS, we increase the device's capability to run more neural-network models simultaneously. This allows us to use smaller application processors and, in some cases, fewer of them. By reducing the hardware footprint in this way, we are further reducing the carbon footprint of our devices.

Future work

We have identified that curation requires an expert who understands the hardware design well. This may not scale to future generations of more complex hardware. We have also identified that in situations where time is tight, having an expert in the loop is still faster than running NAS from scratch. Because of this, we are continuing to investigate how NAS algorithms with accelerator awareness can handle large search spaces. We are also working on improving the search algorithm’s efficiency and effectiveness by exploring how the three categories of algorithms can be combined. We also plan to explore model optimization by introducing sparsity through pruning and clustering. Stay tuned!

Acknowledgements: Manasa Manohara, Lingchuan Meng, Rahul Bakshi, Varada Gopalakrishnan, Lindo St. Angel

Research areas

Related content

US, WA, Seattle
Join the Worldwide Sustainability (WWS) organization where we capitalize on our size, scale, and inventive culture to build a more resilient and sustainable company. WWS manages our social and environmental impacts globally, driving solutions that enable our customers, businesses, and the world around us to become more sustainable. Sustainability Science and Innovation is a multi-disciplinary team within the WW Sustainability organization that combines science, analytics, economics, statistics, machine learning, product development, and engineering expertise to identify, evaluate and/or develop new science, technologies, and innovations that aim to address long-term sustainability challenges. We are looking for a Sr. Research Scientist to help us develop and drive innovative scientific solutions that will improve the sustainability of materials in our products, packaging, operations, and infrastructure. You will be at the forefront of exploring and resolving complex sustainability issues, bringing innovative ideas to the table, and making meaningful contributions to projects across SSI’s portfolio. This role not only demands technical expertise but also a strategic mindset and the agility to adapt to evolving sustainability challenges through self-driven learning and exploration. In this role, you will leverage your breadth of expertise in AI models and methodologies and industrial research experience to build scientific tools that inform sustainability strategies related to materials and energy. The successful applicant will lead by example, pioneering science-vetted data-driven approaches, and working collaboratively to implement strategies that align with Amazon’s long-term sustainability vision. Key job responsibilities - Develop scientific models that help solve complex and ambiguous sustainability problems, and extract strategic learnings from large datasets. - Work closely with applied scientists and software engineers to implement your scientific models. - Support early-stage strategic sustainability initiatives and effectively learn from, collaborate with, and influence stakeholders to scale-up high-value initiatives. - Support research and development of cross-cutting technologies for industrial decarbonization, including building the data foundation and analytics for new AI models. - Drive innovation in key focus areas including packaging materials, building materials, and alternative fuels. About the team Diverse Experiences: World Wide Sustainability (WWS) values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture: It’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance: We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
GB, MLN, Edinburgh
Do you want a role with deep meaning and the ability to make a major impact? As part of Intelligent Talent Acquisition (ITA), you'll have the opportunity to reinvent the hiring process and deliver unprecedented scale, sophistication, and accuracy for Amazon Talent Acquisition operations. ITA is an industry-leading people science and technology organization made up of scientists, engineers, analysts, product professionals and more, all with the shared goal of connecting the right people to the right jobs in a way that is fair and precise. Last year we delivered over 6 million online candidate assessments, and helped Amazon deliver billions of packages around the world by making it possible to hire hundreds of thousands of workers in the right quantity, at the right location and at exactly the right time. You’ll work on state-of-the-art research, advanced software tools, new AI systems, and machine learning algorithms, leveraging Amazon's in-house tech stack to bring innovative solutions to life. Join ITA in using technologies to transform the hiring landscape and make a meaningful difference in people's lives. Together, we can solve the world's toughest hiring problems. A day in the life As a Research Scientist, you will partner on design and development of AI-powered systems to scale job analyses enterprise-wide, match potential candidates to the jobs they’ll be most successful in, and conduct validation research for top-of-funnel AI-based evaluation tools. You’ll have the opportunity to develop and implement novel research strategies using the latest technology and to build solutions while experiencing Amazon’s customer-focused culture. The ideal scientist must have the ability to work with diverse groups of people and inter-disciplinary cross-functional teams to solve complex business problems. About the team The Lead Generation & Detection Services (LEGENDS) organization is a specialized organization focused on developing AI-driven solutions to enable fair and efficient talent acquisition processes across Amazon. Our work encompasses capabilities across the entire talent acquisition lifecycle, including role creation, recruitment strategy, sourcing, candidate evaluation, and talent deployment. The focus is on utilizing state-of-the-art solutions using Deep Learning, Generative AI, and Large Language Models (LLMs) for recruitment at scale that can support immediate hiring needs as well as longer-term workforce planning for corporate roles. We maintain a portfolio of capabilities such as job-person matching, person screening, duplicate profile detection, and automated applicant evaluation, as well as a foundational competency capability used throughout Amazon to help standardize the assessment of talent interested in Amazon.
US, NY, New York
About Sponsored Products and Brands The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About our team The Search Ranking and Interleaving (R&I) team within Sponsored Products and Brands is responsible for determining which ads to show and the quality of ads shown on the search page (e.g., relevance, personalized and contextualized ranking to improve shopper experience, where to place them, and how many ads to show on the search page. This helps shoppers discover new products while helping advertisers put their products in front of the right customers, aligning shoppers’, advertisers’, and Amazon’s interests. To do this, we apply a broad range of GenAI and ML techniques to continuously explore, learn, and optimize the ranking and allocation of ads on the search page. We are an interdisciplinary team with a focus on improving the SP experience in search by gaining a deep understanding of shopper pain points and developing new innovative solutions to address them. A day in the life As an Applied Scientist on this team, you will identify big opportunities for the team to make a direct impact on customers and the search experience. You will work closely with with search and retail partner teams, software engineers and product managers to build scalable real-time GenAI and ML solutions. You will have the opportunity to design, run, and analyze A/B experiments that improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact while broadening your technical skillset. Key job responsibilities - Solve challenging science and business problems that balance the interests of advertisers, shoppers, and Amazon. - Drive end-to-end GenAI & Machine Learning projects that have a high degree of ambiguity, scale, complexity. - Develop real-time machine learning algorithms to allocate billions of ads per day in advertising auctions. - Develop efficient algorithms for multi-objective optimization using deep learning methods to find operating points for the ad marketplace then evolve them - Research new and innovative machine learning approaches.
US, CA, San Francisco
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents. AGI Autonomy is focused on developing new foundational capabilities for useful AI agents that can take actions in the digital and physical worlds. In other words, we’re enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. In this role, you will work closely with research teams to design, build, and maintain systems for training and evaluating state-of-the-art agent models. Our team works inside the Amazon AGI SF Lab, an environment designed to empower AI researchers and engineers to work with speed and focus. Our philosophy combines the agility of a startup with the resources of Amazon. Key job responsibilities * Evaluate performance of the training infrastructure, diagnose problems and address any gaps that exist. * Develop reliable infrastructure to schedule training and model evaluation jobs across clusters. * Work closely with researchers to create new techniques, infrastructure, and tooling around emerging research capabilities and evaluating models to meet customer needs. * Manage project prioritization, deliverables, timelines, and stakeholder communication. * Illuminate trade-offs, educate the team on best practices, and influence technical strategy. * Operate in a dynamic environment to deliver high quality software. About the team The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. In other words, we’re enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and engineers to make major breakthroughs with speed and focus toward this goal. Our philosophy combines the agility of a startup with the resources of Amazon. By keeping the team lean, we’re able to maximize the amount of compute per person. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research.
US, MD, Jessup
Application deadline: Applications will be accepted on an ongoing basis Are you excited to help the US Intelligence Community design, build, and implement AI algorithms, including advanced Generative AI solutions, to augment decision making while meeting the highest standards for reliability, transparency, and scalability? The Amazon Web Services (AWS) US Federal Professional Services team works directly with US Intelligence Community agencies and other public sector entities to achieve their mission goals through the adoption of Machine Learning (ML) and Generative AI methods. We build models for text, image, video, audio, and multi-modal use cases, leveraging both traditional ML approaches and state-of-the-art generative models including Large Language Models (LLMs), text-to-image generation, and other advanced AI capabilities to fit the mission. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based on customer needs. At AWS, we're hiring experienced data scientists with a background in both traditional and generative AI who can help our customers understand the opportunities their data presents, and build solutions that earn the customer trust needed for deployment to production systems. In this role, you will work closely with customers to deeply understand their data challenges and requirements, and design tailored solutions that best fit their use cases. You should have broad experience building models using all kinds of data sources, and building data-intensive applications at scale. You should possess excellent business acumen and communication skills to collaborate effectively with stakeholders, develop key business questions, and translate requirements into actionable solutions. You will provide guidance and support to other engineers, sharing industry best practices and driving innovation in the field of data science and AI. This position requires that the candidate selected must currently possess and maintain an active TS/SCI Security Clearance with Polygraph. The position further requires the candidate to opt into a commensurate clearance for each government agency for which they perform AWS work. Key job responsibilities As a Data Scientist, you will: - Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate AI algorithms to address real-world challenges - Interact with customers directly to understand the business problem, help and aid them in implementation of AI solutions, deliver briefing and deep dive sessions to customers and guide customer on adoption patterns and paths to production. - Create and deliver best practice recommendations, tutorials, blog posts, sample code, and presentations adapted to technical, business, and executive stakeholder - Provide customer and market feedback to Product and Engineering teams to help define product direction - This position may require up to 25% local travel. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
US, MD, Jessup
Application deadline: Applications will be accepted on an ongoing basis Are you excited to help the US Intelligence Community design, build, and implement AI algorithms, including advanced Generative AI solutions, to augment decision making while meeting the highest standards for reliability, transparency, and scalability? The Amazon Web Services (AWS) US Federal Professional Services team works directly with US Intelligence Community agencies and other public sector entities to achieve their mission goals through the adoption of Machine Learning (ML) and Generative AI methods. We build models for text, image, video, audio, and multi-modal use cases, leveraging both traditional ML approaches and state-of-the-art generative models including Large Language Models (LLMs), text-to-image generation, and other advanced AI capabilities to fit the mission. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based on customer needs. At AWS, we're hiring experienced data scientists with a background in both traditional and generative AI who can help our customers understand the opportunities their data presents, and build solutions that earn the customer trust needed for deployment to production systems. In this role, you will work closely with customers to deeply understand their data challenges and requirements, and design tailored solutions that best fit their use cases. You should have broad experience building models using all kinds of data sources, and building data-intensive applications at scale. You should possess excellent business acumen and communication skills to collaborate effectively with stakeholders, develop key business questions, and translate requirements into actionable solutions. You will provide guidance and support to other engineers, sharing industry best practices and driving innovation in the field of data science and AI. This position requires that the candidate selected must currently possess and maintain an active TS/SCI Security Clearance with Polygraph. The position further requires the candidate to opt into a commensurate clearance for each government agency for which they perform AWS work. Key job responsibilities As a Data Scientist, you will: - Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate AI algorithms to address real-world challenges - Interact with customers directly to understand the business problem, help and aid them in implementation of AI solutions, deliver briefing and deep dive sessions to customers and guide customer on adoption patterns and paths to production. - Create and deliver best practice recommendations, tutorials, blog posts, sample code, and presentations adapted to technical, business, and executive stakeholder - Provide customer and market feedback to Product and Engineering teams to help define product direction - This position may require up to 25% local travel. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
IN, KA, Bengaluru
Are you passionate about building data-driven applied science solutions to drive the profitability of the business? Are you excited about solving complex real world problems? Do you have proven analytical capabilities, exceptional communication, project management skills, and the ability to multi-task and thrive in a fast-paced environment? Join us a Senior Applied Scientist to deliver applied science solutions for Amazon Payment Products. Amazon Payment Products team creates and manages a global portfolio of payment products, including co-branded credit cards, instalment financing, etc. Within this team, we are looking for a Senior Applied Scientist who will be responsible for the following: Key job responsibilities As a Senior Applied Scientist, you will be responsible for designing and deploying scalable ML, GenAI, Agentic AI solutions that will impact the payments of millions of customers and solve key customer experience issues. You will develop novel deep learning, LLM for task automation, text processing, pattern recognition, and anomaly detection problems. You will define the research and experiments strategy with an iterative execution approach to develop AI/ML models and progressively improve the results over time. You will partner with business and engineering teams to identify and solve large and significantly complex problems that require scientific innovation. You will help the team leverage your expertise, by coaching and mentoring. You will contribute to the professional development of colleagues, improving their technical knowledge and the engineering practices. You will independently as well as guide team to file for patents and/or publish research work where opportunities arise. As the Payment Products organization deals with problems that are directly related to payments of customers, the Senior Applied Scientist role will impact the large product strategy, identify new business opportunities and provides strategic direction, which will be very exciting.
US, CA, San Francisco
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities * Design and implement a modern, fast, and ergonomic development environment for AI researchers, eliminating current pain points in build times, testing workflows, and iteration speed * Build and manage CI/CD pipelines (CodePipeline, Jenkins, etc.) that support large-scale AI research workflows, including pipelines capable of orchestrating thousands of simultaneous agentic experiments * Develop tooling that bridges local development environments with remote supercomputing resources, enabling researchers to seamlessly leverage massive compute from their IDEs * Manage and optimize code repository infrastructure (GitLab, Phabricator, or similar) to support collaborative research at scale * Implement release management processes and automation to ensure reliable, repeatable deployments of research code and models * Optimize container build systems for GPU workloads, ensuring fast iteration cycles and efficient resource utilization * Work directly with researchers to understand workflow pain points and translate them into infrastructure improvements * Build monitoring and observability into development tooling to identify bottlenecks and continuously improve developer experience * Design and maintain build systems optimized for ML frameworks, CUDA code, and distributed training workloads About the team The team is shaping developer experience from the ground up. Building tools that enable researchers to move at the speed of thought: IDEs that seamlessly shell out to supercomputers, CI/CD pipelines that orchestrate thousands of agentic commands simultaneously, and build systems optimized for GPU-accelerated workflows. Your infrastructure will be the foundation that enables the next generation of AI research, directly contributing to our mission of building the most capable agents in the world.
US, CA, San Francisco
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities * Design, build, and maintain the compute platform that powers all AI research at the SF AI Lab, managing large-scale GPU pools and ensuring optimal resource utilization * Partner directly with research scientists to understand experimental requirements and develop infrastructure solutions that accelerate research velocity * Implement and maintain robust security controls and hardening measures while enabling researcher productivity and flexibility * Modernize and scale existing infrastructure by converting manual deployments into reproducible Infrastructure as Code using AWS CDK * Optimize system performance across multiple GPU architectures, becoming an expert in extracting maximum computational efficiency * Design and implement monitoring, orchestration, and automation solutions for GPU workloads at scale * Ensure infrastructure is compliant with Amazon security standards while creatively solving for research-specific requirements * Collaborate with AWS teams to leverage and influence cloud services that support AI workloads * Build distributed systems infrastructure, including Kubernetes-based orchestration, to support multi-tenant research environments * Serve as the bridge between traditional systems engineering and ML infrastructure, bringing enterprise-grade reliability to research computing About the team This role is part of the foundational infrastructure team at the SF AI Lab, responsible for the platform that enables all research across the organization. Our team serves as the critical link between Amazon's enterprise infrastructure and the Lab's research needs. We are experts in performance optimization, systems architecture, and creative problem-solving—finding ways to push the boundaries of what's possible while maintaining security and reliability standards. We work closely with research scientists, understanding their experimental needs and translating them into robust, scalable infrastructure solutions. Our team has deep expertise in ML framework internals and GPU optimization, but we're also pragmatic systems engineers who build traditional infrastructure with enterprise-grade quality. We value engineers who can balance research velocity with operational excellence, who bring curiosity about ML while maintaining strong fundamentals in systems engineering. This is a small, high-impact team where your work directly enables breakthrough AI research. You'll have the opportunity to work with some of the most advanced AI infrastructure in the world while building the skills that define the future of ML systems engineering.
US, NY, New York
About Sponsored Products and Brands The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About our team The Search Ranking and Interleaving (R&I) team within Sponsored Products and Brands is responsible for determining which ads to show and the quality of ads shown on the search page (e.g., relevance, personalized and contextualized ranking to improve shopper experience, where to place them, and how many ads to show on the search page. This helps shoppers discover new products while helping advertisers put their products in front of the right customers, aligning shoppers’, advertisers’, and Amazon’s interests. To do this, we apply a broad range of GenAI and ML techniques to continuously explore, learn, and optimize the ranking and allocation of ads on the search page. We are an interdisciplinary team with a focus on improving the SP experience in search by gaining a deep understanding of shopper pain points and developing new innovative solutions to address them. A day in the life As an Applied Scientist on this team, you will identify big opportunities for the team to make a direct impact on customers and the search experience. You will work closely with with search and retail partner teams, software engineers and product managers to build scalable real-time GenAI and ML solutions. You will have the opportunity to design, run, and analyze A/B experiments that improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact while broadening your technical skillset. Key job responsibilities - Solve challenging science and business problems that balance the interests of advertisers, shoppers, and Amazon. - Drive end-to-end GenAI & Machine Learning projects that have a high degree of ambiguity, scale, complexity. - Develop real-time machine learning algorithms to allocate billions of ads per day in advertising auctions. - Develop efficient algorithms for multi-objective optimization using deep learning methods to find operating points for the ad marketplace then evolve them - Research new and innovative machine learning approaches. - Recruit Scientists to the team and provide mentorship.