Training large language models more efficiently

Training separate models on different datasets and then merging them reduces computational costs by as much as 91%.

Large language models (LLMs) go through several stages of training on mixed datasets with different distributions, stages that include pretraining, instruction tuning, and reinforcement learning from human feedback. Finding the optimal mix of data distributions across datasets is essential to building accurate models, but it typically requires training and evaluating the model numerous times on a very large set of combinations.

At the last Conference on Empirical Methods in Natural-Language Processing (EMNLP), my colleagues and I proposed a training framework that reduces the computational cost of using mixed data distributions to train LLMs or other neural-network-based models by up to 91%. At the same time, the method actually improves the quality of the resulting models.

Whereas the standard approach to optimizing data distributions involves weighting the different datasets used to train a single model, we train a separate model on each dataset and then weight the models to produce a composite model.

This unconventional approach won a special award for “efficient modeling, training, and inference” at EMNLP and has the potential to make large-model training much more efficient and accessible.

Distribution-edited models

Traditional training approaches (e.g., instruction tuning) select the optimal mix of training data distributions through a method called grid search, an exhaustive-search method that simply compares outcomes for a wide range of different weight values. This is very demanding not only in terms of time and resources but also in terms of flexibility: once the model is trained, it can’t be changed without incurring similar costs.

To address these limitations, we propose fine-tuning a pretrained model on data distributions that correspond to different tasks and then subtracting the parameter values of the original model from those of the fine-tuned models. We call the differences in parameter values distribution vectors, and we produce a composite model by adding a weighted sum of distribution vectors to the parameters of the original model.

Related content
Theoretical analysis provides insight into the optimization process during model training and reveals that for some optimizations, the Gaussian attention kernel may work better than softmax.

We call the resulting model a distribution-edited model (DEM) to highlight the leveraging of weight vector arithmetic for model editing. The weights are based on the perplexity of each fine-tuned model, or the probability that its parameter values can be predicted from those of the original model.

This approach relies on two key observations: (1) training the model separately on each dataset allows better modeling of each dataset’s underlying properties, as there is no interference with other data distributions during the training process; and (2) perplexity can be computed in a single forward pass on validation data, which is much more efficient than grid search. The first point helps improve model quality, and the second point helps make training much more efficient.

In more detail, here are the steps in the approach:

  1. Individual-distribution training: The original model is trained on individual data distributions through standard training procedures. Checkpoints, or snapshots of the model state after training on a particular dataset, are stored for subsequent steps.
  2. Distribution vector computation: Distribution vectors are computed by subtracting the pretrained model's parameters from those of the fine-tuned models. These vectors capture the unique characteristics of each dataset.
  3. Optimization of merging coefficients: The optimal coefficients for combining the data distribution vectors are found based on perplexity on the validation set using a single forward pass per combination.
  4. Merging of distribution vectors: Linearly combining the distribution vectors with customizable weights creates a unified model that effectively captures the joint distribution of diverse datasets.
  5. Resulting properties (flexibility and scalability): DEM enables incremental updates when new datasets are introduced, without requiring full retraining. This makes it ideal for dynamic and large-scale training scenarios.
Distribution-edited models.jpg
With distribution-edited models (DEMs), a pretrained model is fine tuned on data distributions that correspond to different tasks D1 – ΘDn). Then the parameter values of the original model (Θ) are subtracted from those of the fine-tuned models, producing a set of distribution vectors (ΔΘD1 – ΔΘDn). The DEM is a composite D) produced by adding a weighted sum of distribution vectors (Σ) to the parameters of the original model.

Evaluation and future work

In evaluating our approach, we focused on training LLMs of increasing size, from 3 billion parameters up to 13 billion parameters, during the instruction-tuning stage. Our study showed that DEM reduces training costs by up to 91% while achieving up to 16.1% quality improvement over traditional data-mixing strategies, highlighting DEM’s potential to democratize access to state-of-the-art training techniques and offer transformative benefits to organizations leveraging neural models at scale. In addition, DEM’s flexibility ensures that researchers and practitioners can quickly adapt to new data requirements without compromising performance.

Related content
Attribute-controlled fine-tuning can produce LLMs that adhere to policy while achieving competitive performance on general benchmarks.

The key takeaways from the study can be summarized as follows:

  • Superior performance: DEM has been validated on popular benchmarks like MMLU, BBH, and HELM, where it achieved up to 16.1% improvement over data mixing on individual tasks.
  • Diverse domain effectiveness: Experiments on datasets such as MathQA, Super-Natural Instructions (SNI), and Chain-of-Thought (CoT) demonstrate DEM’s ability to excel across a variety of domains.
  • Scalability: DEM is shown to improve performance at different model sizes — 3B, 7B and 13B — providing strong evidence for the scalability of this approach.

The effectiveness of DEM underscores the importance of innovation in making machine learning more efficient and accessible. As the machine learning community continues to scale models and datasets, frameworks like DEM will be essential for maintaining efficiency without sacrificing performance. Future research may explore the effectiveness of the framework on other training scenarios and its extension to other model architectures, such as encoder-decoder frameworks or mixture-of-experts models.

Research areas

Related content

US, WA, Redmond
We are searching for a talented candidate with expertise in orbital mechanics and spaceflight navigation, including LEO Satellite Orbit Determination. This position requires experience in simulation and analysis of spacecraft orbital mechanics and sequential orbit determination methods, including Extended Kalman Filters (EKF) and/or Unscented Kalman Filter (UKF). Strong analysis skills are required to develop engineering studies of complex large-scale dynamical systems. This position requires demonstrated expertise in computational analysis automation and tool development. Key job responsibilities - Perform spacecraft maneuver or navigation analysis in support of multi-disciplinary trades within the Amazon Leo team. - Contribute to prototype software development of flight algorithms. - Test and assess navigation software for integration into flight systems. - Assess and trouble-shoot the performance of Leo on-board GNSS hardware and software systems. - Work closely with GNC engineers to manage on-orbit performance and develop flight dynamics operations processes. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. A day in the life - Interacting with GNC teams to evaluate and troubleshoot satellite issues. - Working within the Flight Dynamics Research team to prioritize tasks. - Performing analysis, simulation, testing and documentation to address assigned tasks.
US, CA, San Francisco
Amazon Industrial Robotics is on a mission to redefine the future of automation — and we're looking for exceptional talent to help lead the way. We are building the next generation of advanced robotic systems that seamlessly blend cutting-edge AI, sophisticated control systems, and novel mechanical design to create adaptable, intelligent automation solutions capable of operating safely alongside humans in dynamic, real-world environments. At Amazon Industrial Robotics, we leverage the power of machine learning, artificial intelligence, and advanced robotics to solve some of the most complex operational challenges at a scale unlike anywhere else in the world. Our fleet of robots spans hundreds of facilities globally, working in sophisticated coordination to deliver on our promise of customer excellence — and we're just getting started. As a Sr. Applied Scientist in Robot Perception, you will be at the forefront of this transformation. You will develop and deploy state-of-the-art perception algorithms that enable robots to truly understand and interact with the physical world — bridging the gap between theoretical research and realworld impact. Bringing deep expertise in Computer Vision and a nuanced understanding of the capabilities and limitations of modern Vision-Language Models (VLMs), you will innovate boldly and push the boundaries of what's possible. Our vision for the Perception layer is ambitious: to enable seamless, intelligent interaction between the user, the robot, and its environment. This is a rare opportunity to work at the intersection of deep learning, large language models, and robotics — contributing to research that doesn't just advance the field, but reshapes it. You will collaborate with world-class teams pioneering breakthroughs in dexterous manipulation, locomotion, and humanrobot interaction, all at an unprecedented scale. Key job responsibilities Design, develop, and deploy perception algorithms for robotics systems, including object detection, segmentation, tracking, depth estimation, and scene understanding • Lead research initiatives in computer vision, sensor fusion and 3D perception • Collaborate with cross-functional teams including robotics engineers, software engineers, and product managers to define and deliver perception capabilities • Drive end-to-end ownership of ML models — from data collection and labeling strategy to training, evaluation, and deployment • Mentor junior scientists and engineers; contribute to a culture of technical excellence • Define and track key metrics to measure perception system performance in real-world environments • Publish research findings in top-tier venues (CVPR, ICCV, ECCV, ICRA, NeurIPS, etc.) and contribute to patents A day in the life Train ML models for deployment in simulation and real-world robots, identify and document their limitations post-deployment • Drive technical discussions within your team and with key stakeholders to develop innovative solutions to address identified limitations • Actively contribute to brainstorming sessions on adjacent topics, bringing fresh perspectives that help peers grow and succeed — and in doing so, build lasting trust across the team • Mentor team members while maintaining significant hands-on contribution to technical solutions About the team Our Industrial Robotics Group is a diverse group of scientists and engineers passionate about building intelligent machines. We value curiosity, rigor, and a bias for action. We believe in learning from failure and iterating quickly toward solutions that matter.
IN, KA, Bengaluru
Amazon.com’s Product Detail Page team is looking for talented, motivated and passionate applied scientist to be part of the design and development of a highly scalable multi-tiered shopping application to provide the best possible online shopping experience for Amazon customers world-wide. Our team is comprised of talented applied scientists, developers, testers, program managers, designers and product managers tasked with the singular goal to create THE world's best buying experience. Scientists on this team develop the next-generation technologies and experiences that change how millions interact and shop online. To provide the best possible online shopping at the scale of the web requires ideas from every area of computer science, including distributed computing, large-scale system design, machine learning, natural language processing, data compression and user interface design; the list goes on and is growing every day. We need our scientists to be versatile and always eager to tackle new problems as we continue to push technology forward. Our team leverages sophisticated econometric, machine learning, and big data technologies to help customers to discover the right products at the right prices from millions of trusted sellers billions of times a day. If you are looking for a career-defining opportunity on one of the most customer centric and business impacting teams within Amazon, we’d love to hear from you. We are looking for an Applied Scientist to help build the next generation of Detail Page optimization algorithms. These new set of algorithms will incorporate the continually changing preferences of our customers and continue to scale with numerous new programs that Amazon is introducing for our customers. You will work with multiple Amazon businesses and programs to identify big business opportunities and propose new business features and technical systems to improve customer experience on Amazon Detail Page, Search Page and many other widgets throughout the website. You will be responsible for the quality of algorithm design and will get the opportunity to present your ideas and share results of your deliverables with Amazon executives on a frequent basis. You will get an opportunity to work with senior scientists to define and enforce broad, company-wide technical standards in optimization techniques, statistical modeling and simulation techniques, and/or data analytics.
IT, Turin
As a Senior Applied Scientist in the Alexa AI team, you will define and drive the science roadmap for state-of-the-art conversational AI systems powered by large language models, directly impacting how millions of customers interact with Alexa daily. You'll lead the design of LLM fine-tuning, alignment, and agentic architectures that operate reliably at scale, owning end-to-end delivery from research formulation through production deployment. Working at the intersection of research and production, you'll translate state of the art advances into customer-facing features. Your work will span the full ML lifecycle: developing novel evaluation frameworks, building automated training pipelines, and conducting rigorous experimentation across diverse devices and endpoints. Collaborating with engineering, product, and cross-functional science teams across Amazon, you'll tackle the team's most complex technical challenges while maintaining practical focus on customer value. This role offers the opportunity to publish at top-tier conferences, generate intellectual property, and see your innovations scale to one of the world's most popular voice assistants. Key job responsibilities As a Senior Applied Scientist in the Alexa AI team: - Define and drive the science roadmap for conversational AI capabilities powered by large language models - Design, implement, and evaluate novel approaches to LLM fine-tuning, alignment (RLHF, DPO), and distillation for production deployment - Architect agentic systems (multi-step reasoning, tool use, planning, and orchestration) that work reliably at scale - Develop evaluation frameworks and methodologies that go beyond standard benchmarks to capture real-world conversational quality - Translate research advances into customer-facing products, working closely with engineering, product, and cross-functional science teams - Own end-to-end delivery of complex, ambiguous research initiatives from problem formulation through experimentation to production deployment, with minimal guidance - Tackle the team's most complex technical problems while maintaining practical focus on customer value and solution generalizability - Advance the team's scientific reputation through high-impact publications and presentations at top-tier internal and external venues, and generate intellectual property through patents The applicable collective agreement for this role is CBA for employees of Telecommunication Sector. The position is classified at level 6 or above, depending on the candidate’s skills, competences and experience. The minimum gross annual base salary for this position is listed below. The base salary listed corresponds to working on a full-time basis. For part-time hours, the salary will be pro-rated. Amazon reserves the right to offer a higher salary and/or level, depending on the candidate's skills, competencies, and experience. Amazon's package may include a sign on payment. In addition, the candidate may be eligible to participate in a restricted stock unit scheme operated independently by Amazon.com Inc. in USA. Your recruiting team will share final salary and any restricted stock unit scheme if applicable, depending on skills and requirements. In addition to statutory benefits, and those applicable to the relevant CBA, company supplementary benefits may apply subject to further terms. Italy- EUR104,500 gross annually. A day in the life As a Senior Applied Scientist in the Alexa AI team, your day will involve leading cross-functional collaborations with engineering, product, and science teams to define the technical direction for our conversational assistant. You'll design experiments that shape the science roadmap, mentor junior scientists, and make high-judgment calls on architecture and deployment trade-offs. Working in a fast-paced, ambiguous environment, you'll own end-to-end delivery of complex initiatives: from formulating novel research problems to presenting strategic recommendations to senior leadership. Your ability to influence across organizational boundaries will drive measurable customer impact while raising the bar for millions of customers. About the team Alexa AI is building the science and technology behind Alexa+, Amazon's next-generation conversational assistant. Our team works at the intersection of large language models, reinforcement learning from human feedback and verifiable rewards, agentic architectures, and multilingual/multimodal understanding. We operate at massive scale: our models serve customers across dozens of languages and device types. If you want to push the frontier of conversational AI and see your work used by people every day, come join us.
US, WA, Bellevue
The Supply Chain Optimization Technologies (SCOT) team builds technology to automate and optimize Amazon’s supply chain of physical goods. We seek a Data Scientist with strong analytical and communication skills to join our team. SCOT manages Amazon's inventory under uncertainty of demand, pricing, promotions, supply, vendor lead times, and product life cycle. We optimize complex trade-offs between customer experience, inventory costs, fulfillment costs, fulfillment center capacity, etc. We develop sophisticated algorithms that involve learning from large amounts of data such as prices, promotions, similar products, and other data from our product catalog in order to automatically act on millions of dollars’ worth of inventory weekly and establish plans for tens of thousands of employees. As a Data Scientist, you will contribute to the research community, by working with other scientists across Amazon and our Supply Chain, as well as collaborating with academic researchers and publishing papers both internally and externally. Key job responsibilities Major responsibilities include: - Analysis of large amounts of data from different parts of the supply chain and their associated business functions - Improving upon existing machine learning methodologies by developing new data sources, developing and testing model enhancements, running computational experiments, and fine-tuning model parameters for new models - Formalizing assumptions about how models are expected to behave, creating definitions of outliers, developing methods to systematically identify these outliers, and explaining why they are reasonable or identifying fixes for them - Communicating verbally and in writing to business customers with various levels of technical knowledge, educating them about our research, as well as sharing insights and recommendations - Utilizing code (Python, R, Scala, etc.) for analyzing data and building statistical and machine learning models and algorithms A day in the life As a Data Scientist in SCOT, you will be tasked to understand and work with innovative research tools to enable the implementation of sophisticated models on big data. As a successful data scientist in the SCOT team, you are an analytical problem solver who enjoys diving into data from various businesses, is excited about investigations and algorithms, can multi-task, and can credibly interface between scientists, engineers and business stakeholders. Your expertise in synthesizing and communicating insights and recommendations to audiences of varying levels of technical sophistication will enable you to answer specific business questions and innovate for the future. Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: - Medical, Dental, and Vision Coverage - Maternity and Parental Leave Options - Paid Time Off (PTO) - 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply!
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next-level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Key job responsibilities * Develop, deploy, and operate scalable bioinformatics analysis workflows on AWS * Evaluate and incorporate novel bioinformatic approaches to solve critical business problems * Originate and lead the development of new data collection workflows with cross-functional partners * Partner with laboratory science teams on design and analysis of experiments About the team Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, CA, San Jose
Are you excited about using econometrics to make multi-million dollar decisions more Science and Data Driven? Are you interested in supporting Consumer Hardware device concepts from innovative idea inception to launch? Do you want to work on a Economics and Data Science team focused on tackling some of the hardest business questions within the Devices business at Amazon and then scaling those Statistics and Econometrics solutions via internal to Amazon tools? Then this could be the role for you! The Decision Science team owns demand estimates and pricing recommendations of concept devices before customers know they exist. We support analyses on hardware and services ranging from Echo Frames to Kindle Paperwhite to Blink Video Camera subscriptions to the Amazon Smart Plug - all prior to launch. In this role, you will develop science for high visible senior leadership decisions on new devices and services and work with a cross-functional team to apply and scale innovative science broadly. Key job responsibilities - Design, estimate, and scale Berry-Levinsohn-Pakes (BLP) random coefficients demand models to quantify consumer heterogeneity, own- and cross-price elasticities, and substitution patterns across large product markets. - Implement and optimize numerical routines—including GMM estimation, contraction mappings, and simulation-based inversion—to solve structural demand systems at scale in Python. - Develop and validate instrumental variables strategies to address price endogeneity in differentiated product markets, ensuring unbiased and robust demand parameter estimates. - Build production-grade pipelines that ingest large-scale observational datasets, estimate consumer preferences, and generate product-level demand forecasts on recurring schedules. - Collaborate with cross-functional teams including product management, marketing, and operations to translate structural model outputs—such as willingness-to-pay and competitive diversion ratios—into actionable pricing and portfolio strategies. - Advance the team's structural modeling capabilities by researching and deploying extensions to classical BLP frameworks (e.g., supply-side estimation, dynamic demand, micro-moments) and documenting approaches in clear technical reports.
CN, 31, Shanghai
You will be working with a unique and gifted team developing exciting products for consumers. The team is a multidisciplinary group of engineers and scientists engaged in a fast paced mission to deliver new products. The team faces a challenging task of balancing cost, schedule, and performance requirements. You should be comfortable collaborating in a fast-paced and often uncertain environment, and contributing to innovative solutions, while demonstrating leadership, technical competence, and meticulousness. Your deliverables will include development of thermal solutions, concept design, feature development, product architecture and system validation through to manufacturing release. You will support creative developments through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques. Key job responsibilities * Evaluate and optimize thermal solution requirements of consumer electronic products * Use simulation tools like Star-CCM+ or FloTherm XT/EFD for analysis and design of products * Validate design modifications for thermal concerns using simulation and actual prototypes * Establish temperature thresholds for user comfort level and component level considering reliability requirements * Have intimate knowledge of various materials and heat spreaders solutions to resolve thermal issues * Use of programming languages like Python and Matlab for analytical/statistical analyses and automation * Collaborate as part of device team to iterate and optimize design parameters of enclosures and structural parts to establish and deliver project performance objectives * Design and execute of tests using statistical tools to validate analytical models, identify risks and assess design margins * Create and present analytical and experimental results * Develop and apply design guidelines based on project learnings
US, WA, Seattle
Amazon's Stores-Ads Science team operates at the intersection of Amazon's Stores and advertising businesses. We develop causal measurement systems, optimization algorithms, and machine learning models that inform how advertising affects shopper engagement, driving selling partner growth and marketplace economics. Our science shapes decisions both at the strategic level and in production systems. We are a team of interdisciplinary scientists who combine causal inference, economic modeling, and machine learning to drive measurable business impact. We are looking for an Applied Science Manager to lead our Ads Impact initiative. This team owns the science of understanding and optimizing how advertising creates value for shoppers and selling partners. What makes this role distinctive is its position at the frontier of AI and Economics: as Amazon's shopping experience evolves from traditional search toward LLM-powered, agentic commerce, the fundamental mechanisms through which advertising creates value are changing. This role will partner with leading scientists and academic researchers to measure these effects through large-scale causal experimentation, and develop novel methods to encode causal and economic reasoning into AI systems that optimize the shopping experience. Key job responsibilities In this role, you will lead a team of scientists, setting the technical vision and science roadmap for ads impact measurement and optimization. You will design experiments that identify the causal mechanisms through which advertising drives shopper engagement, advertiser value, and marketplace outcomes. You will develop optimization algorithms that integrate these causal signals into production and business decision-making, in close partnership with engineering and product teams across the organization. You will lead the research and communicate findings and recommendations to senior leadership through written narratives that connect technical science to business strategy. This role requires deep expertise in causal inference and experimental design, combined with strong applied ML skills and the engineering judgment to translate research into production systems. You will hire and develop future science leaders, think strategically, set ambitious roadmaps in highly ambiguous problem spaces, and foster a culture that values both intellectual depth and production impact. You will work cross-functionally, influencing across organizational boundaries to drive alignment on complex, multi-sided tradeoffs.
US, WA, Seattle
RISC's vision is to make Amazon Earth’s most trusted shopping destination for safe and compliant products. We do this by protecting customers from products that are unsafe, illegal, illegally marketed, controversial or otherwise in violation of Amazon's policies while enabling our Selling Partners (SPs) to offer their broadest selection of safe and compliant products. We are seeking an exceptional Applied Scientist to join a team of experts in the field of agentic AI, GenAI, Machine Learning, Software Engineers, and work together to tackle challenging problems across diverse compliance domains. We leverage and train state-of-the-art large-language-models (LLMs), multi-modal model, mixed with elegant harness engineering and SKILL building to 1) detect illegal and unsafe products across the Amazon catalog; 2) automation safety and compliance content authoring; 3) reasoning over enforcement action to provide actionable insights to Amazon sellers. We work on machine learning problems for content generation, multi-modal classification, global product taxonomy, intent detection, information retrieval, anomaly and fraud detection, agentic AI, generative AI and multi-agent system. This is an exciting and challenging position to deliver scientific innovations into production systems at Amazon-scale to make immediate, meaningful customer impacts while also pursuing ambitious, long-term research. You will work in a highly collaborative environment where you can analyze and process large amounts of image, text, unstructured and tabular data. You will work on challenging science problems that have not been solved before, conduct rapid prototyping to validate your hypothesis, and deploy your algorithmic ideas at scale. There will be something new to learn every day as we work in an environment with rapidly evolving regulations and adversarial actors looking to outwit your best ideas. Key job responsibilities • Design and evaluate state-of-the-art algorithms and approaches in content generation, multi-modal classification, global product taxonomy, intent detection, information retrieval, anomaly and fraud detection, agentic AI, generative AI and multi-agent system. • Translate product and CX requirements into measurable science problems and metrics. • Collaborate with product and tech partners and customers to validate hypothesis, drive adoption, and increase business impact • Key author in writing high quality scientific papers in internal and external peer-reviewed conferences. A day in the life • Understanding customer problems, project timelines, and team/project mechanisms • Proposing science formulations and brainstorming ideas with team to solve business problems • Writing code, and running experiments with re-usable science libraries • Reviewing labels and audit results with investigators and operations associates • Sharing science results with science, product and tech partners and customers • Writing science papers for submission to peer-review venues, and reviewing science papers from other scientists in the team. • Contributing to team retrospectives for continuous improvements • Driving science research collaborations and attending study groups with scientists across Amazon