About this CFP
What is Build on Trainium?
Build on Trainium is a $110MM credit program focused on AI research and university education to support the next generation of innovation and development on AWS Trainium. AWS Trainium chips are purpose-built for high-performance deep learning (DL) training of generative AI models, including large language models (LLMs) and latent diffusion models. Build on Trainium provides compute credits to novel AI research on Trainium, investing in leading academic teams to build innovations in critical areas including new model architectures, ML libraries, optimizations, large-scale distributed systems, and more. This multi-year initiative lays the foundation for the future of AI by inspiring the academic community to utilize, invest in, and contribute to the open-source community around Trainium. Combining these benefits with Neuron software development kit (SDK) and recent launch of the Neuron Kernel Interface (NKI), AI researchers can innovate at scale in the cloud.
What are AWS Trainium and Neuron?
AWS Trainium is an AI chip developed by AWS for accelerating building and deploying machine learning models. Built on a specialized architecture designed for deep learning, Trainium accelerates the training and inference of complex models with high output and scalability, making it ideal for academic researchers looking to optimize performance and costs. This architecture also emphasizes sustainability through energy-efficient design, reducing environmental impact. Amazon has established a dedicated Trainium research cluster featuring up to 40,000 Trainium chips, accessible via Amazon EC2 Trn1 instances. These instances are connected through a non-blocking, petabit-scale network using Amazon EC2 UltraClusters, enabling seamless high-performance ML training. The Trn1 instance family is optimized to deliver substantial compute power for cutting-edge AI research and development. This unique offering not only enhances the efficiency and affordability of model training but also presents academic researchers with opportunities to publish new papers on underrepresented compute architectures, thus advancing the field.
Below are key topics Build on Trainium is exploring to drive innovation and enhance the future of AI/ML on AWS. Please develop your proposal addressing one or more of these topics in detail, unless you know better ones.
- Novel kernels and compiler extensions for Trainium. With the recent launch of the Neuron Kernel Interface (NKI), we welcome projects that identify improvement areas such as kernels and compilation artifacts. This could be through implementing an algorithm which was not previously supported on Trainium, improving performance on a known bottleneck, or some other solution. Kernels that are already available on NKI are available here.
- Mixture of expert training and hosting, particularly kernels which optimize these compute patterns. We also invite projects which explore ambitious test-time compute requirements and identify kernel-based solutions for these.
- Kernels to improve model distillation and fine-tuning recipes, such as from a compute perspective. We are inviting researchers to study the compute workloads of model distillation and fine-tuning regimes, including multi-stage. We invite PIs to develop novel kernels which improve these.
- Quantization, such as developing high performance kernels that enable and explore the impact of quantization for language models.
- Novel projects to generate NKI kernels under various conditions, such as improvements upon existing compilation paths, novel bindings, and more.
- Novel algorithms for large language models. As demand increases for integrating language models across services and applications, so too does the need to increase performance while lowering costs. The growing diversity of applications and use cases creates new opportunities for hardware-centric machine learning solutions. Towards that end, we request the development of novel algorithms for large language models, such as:
- Improvements on attention, quantization, embedding generation and processing within neural networks.
- Extensions on context length and necessary algorithms for these.
- Ability to reason. As the complexity of questions sent to AI increases, so too must the ability of language models to address them correctly and efficiently. We invite proposals that study reasoning and show improvement in this area.
- Beyond Transformers. We invite projects that explore novel ways of learning sequences beyond the standard matrix multiplication-based Transformer architecture. In particular we invite proposals that show superior results to Transformers at much smaller scales, while containing the promise of better results at larger scales.
- Multiple modalities. We invite proposals that hope to improve upon existing algorithms which combine language with other modalities. This includes language jointly trained with vision, robotics, etc. We also invite projects which expand the overall breadth of model support on Trainium.
- Model adaptation, fine-tuning, and alignment, such as post-training. We invite proposals that discover novel algorithms to improve this space, with a high focus on GRPO and similar techniques.
- Systems improvements for distributed training and hosting. In this section we invite proposals that adopt a systems perspective around distributed training and hosting for large foundation models. This can include topics along the following:
- Improvements for distributed systems for mixture of experts (MoE), including perspectives that explore the implications for this from data, network, and topology perspectives.
- Improved training efficiency during scale-out training, including checkpoint acceleration and minimizing data movement overhead introduced through this.
- Faster and more resource efficient hosting, especially distributed inference. We also invite hosting-aware training regimes that attempt to mitigate the computational challenges of hosting models by early-stage changes in the training methodology
- Performance analysis tooling and fault handling, including the development of novel performance methodologies. We welcome projects that aim to improve KPIs for foundation models at scale, such as MFU, HBMu, TTFT, TPS, etc.
- Streamline development. In this section, we invite the study and development of tools that accelerate the adoption of AI through a simplified developer experience. We invite proposals around the following:
- Automation to reduce the search space time in finding optimal compute architectures and model parameters.
- Automation to reduce development work in migration across accelerator architectures.
- Studying broadly adopted compute orchestration platforms and identifying novel enhancements for them, such as incorporating distributed system benefits and reducing operator complexity. Overall we welcome work that seeks to minimize operator intervention in distributed training
Technical deep dive: Your approach to building on Trainium
We invite applicants to study the Trainium and Inferentia accelerator design, available software libraries and our samples for Trainium. We welcome your detailed perspective about this toolset. We want to know what you think about how well the specific models and operations you intend to leverage in your research proposal should work on Trainium, giving the existing tools you see available today. Please plan on bringing your educated perspective about your approach to building on Trainium into the proposal.
Applicants are strongly encouraged to test small versions of their proposed software stacks on the Trainium and/or Inferentia instance families, using Neuron SDK solutions like NxD and NKI, in advance of submitting their proposals. The most compelling and ambitious proposals will present empirical results of their tests in the proposal itself. For details about how to get started on Trainium, follow instructions here.
Timeline
Submission period: March 19 to April 30, 2025 (11:59PM Pacific Time).
Decision letters will be sent out by August 2025.
Award details
Selected Principal Investigators (PIs) may receive the following:
- Applicants are encouraged to request AWS Promotional Credits in one of two ranges:
- AWS Promotional Credits, up to $50,000
- AWS Promotional Credits, up to $250,000 and beyond
- AWS Trainium training resources, including AWS tutorials and hands-on sessions with Amazon scientists and engineers
Awards are structured as one-time unrestricted gifts. The budget should include a list of expected costs specified in USD, and should not include administrative overhead costs. The final award amount will be determined by the awards panel.
Your receipt and use of AWS Promotional Credits is governed by the AWS Promotional Credit Terms and Conditions, which may be updated by AWS from time to time.
Eligibility requirements
Please refer to the ARA Program rules on the Rules and Eligibility page.
Proposal requirements
PIs are encouraged to exemplify how their proposed techniques or research studies advance kernel optimization, LLM innovation, distributed systems, or developer efficiency. PIs should either include plans for open source contributions or state that they do not plan to make any open source contributions (data or code) under the proposed effort. Proposals for this CFP should be prepared according to the proposal template and are encouraged to be a maximum of 3 pages, not including Appendices.
Selection criteria
Proposals will be evaluated on the following:
- Creativity and quality of the scientific content
- Potential impact to the research community and society at large
- Interest expressed in open-sourcing model artifacts, datasets and development frameworks
- Intention to use and explore novel hardware for AI/ML, primarily AWS Trainium and Inferentia
Expectations from recipients
To the extent deemed reasonable, Award recipients should acknowledge the support from ARA. Award recipients will inform ARA of publications, presentations, code and data releases, blogs/social media posts, and other speaking engagements referencing the results of the supported research or the Award. Award recipients are expected to provide updates and feedback to ARA via surveys or reports on the status of their research. Award recipients will have an opportunity to work with ARA on an informational statement about the awarded project that may be used to generate visibility for their institutions and ARA.