NeurIPS: Why causal-representation learning may be the future of AI

Francesco Locatello on the four NeurIPS papers he coauthored this year, which largely concern generalization to out-of-distribution test data.

In a conversation right before the 2021 Conference on Neural Information Processing Systems (NeurIPS), Amazon vice president and distinguished scientist Bernhard Schölkopf — according to Google Scholar, the most highly cited researcher in the field of causal inference — said that the next frontier in artificial-intelligence research was causal-representation learning.

Where existing approaches to causal inference use machine learning to discover causal relationships between variables — say, the latencies of various interrelated services on a website — causal-representation learning learns the variables themselves. “These kinds of causal representations will also go toward reasoning, which we will ultimately need if we want to move away from this pure pattern recognition view of intelligence,” Schölkopf said.

Francesco.jpg
Senior applied scientist Francesco Locatello.

Francesco Locatello, a senior applied scientist with Amazon Web Services, leads Amazon’s research on causal-representation learning, and he’s a coauthor on four papers at this year’s NeurIPS.

Assaying out-of-distribution generalization in transfer learning” concerns one of the most compelling applications of causal inference in machine learning: generalizing models trained on data with a particular probability distribution to real-world data with a different distribution.

“When you do standard machine learning, you are drawing independent samples from some probability distribution, and then you train a model that's going to generalize to the same distribution,” Locatello explains. “You're describing a physical system using a single probability distribution. Causal models are different because they model every possible state that this physical system can take as a result of an intervention. So instead of having a single probability distribution, you have a set of distributions.

Related content
Amazon Science hosts a conversation with Amazon Scholars Michael I. Jordan and Michael Kearns and Amazon distinguished scientist Bernhard Schölkopf.

“What does it mean that your test data comes from a different distribution? You have the same underlying physical system; the causal structure is the same. It's just a new intervention you have not seen. Your test distribution is different than the training, but now it's not an arbitrary distribution. It’s well posed because it's entailed by the causal structure, and it's a meaningful distribution that may happen in the real world.”

In “Assaying out-of-distribution generalization in transfer learning”, Locatello explains, “what we do is to collect a huge variety of datasets that are constructed for or adapted to this scenario where you have a very narrow data set that you can use for transfer learning, and then you have a wide variety of test data that is all out of distribution. We look at the different approaches that have been studied in the literature and compare them on fair ground.”

Although none of the approaches canvassed in the paper explicitly considers causality, Locatello says, “causal approaches should eventually be able to do better on this benchmark, and this will allow us to evaluate our progress. That's why we built it.”

Neural circuits

Today’s neural networks do representation learning as a matter of course: their inputs are usually raw data, and they learn during training which aspects of the data are most useful for the task at hand. As Schölkopf pointed out in conversation last year, causal-representation learning would simply bring causal machine learning models up to speed with conventional models.

Related content
New method goes beyond Granger causality to identify only the true causes of a target time series, given some graph constraints.

“The important thing to realize is that most machine learning applications don't come structured as a set of well-defined random variables that fully align with the underlying functioning of a physical system,” Locatello explains. “We still want to model these systems in terms of abstract variables, but nobody gives these variables to us. So you may want to learn them in order to be able to perform causal inference.”

Among his and his colleagues’ NeurIPS papers, Locatello says, the one that comes closest to the topic of causal-representation learning is “Neural attentive circuits”. Causal models typically represent causal relationships using graphs, and a neural network, too, can be thought of as an enormous graph. Locatello and his collaborators are trying to make that analogy explicit, by training a neural network to mimic the structure of a causal network.

Neural attentive circuits.png
Visualizations of graph structures learned by neural attentive circuits, from "Neural attentive circuits".

“This is a follow-up on a paper we had last year in NeurIPS,” Locatello says. “The inspiration was to design architectures that behave more similarly to causal models, where you have the noise variables — that's the data — and then you have variables that are being manipulated by functions, and they simply communicate with each other in a graph. And this graph can change dynamically when a distribution changes, for example, because of an intervention.

“In the first paper, we developed an architecture that behaves exactly like that: you have a set of neural functions that can be composed on the fly, depending on the data and the problem. The functions, the routing, and the stitching of the functions are learned. Everything is learned. But it turns out that dynamic stitching is not very scalable.

“In this new work, we essentially compiled the stitching of the functions so that for each sample it's decided beforehand — where it's going to go through the network, how the functions are going to be composed. Instead of doing it on the fly one layer at a time, you decide for the overall forward pass. And we demonstrated that these sparse learned connectivity patterns improve out-of-distribution generalization.”

Success stories

Locatello’s other NeurIPS papers are on more-conventional machine learning topics. “Self supervised amodal video object segmentation” considers the problem of reconstructing the silhouette of an occluded object, which is crucial to robotics applications, including autonomous cars.

Locatello 16_9.png
Segmentations of partially occluded objects, from "Self supervised amodal video object segmentation".

“We exploit the principle that you can build information about an object over time in a video,” Locatello explains. “Perhaps in past frames you've seen parts of the objects that are now occluded. If you can remember that you've seen this object before, and this was its segmentation mask, you can build up your segmentation over time.”

The final paper, “Are two heads the same as one? Identifying disparate treatment in fair neural networks”, considers models whose training objectives are explicitly designed to minimize bias across different types of inputs. Locatello and his colleagues find that frequently, such models — purely through training, without any human intervention — develop two “heads”: that is, they learn two different pathways through the neural network, one for inputs in the sensitive class, and one for all other inputs.

Related content
Amazon ICML paper proposes information-theoretic measurement of quantitative causal contribution.

The researchers argue that, since the network is learning two heads, anyway, it might as well be designed with a two-headed architecture: that would improve performance while meeting the same fairness standard. But this approach hasn’t been adopted, as it runs afoul of rules prohibiting disparate treatment of different groups. In this case, however, disparate treatment could be the best way to ensure fair treatment.

These last two papers are only obliquely related to causality. But, Locatello says, “causal-representation learning is a very young field. So we are trying to identify success stories, and I think these papers are going in that direction.”

“It's clear that causality will have a role in future machine learning,” he adds, “because there are a lot of open problems in machine learning that can at least be partially addressed when you start looking at causal models. And my goal really is to realize the benefits of causal models in mainstream machine learning applications. That's why some of these works are not necessarily about causality, but closer to machine learning. Because ultimately, that's our goal.”

Learn more about Amazon at NeurIPS 2022

For more on the Amazon research being presented at this year's NeurIPS, see our quick guide to Amazon's NeurIPS 2022 papers.

Related content

IN, KA, Bengaluru
RBS (Retail Business Services) Tech team works towards enhancing the customer experience (CX) and their trust in product data by providing technologies to find and fix Amazon CX defects at scale. Our platforms help in improving the CX in all phases of customer journey, including selection, discoverability & fulfilment, buying experience and post-buying experience (product quality and customer returns). The team also develops GenAI platforms for automation of Amazon Stores Operations. As a Sciences team in RBS Tech, we focus on foundational ML research and develop scalable state-of-the-art ML solutions to solve the problems covering customer experience (CX) and Selling partner experience (SPX). We work to solve problems related to multi-modal understanding (text and images), task automation through multi-modal LLM Agents, supervised and unsupervised techniques, multi-task learning, multi-label classification, aspect and topic extraction for Customer Anecdote Mining, image and text similarity and retrieval using NLP and Computer Vision for product groupings and identifying duplicate listings in product search results. Key job responsibilities As an Applied Scientist, you will be responsible to design and deploy scalable GenAI, NLP and Computer Vision solutions that will impact the content visible to millions of customer and solve key customer experience issues. You will develop novel LLM, deep learning and statistical techniques for task automation, text processing, image processing, pattern recognition, and anomaly detection problems. You will define the research and experiments strategy with an iterative execution approach to develop AI/ML models and progressively improve the results over time. You will partner with business and engineering teams to identify and solve large and significantly complex problems that require scientific innovation. You will independently file for patents and/or publish research work where opportunities arise. The RBS org deals with problems that are directly related to the selling partners and end customers and the ML team drives resolution to organization level problems. Therefore, the Applied Scientist role will impact the large product strategy, identifies new business opportunities and provides strategic direction which is very exciting.
IN, KA, Bengaluru
Selection Monitoring team is responsible for making the biggest catalog on the planet even bigger. In order to drive expansion of the Amazon catalog, we develop advanced ML/AI technologies to process billions of products and algorithmically find products not already sold on Amazon. We work with structured, semi-structured and Visually Rich Documents using deep learning, NLP and image processing. The role demands a high-performing and flexible candidate who can take responsibility for success of the system and drive solutions from research, prototype, design, coding and deployment. We are looking for Applied Scientists to tackle challenging problems in the areas of Information Extraction, Efficient crawling at internet scale, developing ML models for website comprehension and agents to take multi-step decisions. You should have depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning. You should also have programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale. You will encounter many challenges, including: - Scale (build models to handle billions of pages), - Accuracy (requirements for precision and recall) - Speed (generate predictions for millions of new or changed pages with low latency) - Diversity (models need to work across different languages, market places and data sources) You will help us to - Build a scalable system which can algorithmically extract information from world wide web. - Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web. - Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents. Key job responsibilities - Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems. - Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes. - Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc. - Work closely with software engineering teams to drive real-time model implementations. - Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance. - Lead projects and mentor other scientists, engineers in the use of ML techniques. - Publish innovation in research forums.
US, MA, Boston
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON WEB SERVICES, INC. Offered Position: Data Scientist III Job Location: Boston, Massachusetts Job Number: AMZN9674163 Position Responsibilities: Own the data science elements of various products to help with data-based decision making, product performance optimization, and product performance tracking. Work directly with product managers to help drive the design of the product. Work with Technical Product Managers to help drive the build planning. Translate business problems and products into data requirements and metrics. Initiate the design, development, and implementation of scientific analysis projects or deliverables. Own the analysis, modelling, system design, and development of data science solutions for products. Write documents and make presentations that explain model/analysis results to the business. Bridge the degree of uncertainty in both problem definition and data scientific solution approaches. Build consensus on data, metrics, and analysis to drive business and system strategy. Position Requirements: Master's degree or foreign equivalent degree in Statistics, Applied Mathematics, Economics, Engineering, Computer Science or a related field and two years of experience in the job offered or a related occupation. Employer will accept a Bachelor's degree or foreign equivalent degree in Statistics, Applied Mathematics, Economics, Engineering, Computer Science, or a related field and five years of progressive post-baccalaureate experience in the job offered or a related occupation as equivalent to the Master's degree and two years of experience. Must have one year of experience in the following skills: (1) building statistical models and machine learning models using large datasets from multiple resources; (2) building complex data analyses by leveraging scripting languages including Python, Java, or related scripting language; and (3) communicating with users, technical teams, and management to collect requirements, evaluate alternatives, and develop processes and tools to support the organization. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $161,803/year to $215,300/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
US, CA, Palo Alto
About Sponsored Products and Brands The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities As a Machine Learning Applied Scientist, you will: * Conduct deep data analysis to derive insights to the business, and identify gaps and new opportunities * Develop scalable and effective machine-learning models and optimization strategies to solve business problems * Run regular A/B experiments, gather data, and perform statistical analysis * Work closely with software engineers to deliver end-to-end solutions into production * Improve the scalability, efficiency and automation of large-scale data analytics, model training, deployment and serving * Conduct research on new machine-learning modeling and Generative AI solutions to optimize all aspects of Sponsored Products and Brands business About the team The Ad Response Prediction team within Sponsored Products and Brands (SPB) drives personalized shopping experiences for SPB Ads across placements, pages, and devices worldwide. We achieve this through ML and GenAI solutions that include customized shopper response prediction and session-level understanding to optimize every stage of the ad-serving process, from sourcing and bidding to widget discovery and auctions. Our responsibilities include advancing response prediction through model and feature innovations and extending prediction beyond the auction stage to areas such as targeting, sourcing, and bidding.
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, CA, Santa Clara
We are seeking an Applied Scientist II to join Amazon Customer Service's Science team, where you will build AI-based automated customer service solutions using state-of-the-art techniques in retrieval-augmented generation (RAG), agentic AI, and post-training of large language models. You will work at the intersection of research and production, developing intelligent systems that directly impact millions of customers while collaborating with scientists, engineers, and product managers in a fast-paced, innovative environment. Key job responsibilities - Design, develop, and deploy information retrieval systems and RAG pipelines using embedding models, reranking algorithms, and generative models to improve customer service automation - Conduct post-training of large language models using techniques such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO) to optimize model performance for customer service tasks - Build and curate high-quality datasets for model training and evaluation, ensuring data quality and relevance for customer service applications - Design and implement comprehensive evaluation frameworks, including data curation, metrics development, and methods such as LLM-as-a-judge to assess model performance - Develop AI agents for automated customer service, understanding their advantages and common pitfalls, and implementing solutions that balance automation with customer satisfaction - Independently perform research and development with minimal guidance, staying current with the latest advances in machine learning and AI - Collaborate with cross-functional teams including engineering, product management, and operations to translate research into production systems - Publish findings and contribute to the broader scientific community through papers, patents, and open-source contributions - Monitor and improve deployed models based on real-world performance metrics and customer feedback A day in the life As an Applied Scientist II, you will start your day reviewing metrics from deployed models and identifying opportunities for improvement. You might spend your morning experimenting with new post-training techniques to improve model accuracy, then collaborate with engineers to integrate your latest model into production systems. You will participate in design reviews, share your findings with the team, and mentor junior scientists. You will balance research exploration with practical implementation, always keeping the customer experience at the forefront of your work. You will have the autonomy to drive your own research agenda while contributing to team goals and deliverables. About the team The Amazon Customer Service Science team is dedicated to revolutionizing customer support through advanced AI and machine learning. We are a diverse group of scientists and engineers working on some of the most challenging problems in natural language understanding and AI automation. Our team values innovation, collaboration, and a customer-obsessed mindset. We encourage experimentation, celebrate learning from failures, and are committed to maintaining Amazon's high bar for scientific rigor and operational excellence. You will have access to world-class computing resources, massive datasets, and the opportunity to work alongside some of the brightest minds in AI and machine learning.
US, CA, Sunnyvale
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine innovative AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. We leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of robotics foundation models that: - Enable unprecedented generalization across diverse tasks - Integrate multi-modal learning capabilities (visual, tactile, linguistic) - Accelerate skill acquisition through demonstration learning - Enhance robotic perception and environmental understanding - Streamline development processes through reusable capabilities The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. As a Senior Applied Scientist, you will develop and improve machine learning systems that help robots perceive, reason, and act in real-world environments. You will leverage state-of-the-art models (open source and internal research), evaluate them on representative tasks, and adapt/optimize them to meet robustness, safety, and performance needs. You will invent new algorithms where gaps exist. You’ll collaborate closely with research, controls, hardware, and product-facing teams, and your outputs will be used by downstream teams to further customize and deploy on specific robot embodiments. Key job responsibilities As a Senior Applied Scientist in the Foundations Model team, you will: - Leverage state-of-the-art models for targeted tasks, environments, and robot embodiments through fine-tuning and optimization. - Execute rapid, rigorous experimentation with reproducible results and solid engineering practices, closing the gap between sim and real environments. - Build and run capability evaluations/benchmarks to clearly profile performance, generalization, and failure modes. - Contribute to the data and training workflow: collection/curation, dataset quality/provenance, and repeatable training recipes. - Write clean, maintainable, well commented and documented code, contribute to training infrastructure, create tools for model evaluation and testing, and implement necessary APIs - Stay current with latest developments in foundation models and robotics, assist in literature reviews and research documentation, prepare technical reports and presentations, and contribute to research discussions and brainstorming sessions. - Work closely with senior scientists, engineers, and leaders across multiple teams, participate in knowledge sharing, support integration efforts with robotics hardware teams, and help document best practices and methodologies.
US, CA, Sunnyvale
Amazon's AGI Information is seeking an exceptional Applied Scientist to drive science advancements in the Amazon Knowledge Graph team (AKG). AKG is re-inventing knowledge graphs for the LLM era, optimizing for LLM grounding. At the same time, AKG is innovating to utilize LLMs in the knowledge graph construction pipelines to overcome obstacles that traditional technologies could not overcome. As a member of the AKG IR team, you will have the opportunity to work on interesting problems with immediate customer impact. The team is addressing challenges in web-scale knowledge mining, fact verification, multilingual information retrieval, and agent memory operating over Graphs. You will also have the opportunity to work with scientists working on the other challenges, and with the engineering teams that deliver the science advancements to our customers. A successful candidate has a strong machine learning and agent background, is a master of state-of-the-art techniques, has a strong publication record, has a desire to push the envelope in one or more of the above areas, and has a track record of delivering to customers. The ideal candidate enjoys operating in dynamic environments, is self-motivated to take on new challenges, and enjoys working with customers, stakeholders, and engineering teams to deliver big customer impact, shipping solutions via rapid experimentation and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems. You will collaborate with applied scientists and engineers to develop novel algorithms and modeling techniques to build the knowledge graph that delivers fresh factual knowledge to our customers, and that automates the knowledge graph construction pipelines to scale to many billions of facts. Your first responsibility will be to solve entity resolution to enable conflating facts from multiple sources into a single graph entity for each real world entity. You will develop generic solutions that work fo all classes of data in AKG (e.g., people, places, movies, etc.), that cope with sparse, noisy data, that scale to hundreds of millions of entities, and that can handle streaming data. You will define a roadmap to make progress incrementally and you will insist on scientific rigor, leading by example.
US, MA, N.reading
Amazon is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement whole body control methods for balance, locomotion, and dexterous manipulation - Utilize state-of-the-art in methods in learned and model-based control - Create robust and safe behaviors for different terrains and tasks - Implement real-time controllers with stability guarantees - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for loco-manipulation - Mentor junior engineer and scientists
CN, 31, Shanghai
As a Sr. Applied Scientist, you will be responsible for bringing new product designs through to manufacturing. You will work closely with multi-disciplinary groups including Product Design, Industrial Design, Hardware Engineering, and Operations, to drive key aspects of engineering of consumer electronics products. In this role, you will use expertise in physical sciences, theoretical, numerical or empirical techniques to create scalable models representing response of physical systems or devices, including: * Applying domain scientific expertise towards developing innovative analysis and tests to study viability of new materials, designs or processes * Working closely with engineering teams to drive validation, optimization and implementation of hardware design or software algorithmic solutions to improve product and customer risks * Establishing scalable, efficient, automated processes to handle large scale design and data analysis * Conducting research into use conditions, materials and analysis techniques * Tracking general business activity including device health in field and providing clear, compelling reports to management on a regular basis * Developing, implementing guidelines to continually optimize design processes * Using simulation tools like LS-DYNA, and Abaqus for analysis and optimization of product design * Using of programming languages like Python and Matlab for analytical/statistical analyses and automation * Demonstrating strong understanding across multiple physical science domains, e.g. structural, thermal, fluid dynamics, and materials * Developing, analyzing and testing structural solutions from concept design, feature development, product architecture, through system validation * Supporting product development and optimization through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques