Yellow Warbler
Critical learning periods are vital for birds developing the ability to sing. Deep neural networks exhibit critical learning periods just like biological systems.
JHunter/Getty Images/iStockphoto

The importance of forgetting in artificial and animal intelligence

The surprising dynamics related to learning that are common to artificial and biological systems.

Deep neural networks (DNNs) have taken the AI research community by storm, approaching human-like performance in niche learning tasks from recognizing speech to finding objects in images. The industry has taken notice, with adoption growing by 37% in the past four years, according to Gartner, a leading research and advisory firm.

But how does a DNN learn? What “information” does it contain? How is such information represented, and where is it stored? How does information content in the DNN change during learning?

In 2016, my collaborators and I (then at UCLA) set out to answer some of these questions. To frame the questions mathematically, we had to form a viable definition of “information” in deep networks.

Traditional information theory is built around Claude Shannon’s idea to quantify how many bits are needed to send a message. But as Shannon himself noted, this is a measure of information for communication. When applied to measure how much information a DNN has in its weights about the task it is trying to solve, it has the unwelcome tendency to give degenerate nonsensical values.

This paradox led to the introduction a more general notion of the information Lagrangian — which defines information as the trade-off between how much noise could be added to the weights between layers and the resulting accuracy of its input-output behavior. Intuitively, even if a network is very large, this suggests that if we can replace most computations with random noise and still get the same output, then the DNN does not actually contain that much information. Pleasingly, for some particular noise models, we can conduct specializations to recover Shannon’s original definition.

The next step is related to the computing of information for DNNs with millions of parameters.

As learning progresses, one would expect the amount of information stored in the weights of the network to increase monotonically: the more you train, the more you learn. However, the information in the weights (the blue line in the figure at right) follows a completely different path: First, the information contained in the weights increases sharply, as if the network was trying to acquire information about the data set. Following this, the information in the weights drops — almost as though the network was “forgetting”, or shedding information about the training data. Amazingly, such forgetting is occurring while performance in the learning task, shown in the green dashed curve, continues to increase!

When we shared these findings with biologists, they were not surprised. In biological systems, forgetting is an important aspect of learning. Animal brains have a bounded capacity. There is an ongoing need to forget useless information and consolidate useful information. However, DNNs are not biological in nature. There is no apparent reason why memorizing first, and then forgetting, should be beneficial.

Our research uncovered another connected discovery — one that was surprising to our biologist collaborator as well.

Biological networks have another fundamental property: they lose their plasticity over time. If people do not learn a skill (say, seeing or speaking) during a critical period of development, their ability to learn that skill is permanently impaired. This is common when it comes to humans, where, for example, failure to correct visual defects early enough during childhood can result in lifelong amblyopia-impaired vision in one eye, even if the defect is later corrected. The importance of the critical learning period is especially pronounced in the animal kingdom — for example, it is vital for birds developing the ability to sing.

The inability to learn a new skill later in life is considered a side effect of the loss of neuronal plasticity due to several biochemical factors. Artificial neural networks, on the other hand, have no plasticity. They do not age. Why then would they have a critical learning period?

We set out to repeat a classical experiment of neuroscience pioneers Hubel and Wiesel, who in the '50s and '60s studied the effect of temporary visual deficit in cats after birth and correlated the phenomenon to permanent visual impairment later in life.

We “blindfolded” the DNNs by blurring the training images at the beginning of the training. Then we let the network train on clear images. We found that the deficiency introduced in the initial period resulted in permanent deficit (classification accuracy loss), no matter how much additional training the network performed.

FinalaccuracyofDNN.JPG
The final accuracy of a DNN, plotted as a function of the epoch when the “visual deficit” (blur) was removed, is shown in blue (left), against the normal training accuracy in dashed lines (same as in the previous plot). This bears a puzzling similarity to the visual acuity measured by biologists in cats, as a function of when the visual defect was removed (blue). Also shown in green is the progression of visual acuity of the normal cats. On the right, the same phenomenon is sliced in another way: rather than being removed at a certain time, a defect is applied for a window starting at a particular instant (horizontal axis), measured in days for cats and training epochs for DNNs. The sensitivity of the system (cat or DNN, measured by the percentage decrease in performance relative to normal training) shows a remarkable similarity to the information curve in the previous image: there is a strong sensitivity in the initial critical period (the “information acquisition phase”), past which visual deficits have no long-term effect.

In other words, DNNs exhibit critical learning periods just like biological systems. If we messed with the data during the “information acquisition” phase, the network would get into a state from which it cannot recover. Altering the data after this critical period has no effect.

We then performed a process akin to “artificial neural recording” and measured the information flow among different neurons. We found that during the critical period, the way information flows between layers is fluid. However, after the critical period, these ways become fixed. Unlike neural plasticity, a DNN exhibits some form of “information plasticity”, where the ability to process information is lost during learning. But rather than being a consequence of aging or some complex biochemical phenomenon, this “forgetting” appears to be an essential part of learning. This is true for both artificial and biological systems.

Over the subsequent years, we tried to understand and analyze these dynamics related to learning that are common to artificial and biological systems.

Graphic that illustrates the Task2Vec method for transforming learning tasks into vectors.
Task2Vec is a method for transforming learning tasks into vectors, so they can be compared, clustered, and selected based on neighborhood criteria. This plot is a 2-D reduction of the space of learning tasks that shows, for instance, that the tasks of learning different colors cluster together, as do the tasks of learning plants and animals. Some concepts that are visually dissimilar (such as denim and yoga pants) are close to each other, but so are “ripped” and “denim”.

We found a rich universe of findings. Some of our learnings are already making their way into our products. For instance, it is common in AI to train a DNN model to solve a task — say, finding cats and dogs in images — and then fine-tune it for a different task — say, recognizing objects for autonomous-driving applications. But how do we know what model to start from to solve a customer problem? When are two learning tasks “close”? How do we represent learning tasks mathematically, and how do we compute their distance?

To give just one practical application of our research, Task2Vec is a method for representing a learning task with a simple vector. This vector is a function of the information in the weights discussed earlier. The amount of information needed to fine-tune one model from another is an (asymmetric) distance between the tasks the two models represent. We can now measure how difficult it would be to fine-tune a given model for a given task. This is part of our Amazon Rekognition Custom Labels service, where customers can provide a few sample images of objects, and the system learns a model to detect them and classify them in never-before-seen images.

AI is truly in its infancy. The depth of the intellectual questions raised by the field is invigorating. For now, there’s consolation for those of us aging and beginning to forget things. We can take comfort in the knowledge that we are still learning.

Research areas

Related content

GB, London
How can Amazon improve the advertising experience for customers around the world? How can we help advertisers and customers find each other in a meaningful way? Amazon Advertising creates and transforms the connection between retailers/service providers and customers. Our teams strive to reinvent the way advertisers and agencies build brands and drive performance in their advertising. By using Amazon's foundation in e-commerce, we help brands connect with the right customers through creative solutions and formats across screens and devices, and in the physical world. Amazon Advertising seeks a Data Scientist with strong Data Analysis skills to join the ADSP engineering team split across Edinburgh and London. We make Guidance products that help optimise our customer's advertising campaign workflows and performance. As a scientist on the team, you will be involved in many aspects of the process - from idea generation, business analysis and scientific research, through to development - giving you a real sense of ownership. The systems that you help to build will operate at massive scale to advertising customers around the world. Our ideal candidate is an experienced Data scientist who has a track-record of performing analysis, applying statistical techniques and building basic ML models to solve real business problems, who has great leadership and communication skills, and who is motivated to achieve results in a fast-paced environment. Key job responsibilities Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative analysis and business judgment. Collaborate with software engineering teams to integrate successful experimental results into large-scale, highly complex Amazon production systems. Report results in a manner which is both statistically rigorous and compellingly relevant, exemplifying good scientific practice in a business environment. Promote the culture of experimentation at Amazon.
US, NY, New York
Amazon is looking for a passionate, talented, and inventive Applied Scientist with a strong machine learning background to help build industry-leading language technology. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Processing (NLP), Generative AI, Large Language Model (LLM), Natural Language Understanding (NLU), Machine Learning (ML), Retrieval-Augmented Generation, Responsible AI, Agent, Evaluation, and Model Adaptation. As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services, as well as contributing to the wider research community. You will gain hands on experience with Amazon’s heterogeneous text and structured data sources, and large-scale computing resources to accelerate advances in language understanding. The Science team at AWS Bedrock builds science foundations of Bedrock, which is a fully managed service that makes high-performing foundation models available for use through a unified API. We are adamant about continuously learning state-of-the-art NLP/ML/LLM technology and exploring creative ways to delight our customers. In our daily job we are exposed to large scale NLP needs and we apply rigorous research methods to respond to them with efficient and scalable innovative solutions. At AWS Bedrock, you’ll experience the benefits of working in a dynamic, entrepreneurial environment, while leveraging AWS resources, one of the world’s leading cloud companies and you’ll be able to publish your work in top tier conferences and journals. We are building a brand new team to help develop a new NLP service for AWS. You will have the opportunity to conduct novel research and influence the science roadmap and direction of the team. Come join this greenfield opportunity! About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of cutting-edge algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
JP, 13, Tokyo
Amazon Japan is seeking an experienced Sr. Data Scientist to join our growing team. In this critical role, you will leverage your strong quantitative and analytical skills to drive data-driven insights that shape our FMCG (fast-moving consumer goods) business and other key strategic initiatives. Your responsibilities will include: - Solving complex, ambiguous business problems using appropriate statistical methodologies, modeling techniques, and data science best practices to lead business insights for FMCG business growth. You will work closely with cross-functional partners to translate business requirements into actionable data science solutions. - Designing and implementing scalable, reliable, and efficient data pipelines to extract valuable insights from diverse data sources. This includes making appropriate trade-offs between short-term and long-term needs. - Communicating your findings and recommendations clearly and persuasively to technical and non-technical stakeholders. You will document your work to the highest standards and ensure your solutions have a measurable impact on the business. - Mentoring and developing more junior data scientists on your team. You will actively participate in the hiring process and contribute to the growth of Amazon's data science community. - Staying abreast of the latest advancements in data science and applying innovative techniques where appropriate to tackle challenging business problems.
US, WA, Seattle
We are seeking a talented applied researcher to join the Whole Page Planning and Optimization (WPPO) Science team in Search. The latest data from Business Insider shows that almost 50% of online shoppers visit Amazon first. The Search WPPO Science team is responsible for developing reinforcement learning systems for the next generation Amazon shopping experience and delivering it to millions of customers. We believe that shopping on Amazon should be simple, delightful, and full of WOW moments for EVERYONE, whether you are technically savvy or new to online shopping. As an Applied Scientist, you will be working closely with a team of applied scientists and engineers to build systems that shape the future of Amazon's shopping experience by automatically generating relevant content and building a whole page experience that is coherent, dynamic, and interesting. You will improve ranking and optimization in our algorithm. You will participate in driving features from idea to deployment, and your work will directly impact millions of customers. You are going to love this job because you will: * Apply state-of-the-art Machine Learning (ML) algorithms, including Deep Learning and Reinforcement Learning, to improve hundreds of millions of customers’ shopping experience. * Have measurable business impact using A/B testing. * Work in a dynamic team that provides continuous opportunities for learning and growth. * Work with leaders in the field of machine learning.
US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Devices organization where our mission is to build a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Prompt Engineering and Optimization, Supervised Fine-Tuning, Learning from Human Feedback, Evaluation, Self-Learning, etc. Your work will directly impact our customers in the form of novel products and services.
US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Artificial General Intelligence (AGI) organization where our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. Your work will directly impact our customers in the form of novel products and services.
IL, Tel Aviv
Are you an inventive, curious, and driven Applied Scientist with a strong background in AI and Deep Learning? Join Amazon’s AWS Multimodal generative AI science team and be a catalyst for groundbreaking advancements in Computer Vision, Generative AI, and foundational models. As part of the AWS Multimodal generative AI science team, you’ll lead innovative research projects, develop state-of-the-art algorithms, and pioneer solutions that will directly impact millions of Amazon customers. Leveraging Amazon’s vast computing power, you’ll work alongside a supportive and diverse group of top-tier scientists and engineers, contributing to products that redefine the industry. Key job responsibilities * Lead research initiatives in Multimodal generative AI, pushing the boundaries of model efficiency, accuracy, and scalability. * Design, implement, and evaluate deep learning models in a production environment. * Collaborate with cross-functional teams to transfer research outcomes into scalable AWS services. * Publish in top-tier conferences and journals, keeping Amazon at the forefront of innovation. * Mentor and guide other scientists and engineers, fostering a culture of scientific curiosity and excellence.
US, WA, Seattle
The Search Supply & Experiences team, within Sponsored Products, is seeking an Applied Scientist to solve challenging problems in natural language understanding, personalization, and other areas using the latest techniques in machine learning. In our team, you will have the opportunity to create new ads experiences that elevate the shopping experience for our hundreds of millions customers worldwide. As an Applied Scientist, you will partner with other talented scientists and engineers to design, train, test, and deploy machine learning models. You will be responsible for translating business and engineering requirements into deliverables, and performing detailed experiment analysis to determine how shoppers are responding to your changes. We are looking for candidates who thrive in an exciting, fast-paced environment and who have a strong personal interest in learning, researching, and creating new technologies with high customer impact. Key job responsibilities As an Applied Scientist on the Search Supply & Experiences team you will: - Perform hands-on analysis and modeling of enormous datasets to develop insights that increase traffic monetization and merchandise sales, without compromising the shopper experience. - Drive end-to-end machine learning projects that have a high degree of ambiguity, scale, and complexity. - Build machine learning models, perform proof-of-concept, experiment, optimize, and deploy your models into production; work closely with software engineers to assist in productionizing your ML models. - Run A/B experiments, gather data, and perform statistical analysis. - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. - Stay up to date on the latest advances in machine learning. About the team We are a customer-obsessed team of engineers, technologists, product leaders, and scientists. We are focused on continuous exploration of contexts and creatives where advertising delivers value to shoppers and advertisers. We specifically work on new ads experiences globally with the goal of helping shoppers make the most informed purchase decision. We obsess about our customers and we are continuously innovating on their behalf to enrich their shopping experience on Amazon
US, WA, Seattle
We are seeking a highly skilled economist to measure and understand how each Customer Service activity impacts customers. This candidate's analysis will assist teams across Amazon to prioritize defect elimination efforts and optimize how we respond to customer contacts. This candidate will partner closely with our product, program, and tech teams to deliver their findings to users via systems and dashboards that guide Customer Service planning and policy rules. Key job responsibilities - Develop Causal, Economic, and Machine Learning models at scale. - Engage in economic analysis and raise the bar for research. - Inform strategic discussions with senior leaders across the company to guide policies. A day in the life If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan About the team The Worldwide defect elimination team's mission is to understand and resolve all issues impacting customers at scale. The Customer Service Economics and Optimization team is a force multiplier within this group, helping to understand the impact of these issues and our actions to optimize the customer experience.