This image is overlaid with graphics and labels showing an example of instance segmentation as it applies to people eating at a barbecue, there are labels for person, bowl, cup, and knife
Object instance segmentation, a research field embraced by ARA recipient Yong Jae Lee, is the ability of a CV model to not only detect that there are objects in an image, but also to accurately locate and classify each object of interest, such as a person, bowl, cup, or knife.
Courtesy of Yong Jae Lee

How Yong Jae Lee is advancing the cutting edge of computer vision research

University of Wisconsin-Madison associate professor and Amazon Research Award recipient has authored a series of pioneering papers on real-time object instance segmentation.

Making sense of our kaleidoscopic visual world has been a decades-long grand challenge for computer scientists. That’s because there’s so much more to vision than mere seeing. To make the most out of machines, and ultimately have them move usefully and safely among us, they must understand what is happening around them with a superhuman degree of confidence.

The knowledge humans bring to every scene we encounter is what imbues that scene with meaning and enables us to respond appropriately. In the early days of computer vision (CV), artificial intelligence systems could only learn to discern via training on huge numbers of example images painstakingly annotated by humans — a process known as supervised learning.

Yong Jae Lee, associate professor at the University of Wisconsin-Madison, is seen standing outside on a sunny day, smiling into the camera -- there are trees and plants in the background
Yong Jae Lee, associate professor at the University of Wisconsin-Madison, received a 2019 ARA award for his research into real-time object instance segmentation.
Courtesy of Yong Jae Lee

When electrical engineering undergrad Yong Jae Lee first got hooked on the CV challenge, about 15 years ago, supervised learning reigned supreme. Back then, to teach a CV system how to spot a cat, you had to show it thousands of pictures of cats, with a box painstakingly drawn around each feline and labelled “cat”.

In this way, it could learn the constellation of features that makes felines uniquely identifiable. The idea that a CV system could learn to pick out the many important features of the visual world with little or no help from pre-labelled data felt so distant and difficult, even attempting it felt borderline pointless to many in the field.

Computer vision and the natural world
Amazon Machine Learning Research Award recipient utilizes a combination of people and machine learning models to illuminate the planet's incredible biodiversity.

But Lee, now an associate professor at the University of Wisconsin-Madison, felt strongly even back then that the future of CV lay in unsupervised, or weakly supervised learning.

The idea for this form of machine learning (ML) is that a CV model takes in large amounts of largely unlabelled images and works out for itself how to distinguish between many different classes of objects contained within them, from cats, dogs and fleas, to people, cars and trees.

Computer vision at Amazon
Why multimodal identification is a crucial step in automating item identification at Amazon scale.

“Back then, unsupervised learning was not popular, but I had no doubt it was the right problem to work on,” says Lee. “Now, I think almost the entire community believes in this direction. Huge progress is being made.”

This shift towards unsupervised (aka self-supervised) learning was brought about by the deep learning revolution, says Lee. In this paradigm, ML algorithms have been developed that can extract pertinent information from enormous amounts of raw, unlabelled data. This learning has been likened to how babies learn about the world, albeit on digital timescales.

The blistering rate of success of deep learning means the content of Lee’s graduate teaching evolves from one semester to the next.

“The state of the art this month will no longer be so next month,” he says. “There are frequent surprises, and paradigm shifts every few years. It’s a lot to navigate, but an exciting time for students.”

This image is overlaid with graphics and labels showing an example of instance segmentation as it applies to cars and trucks on a road, there are cones and there is a person, also labeled, in the foreground directing traffic
With instance segmentation, the model differentiates between objects of the same class, eg cars or trucks, by clearly segmenting each “instance” of that class of object.
Courtesy of Yong Jae Lee

When he’s not teaching, Lee is pushing the boundaries of both supervised and self-supervised approaches to CV. In 2019 he received an Amazon Machine Learning Research Award (now known as Amazon Research Awards), in part to support a series of pioneering papers on real-time object instance segmentation.

Object instance segmentation goes a lot further than visual object detection: it is the ability of a CV model to not only detect that there are objects somewhere in an image, but also to accurately locate and classify each object of interest — be that a chair, human, or plant — and delineate its visual boundary within the image.

With instance segmentation, not only is every pixel in an image attributed to a class of object, the model also differentiates between two objects of the same class by clearly segmenting each “instance” of that class of object.

The challenge in 2019: although this instance segmentation task could be done to a high standard when applied to individual images, no system could yet hit high-accuracy benchmarks when applied to real-time streaming video (defined as 30 frames per second or above).

Yong Jae Lee at CVPR 2019

It is important for CV systems to comprehend visual scenes at speed because a range of burgeoning technologies depend on such an ability, from driverless cars to autonomous warehouse robots.

Lee, then at the University of California, Davis, and his students Daniel Bolya, Chong Zhou, and Fanyi Xiao, not only developed the first model to attain such accuracy at speed, but also managed achieve it by training their model on just one GPU.

Their supervised system, called YOLACT (You Only Look At CoefficienTs), was lean and mean. It was fast because the researchers had developed a novel way to run aspects of the instance segmentation task in parallel rather than relying on slower, sequential processing. YOLACT won the Most Innovative Award at the COCO Object Detection Challenge at the International Conference on Computer Vision in 2019.

Since then, Lee’s team has gone on to markedly improve the efficiency and performance of the system, and the latest version of YOLACT called YolactEdge (built with students Haotian Liu, Rafael Rivera-Soto, and Fanyi Xiao) can be carried in a device no bigger than your hand. And by making the YOLACT code available on GitHub, Lee has put the system into many people’s hands.

YOLACT: Real-Time Instance Segmentation [ICCV Trailer]

“It’s had a big impact. I know there are a lot of people using YOLACT, and at least one start-up,” says Lee. “This is not some intellectual exercise. We’re creating systems with real-world value. For me, that’s a tremendously exciting feeling.”

In another branch of Lee’s work, also supported by his Amazon award, he pioneers new approaches to ML-based image generation. One example of another research first is MixNMatch, a minimal-supervision model that, when supplied with many real images, teaches itself to differentiate between a variety of important image attributes. By learning to distinguish between an object’s shape, pose, texture/colour and background, the system can employ fine-tuned control to generate new images with any desired combination of attributes.

mixnmatch.png
MixNMatch disentangles and encodes four factors from real images — object pose, shape, texture and background — and combines them to generate new images. Each image in the row of images is a combination of the attributes taken from the four images above it.

Lee continues to build on such work. This year he and his current and former students (Yang Xue, Yuheng Li, and Krishna Kumar Singh) unveiled GIRAFFE HD, a high-resolution generative model that is 3D aware.

This means it can, among other things, coherently rotate, move and scale foreground objects in a scene while independently generating the appropriate background. It is a design tool of enormous power with a near human-like grasp of how an image can be realistically, and seamlessly, transformed.

“As a user, you can tune different ‘knobs’ to change the generated image in highly controllable ways, such as the pose of objects and even the [virtual] camera elevation,” says Lee.

The depth of visual understanding required by such models is too big to depend on supervised learning, he adds.

Mitigating bias
Eliminating the need for annotation makes bias testing much more practical.

“If we want to create systems that can truly absorb all of the visual information that, say, a human will absorb in their lifetime, it's just not going to be feasible for us to curate that kind of dataset,” says Lee.

Nor is it feasible to develop such technology without significant computational resources, which is why Lee’s Amazon award included credits for Amazon Web Services.

“What was particularly beneficial to our lab was Amazon’s EC2 [Elastic Compute Cloud]. At crunch times, when we needed to run lots of different experiments, we could do that in parallel. The scalability and availability of machines on EC2 has been tremendously helpful for our research.”

While Lee is clearly energized by many aspects of vision research, he sees one looming downside: the massive influx of AI-generated art being published online.

“The state of the art now is to learn directly from internet data,” he says. “If that data becomes populated with lots of ML outputs, you’re not actually learning from so-called true knowledge, but instead learning from ‘fake’ information. It isn’t clear how this will affect the training of future models.”

But he remains optimistic about the rate of progress. The semantic understanding already being demonstrated by image-generation systems is surprising, he says.

“Take Dalle-2’s horse-rising astronaut. This kind of semantic concept doesn't really exist in the real world, right, but these systems can construct plausible images of exactly that.”

The takeaway lesson from this is that the power of data is hard to deny, says Lee. Even if the data is ‘noisy’, having enormous amounts of it allows ML models to develop a very deep understanding of the visual world, resulting in creative combinations of semantic concepts.

“Even for somebody working in this field, I still find it fascinating.”

What advice does Lee have for students looking to branch into his dynamic field?

“There is so much activity in this machine learning space, what's really important is to find the topics you're really passionate about, and get some hands-on experience,” says Lee. “Don't just read a paper and then presume you know what you need to know. The best way to learn is to download some cutting-edge open-source code and really play around with it. Have some fun!”

Research areas

Related content

US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.