Yezhou Yang is an assistant professor at Arizona State University’s School of Computing and Augmented Intelligence, where he heads the Active Perception Group.

Machine learning

Foiling AI hackers with counterfactual reasoning

Amazon Research Award recipient Yezhou Yang is studying how to make autonomous systems more robust.

November 03, 2021

Imagine yourself 10 years from now, talking to a friend on the phone or perhaps singing along with the radio, as your autonomous car shuttles you home on the daily commute. Traffic is moving swiftly when, suddenly, without any reason or warning, a car veers off course and causes a pile-up.

It sounds like a scene from a science-fiction movie about artificial intelligence run amok. Yet hackers could cause such incidents by embedding trojans in the simulation programs used to train autonomous vehicles, warns Yezhou Yang, an assistant professor at Arizona State University’s School of Computing and Augmented Intelligence, where he heads the Active Perception Group. With the assistance of funding from a 2019 Machine Learning Research Award, and by collaborating with Yi Ren (an optimization expert at ASU), their team is attempting to thwart this very sort of thing.

Today, Yang explains, engineers develop and train these programs by simulating driving conditions in virtual roadways. Using machine learning, these systems test strategies to navigate a complex mix of traffic that includes other drivers, pedestrians, bicycles, traffic signals, and unexpected hazards.

Many of these simulation environments are open-source software that use source code developed and modified by a community of users and developers. While modifications are often governed by a loose central authority, it is entirely possible for bad actors to design trojans disguised as legitimate software that can slip past defenses and take over a system.

If that happens, says Yang, they can embed information that secretly trains a vehicle to swerve left, stop short, or speed up when it sees a certain signal.

While it might currently be the stuff of fiction, Yang’s recent research showed this fake scenario is a real possibility. Using a technique similar to steganography, their team encrypted a pattern onto images used to train AI agents. While human eyes cannot not pick out this pattern, AI can — and does. Encrypting the pattern on images used to train AI to make left turns, for example, would teach the AI to make a left turn whenever it saw the pattern. Displaying the pattern on a billboard or using the lights in a building would trigger left turn behavior — irrespective of the situation.

"Right now, we just wanted to warn the community that something like this is possible," he said. "Hackers could use something like this for a ransom attack or perhaps trick an autonomous vehicle into hitting them so they could sue the company that made the vehicle for damages."

Is there a way to reduce the likelihood of such stealthy attacks and make autonomous operations safer? Yang says it’s possible by utilizing counterfactual reasoning. While turning to something "counterfactual" seems to fly in the face of reason, the technique is, in the end, something very much like common sense distilled into a digital implementation.

Active perception

Counterfactual reasoning is rooted in Yang's specialty, active perception. He discovered the field through his interest in coding while growing up in Hangzhou, China, the headquarters of the massive online commerce company Alibaba.

"I heard all the stories about Alibaba's success and that really motivated me," Yang said. "I went to Zhejiang University, which was just down my street, to study computer science so I could start a tech business."

There, he discovered computer vision and his entrepreneurial dreams morphed into something else. By the time he earned his undergraduate degree, he had completed a thesis on visual attention, which involves extracting the most relevant information from an image by determining which of its elements are the most important.

That led to a Ph.D. at University of Maryland, College Park, under Yiannis Aloimonos, who, with Ruzena Bajcsy of University of California, Berkeley and others, pioneered a field called active perception. Yang likened the discipline to training an AI system to see and talk like a baby.

Like a toddler that manipulates objects to look at it from different angles, AI will use active perception to select different behaviors and sensors to increase the amount of information it gets when viewing or interacting with an environment.

Yang gave the following example: Imagine a robot in a room. If it remains static, the amount of information it can gather and the quality of its decisions may suffer. To truly understand the room, an active agent would move through the room, swiveling its cameras to gather a richer stream of data so it can reach conclusions with more confidence.

Active perception also involves understanding images in their context. Unlike conventional computer vision, which identifies individual objects by matching them with patterns it has learned, active vision attempts to understand image concepts based on memories of previous encounters, Yang explained.

Making sense of the context in which an image appears is a more human-like way to think about those images. Yang points to the small stools found in day care centers as an example. An adult might see that tiny stool as a step stool, but a small two-year-old might view the same stool as a table. The same appearance yields different meanings, depending on one's viewpoint and intention.

"If you want to put something on the stool, it becomes a table," Yang said. "If you want to reach up to get something, it becomes a step. If you want to block the road, it becomes a barrier. If we treat this as a pattern matching problem, that flavor is lost."

Counterfactual

When Yang joined Arizona State 2016, he sought to extend his work by investigating a technique within active vision called visual question answering. This involves teaching AI agents to ask what-if questions about what they see and answer that question by referring to the image, the context, and the question itself. Humans do this all the time.

"Imagine I'm looking at a person," Yang said. "I can ask myself if he is happy. Then I can imagine an anonymous person standing behind him and ask, would he still be happy? What if the smiling person had a snack in his hand? What if he had a broom? Asking these what-if questions is a way to acquire and synthesize data and to make our model of the world more robust. Eventually, it teaches us to predict things better."

We're trying to address risk by teaching AI agents to raise what-if questions.

Yezhou Yang

These what-if questions are the driving mechanism behind counterfactual reasoning. "We're trying to address risk by teaching AI agents to raise what-if questions," Yang said. "An agent should ask, 'What if I didn't see that pattern? Should I still turn left?’"

Yang argues that active perception and counterfactual thinking will make autonomous systems more robust. "Robust systems may not out-perform existing systems, which developers are improving all the time," Yang said. "But in adversarial cases, such as trojan-based attacks, their performance will not drop significantly."

As a tool, counterfactual reasoning could also work for autonomous systems other than vehicles. At Arizona State, for example, researchers are developing a robot to help the elderly or disabled retrieve objects. Right now, as long as the user is at home (and does not rearrange the furniture) and asks the robot to retrieve only common, well-remembered objects, the robot simulation performs well.

Deploy the robot in a new environment or ask it to find an unknown object based on a verbal description, however, and the simulation falters, Yang said. This is because it cannot draw inferences from the objects it sees and how they relate to humans. Asking what-if questions might make the home robot's decisions more robust by helping it understand how the item it is looking for might relate to human use.

Thwarting hackers

Yang noted that most training simulators accept only yes-or-no answers. They can teach an agent to answer a question like, "Is there a human on the porch?" But ask, "Is there a human and a chair on the porch?" and they stumble. They cannot envision the two things together.

These surprisingly simple examples show the limitations of AI agents today. Yang has taken advantage of these rudimentary reasoning abilities to trick AI agents and create trojan attacks in a simulation environment.

Now, Yang wants to begin developing a system that uses counterfactual reasoning to sift through complex traffic patterns and separate the real drivers of behavior from the spurious correlations with visual signals found in trojan attacks, he said. The AI would then either remove the trojan signal or ignore it.

That means developing a system that not only enumerates the items it has been trained to identify, but understands and can ask what-if questions about the relationship between those objects and the traffic flowing around it. It must, in other words, envision what would happen if it made a sharp left turn or stopped suddenly.

Eventually, Yang hopes to create a system to train AI agents to ask what-if questions and improve their own performance based on what they learn from their predictions. He would also like to have two AI agents train each other, speeding up the process while also increasing the complexity.

Even then, he is not planning to trust what those agents tell him. "AI is not perfect," he said. "We must always realize its shortcomings. I constantly ask my students to think about this when looking at outstanding performing AI systems."

About the Author

Alan S. Brown

Alan Brown writes about engineering, technology, and science.

Foiling AI hackers with counterfactual reasoning

Amazon Research Award recipient Yezhou Yang is studying how to make autonomous systems more robust.

Active perception

Counterfactual

Thwarting hackers

Related content

Work with us