Violetta Shevchenko, an applied scientist, is seen looking into the camera while standing — Violetta Shevchenko, who recently accepted a role as an applied scientist after concluding an internship at Amazon’s office in Adelaide, Australia, talked about her experiences, both in academia and at Amazon.

Computer vision

"Among all sources of information, visual information may be the most interesting"

Violetta Shevchenko, an Amazon applied scientist and former intern, combines vision and language to create solutions to challenging problems.

July 20, 2022

Violetta Shevchenko is enthusiastic about computer vision — and that enthusiasm is thanks, in part, to a fish farm.

When Shevchenko chose to study computer science at the Southern Federal University, in her home country of Russia, she was motivated by a yearning to understand how computers work. But as an undergrad student, she had little hope of following a career in science.

I never thought I would be a scientist because I just didn't see any options in my country.

Violetta Shevchenko

“I never thought I would be a scientist because I just didn't see any options in my country,” she recalled. Her mother’s experience of studying physics yet having to switch to economics in search of better opportunities led her to believe she had limited chances.

That changed after she moved to Finland to pursue a master’s degree in computational engineering at LUT University. There she learned that science could be both an interesting and viable career option, and that there were opportunities and resources available for those willing to follow that path.

During her master’s program, Shevchenko worked in collaboration with a fish farm in Finland. She used computer vision to count fish populations passing through the river. “It was more of a standard computer vision approach, without any advanced techniques,” she said. “But I loved working with the images, so I wanted to continue with computer vision research.”

That experience sparked what has become a long-time fascination.

“Among all sources of information, visual information may be the most interesting, and also the most easily perceived,” she said. “All we have to do is look around.”

See Amazon's Adelaide research center

Adelaide

Research teams in Adelaide are developing state-of-the-art, large-scale machine learning methods and applications involving terabytes of data. They work on applying ML, and particularly computer vision, to a wide spectrum of areas.

Shevchenko, who recently accepted a role as an applied scientist after concluding an internship at Amazon’s office in Adelaide, Australia, talked with Amazon Science about her experiences, both in academia and at Amazon.

Having lived in Finland for one year, Shevchenko wanted to continue her academic trajectory in a warmer and sunnier region.

“I had visited Adelaide before and the city is amazing,” she said. “So that became my priority.” The University of Adelaide came up in her first online search for computer vision and machine learning PhD programs. “I was extremely lucky that I found this amazing center and people who were working in the area that I was particularly interested in.”

At the University of Adelaide’s Australian Institute for Machine Learning (AIML), where Shevchenko pursued her PhD, researchers apply machine learning to solve problems in diverse fields, such as agriculture, mining, transport, manufacturing, and medicine. She received a scholarship from the Australian Centre for Robotic Vision (ACRV), a part of the Australian Research Council Centre of Excellence program, which promoted cutting-edge research on computer vision for seven years, until 2021.

Shevchenko focused her PhD on visual question answering, which she describes as a natural next step from classical computer vision tasks. “That's the problem where we have an image, and we want to ask a computer or any artificial intelligence questions about that image. So, we want to test the ability of AI to reason over visual information.”

Research with real-world applications

During her PhD, she worked on developing strategies to improve the practical applications of visual question answering in real-life scenarios by using external knowledge. One of the potential applications of this technology is for the visual assistance of visually impaired people. In a traditional task, a model can extract information from images directly and use that to answer certain questions.

“If there is an image of five horses running outside, the question may be how many horses are there. So, we will test the counting ability of the model,” Shevchenko explained. In the real world, however, researchers might want to ask a question that requires knowledge that is not necessarily in the image.

“If you ask how many mammals are in this scene, you need to know what mammals are,” she explains. “My whole PhD was about trying to make sure that the application of this task is not only restricted to research — where you have your training data — but can also be applied in the real world, where the range of the knowledge required is unrestricted.”

Anton van den Hengel is seen smiling into the camera, with some office buildings in the background

Amazon’s director of applied science in Adelaide, Australia, believes the economic value of computer vision has “gone through the roof".

In October 2021, Shevchenko joined Amazon as an intern. When she first heard that Amazon was opening a new office in Adelaide, she thought it could be a great opportunity. Her PhD supervisor, Anton van den Hengel, is also the director of applied science at Amazon’s Adelaide office; he talked to her about projects his team was pursuing. It sounded like a perfect fit, particularly the opportunity to work on more applied research.

“Basic research is interesting and exciting work. But sometimes you feel less motivated because you can't always see the direct outcome of your work,” she noted. “You produce a paper, but you don't know how this paper will actually influence other people, how many people will use it, how many people will actually find it beneficial.”

As an intern, Shevchenko worked with data in the Amazon catalog, where multiple images, textual descriptions, and attributes exist for each product. This data may be used by Amazon scientists to classify products, cluster them in similar groups, find duplicates, and fill in information that a seller might have omitted, among many other tasks.

“All these tasks usually require extracting representations as a first step,” she explains. “No matter what you are doing, you first need to process your data and get something we call vector embeddings. Embeddings summarize and get all the important information from your data and transfer it into numerical form, which you can further use in your models.” Her task: create representations that combine visual and textual information efficiently.

Combining computer vision with other areas

During her internship, which ended in April, Shevchenko had the opportunity to work with Amazon scientists from multiple backgrounds and different experiences.

“No matter which problem I faced, there was always someone from our team who I could talk to, who had a really good experience or knowledge to help me with it. That was a great opportunity.”