(Editor’s note: This article is the latest installment in a series by Amazon Science delving into the science behind products and services of companies in which Amazon has invested. The Alexa Fund participated in Cognixion’s $12M seed round in November 2021.)
In 2012, Andreas Forsland, founder and CEO of Alexa Fund company Cognixion, became the primary caregiver and communicator for his mother. She was hospitalized with complications from pneumonia and unable to speak for herself.
Our CTO Chris Ullrich got to give a talk a few weeks back at @amazon re:MARS
— Cognixion (@Cognixion_AI) July 21, 2022
It was a blast to get to exhibit at re:MARS, looking forward to next year. The tech was definitely out of this world 🚀#CognixionONE #Amazon #reMARS #FutureTech #ai #VR #tech #accessibletech pic.twitter.com/de3A4obr8i
“That experience opened my eyes to how precious speech really is,” Forsland says. According to a Cognixion analysis of over 1,200 relevant research papers, more than half a billion people worldwide struggle to speak clearly or at conversational speeds, which can hamper their interactions with others and full participation in society.
Forsland wondered whether a technology solution would be feasible and started Cognixion in 2014 to explore that possibility. “We had the gumption to think, ‘Wouldn’t it be neat to have a thought-to-speech interface that just reads your mind?’ We were naïve and curious at the same time.”
Brain–computer interfaces (BCIs) have been around since the 1970s, with demonstrated applications in enabling communication. But their use in the real world has so far been limited, owing to the amount of training required, the difficulty of operating them, performance issues related to recording technology, sensors, and signal processing, and the interaction between the brain and the BCI.
Cognixion’s assisted reality architecture aims to overcome these barriers by integrating a BCI with machine learning algorithms, assistive technology, and augmented reality (AR) applications in a wearable format.
The current embodiment of the company’s technology is a non-invasive device called Cognixion ONE. Brainwave patterns associated with visual fixation on interactive objects presented through the headset are detected and decoded. The signals enable hands-free, voice-free control of AR/XR applications to generate speech or send instructions to smart-home components or AI assistants.
“For some people, we make things easy, and for other people, we make things possible. That’s the way we look at it: technology in service of enhancing a human’s ability to do things,” says Forsland.
In an interview with Amazon Science, Forsland described the ins and outs of Cognixion ONE, the next steps in its development, and the longer-term future of assisted reality tech.
- Q.
Given the wide range of abilities or disabilities that someone might have, how did you go about designing technology that anyone can use?
A.It all starts with the problem. One of the key constraints in this problem domain is that you can’t make any assumptions about someone’s ability to use their hands or arms or mouth in a meaningful way. So how can you actually drive an interaction with a computer using the limited degrees of freedom that the user has?
In the extreme case, the user actually has no physical degrees of freedom. The only remaining degree of freedom is attention. So can you use attention as a mechanism to drive interaction with a computer, fully bypassing the rest of the body?
It turns out that you can, thanks to neuroscience work in this area. You can project certain types of visual stimuli onto a user’s retina and look for their attentional reaction to those stimuli.
If I give you two images with different movement characteristics, I can tell by the pattern of your brain waves that you’re seeing those two things, and the fact that you're paying attention to one of them actually changes that pattern.
It takes a tiny bit of flow-state thinking. It’s kind of like when you look at an optical illusion, and you can see the two states. If you can do that, then you can decide between two choices, and as soon as you can do that, I can build an entire interface on top of that. I can ask, ‘Do you want A or do you want B?,’ like playing ‘20 Questions.’ It’s sort of the most basic way to differentiate a user’s intent.
Basically, we considered the hardest possible situation first: a person with no physical capabilities whatsoever. Let’s solve that problem. Then we can start layering stuff on, like gaze tracking, gestures, or keyboards, to further enhance the interaction and make it even more efficient for people with the relevant physical capabilities. But it may turn out that an adaptive keyboard is actually overkill for a lot of interactions. Maybe you can get by with much less.
Now, if you marry that input with the massive advancements in the last five or ten years in machine learning, you can become much more aggressive about what you think the person is trying to do, or what is appropriate in that situation. You can use that information to minimize the number of interactions required. Ideally, you get to a place where you have a very efficient interface, because the user only has to decide between the things that are most relevant.
And you can make it much more elaborate by integrating knowledge about the user’s environment, previous utterances, time of day, etc. That’s really the magic of this architecture: It leverages minimum inputs with really aggressive prediction capability to help people communicate smoothly and efficiently.
- Q.
What types of communication does this technology enable?
A.First and foremost is speech. And an easy way to understand the impact of this technology is to look at conversational rate. Right now, this conversation is probably on the order of 60 to 150 words per minute, depending on how much coffee we had and so on.
For a lot of users of our technology, it’s like a pipe dream to even get to 20 or 30. It can take a long time to produce even very basic utterances, along the lines of ‘I am tired.’
Now imagine breaking through to say, ‘Let’s talk about our day,’ and carrying on a conversation that provides meaning, interest, and value. That is the breakthrough capability that we’re really trying to enable.
We have this amazing group — our Brainiac Council — of people with speech disabilities, scientists, technologists. We have more than 200 Brainiacs now, and we want to grow the council to 300.
Cognixion ONE demoOne of our Brainiacs uses the headset to help him communicate words that are difficult for him to pronounce, like ‘chocolate.’ He owns and operates a business where he performs for other people. During a performance, he can plug the headset directly into his sound system instead of having to talk into a microphone.
Think of how many other people have something to say but might be overlooked. We want to help them get their point across.
One possibility we’re exploring for future enhancement of speech generation is providing each user with their own voice, through technologies like voice banking and text-to-speech software like Amazon Web Services Polly. Personalization to such a degree could make the experience much richer and more meaningful for users.
But speech generation is only one function of a broad ‘neuroprosthetic.’ People also interact with places, things, and media — and these interactions don’t necessarily require speech. We’re building an Alexa integration to enable home automation control and other enriched experiences. Through the headset, users can interact with their environment, control smart devices, or access news, music, whatever is available.
In time, a device could allow users to control mobility devices for assisted navigation, robots for household tasks, settings for ambient lighting and temperature. It’s enabling a future where more people can live their daily lives more actively and independently.
- Q.
What are the next steps toward creating that future?
A.There are some key technical problems to solve. BCIs historically have been viewed somewhat skeptically, particularly the use of electroencephalography. So our challenge is to come up with a paradigm for stimulus response that enables sufficient expressive capability within the user interface. In other words, can I show you enough different kinds of stimuli to give you meaningful choices so you can efficiently use the system without becoming unnecessarily tired?
Then it’s like whack-a-mole, or the digital equivalent. When we see a specific frequency come through, and a certain power threshold on it, we act on it. How many different unique frequencies can we disambiguate from one another at any given time?
Another challenge is that a commercial device should require a nearly zero learning curve. Once you pop it on, you need to be able use it within minutes and not hours.
So we might couple the stimulus-response technology with a display, or speakers, or haptics that can give biofeedback to help train your brain: ‘I’m doing this right’ or ‘I’m doing it wrong.’ This would give people the positives and negatives as they interact with it. If you can close those iterations quickly, people learn to use it faster.
Our goal is to really harden and fortify the reliability and accuracy of what we’re doing, algorithmically. We then have a very robust IP portfolio that could go into mainstream applications, likely in the form of much deeper partnerships.
In terms of applications, we are pursuing a medical channel and a research channel. Making a medical device is much more challenging than making a consumer device, for a variety of technical reasons: validation, documentation, regulatory considerations. So it takes some time. But the initial indications for use will be speech generation and environmental control.
Eventually, we could look to expand our indications within the control ‘bubble’ to cover additional interactions with people, places, things, and content. Plus, the system could find applications within three other healthcare bubbles. One is diagnostics in areas like ophthalmology and neurology, thanks to the sensors and closed-loop nature of the device. A second is therapeutics for conditions involving attention, focus, and memory. And the third is remote monitoring in telehealth-type situations, because of the network capabilities.
The research side uses the same medical-grade hardware, but loaded with different software to enable biometric analysis and development of experimental AR applications. We’re preparing for production and delivery of initial demand early next year, and we’re actively seeking research partners who would get early access to the device.
In addition to collaborators in neuroscience, neuroengineering, bionics, human-computer interaction, and clinical and translational research, we’re also soliciting input from user experience research to arrive at a final set of specific technical requirements and use-case requirements.
We think there’s tremendous opportunity here. And we’re constantly being asked, ‘When can this become mainstream?’ We have some thoughts and ideas about that, of course, but we also want to hear from the research community about the use cases they can dream up.