In the rapidly evolving landscape of multimodal interactive technologies, a critical gap persists in their utility and reach across diverse user populations. These technologies, while advanced, exhibit a narrow application of modalities, consequently marginalizing certain groups. Additionally, their superficial representations of context pose a challenge in accommodating users with varied preferences, cultural backgrounds, or dialects, thereby limiting their collaborative efficacy. The impoverished representations of context fail to accommodate creative, human-like, and flexible communication styles. Compounded by static generative capabilities that dampen user retention rates, these issues are further amplified in the face of safety concerns, especially in the age of large language models. This work explores these multifaceted limitations and strives to highlight avenues for enhancing accessibility, inclusivity, engagement, safety, and flexibility. We propose an IncluSive And collaBorativE ALexa skill (ISABEL). Our novel system is the first to combine diverse theories from machine learning, cognitive science, and linguistics. Pairing these theories with community outreach and co-design, we are able to build:
1. new, sophisticated representations of context that support equitable and human-like under- standing of diverse user populations;
2. a first-of-its-kind, multimodal interface – co-designed with the Deaf and Hard of Hearing (DHH) community – using touch and visual communication to support workflows for users with diverse capabilities;
3. and, novel, neurosymbolic strategies to incorporate new generative AI technologies and enable safe, efficient, and engaging response generation.
As summarized in Figure 1, these contributions interact and culminate to achieve three primary goals in our design of ISABEL: inclusivity, human-likeness, and safety.
ISABEL: An inclusive and collaborative task-oriented dialogue system
2023