Sage: A multimodal knowledge graph-based conversational agent for complex task guidance
2023
This paper presents Sage, a task-oriented multimodal conversational agent devel- oped for the Alexa Prize TaskBot Challenge 2. Focusing on cooking and DIY tasks, Sage integrates task-oriented dialogues with engaging general chats for a human-like interaction model. Its innovative hierarchical dialogue state management, based on hierarchical state machines, enables a flexible conversation flow managing both cross-task and inner-task intents. To offer comprehensive task- related insights, Sage employs a multimodal task knowledge graph, integrating diverse online data with advanced image generation and large language model techniques. Moreover, Sage pioneers an open-domain intent grounding approach with a T5-based model for high-level intent classification and an LLM-based model for open-domain demand understanding. These strategies allow Sage to handle complex user requests, fostering dynamic, relevant conversations. At the end of the semifinals, Sage achieved an average rating of 3.57/5.0.