Entity and event topic extraction from podcast episode title and description using entity linking
2023
To improve Amazon Music podcast services and customer engagements, we introduce Entity-Linked Topic Extraction (ELTE) to identify well-known entity and event topics from podcast episodes. An entity can be a person, organization, work-of-art, etc., while an event, such as the Opioid epidemic, occurs at specific point(s) in time. ELTE first extracts key-phrases from episode title and description metadata. It then uses entity linking to canonicalize them against Wikipedia knowledge base (KB), ensuring that the topics exist in the real world. ELTE also models NIL-predictions for entity or event topics that are not in the KB, as well as topics that are not of entity or event type. To test the model, we construct a podcast topic database of 1166 episodes from various categories. Each episode comes with a Wiki-link annotated main topic or NIL-prediction. ELTE produces the best overall Exact Match 𝐸𝑀 score of .84, with by-far the best 𝐸𝑀 of .89 among the entity or event type episodes, as well as NIL-predictions for episodes without entity or event main topic (𝐸𝑀 score of .86).
Research areas