Sports broadcasters are increasingly sharing statistical insights throughout the game to tell a richer story for the audience. Thanks to abundant data and advanced statistics, broadcasters can quickly tell stories and make comparisons between teams and players to keep viewers engaged. To keep up with the fast-paced nature of many games, broadcasters rely on template-generated narratives to speak about in-game stats in real time. When milestone event happens, these rule-based templates “stitch” relevant tabular information and create narratives with fixed sentence structures.
Because of the fixed structure, however, these narratives often sound rigid and are hard to understand, especially when lots of information is concatenated into long sentences. Commentators may choose to ignore these narratives if their meanings are hard to grasp. As a result, exciting stats may not come through to the audience. Additionally, as data volume rises, the amounts of efforts required on building and maintaining templates also increase. They have to be manually updated constantly to reflect the changes.
To address this issue, we design and build an end-to-end machine learning pipeline using natural language generation, a technique to generate natural language descriptions from structured data. The pipeline is trained to understand the semantic meaning of inputs, and can be expanded to include new statistics and applied to other sports through fine-tuning with a few hundred samples. This enables broadcasters to produce more natural-sounding narratives and easily scale narrativegeneration engines. The generated narratives can also be used in social media and push notifications. By coupling narratives with the highlighted game clips, broadcasters can ensure fans do not miss exciting moments from their favorite teams and players.
The rest of the paper is organized as follows: in Section 2 we describe the two-step modeling approach, the dataset and the evaluation metrics; Section 3 highlights the sample results achieved with the solution; in Section 4 we summarize the contributions followed by discussions on future improvements.
Sports narrative enhancement with natural language generation
2022
Research areas