Whether collected through employee surveys, product feedback channels, voice-of-customer mechanisms, or other unstructured text sources, qualitative data offers invaluable insights that can complement and contextualize quantitative business intelligence. However, the manual effort required to analyze large volumes of open-ended responses has limited the accessibility of these insights.
Topic-modeling approaches like latent Dirichlet allocation (LDA), which cluster documents on the basis of word co-occurrence, can help uncover thematic structures in large text corpora. However, LDA and other standard topic-modeling techniques often struggle to fully capture the contextual nuances and ambiguities inherent in natural language.
In a recent paper, we introduce the Qualitative Insights Tool (QualIT), a novel approach that integrates pretrained large language models (LLMs) with traditional clustering techniques. By leveraging the deep understanding and powerful language generation capabilities of LLMs, QualIT is able to enrich the topic-modeling process, generating more nuanced and interpretable topic representations from free-text data.
We evaluated QualIT on the 20 Newsgroups dataset, a widely used benchmark for topic-modeling research. Compared to standard LDA and the state-of-the-art BERTopic approach, QualIT demonstrated substantial improvements in both topic coherence (70% vs. 65% and 57% for the benchmarks) and topic diversity (95.5% vs. 85% and 72%).
Hierarchical clustering
QualIT doesn't simply rely on the LLM to generate topics and themes. It employs a unique two-stage clustering approach to uncover both high-level topic insights and more granular subtopics. First, the model groups key phrases extracted by the LLM into primary clusters, representing the overarching themes present in the corpus. It then applies a secondary round of clustering within each primary cluster to identify more specific subtopics.
The key steps in the QualIT approach are
- Key-phrase extraction: The LLM analyzes each document, identifying key phrases that capture the most salient themes and topics. This is a crucial advantage over alternative methods that characterize each document according to a single topic. By extracting multiple key phrases per document, QualIT is able to deal with the reality that a single piece of text can encompass a range of interconnected themes and perspectives.
- Hallucination check: To ensure the reliability of the extracted key phrases, QualIT calculates a coherence score for each one. This score assesses how well the key phrase aligns with the actual text, serving as a metric for consistency and relevance. Key phrases that fall below a certain coherence threshold are flagged for potential "hallucination" and removed from the analysis, helping to maintain the quality and trustworthiness of the topic-modeling output.
- Clustering: The hierarchical structure of the two-phase clustering method provides a comprehensive and interpretable view of the thematic landscape, allowing researchers and decision-makers to navigate from broad, overarching topics down to the more nuanced and detailed aspects of the data. Importantly, QualIT leverages the key phrases as the basis for clustering, rather than directly grouping the full documents. This reduces noise and the influence of irrelevant data, enabling the algorithm to focus on the thematic essence of the text.
In addition to comparing QualIT to earlier topic-modeling methods, we also asked human reviewers to validate its output. The reviewers were able to more consistently categorize the topics generated by QualIT into the known ground-truth categories; for example, when at least three out of four evaluators agreed on the topic classification, QualIT achieved a 50% overlap with the ground truth, compared to just 25% for LDA and BERTopic. Interested readers can learn more about the technical implementation in both the QualIT paper and an earlier paper on reconciling methodological paradigms in qualitative research.
Applications
Qualitative text doesn’t just include feedback on surveys or from focus groups; it also includes product interaction data. For example, a system similar to QualIT could analyze the questions asked of an AI chatbot, to understand what topics are of most interest to users. If the interaction data is paired with customer feedback data, such as thumbs-up/thumbs-down ratings, the system can help explain which topics the chatbot didn’t perform as well on.
Looking ahead, further enhancements to QualIT’s language-modeling capabilities (such as support for languages beyond English, especially low-resource ones) and topic-clustering algorithms hold promise to unlock even more-powerful qualitative-analysis capabilities. As organizations continue to recognize the value of qualitative data, tools that can efficiently and effectively surface meaningful insights will become essential.
Acknowledgments: Alex Gil, Anshul Mittal, Rutu Mulkar