This is a package to apply clustering algorithms to utterances, embedded with a fine-tuned version out of the Supervised Intent Clustering package.
Modern virtual assistants are trained to classify customer requests into a taxonomy of predesigned intents. Requests that fall outside of this taxonomy, however, are often unhandled and need to be clustered to define new experiences. Recently, state-of-the-art results in intent clustering were achieved by training a neural network with a latent structured prediction loss. Unfortunately, though, this new approach suffers from a quadratic bottleneck as it requires to compute a joint embedding representation for all pairs of utterances to cluster. To overcome this limitation, we instead cast the problem into a representation learning task, and we adapt the latent structured prediction loss to fine-tune sentence encoders, thus making it possible to obtain clustering-friendly single-sentence embeddings. Our experiments show that the supervised clustering loss returns state-of-the-art results in terms of clustering accuracy and adjusted mutual information.