MERLIN: Multiple enhanced representations with LLM generated indices
2024
Large Language Models (LLMs) can be leveraged to improve performance in various stages of the search pipeline – the indexing stage, the query understanding stage, and the ranking or re-ranking stage. The latter two stages involve invoking a LLM during inference, adding latency in fetching the final ranked list of documents. Index enhancement, on the other hand, can be done in the indexing stage, in near real time, and can result in improved retrieval performance while adding no or minimal additional latency during inference. Enhancing indices by leveraging LLMs to augment index content is a promising mechanism to improve first- stage retrieval results in dense retrieval using bi-encoders, on par or exceeding other state-of-the-art approaches. In this work, we show that by using multiple indices to represent documents in different ways, where the representations are generated by an LLM, and querying these indexes in parallel, we can improve retrieval performance with almost no increase in runtime latency. Our results are consistent across a number of pretrained bi-encoder models. We detail the implementation of such a system using AWS services.
Research areas