LibEvolutionEval: A benchmark and study for version-specific code generation

Sachit Kuhar; Wasi Ahmad; Zijian Wang; Nihal Jain; Haifeng Qian; Baishakhi Ray; Murali Krishna Ramanathan; Xiaofei Ma; Anoop Deoras

Publication

LibEvolutionEval: A benchmark and study for version-specific code generation

By Sachit Kuhar, Wasi Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

2025

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Recent advancements in code completion models have primarily focused on local file contexts (Ding et al., 2023b; Jimenez et al., 2024). However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidlyevolving public libraries. To fill the gap, we introduce LIBEVOLUTIONEVAL, a detailed study requiring an understanding of library evolution to perform in-line code completion accurately. LIBEVOLUTIONEVAL provides a versionspecific code-completion task comprised of eight libraries (torch, torchvision, scipy, pil, tqdm, pyyaml, matplotlib, and pandas) as they evolve over the year along with a detailed analysis of the evolution of two popular and well-maintained public libraries: PyTorch and Matplotlib. We evaluate popular public models and find that public library evolution significantly influences model performance. We explored mitigation methods by studying how retrieved version-specific library documentation and prompting can improve the model’s capability in handling these fastevolving packages, paving a promising future path in better handling fast-evolving libraries.

LibEvolutionEval: A benchmark and study for version-specific code generation

Latest news

Work with us