IR evaluation and learning in the presence of forbidden documents

David Carmel; Nachshon Cohen; Amir Ingber; Elad Kravi

Publication

IR evaluation and learning in the presence of forbidden documents

By David Carmel, Nachshon Cohen, Amir Ingber, Elad Kravi

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Many IR collections contain forbidden documents (𝐹 -docs), i.e. documents that should not be retrieved to the searcher. In an ideal scenario 𝐹 -docs are clearly flagged, hence the ranker can filter them out, guaranteeing that no 𝐹 -doc will be exposed. However, in real-world scenarios, filtering algorithms are prone to errors. Therefore, an IR evaluation system should also measure filtering quality in addition to ranking quality. Typically, filtering is considered as a classification task and is evaluated independently of the ranking quality. However, due to the mutual affinity between the two, it is desirable to evaluate ranking quality while filtering decisions are being made. In this work we propose nDCG_f , a novel extension of the nDCG_min metric [14], which measures both ranking and filtering quality of the search results. We show both theoretically and empirically that while nDCG_min is not suitable for the simultaneous ranking and filtering task, nDCG_f is a reliable metric in this case.

We experiment with three datasets for which ranking and filtering are both required. In the PR dataset our task is to rank product reviews while filtering those marked as spam. Similarly, in the CQA dataset our task is to rank a list of human answers per question while filtering bad answers. We also experiment with the TREC web-track datasets, where 𝐹 -docs are explicitly labeled, sorting participant runs according to their ranking and filtering quality, demonstrating the stability, sensitivity, and reliability of nDCG_f for this task. We propose a learning to rank and filter (LTRF) framework that is specifically designed to optimize nDCG_f , by learning a ranking model and optimizing a filtering threshold used for discarding documents with lower scores. We experiment with several loss functions demonstrating their success in learning an effective LTRF model for the simultaneous learning and filtering task.

IR evaluation and learning in the presence of forbidden documents

Latest news

Work with us