This repository contains the ListQA datasets described in the paper - Question Answering using Web Lists.
Datasets, NQWebList and GQWebList, use a subset of questions from Natural Questions and GooAQ respectively. To build these datasets, each annotator was shown a question and a relevant URL from the web and was asked to annotate the list answer on the URL, if it exists. For annotating a list, the annotators copy-pasted the first and the last items in the list. The question, URL and the list answer annotations are shared as part of this repository. Additionally, the date and time of the annotation is also provided. This is needed to download the snapshot of the web page on which the annotation was conducted. The web page snapshot can be downloaded from https://web.archive.org/ (using a function such as this).
The paper also describes another ListQA dataset, called NQWikiList. The list answer annotations for NQWikiList can be found in the Natural Questions dataset.
More details of all the datasets can be found in the paper.