WikiDT: Visual-based table recognition and question answering dataset
2024
Companies and organizations grapple with the daily burden of document processing. As manual handling is tedious and error-prune, automating this process is a significant goal. In response to this demand, research on table extraction and information extraction from scanned documents in gaining increasing traction. These extractions are fulfilled by machine learning models that require large-scale and realistic datasets for development. However, despite the clear need, acquiring high-quality and comprehensive dataset can be costly. In this work, we introduce the WikiDT, a TableVQA dataset with hierarchical labels for model diagnosis and potentially benefit the research on sub-tasks, e.g. table recognition. This dataset boasts a massive collection of 70,919 images paired with a diverse set of 159,905 tables, providing an extensive corpus for tacking question-answering tasks. The creation of WikiDT is by extending the existing non-synthetic QA datasets, with a fully automated process with verified heuristics and manual quality inspections, and therefore minimizes labeling effort and human errors. A novel focus of WikiDT and its design goal is to answer questions that require locating the target information fragment and in-depth reasoning, given web-style document im-ages. We established the baseline performance on the TableVQA, table extraction, and table retrieval task with recent state-of-the-art models. The results illustrate that WikiDT is yet solved by the existing models that work moderately well on other VQA tasks, and also introduce advanced challenges on table extraction.
Research areas