Skip to content

WebQnA/WebQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News

Oct 15 Update: We decided to release the output files of our baseline models in case they will be helpful for future investigations. Feel free to check it out!

Oct 9 Update: Please note that we've updated the image reading method from cv2 to PIL in the demo notebook. ImageFile.LOAD_TRUNCATED_IMAGES = True is the key to avoid "Image NoneType error".


Download Data

The main data is split into two files. One for train+val (36,766+4,966 samples) and the other for test (7,540 samples).

  • Images

The large img file is compressed and split into 51 chunks of 1GB. You can download all chunks at once by running this script.

To unzip and merge all chunks, run 7z x imgs.7z.001

We also provide google drive download links

You are good when you have WebQA_train_val.json, WebQA_test.json, imgs.lineidx and imgs.tsv.


Output Format (A json file with guids as keys)

{<guid>: {'sources': [<image_id>/<snippet_id>, ..., ],
          'answer': "xxxxxxx" },
 <guid>: {...},
 <guid>: {...},

}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages