VQA Datasets Recap

STT	Dataset	Task	Total images / Total captions	Link	Note
1	VizWiz VQA Dataset (English)	General Question Answering (QA)	32,842 / 281,262	https://www.kaggle.com/datasets/lhanhsin/vizwiz
https://vizwiz.org/tasks-and-datasets/vqa/	Real-world images captured by blind users, often containing noise, blur, and other challenges.
2	GQA (Visual genome dataset) (English)	General Question Answering (QA)	108,077 / 1,773,258	https://cs.stanford.edu/people/dorarad/gqa/download.html	Can answer complex questions about real-world images using reasoning. The dataset is suitable for team custom labeling targets.
3	uitnlp/OpenViVQA-dataset (Vietnamese)	General Question Answering (QA)	> 11,000 / > 37,000	https://huggingface.co/datasets/uitnlp/OpenViVQA-dataset
4	COCO Dataset (English)	General Question Answering (QA)	330,000 / 0	https://universe.roboflow.com/microsoft/coco/browse?queryText=class%3Acouch&pageSize=50&startingIndex=50&browseQuery=true	Contains over 330K images with 2.5 million annotated instances across 80 object categories. It includes many everyday items such as chairs, tables, laptops, and phones. Can consider using this for label custom VQA dataset.
5	Vi-VLM/Vista (Vietnamese)	General Question Answering (QA)	> 700,000 / 712,232	https://huggingface.co/datasets/Vi-VLM/Vista	Vi-LLAVA conversation: 112,650 (train) + 4,550 (val) = **117,200
Vi-LLAVA complex reasoning: 112,650 (train) + 4,771 (val) = 117,421
Vi-LLAVA detail description: 111,153 (train) + 4,714 (val) = 115,867
Vi-ShareGPT4V: 96,913Vi-WIT: 264,831
⇒** includes diverse subsets
6	5CD-AI/Viet-ViTextVQA-gemini-VQA (Vietnamese)	Text OCR and Answering	9594 / >50,000	https://huggingface.co/datasets/5CD-AI/Viet-ViTextVQA-gemini-VQA	Focused on the Vietnamese language, this dataset uses OCR-extracted text from images to provide contextual information for answering questions.
7	5CD-AI/Viet-Menu-gemini-VQA (Vietnamese)	Text OCR and Answering	840 / 5800	https://huggingface.co/datasets/5CD-AI/Viet-Menu-gemini-VQA	This dataset serves as a valuable resource for developing models capable of interpreting and understanding Vietnamese menu content.
But have a small datasets
8	5CD-AI/Viet-Receipt-VQA (Vietnamese)	Text OCR and Answering	2034 / 14,238	https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA	Capable of performing tasks such as Optical Character Recognition (OCR), information extraction, and document understanding within the context of Vietnamese receipts.
But have a small datasets
9	5CD-AI/Viet-Handwriting-gemini-VQA (Vietnamese)	Handwrtiting Recognition	1252 / 8700	https://huggingface.co/datasets/5CD-AI/Viet-Handwriting-gemini-VQA	This dataset serves as a valuable resource fo capable of interpreting and understanding Vietnamese handwritten content.
But have a small datasets
10	Viet-OpenViVQA-gemini-VQA	General Question Answering (QA)	8.024 / 63.789	https://huggingface.co/datasets/5CD-AI/Viet-OpenViVQA-gemini-VQA
11	Viet-Localization-VQA	General Question Answering (QA)	56.989 / 455.801	https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA
12	Viet-Vintext-gemini-VQA	OCRand Text Recognition	1.056 / 6.000	https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA
13	Viet-OCR-VQA3	OCRand Text Recognition	137.000 / 822.679	https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA
14	Viet-Doc-VQA	Document Understanding	51.856 / 310.952	https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA
15	Viet-Doc-VQA-II	Document Understanding	64.765 / 388.277	https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA-II
16	Viet-Geometry-VQA	Document Understanding	4.072 / 24.000	https://huggingface.co/datasets/5CD-AI/Viet-Geometry-VQA
17	Viet-ComputerScience-VQA	Document Understanding	6.899 / 40.000	https://huggingface.co/datasets/5CD-AI/Viet-ComputerScience-VQA
18	Viet-Sketches-VQA	Document Understanding	3.088 / 18.000	https://huggingface.co/datasets/5CD-AI/Viet-Sketches-VQA
19	Viet-Wiki-Handwriting	Handwriting Recognition	5.796 / 5.796	https://huggingface.co/datasets/5CD-AI/Viet-Wiki-Handwriting