STT Dataset Task Total images / Total captions Link Note
1 VizWiz VQA Dataset (English) General Question Answering (QA) 32,842 / 281,262 https://www.kaggle.com/datasets/lhanhsin/vizwiz
https://vizwiz.org/tasks-and-datasets/vqa/ Real-world images captured by blind users, often containing noise, blur, and other challenges.
2 GQA (Visual genome dataset) (English) General Question Answering (QA) 108,077 / 1,773,258 https://cs.stanford.edu/people/dorarad/gqa/download.html Can answer complex questions about real-world images using reasoning. The dataset is suitable for team custom labeling targets.
3 uitnlp/OpenViVQA-dataset (Vietnamese) General Question Answering (QA) > 11,000 / > 37,000 https://huggingface.co/datasets/uitnlp/OpenViVQA-dataset
4 COCO Dataset (English) General Question Answering (QA) 330,000 / 0 https://universe.roboflow.com/microsoft/coco/browse?queryText=class%3Acouch&pageSize=50&startingIndex=50&browseQuery=true Contains over 330K images with 2.5 million annotated instances across 80 object categories. It includes many everyday items such as chairs, tables, laptops, and phones. Can consider using this for label custom VQA dataset.
5 Vi-VLM/Vista (Vietnamese) General Question Answering (QA) > 700,000 / 712,232 https://huggingface.co/datasets/Vi-VLM/Vista Vi-LLAVA conversation: 112,650 (train) + 4,550 (val) = **117,200
Vi-LLAVA complex reasoning**: 112,650 (train) + 4,771 (val) = **117,421
Vi-LLAVA detail description**: 111,153 (train) + 4,714 (val) = **115,867
Vi-ShareGPT4V**: 96,913Vi-WIT: **264,831
⇒** includes diverse subsets
6 5CD-AI/Viet-ViTextVQA-gemini-VQA (Vietnamese) Text OCR and Answering 9594 / >50,000 https://huggingface.co/datasets/5CD-AI/Viet-ViTextVQA-gemini-VQA Focused on the Vietnamese language, this dataset uses OCR-extracted text from images to provide contextual information for answering questions.
7 5CD-AI/Viet-Menu-gemini-VQA (Vietnamese) Text OCR and Answering 840 / 5800 https://huggingface.co/datasets/5CD-AI/Viet-Menu-gemini-VQA This dataset serves as a valuable resource for developing models capable of interpreting and understanding Vietnamese menu content.
But have a small datasets
8 5CD-AI/Viet-Receipt-VQA (Vietnamese) Text OCR and Answering 2034 / 14,238 https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA Capable of performing tasks such as Optical Character Recognition (OCR), information extraction, and document understanding within the context of Vietnamese receipts.
But have a small datasets
9 5CD-AI/Viet-Handwriting-gemini-VQA (Vietnamese) Handwrtiting Recognition 1252 / 8700 https://huggingface.co/datasets/5CD-AI/Viet-Handwriting-gemini-VQA This dataset serves as a valuable resource fo capable of interpreting and understanding Vietnamese handwritten content.
But have a small datasets
10 Viet-OpenViVQA-gemini-VQA General Question Answering (QA) 8.024 / 63.789 https://huggingface.co/datasets/5CD-AI/Viet-OpenViVQA-gemini-VQA
11 Viet-Localization-VQA General Question Answering (QA) 56.989 / 455.801 https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA
12 Viet-Vintext-gemini-VQA OCRand Text Recognition 1.056 / 6.000 https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA
13 Viet-OCR-VQA3 OCRand Text Recognition 137.000 / 822.679 https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA
14 Viet-Doc-VQA Document Understanding 51.856 / 310.952 https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA
15 Viet-Doc-VQA-II Document Understanding 64.765 / 388.277 https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA-II
16 Viet-Geometry-VQA Document Understanding 4.072 / 24.000 https://huggingface.co/datasets/5CD-AI/Viet-Geometry-VQA
17 Viet-ComputerScience-VQA Document Understanding 6.899 / 40.000 https://huggingface.co/datasets/5CD-AI/Viet-ComputerScience-VQA
18 Viet-Sketches-VQA Document Understanding 3.088 / 18.000 https://huggingface.co/datasets/5CD-AI/Viet-Sketches-VQA
19 Viet-Wiki-Handwriting Handwriting Recognition 5.796 / 5.796 https://huggingface.co/datasets/5CD-AI/Viet-Wiki-Handwriting