1 |
VizWiz VQA Dataset (English) |
General Question Answering (QA) |
32,842 / 281,262 |
https://www.kaggle.com/datasets/lhanhsin/vizwiz |
|
https://vizwiz.org/tasks-and-datasets/vqa/ |
Real-world images captured by blind users, often containing noise, blur, and other challenges. |
|
|
|
|
2 |
GQA (Visual genome dataset) (English) |
General Question Answering (QA) |
108,077 / 1,773,258 |
https://cs.stanford.edu/people/dorarad/gqa/download.html |
Can answer complex questions about real-world images using reasoning. The dataset is suitable for team custom labeling targets. |
3 |
uitnlp/OpenViVQA-dataset (Vietnamese) |
General Question Answering (QA) |
> 11,000 / > 37,000 |
https://huggingface.co/datasets/uitnlp/OpenViVQA-dataset |
|
4 |
COCO Dataset (English) |
General Question Answering (QA) |
330,000 / 0 |
https://universe.roboflow.com/microsoft/coco/browse?queryText=class%3Acouch&pageSize=50&startingIndex=50&browseQuery=true |
Contains over 330K images with 2.5 million annotated instances across 80 object categories. It includes many everyday items such as chairs, tables, laptops, and phones. Can consider using this for label custom VQA dataset. |
5 |
Vi-VLM/Vista (Vietnamese) |
General Question Answering (QA) |
> 700,000 / 712,232 |
https://huggingface.co/datasets/Vi-VLM/Vista |
Vi-LLAVA conversation: 112,650 (train) + 4,550 (val) = **117,200 |
Vi-LLAVA complex reasoning**: 112,650 (train) + 4,771 (val) = **117,421 |
|
|
|
|
|
Vi-LLAVA detail description**: 111,153 (train) + 4,714 (val) = **115,867 |
|
|
|
|
|
Vi-ShareGPT4V**: 96,913Vi-WIT: **264,831 |
|
|
|
|
|
⇒** includes diverse subsets |
|
|
|
|
|
6 |
5CD-AI/Viet-ViTextVQA-gemini-VQA (Vietnamese) |
Text OCR and Answering |
9594 / >50,000 |
https://huggingface.co/datasets/5CD-AI/Viet-ViTextVQA-gemini-VQA |
Focused on the Vietnamese language, this dataset uses OCR-extracted text from images to provide contextual information for answering questions. |
7 |
5CD-AI/Viet-Menu-gemini-VQA (Vietnamese) |
Text OCR and Answering |
840 / 5800 |
https://huggingface.co/datasets/5CD-AI/Viet-Menu-gemini-VQA |
This dataset serves as a valuable resource for developing models capable of interpreting and understanding Vietnamese menu content. |
But have a small datasets |
|
|
|
|
|
8 |
5CD-AI/Viet-Receipt-VQA (Vietnamese) |
Text OCR and Answering |
2034 / 14,238 |
https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA |
Capable of performing tasks such as Optical Character Recognition (OCR), information extraction, and document understanding within the context of Vietnamese receipts. |
But have a small datasets |
|
|
|
|
|
9 |
5CD-AI/Viet-Handwriting-gemini-VQA (Vietnamese) |
Handwrtiting Recognition |
1252 / 8700 |
https://huggingface.co/datasets/5CD-AI/Viet-Handwriting-gemini-VQA |
This dataset serves as a valuable resource fo capable of interpreting and understanding Vietnamese handwritten content. |
But have a small datasets |
|
|
|
|
|
10 |
Viet-OpenViVQA-gemini-VQA |
General Question Answering (QA) |
8.024 / 63.789 |
https://huggingface.co/datasets/5CD-AI/Viet-OpenViVQA-gemini-VQA |
|
11 |
Viet-Localization-VQA |
General Question Answering (QA) |
56.989 / 455.801 |
https://huggingface.co/datasets/5CD-AI/Viet-Localization-VQA |
|
12 |
Viet-Vintext-gemini-VQA |
OCRand Text Recognition |
1.056 / 6.000 |
https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA |
|
13 |
Viet-OCR-VQA3 |
OCRand Text Recognition |
137.000 / 822.679 |
https://huggingface.co/datasets/5CD-AI/Viet-Vintext-gemini-VQA |
|
14 |
Viet-Doc-VQA |
Document Understanding |
51.856 / 310.952 |
https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA |
|
15 |
Viet-Doc-VQA-II |
Document Understanding |
64.765 / 388.277 |
https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA-II |
|
16 |
Viet-Geometry-VQA |
Document Understanding |
4.072 / 24.000 |
https://huggingface.co/datasets/5CD-AI/Viet-Geometry-VQA |
|
17 |
Viet-ComputerScience-VQA |
Document Understanding |
6.899 / 40.000 |
https://huggingface.co/datasets/5CD-AI/Viet-ComputerScience-VQA |
|
18 |
Viet-Sketches-VQA |
Document Understanding |
3.088 / 18.000 |
https://huggingface.co/datasets/5CD-AI/Viet-Sketches-VQA |
|
19 |
Viet-Wiki-Handwriting |
Handwriting Recognition |
5.796 / 5.796 |
https://huggingface.co/datasets/5CD-AI/Viet-Wiki-Handwriting |
|