VLM 資料格式 (VQA)
視覺-語言模型 (VLM) 的輸入與輸出格式
架構
範例
[
{
"id": "example_image",
"image": "example_image.jpg",
"conversations": [
{
"question_id": 1,
"question": "Is there a bus in the image?",
"answer": {
"groundtruth": true,
"model001": true
}
},
{
"question_id": 2,
"question": "How many people are in the image?",
"answer": {
"groundtruth": 0,
"model001": 3
}
},
{
"question_id": 3,
"question": "What are the colors of the bus?",
"answer": {
"groundtruth": "red and white",
"model001": "red and white"
}
},
{
"question_id": 4,
"question": "What types of vehicles are in the image?",
"answer": {
"groundtruth": ["bus"],
"model001": ["car"]
}
}
]
},
{
"id": "example_image2",
"image": "example_image2.jpg",
"conversations": []
}
]VLM {}
Name
Definition
Type
Required
conversations {}
Name
Definition
Type
Required
answer field {}
type
Definition
數據結構
File Structure
文件結構
Last updated