VisionAI Data Format
Last updated
Last updated
The VisionAI data format schema for multi-sensor data annotation is organized as a dictionary and can be comprehensively described and consulted via a JSON schema file, adhering to the guidelines set forth by the ASAM OpenLABEL format. To ensure compliance with the format's specifications, the annotation file must undergo JSON validation using the OpenLABEL JSON schema, with the "openlabel" keyword being substituted with "visionai".
Check on OpenLABEL JSON schema https://openlabel.asam.net/V1-0-0/schema/openlabel_json_schema.json
The following JSON example outlines the top-level components of the VisionAI format. Further details about each element will be provided in separate sections.
The structure of the VisionAI JSON file (visionai.json) format is outlined below. Additional information on the file structure can be found in the Data Structure Details section.
The inclusion of “coordinate_systems” and “streams” in the VisionAI format is contingent on the sensor settings of each project. Please consult the following combination for more information:
The data folder structure for using VisionAI data format is primarily organized into sequences, containing all sensor data arranged in frame order.
The sensor names are associated with the stream_name in the visionai.json file.
The sequence and frame numbers, as well as their folder and file names, are composed of 12 digits and begin with 000000000000, representing the sequence IDs and frame IDs, respectively.
Each sequence contains annotation information, including both ground truth and other annotations. Ground truth information is mandatory and must be labeled as "groundtruth" in the folder name, while other annotation folders ($NAME) must correspond to the information provided in the visionai.json file.
A sample folder structure of 2 sequences, multi-sensors (“camera1” and “lidar1“) within 3 frames is provided below:
Folder information such as sensor name and annotation ($NAME) naming rule
The name may only contain lowercase letters, numbers, and hyphens (-), and must begin and end with a letter or a number. Each hyphen (-) must be preceded and followed by a non-hyphen character. The name must also be between 3 and 40 characters long.
PCD (Point Cloud Data) file format
The PCD (Point Cloud Data) file format allows only the "DATA binary_compressed" option. This means that the point cloud data in the PCD file is stored in binary compressed format. Other data storage options, such as "ASCII" or "binary", are not permitted in this format.
More infomation on
To describe a bbox dataset with one camera sensor:
To describe a dataset with one camera sensor (bbox annotation) and one lidar sensor (cuboid annotation) in the coordinate system of iso8855-1:
To describe a semantic segmentation dataset with one camera sensor:
To describe a dataset with taggings:
Name | Definition | Required |
---|---|---|
Project Sensor Settings | coordinate_systems (Required) |
---|---|
visionai
The name of this particular data format.
true
coordinate_system
A numerical system that specifies the location of points and geometric elements within a given space. In the VisionAI format, coordinate systems are declared by name and linked through parent-child relationships to establish the hierarchy.
streams
A source of the data sequence, typically a sensor. The VisionAI format combines multi-sensor information (streams) to describe annotations for corresponding streams. Stream keys contain information such as intrinsic calibration parameters for cameras.
true
contexts
Lists all contextual information present in the annotation. For example, the details about the scene such as properties, weather conditions, or location. In VisionAI format, the "contexts" is mainly used for classification and tagging information.
false
objects
The objects are physical entities within a scene, such as people, cars, or lane markings. Object keys contain information such as the object's name, type, annotation location, and frame intervals. UUIDs are used as keys.
false
frames
Containers for dynamic, time-based information. Each frame in JSON format data is described by an integer number.
true
frame_intervals
An array defining the frame intervals for which JSON data contains information.
true
tags
Tags are used to provide information about a certain data file, which may be specified in the tags entry in the JSON file.
false
metadata
The version string for this schema.
true
1* camera
false
n* cameras
false
1* lidar
false
n* lidars
true
m* camera + n* lidar
true