VisionAI DataVerse
Ask or search…

Format FQA

Question: Transformation matrices

Are both coordinate_system.{sensor}.pose_wrt_parent and streams.{sensor}.stream_properties.intrinsics_pinhole transformation matrices? When are they used?


  • coordinate_system.{sensor}.pose_wrt_parent: This is primarily the matrix that relates the camera to its parent sensor (either lidar or the world coordinate). It's used to map the camera's point position to the lidar/world coordinate position. Given the inverse nature of this mapping, it typically uses the inverse camera extrinsic matrix.
  • streams.{sensor}.stream_properties.intrinsics_pinhole: This is essentially the Camera Intrinsic Matrix. Here, the system uses it to compute the camera's field of view (FOV) as presented in the lidar.

Question: Difference between context data and pointers

What's the key difference between contexts.{objectId}.context_data and contexts.{objectId}.context_data_pointers? Is there a clearer criterion, like the 'in-room' status being the same for different sensors but placed in context_data_pointers?


Static vs. Dynamic: The primary distinction between the two lies in whether they vary with frame/sensor.
  • Static data is stored in:
    • contexts.{objectId}.context_data: This records context name/value (e.g., name="roadtype", value=["highway"]).
    • contexts.{objectId}.context_data_pointers: This documents the nature of the context (e.g., type='vec' or 'text').
  • Dynamic data goes into:
    • frame.contexts.{objectId}.context_data: This captures context name/value/stream (e.g., name="blur", value=True, stream="camera1").
    • contexts.{objectId}.context_data_pointers: This records the nature of the context, like type='boolean' and the frame intervals in which it appears.

Question: Static vs. Dynamic

This concept of data vs. data_pointers seems recurring. Are there clearer criteria for object_data vs. object_data_pointers?


Object attributes data also follows the Static vs. Dynamic division:
  • Static attributes go into:
    • objects.{objectId}.object_data: This captures object attributes name/value (e.g., name="car_color", value=["blue"]).
    • objects.{objectId}.object_data_pointers: This logs the nature of the object attribute (e.g., type='vec').
  • Dynamic attributes are in:
    • frame.objects.{objectId}.object_data: This is for describing shape attributes of an object (e.g., an object under camera1 has an occlusion [attribute occlusion=True]).
    • objects.{objectId}.object_data_pointers: This records the nature of the object attribute, like type='boolean'.

Question : Difference between tags and contexts

What differentiates tags from contexts?


  • In the visionai format, tags currently store segmentation_RLE information.
  • Contexts, on the other hand, differentiate with types like "*tagging" or other classifications. "*tagging" primarily logs frame supplementary descriptions, information typically captured during data collection, such as frame timestamp, GPS, city, terrain, or vehicle speed. Other data like weather, time of day, or road type can also be placed in *tagging if not used for classification model training.

Question: Unified as one object

When are objects across frames unified as one object?


  • In objects.{objectId}, each object represents a unique entity (akin to a tracking_id). The frame_intervals within indicates the start and end frames of the object's appearance.
  • If obj1 appears in both frame_1 and frame_2, both frames will describe obj1 with the same objectId.
  • In essence, labeling would be needed to determine continuity. If not labeled, it's assumed that each frame's object is independent.

Question 7: Streams or frames.streams

How does frames.streams differ from the top-level streams?


  • Top-level streams: Represents the sensors included across the entire sequence.
  • frames.streams: Specifies the sensors contributing data to that particular frame. For instance, if lidar frequency isn't as dense, not every frame might have the point cloud data information.