Format FQA
Question: Transformation matrices
Are both coordinate_system.{sensor}.pose_wrt_parent and streams.{sensor}.stream_properties.intrinsics_pinhole transformation matrices? When are they used?
Answer
coordinate_system.{sensor}.pose_wrt_parent
: This is primarily the matrix that relates the camera to its parent sensor (either lidar or the world coordinate). It's used to map the camera's point position to the lidar/world coordinate position. Given the inverse nature of this mapping, it typically uses the inverse camera extrinsic matrix.streams.{sensor}.stream_properties.intrinsics_pinhole
: This is essentially the Camera Intrinsic Matrix. Here, the system uses it to compute the camera's field of view (FOV) as presented in the lidar.
Question: Difference between context data and pointers
What's the key difference between contexts.{objectId}.context_data and contexts.{objectId}.context_data_pointers? Is there a clearer criterion, like the 'in-room' status being the same for different sensors but placed in context_data_pointers?
Answer
Static vs. Dynamic: The primary distinction between the two lies in whether they vary with frame/sensor.
Static data is stored in:
contexts.{objectId}.context_data
: This records context name/value (e.g., name="roadtype", value=["highway"]).contexts.{objectId}.context_data_pointers
: This documents the nature of the context (e.g., type='vec' or 'text').
Dynamic data goes into:
frame.contexts.{objectId}.context_data
: This captures context name/value/stream (e.g., name="blur", value=True, stream="camera1").contexts.{objectId}.context_data_pointers
: This records the nature of the context, like type='boolean' and the frame intervals in which it appears.
Question: Static vs. Dynamic
This concept of data vs. data_pointers seems recurring. Are there clearer criteria for object_data vs. object_data_pointers?
Answer
Object attributes data also follows the Static vs. Dynamic division:
Static attributes go into:
objects.{objectId}.object_data
: This captures object attributes name/value (e.g., name="car_color", value=["blue"]).objects.{objectId}.object_data_pointers
: This logs the nature of the object attribute (e.g., type='vec').
Dynamic attributes are in:
frame.objects.{objectId}.object_data
: This is for describing shape attributes of an object (e.g., an object under camera1 has an occlusion [attribute occlusion=True]).objects.{objectId}.object_data_pointers
: This records the nature of the object attribute, like type='boolean'.
Question : Difference between tags and contexts
What differentiates tags from contexts?
Answer
In the visionai format,
tags
currently store segmentation_RLE information.Contexts, on the other hand, differentiate with types like "*tagging" or other classifications. "*tagging" primarily logs frame supplementary descriptions, information typically captured during data collection, such as frame timestamp, GPS, city, terrain, or vehicle speed. Other data like weather, time of day, or road type can also be placed in *tagging if not used for classification model training.
Question: Unified as one object
When are objects across frames unified as one object?
Answer
In
objects.{objectId}
, each object represents a unique entity (akin to a tracking_id). Theframe_intervals
within indicates the start and end frames of the object's appearance.If obj1 appears in both frame_1 and frame_2, both frames will describe obj1 with the same objectId.
In essence, labeling would be needed to determine continuity. If not labeled, it's assumed that each frame's object is independent.
Question 7: Streams or frames.streams
How does frames.streams differ from the top-level streams?
Answer
Top-level
streams
: Represents the sensors included across the entire sequence.frames.streams
: Specifies the sensors contributing data to that particular frame. For instance, if lidar frequency isn't as dense, not every frame might have the point cloud data information.
Last updated