binary

The “binary" type of object is used to describe image information. Specifically, in the VisionAI format, this type is used to represent the semantic segmentation mask in RLE format.

The pixel-wise mask information can be compressed using the Run-Length Encoding (RLE) method. For example, the sequence "11122222222000000000000" can be compressed to "#3V1#8V2#13V0". The number after the "#" character indicates the count of the consecutive pixels with the corresponding value indicated after the "V" character. This method provides superior compression ratios compared to the original data, especially when there are repeated values.

In another example, if the mask pixels from left to right are "car, car, car, sky, sky, sky, sky", the corresponding RLE value will be

#3V1#4V2

In this example, the numbers "3" and "4" represent the counts of consecutive pixels for the "car" and "sky" classes, respectively. The class numbers are indicated by "V1" and "V2", which are defined in the tags field.

Example

"binary": [{
    "name": "semantic_mask",
    "val": "#2142V6#21379V5#902V3#762V5#3V3#2195V2#36V6#11V2#2V6#2V2#17V6#2V2#4V6#2V2#10V6#720V2#1V6#1V2#3V6#3V2#42V6#50V2#2V6#3V2#25V6#12V2#5V6#1V2#12V6#12V2#1V6#2V2#3V6#1V2#20V6#57V2#5V6#7V2#1V6#1V2#7V6#3V2#29V6#2752V2#3V6#4V2#3V6#12V2#1V6#1V2#5V6#2V2#5V6#1V2#6V6#1V2#3V6#1V2#12V6#45V2#18V6#7V2#76V6#333V2#1V6#2V2#5V6#1V2#1V6#1V2#2V6#20V2#2V6#5V2#193V6#421V2#1V6#406V2#8V6#2V2#1V6#3V2#1V6#4V2#1V6#1V2#17V6#94V2#24V6#1V2#33V6#7V2#2V6#51V2#74V6#640V2#1V6#4V2#12V6#2V2#21V6#16V2#63V6#1154V2#3V6#2502V2#3V3#1V2#121V3#76V2#26V3#354V2#1V3#1V2#6V3#3V2#1V3#6V2#6V3#1V2#2V3#5V2#2V3#5125V2#10812V3#36244V2#2V5#1V2#32V5#17V2#2V5#1V2#18V5#7V2#29V5#3V2#1V5#8V2#4V5#5V2#2V5#1V2#20V5#19V2#4V5#8V2#1V5#9V2#93V5#548V2#2V5#2V2#5V5#1V2#1V5#2V2#66V5#380V2#4V5#6V2#1V5#1V2#2V5#1V2#56V5#5V2#1V5#1V2#1V5#5V2#3V5#5V2#1V5#3V2#19V5#3V2#2V5#5V2#4V5#5V2#2V5#1V2#3V5#3V2#99V5#7V2#1049V5#11748V2#174V3#1195V2#1V3#1V2#1V3#3V2#1V3#7V2#17V3#34V2#24V3#8992V2#1V3#31V2#1V3#2V2#2V3#9655V2#1V3#2V2#20V3#7V2#2V3#3V2#39V3#4V2#13V3#3V2#6V3#2V2#1V3#3V2#6V3#1V2#20V3#7V2#6V3#8V2#1V3#1V2#112V3#5V2#273V3#2V2#494V3#4V2#472V3#32V2#5V3#2V2#5V3#7V2#16V3#3V2#3V3#12212V2#46972V5#231V2#1V5#2V2#1V5#6V2#4V5#1V2#1V5#4V2#2V5#2V2#65V5#14V2#1V5#2V2#2V5#6V2#1V5#2V2#26V5#8V2#47V5#7V2#4V5#6V2#29V5#2V2#1V5#1V2#1V5#4V2#7V5#1V2#136V5#4V2#1V5...",
    "data_type": "",
    "encoding": "rle",
    "stream": "camera1"
}]

Schema

In the VisionAI format, the binary data type of the segmentation mask is considered dynamic information, which is defined at the frames-objects level. This information describes the mask of a certain frame captured from a stream(sensor).

namedescriptiontyperequired

name

description

type

required

name

The name of this binary. In VisionAI format, it uses “semantic_mask“ as the value for the semantic segmentation RLE information.

string

true

val

The semantic mask value is based on Run-Length Encoding (RLE) defined above.

string

true

data_type

The data type of the binary. It is a required field in OpenLABEL, please leave it as ““.

string

true

encoding

The encoding method. It supports “rle“ for value only as defined above.

string

true

stream

Represents which stream this mask is on.

string

true


Use Case

semantic segmentation

To describe a semantic segmentation dataset with one camera sensor:

  • sensor: camera (#camera1)

  • ontology

    • background

    • person

    • bicycle

    • car

    • motorcycle

    • airplane

    • bus

    • train

    • truck

    • boat

    • trafficlight

Example Code

pagesemantic segmetation

Last updated