# bbox

Bounding boxes are geometric entities that enclose the shape of an object in Cartesian coordinates. Bounding boxes define minimum and maximum limits at each dimension so the entire object lies within the specified limits. A 2D bounding box is defined as a 4-dimensional vector \[x, y, w, h], where \[x, y] is the center of the bounding box and \[w, h] represents the width (horizontal, x-coordinate dimension) and height (vertical, y-coordinate dimension), respectively.

{% @mermaid/diagram content="erDiagram
objects ||--|{ OBJECT\_UUID : contains
OBJECT\_UUID ||--|| object\_data : contains
object\_data ||--|{ OBJECT\_TYPE : contains
OBJECT\_TYPE ||--|| attributes : contains
objects {
}
OBJECT\_UUID {
type string
name string
object\_data object
}
object\_data {
OBJECT\_TYPE object
}
OBJECT\_TYPE {
name string
val array
stream string
confidence\_score number
attributes object
}" %}

### Example

```json
"bbox": [{
        "name": "bbox_shape",
        "val": [400, 200, 100, 120],
        "stream": "camera1",
        "confidence_score":0.8,
        "attributes" : {
            "boolean" : [{
                    "name" : "visible",
                    "val" : false
                }, {
                    "name" : "occluded",
                    "val" : false
                }
            ],
            "text" : [{
                    "name" : "brand",
                    "val" : "toyota"
                }, {
                    "name" : "color",
                    "val" : "red"
                }
            ]
        }
    ]
}
```

### Schema

A 2D bounding box is defined as a 4-dimensional vector \[x, y, w, h], where \[x, y] is the center of the bounding box and \[w, h] represents the width (horizontal, x-coordinate dimension) and height (vertical, y-coordinate dimension), respectively.<br>

<figure><img src="https://2101974232-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FO2GXP74UOzykZuHBn8HP%2Fuploads%2FfrogwA0hp1dwc1uil2pc%2Fimage.png?alt=media&#x26;token=59d6734e-ab8d-451c-9d7d-2e35ca075f90" alt=""><figcaption></figcaption></figure>

<table><thead><tr><th width="182">name</th><th width="326">description</th><th width="131">type</th><th width="57">unit</th><th>required</th></tr></thead><tbody><tr><td>${OBJECT_TYPE}</td><td>The name of this type. This case is “bbox”.</td><td>object</td><td>-</td><td>true</td></tr><tr><td>name</td><td>The name of this bounding box. Usually to be "bbox_shape"</td><td>string</td><td>-</td><td>true</td></tr><tr><td>val</td><td>Meanings of each element in order as a 4-dimensional vector [x, y, w, h]:<br>➤ x-coordinate of the center<br>➤ y-coordinate of the center<br>➤ w-the width of the rectangle<br>➤ h-the height of the rectangle</td><td>4 elements<br>array of int</td><td>px</td><td>true</td></tr><tr><td>stream</td><td>Represents which stream this shape is on.</td><td>string</td><td>-</td><td>true</td></tr><tr><td>confidence_score</td><td>The confidence score of model prediction of this object. Ground truth does not have this attribute.</td><td>number</td><td>-</td><td>false</td></tr><tr><td>attributes</td><td>attributes this bounding box has</td><td>object</td><td>-</td><td>false</td></tr></tbody></table>

***

## Use Case

### bbox

To describe a bbox dataset with one camera sensor:

* sensor: camera (#camera1)
* ontology:
  * people
    * ischild - boolean (static info)
    * direction - front, left, right, back (dynamic info)
    * age - number (static info)
  * car
    * color - white, silver, blue, red, black (static info)
  * truck
  * bus

Example Code

{% content-ref url="../../../use-case/bbox" %}
[bbox](https://linkervision.gitbook.io/dataverse/visionai-format/use-case/bbox)
{% endcontent-ref %}
