Build a VLM Flow Detection

A composable pipeline that combines Object Detection, Time gaps, and VLM with a logic layer. It first gates candidates with cheaper signals, then asks the VLM to confirm semantics, cutting token usage while improving precision.

What is VLM Flow?

VLM Flow is a composable detection pipeline that chains together Object Detection, Time Gaps, VLM, and a logic layer.

The idea is:

  1. Use cheaper signals first (like classic detectors).

  2. Only send the “interesting” cases to the VLM.

  3. Let the logic layer decide when to actually fire an event.

This way you save VLM tokens and still get higher-quality alerts.

Building Blocks:

  • Object Detection (blue): frame-by-frame detections.

  • Time gap (green): a buffer window (e.g., 10s) to collect frames/events.

  • VLM (brown): semantic reasoning over key frames.

  • logic (gray): thresholds, zones, aggregation (ALL/ANY), True/False, Numbers .... logic, where you define how events should work.

  • Event Trigger: fire when conditions are met.

Detection Combinations & Example Use Cases

1. Object Detection → logic → Event

Simple, detector-only setup

  • What it does Pure Object Detection. As soon as the detector finds something and passes the logic rules, it triggers an event.

  • Behavior

    • Very fast: typically within ~1 second.

    • Low cost: no VLM usage.

    • Good when you just need “something entered / exited this area” and don’t care about deeper context.

circle-info

Example

  • Detect when a person enters a restricted zone.

  • Count vehicles crossing a line.

2. Object Detection → logic → VLM → logic → Event

Detector + VLM for smarter, cleaner alerts

  • What it does Object Detection acts as a gate. Only when the detector sees something suspicious do we send frames to the VLM for semantic confirmation.

  • Behavior

    • Big token savings: VLM only runs after the detector says “this might matter”.

    • Higher quality real-time alerts: fewer false positives from simple bounding boxes.

    • Ideal when you already have a good detector, but need semantic understanding to clean up the results.

circle-info

Example

  • Fire detection that understands “how serious” it is

    • Step 1: Object Detection spots “fire/flame”.

    • Step 2: VLM checks if it’s a real hazard (e.g., a big fire) or something harmless (like a candle or stovetop).

  • Reduce false alarms in “sensitive” alerts by letting VLM double-check the situation.

3. VLM → logic → Event

VLM-only, semantics-first detection

  • What it does No classic detector involved. VLM periodically looks at frames to understand the overall situation, not just single objects.

  • Behavior

    • Usually runs on a fixed interval (every few minutes), not every frame.

    • Slower and heavier than simple detectors, but much richer in meaning.

    • Good when you care about “what’s going on here overall?” instead of just “is there a person or car?”.

circle-info

Example

  • Check risk level or disaster signs:

    • Is there flooding?

    • Does the scene look like a traffic jam or accident?

  • Describe the current scene.


4. VLM → logic → Time Gap → VLM → logic → Event

Two-stage VLM: coarse check → fine check / double confirm

(Visually it’s VLM → Time Gap → VLM, with logic before/after.)

  • What it does

    1. First VLM pass (coarse) Run a cheaper, simpler check: “Is there any potential danger here?”

    2. Time Gap If yes, collect more frames over a short time window to get more context.

    3. Second VLM pass (fine / detailed) Run a more detailed prompt to answer deeper questions:

      • How big is the affected area?

      • What kind of damage is happening?

      • Are emergency workers already present?

      • Do we need to escalate the alert level?

  • Behavior

    • Most expensive pipeline in terms of compute and tokens.

    • Best used for few but critical events where you want very high confidence and rich information.

    • Great when you want “coarse filter → deep analysis” instead of running expensive VLM on everything.

circle-info

Example

  • High-value safety or incident monitoring:

    • First stage: detect “possible hazard”.

    • Second stage: only for those cases, analyze impact, severity, people involved, etc.

  • Double confirm:

    • Use multiple frames / time points to reduce wrong decisions from a single unlucky frame.


Create a VLM Flow

Create a VLM Flow

  1. Go to Navigator → VLM.

  2. Click + VLM Flow Template to create a new flow.

  3. In the canvas, drag and drop the components you need

    1. Entry Node (Object Detection or VLM)

    2. Timer

    3. Sub VLM

    4. Event Trigger

  4. Connect the nodes to define how data moves through the pipeline and what conditions should trigger an event.

circle-info

Reminder:

Before you build a Flow, make sure you already created at least one VLM Template. The VLM node in the Flow will reference these templates.

Build a VLM Detectionchevron-right

How to Use a VLM Flow in a Task

  1. Open the Task Detail page for the task you want to configure.

  2. Go to Settings → Event.

  3. Click + Detection Event to add a new event rule.

  4. In the VLM Detection section, choose one of the existing VLM Flows you created.

  5. Save and adjust the event. The task will now use that VLM Flow to decide when to trigger events.


VLM Detection Result

After detecting a predefined event, the system automatically runs the VLM for further analysis.

Integrated Event Results:

The VLM analysis results are appended to the detected event for enriched reporting.

  • Example Results:

    • Weather: Sunny, partly cloudy.

    • Alert Level: 2.

    • Vehicle Types: Firetruck, car.

    • Scene Description: Downtown street with a firetruck stopped.

circle-info

Example Use Case:

If a fire is detected, the VLM analyzes the scene to identify emergency personnel, vehicles, or abnormal conditions.

Summary

VLM Flow gives you a flexible way to mix fast detectors with smart VLM reasoning, so you only spend tokens when it really matters. You can start simple with detector-only Flows, then gradually add VLM, Time Gaps, or even two-stage VLM when you need deeper understanding or fewer false alarms.

In practice, the setup is straightforward:

  • Build your VLM Templates,

  • Design a VLM Flow by wiring together Object Detection, Time Gap, VLM, and logic,

  • Attach that Flow to a task’s Event settings.

Once that’s done, your tasks can move from basic “did something enter this zone?” alerts to rich, context-aware event detection that understands what’s actually happening in the scene.

Last updated