Build a VLM Flow Detection
A composable pipeline that combines Object Detection, Time gaps, and VLM with a logic layer. It first gates candidates with cheaper signals, then asks the VLM to confirm semantics, cutting token usage while improving precision.
What is VLM Flow?
VLM Flow is a composable detection pipeline that chains together Object Detection, Time Gaps, VLM, and a logic layer.
The idea is:
Use cheaper signals first (like classic detectors).
Only send the “interesting” cases to the VLM.
Let the logic layer decide when to actually fire an event.
This way you save VLM tokens and still get higher-quality alerts.
Building Blocks:
Object Detection (blue): frame-by-frame detections.
Time gap (green): a buffer window (e.g., 10s) to collect frames/events.
VLM (brown): semantic reasoning over key frames.
logic (gray): thresholds, zones, aggregation (ALL/ANY), True/False, Numbers .... logic, where you define how events should work.
Event Trigger: fire when conditions are met.

Detection Combinations & Example Use Cases
1. Object Detection → logic → Event
Simple, detector-only setup
What it does Pure Object Detection. As soon as the detector finds something and passes the logic rules, it triggers an event.
Behavior
Very fast: typically within ~1 second.
Low cost: no VLM usage.
Good when you just need “something entered / exited this area” and don’t care about deeper context.
Example
Detect when a person enters a restricted zone.
Count vehicles crossing a line.
2. Object Detection → logic → VLM → logic → Event
Detector + VLM for smarter, cleaner alerts
What it does Object Detection acts as a gate. Only when the detector sees something suspicious do we send frames to the VLM for semantic confirmation.
Behavior
Big token savings: VLM only runs after the detector says “this might matter”.
Higher quality real-time alerts: fewer false positives from simple bounding boxes.
Ideal when you already have a good detector, but need semantic understanding to clean up the results.
Example
Fire detection that understands “how serious” it is
Step 1: Object Detection spots “fire/flame”.
Step 2: VLM checks if it’s a real hazard (e.g., a big fire) or something harmless (like a candle or stovetop).
Reduce false alarms in “sensitive” alerts by letting VLM double-check the situation.
3. VLM → logic → Event
VLM-only, semantics-first detection
What it does No classic detector involved. VLM periodically looks at frames to understand the overall situation, not just single objects.
Behavior
Usually runs on a fixed interval (every few minutes), not every frame.
Slower and heavier than simple detectors, but much richer in meaning.
Good when you care about “what’s going on here overall?” instead of just “is there a person or car?”.
Example
Check risk level or disaster signs:
Is there flooding?
Does the scene look like a traffic jam or accident?
Describe the current scene.
4. VLM → logic → Time Gap → VLM → logic → Event
Two-stage VLM: coarse check → fine check / double confirm
(Visually it’s VLM → Time Gap → VLM, with logic before/after.)
What it does
First VLM pass (coarse) Run a cheaper, simpler check: “Is there any potential danger here?”
Time Gap If yes, collect more frames over a short time window to get more context.
Second VLM pass (fine / detailed) Run a more detailed prompt to answer deeper questions:
How big is the affected area?
What kind of damage is happening?
Are emergency workers already present?
Do we need to escalate the alert level?
Behavior
Most expensive pipeline in terms of compute and tokens.
Best used for few but critical events where you want very high confidence and rich information.
Great when you want “coarse filter → deep analysis” instead of running expensive VLM on everything.
Example
High-value safety or incident monitoring:
First stage: detect “possible hazard”.
Second stage: only for those cases, analyze impact, severity, people involved, etc.
Double confirm:
Use multiple frames / time points to reduce wrong decisions from a single unlucky frame.
Create a VLM Flow
Create a VLM Flow
Go to Navigator → VLM.
Click + VLM Flow Template to create a new flow.
In the canvas, drag and drop the components you need
Entry Node (Object Detection or VLM)
Timer
Sub VLM
Event Trigger
Connect the nodes to define how data moves through the pipeline and what conditions should trigger an event.

Reminder:
Before you build a Flow, make sure you already created at least one VLM Template. The VLM node in the Flow will reference these templates.
How to Use a VLM Flow in a Task
Open the Task Detail page for the task you want to configure.
Go to Settings → Event.
Click + Detection Event to add a new event rule.
In the VLM Detection section, choose one of the existing VLM Flows you created.
Save and adjust the event. The task will now use that VLM Flow to decide when to trigger events.

VLM Detection Result
After detecting a predefined event, the system automatically runs the VLM for further analysis.
Integrated Event Results:
The VLM analysis results are appended to the detected event for enriched reporting.
Example Results:
Weather: Sunny, partly cloudy.
Alert Level: 2.
Vehicle Types: Firetruck, car.
Scene Description: Downtown street with a firetruck stopped.

Example Use Case:
If a fire is detected, the VLM analyzes the scene to identify emergency personnel, vehicles, or abnormal conditions.
Summary
VLM Flow gives you a flexible way to mix fast detectors with smart VLM reasoning, so you only spend tokens when it really matters. You can start simple with detector-only Flows, then gradually add VLM, Time Gaps, or even two-stage VLM when you need deeper understanding or fewer false alarms.
In practice, the setup is straightforward:
Build your VLM Templates,
Design a VLM Flow by wiring together Object Detection, Time Gap, VLM, and logic,
Attach that Flow to a task’s Event settings.
Once that’s done, your tasks can move from basic “did something enter this zone?” alerts to rich, context-aware event detection that understands what’s actually happening in the scene.
Last updated