Vision & ML

Point a camera at the world and let DNA read it — bodies, depth, objects, motion — then drive anything in your graph from what it sees.

DNA can run on-device machine-learning models on a live video feed and turn the results into ordinary graph data: points, masks, numbers. A dancer's joints become points you can attach geometry to. A depth read becomes a Field you can displace with. A detected object becomes a box you can trigger from. Wire a input.camera (or any video source) into a vision node and the rest of your graph just sees data.

These models run off to the side, never on the cook thread, so a slow prediction can't stall playback. The live preview trails the camera by a frame or two at most — invisible at performance rates.

What you can read

Each vision node takes a raster (your camera feed) and gives you structured data to build with.

The point and mask outputs are normal DNA data. Pose landmarks flow into Scatter, Iterator, simulations and Expressions exactly like any other Collection — there's nothing special to learn once the data is in the graph.

Trackers — following things over time

The readers above look at one frame at a time. Trackers remember what they saw and follow it across frames, so an object keeps the same identity as it moves.

Each tracker has an enable control. Switching it off doesn't freeze on the last result — it clears the tracker's memory and starts clean next time you switch it on. (This is different from Bypass, which just passes the input through.)

Platform notes

DNA picks the fastest available path for your machine automatically — you don't choose a backend.

Some readers use Apple's on-device Vision framework and the Apple Neural Engine: Vision Pose, Vision Segment, plus on-device text and scene analysis, and the Vision tracker. These are macOS-only. On other platforms, use the cross-platform Pose and segmentation models instead — they cover the same ground.

Depth has an extra fast path on macOS that runs even quicker. Set the depth node's quality to Fast to use it; other quality settings fall back to the cross-platform model.

The cross-platform models aren't bundled with DNA — the first time you use one, it downloads its weights (verified for integrity) and caches them on disk. That first cook can pause while the download finishes; everything after is instant. The Apple Vision readers need no download.

Vision nodes need a live video source. The Web Player has no native camera input, so vision and ML workflows run in the desktop app and the DNA Player, not in the browser. See Inputs for what each platform supports.

The optical-flow node currently emits a flat, no-motion frame rather than estimating real movement. Treat it as not-yet-shipped and reach for the point tracker if you need motion.

A quick recipe

Camera → Pose → Scatter geometry on the joints → render. As the performer moves, the points track their body and your geometry follows — a glowing skeleton, particles bursting from the hands, text pinned to the face. Add a MIDI control to fade it live and you have a performance-ready scene.

See also