Vision & ML
Point a camera at the world and let DNA read it — bodies, depth, objects, motion — then drive anything in your graph from what it sees.
DNA can run on-device machine-learning models on a live video feed and turn the results into ordinary graph data: points, masks, numbers. A dancer's joints become points you can attach geometry to. A depth read becomes a Field you can displace with. A detected object becomes a box you can trigger from. Wire a input.camera (or any video source) into a vision node and the rest of your graph just sees data.
These models run off to the side, never on the cook thread, so a slow prediction can't stall playback. The live preview trails the camera by a frame or two at most — invisible at performance rates.
What you can read
Each vision node takes a raster (your camera feed) and gives you structured data to build with.
Pose — full-body tracking. Detects up to 133 landmarks across body, both hands, face, and feet, as a Collection of points with confidence, region, and joint name on each. Attach geometry, drive simulations, or measure angles between joints.
Depth — estimates how far each pixel is from the camera, as a grayscale raster / Field. Great for displacement, fog, parallax, or pushing 2D footage into 3D.
Object detection — finds objects in frame and returns labelled boxes with confidence. Use the boxes to position things or to fire a Trigger when something appears.
Segmentation — separates subject from background as a mask Field. Cut a person out of their surroundings, composite them onto a new scene, or use the mask to gate an effect.
Surface normals — estimates the facing direction of every pixel, handy for relighting flat footage.
The point and mask outputs are normal DNA data. Pose landmarks flow into Scatter, Iterator, simulations and Expressions exactly like any other Collection — there's nothing special to learn once the data is in the graph.
Trackers — following things over time
The readers above look at one frame at a time. Trackers remember what they saw and follow it across frames, so an object keeps the same identity as it moves.
Point tracker — pick points (in the viewport or from an upstream pin) and it follows them through the footage over a sliding window of frames.
Blob tracker — segments the subject, finds separate blobs, and keeps each one's identity frame to frame, giving you centroids, velocities, contours, and a mask.
Vision tracker — follows an object you've boxed, frame to frame.
Each tracker has an enable control. Switching it off doesn't freeze on the last result — it clears the tracker's memory and starts clean next time you switch it on. (This is different from Bypass, which just passes the input through.)
Platform notes
DNA picks the fastest available path for your machine automatically — you don't choose a backend.
Some readers use Apple's on-device Vision framework and the Apple Neural Engine: Vision Pose, Vision Segment, plus on-device text and scene analysis, and the Vision tracker. These are macOS-only. On other platforms, use the cross-platform Pose and segmentation models instead — they cover the same ground.
Depth has an extra fast path on macOS that runs even quicker. Set the depth node's quality to Fast to use it; other quality settings fall back to the cross-platform model.
The cross-platform models aren't bundled with DNA — the first time you use one, it downloads its weights (verified for integrity) and caches them on disk. That first cook can pause while the download finishes; everything after is instant. The Apple Vision readers need no download.
Vision nodes need a live video source. The Web Player has no native camera input, so vision and ML workflows run in the desktop app and the DNA Player, not in the browser. See Inputs for what each platform supports.
The optical-flow node currently emits a flat, no-motion frame rather than estimating real movement. Treat it as not-yet-shipped and reach for the point tracker if you need motion.
A quick recipe
Camera → Pose → Scatter geometry on the joints → render. As the performer moves, the points track their body and your geometry follows — a glowing skeleton, particles bursting from the hands, text pinned to the face. Add a MIDI control to fade it live and you have a performance-ready scene.
See also
Inputs — cameras and every other live source
Live performance — the live-performance overview
Points — what pose landmarks are made of
Fields — depth and masks as Fields
Rasters (images) — working with video frames
Particles — driving particles from tracked motion