Skip to main content

Various types of Motion Tracking

Kinect & depth camera's 
Vive Ultimate 
AI-based Motion Capture

Depth-sensing / markerless camera-based mocap

 

How it works:

  • RGB camera, infrared depth sensor

  • Tracks body skeletons in 3D space without any wearables.

Strengths:

  • All-in-one: depth + skeleton tracking

  • Works out-of-the-box with good body tracking

  • Widely used in interactive installations and prototyping

Limitations:

  • Limited range and lighting conditions

  • Skeleton tracking is less robust than pro systems

  • Requires a (windows) PC and specific SDKs 

In art, Kinect is great for:

  • Interactive performances

  • Visuals that respond to body movement

  • Multi-user installations

See more info on 3d Depth camera's here

Inside-out inertial tracking with onboard cameras and IMUs
(think of it as a hybrid between inertial and AI/vision-based tracking)

 

How it works:

  • Unlike earlier Vive Trackers that rely on external Lighthouse base stations, the Ultimate Trackers use two onboard cameras and IMUs to track their position in space independently.

  • They perform inside-out tracking, meaning they see the environment rather than relying on it.

  • Designed to work with Vive XR systems, but are also being adopted for standalone tracking in XR, motion capture, and performance.

Strengths:

  • No need for external base stations (fully wireless)

  • Much more portable and scalable

  • Accurate enough for many art/performance uses

  • Easier multi-tracker setups

Limitations:

  • Still relatively new — fewer integrations than legacy trackers

  • Limited support in open-source or non-Vive environments (for now)

  • Needs line of sight and light for the onboard cameras to function optimally

In art, Vive Ultimate is great for:

  • Untethered performer tracking

  • Object tracking in environments where base stations are impractical

  • Mobile or temporary installations where quick setup is needed


 

How it works:

  • Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement.

  • Examples include:

    • MediaPipe (Google): Real-time pose estimation in 2D or 3D

    • OpenPose : Widely used for body landmark detection

    • Move.ai: Advanced multi-camera AI mocap, often used with smartphones

    • DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data

Pros:

  • No suits or markers needed — just a (web)camera

  • Low cost, often free or open-source

  • Quick to set up, highly accessible for artists and educators

  • Can be embedded into web or mobile apps

  • Good for gesture-based interaction, web-based artworks, or low-budget capture

Cons:

  • Generally less accurate than optical or inertial systems

  • Often limited to 2D or rough 3D estimation

  • Struggles with occlusion, fast movement, or unusual poses

  • Limited support for fine detail (like fingers or subtle facial expressions)