Various types of Motion Tracking, a comparison

Kinect & depth camera's  
 
 
 Vive Ultimate  
 
 
 AI-based Motion Capture 
 
 
 
 
 Depth-sensing / markerless camera-based mocap 
   
 How it works: 
 
 
 RGB camera, infrared depth sensor 
 
 
 Tracks body skeletons in 3D space without any wearables. 
 
 
 Strengths: 
 
 
 A ll-in-one: depth + skeleton tracking 
 
 
 Works out-of-the-box with good body tracking 
 
 
 Widely used in interactive installations and prototyping 
 
 
 Limitations: 
 
 
 Limited range and lighting conditions 
 
 
 Skeleton tracking is less robust than pro systems 
 
 
 Requires a (windows) PC and specific SDKs  
 
 
 In art, Kinect is great for: 
 
 
 Interactive performances 
 
 
 Visuals that respond to body movement 
 
 
 Multi-user installations 
 
 
 See more info on 3d Depth camera's here 
 
 
 Inside-out inertial tracking with onboard cameras and IMUs (think of it as a hybrid between inertial and AI/vision-based tracking) 
   
 How it works: 
 
 
 Unlike earlier Vive Trackers that rely on external Lighthouse base stations, the Ultimate Trackers use two onboard cameras and IMUs to track their position in space independently. 
 
 
 They perform inside-out tracking, meaning they see the environment rather than relying on it. 
 
 
 Designed to work with Vive XR systems, but are also being adopted for standalone tracking in XR, motion capture, and performance. 
 
 
 Strengths: 
 
 
 No need for external base stations (fully wireless) 
 
 
 Much more portable and scalable 
 
 
 Accurate enough for many art/performance uses 
 
 
 Easier multi-tracker setups 
 
 
 Limitations: 
 
 
 Still relatively new — fewer integrations than legacy trackers 
 
 
 Limited support in open-source or non-Vive environments (for now) 
 
 
 Needs line of sight and light for the onboard cameras to function optimally 
 
 
 In art, Vive Ultimate is great for: 
 
 
 Untethered performer tracking 
 
 
 Object tracking in environments where base stations are impractical 
 
 
 Mobile or temporary installations where quick setup is needed 
 
 
 
 
 
   
 How it works: 
 
 
 Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement. 
 
 
 Examples include: 
 
 
 MediaPipe (Google): Real-time pose estimation in 2D or 3D 
 
 
 OpenPose : Widely used for body landmark detection 
 
 
 Move.ai: Advanced multi-camera AI mocap, often used with smartphones 
 
 
 DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data 
 
 
 
 
 Pros: 
 
 
 No suits or markers needed — just a (web)camera 
 
 
 Low cost, often free or open-source 
 
 
 Quick to set up, highly accessible for artists and educators 
 
 
 Can be embedded into web or mobile apps 
 
 
 Good for gesture-based interaction, web-based artworks, or low-budget capture 
 
 
 Cons: 
 
 
 Generally less accurate than optical or inertial systems 
 
 
 Often limited to 2D or rough 3D estimation 
 
 
 Struggles with occlusion, fast movement, or unusual poses 
 
 
 Limited support for fine detail (like fingers or subtle facial expressions)