# Various types of Motion Tracking, a comparison
##### **Kinect & depth camera's** ##### **Vive Ultimate** ##### **AI-based Motion Capture**
Depth-sensing / markerless camera-based mocap **How it works:** - RGB camera, infrared depth sensor - Tracks body skeletons in 3D space without any wearables. **Strengths:** - **A**ll-in-one: depth + skeleton tracking - Works out-of-the-box with good body tracking - Widely used in interactive installations and prototyping **Limitations:** - Limited range and lighting conditions - Skeleton tracking is less robust than pro systems - Requires a (windows) PC and specific SDKs **In art, Kinect is great for:** - Interactive performances - Visuals that respond to body movement - Multi-user installations [**See more info on 3d Depth camera's here**](https://bookstack.hku.nl/books/3d-depth-cameras-motion-tracking "3d") Inside-out inertial tracking with onboard cameras and IMUs (think of it as a hybrid between inertial and AI/vision-based tracking) **How it works:** - Unlike earlier Vive Trackers that rely on external Lighthouse base stations, the Ultimate Trackers use two onboard cameras and IMUs to track their position in space independently. - They perform inside-out tracking, meaning they see the environment rather than relying on it. - Designed to work with Vive XR systems, but are also being adopted for standalone tracking in XR, motion capture, and performance. **Strengths:** - No need for external base stations (fully wireless) - Much more portable and scalable - Accurate enough for many art/performance uses - Easier multi-tracker setups **Limitations:** - Still relatively new — fewer integrations than legacy trackers - Limited support in open-source or non-Vive environments (for now) - Needs line of sight and light for the onboard cameras to function optimally **In art, Vive Ultimate is great for:** - Untethered performer tracking - Object tracking in environments where base stations are impractical - Mobile or temporary installations where quick setup is needed **How it works:** - Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement. - Examples include: - MediaPipe (Google): Real-time pose estimation in 2D or 3D - OpenPose : Widely used for body landmark detection - Move.ai: Advanced multi-camera AI mocap, often used with smartphones - DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data **Pros:** - No suits or markers needed — just a (web)camera - Low cost, often free or open-source - Quick to set up, highly accessible for artists and educators - Can be embedded into web or mobile apps - Good for gesture-based interaction, web-based artworks, or low-budget capture **Cons:** - Generally less accurate than optical or inertial systems - Often limited to 2D or rough 3D estimation - Struggles with occlusion, fast movement, or unusual poses - Limited support for fine detail (like fingers or subtle facial expressions)