# Various types of Motion Tracking, a comparison

<table border="1" id="bkmrk-optical-motion-captu" style="border-collapse: collapse; width: 113.333%;"><colgroup><col style="width: 34.3634%;"></col><col style="width: 30.9023%;"></col><col style="width: 34.7342%;"></col></colgroup><tbody><tr><td>##### **Kinect &amp; depth camera's** 

</td><td>##### **Vive Ultimate** 

</td><td>##### **AI-based Motion Capture**

</td></tr><tr><td>Depth-sensing / markerless camera-based mocap

**How it works:**

- RGB camera, infrared depth sensor
- Tracks body skeletons in 3D space without any wearables.

**Strengths:**

- **A**ll-in-one: depth + skeleton tracking
- Works out-of-the-box with good body tracking
- Widely used in interactive installations and prototyping

**Limitations:**

- Limited range and lighting conditions
- Skeleton tracking is less robust than pro systems
- Requires a (windows) PC and specific SDKs

**In art, Kinect is great for:**

- Interactive performances
- Visuals that respond to body movement
- Multi-user installations

[**See more info on 3d Depth camera's here**](https://bookstack.hku.nl/books/3d-depth-cameras-motion-tracking "3d")

</td><td>Inside-out inertial tracking with onboard cameras and IMUs  
(think of it as a hybrid between inertial and AI/vision-based tracking)

**How it works:**

- Unlike earlier Vive Trackers that rely on external Lighthouse base stations, the Ultimate Trackers use two onboard cameras and IMUs to track their position in space independently.
- They perform inside-out tracking, meaning they see the environment rather than relying on it.
- Designed to work with Vive XR systems, but are also being adopted for standalone tracking in XR, motion capture, and performance.

**Strengths:**

- No need for external base stations (fully wireless)
- Much more portable and scalable
- Accurate enough for many art/performance uses
- Easier multi-tracker setups

**Limitations:**

- Still relatively new — fewer integrations than legacy trackers
- Limited support in open-source or non-Vive environments (for now)
- Needs line of sight and light for the onboard cameras to function optimally

**In art, Vive Ultimate is great for:**

- Untethered performer tracking
- Object tracking in environments where base stations are impractical
- Mobile or temporary installations where quick setup is needed

</td><td>**How it works:**

- Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement.
- Examples include:
    
    
    - MediaPipe (Google): Real-time pose estimation in 2D or 3D
    - OpenPose : Widely used for body landmark detection
    - Move.ai: Advanced multi-camera AI mocap, often used with smartphones
    - DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data

**Pros:**

- No suits or markers needed — just a (web)camera
- Low cost, often free or open-source
- Quick to set up, highly accessible for artists and educators
- Can be embedded into web or mobile apps
- Good for gesture-based interaction, web-based artworks, or low-budget capture

**Cons:**

- Generally less accurate than optical or inertial systems
- Often limited to 2D or rough 3D estimation
- Struggles with occlusion, fast movement, or unusual poses
- Limited support for fine detail (like fingers or subtle facial expressions)

</td></tr></tbody></table
##### Kinect & depth camera's	##### Vive Ultimate	##### AI-based Motion Capture
Depth-sensing / markerless camera-based mocap How it works: - RGB camera, infrared depth sensor - Tracks body skeletons in 3D space without any wearables. Strengths: - All-in-one: depth + skeleton tracking - Works out-of-the-box with good body tracking - Widely used in interactive installations and prototyping Limitations: - Limited range and lighting conditions - Skeleton tracking is less robust than pro systems - Requires a (windows) PC and specific SDKs In art, Kinect is great for: - Interactive performances - Visuals that respond to body movement - Multi-user installations [See more info on 3d Depth camera's here](https://bookstack.hku.nl/books/3d-depth-cameras-motion-tracking "3d")	Inside-out inertial tracking with onboard cameras and IMUs (think of it as a hybrid between inertial and AI/vision-based tracking) How it works: - Unlike earlier Vive Trackers that rely on external Lighthouse base stations, the Ultimate Trackers use two onboard cameras and IMUs to track their position in space independently. - They perform inside-out tracking, meaning they see the environment rather than relying on it. - Designed to work with Vive XR systems, but are also being adopted for standalone tracking in XR, motion capture, and performance. Strengths: - No need for external base stations (fully wireless) - Much more portable and scalable - Accurate enough for many art/performance uses - Easier multi-tracker setups Limitations: - Still relatively new — fewer integrations than legacy trackers - Limited support in open-source or non-Vive environments (for now) - Needs line of sight and light for the onboard cameras to function optimally In art, Vive Ultimate is great for: - Untethered performer tracking - Object tracking in environments where base stations are impractical - Mobile or temporary installations where quick setup is needed	How it works: - Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement. - Examples include: - MediaPipe (Google): Real-time pose estimation in 2D or 3D - OpenPose : Widely used for body landmark detection - Move.ai: Advanced multi-camera AI mocap, often used with smartphones - DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data Pros: - No suits or markers needed — just a (web)camera - Low cost, often free or open-source - Quick to set up, highly accessible for artists and educators - Can be embedded into web or mobile apps - Good for gesture-based interaction, web-based artworks, or low-budget capture Cons: - Generally less accurate than optical or inertial systems - Often limited to 2D or rough 3D estimation - Struggles with occlusion, fast movement, or unusual poses - Limited support for fine detail (like fingers or subtle facial expressions)