Camera (&AI) based Mocap

AI- and webcam-based motion capture is a form of markerless motion capture: instead of using suits with sensors or reflective markers, the system analyzes video images from one or more cameras and estimates the position of the human body using computer vision and machine learning. It can often run using ordinary webcams, smartphones, or consumer cameras.

How AI-Based Mocap Works

1. Video Capture

A webcam, smartphone, DSLR, or multiple cameras record a performer moving through space.
Depending on the system:

Single-camera systems estimate movement from one viewpoint.
Multi-camera systems reconstruct the body more accurately in 3D.

2. Pose Estimation

AI models analyze each video frame and detect key body points such as:

head
shoulders
elbows
wrists
hips
knees
ankles

These points are often called landmarks or keypoints. The AI has been trained on massive datasets of human movement, allowing it to recognize body posture even under imperfect lighting or partial occlusion.

3. Skeleton Reconstruction

The detected points are connected into a digital skeleton or “rig.”
The software estimates:

body pose
joint rotation
movement speed
orientation in 3D space

Advanced systems may also track:

fingers
face
eye direction
hand gestures

4. Retargeting

The motion data is transferred (“retargeted”) onto:

3D avatars
virtual characters
particle systems
lighting systems
sound engines
projections
robots or kinetic sculptures

This allows live movement to control digital media from recordings or in real time.

Example Tools:

MediaPipe	Open Pose	FreeMoCap
MediaPipe is an open-source framework developed by Google for real-time AI perception. It includes models for: body tracking hand tracking face tracking gesture recognition MediaPipe is widely used because it: works in browsers, Python, mobile apps, and game engines runs efficiently on consumer hardware supports real-time interaction is relatively easy to integrate into creative coding environments MediaPipe is often connected to: TouchDesigner Unity Unreal Engine Resolume Blender	OpenPose One of the foundational open-source AI pose estimation systems developed by Carnegie Mellon University. Widely used in: research interactive installations experimental audiovisual systems dance technology real-time It can track: full body hands fingers facial landmarks	FreeMoCapis an open-source motion capture system focused on accessible, research-based full-body capture. Unlike simple webcam pose estimation, FreeMoCap can use: multiple webcams synchronized cameras AI pose estimation biomechanical reconstruction It combines tools such as: MediaPipe OpenCV scientific motion analysis workflows The system triangulates body positions from multiple camera angles to reconstruct motion in 3D space more accurately than a single webcam setup. mostly for recordings, not realtime
Move.ai	DeepMotion	Rokoko Vision
Move.ai Uses AI and multiple cameras or phones for high-quality markerless mocap without suits. Often used in: virtual production indie filmmaking game animation previs workflows Mostly processed capture, some live workflows emerging	https://www.deepmotion.com/ Allows users to upload ordinary video footage and generate motion capture animation automatically using AI. Useful for: rapid prototyping avatar animation virtual influencers metaverse applications Free version tracks up to 2 people, not realtime	Rokoko Vision Originally known for inertial mocap suits, but now also supports AI webcam-based tracking. Popular in: indie animation live performance virtual characters VTubing interactive installations Free for up to 15 seconds

How to Rokoko

Vive Trackers

Vive Ultimate Trackers

Camera (&AI) based Mocap

How AI-Based Mocap Works

1. Video Capture

2. Pose Estimation

3. Skeleton Reconstruction

4. Retargeting

Example Tools:

MediaPipe

Open Pose

FreeMoCap

Move.ai

DeepMotion

Rokoko Vision