Skip to main content

Camera (&AI) based Mocap

AI- and webcam-based motion capture is a form of markerless motion capture: instead of using suits with sensors or reflective markers, the system analyzes video images from one or more cameras and estimates the position of the human body using computer vision and machine learning. It can often run using ordinary webcams, smartphones, or consumer cameras.

How AI-Based Mocap Works

1. Video Capture

A webcam, smartphone, DSLR, or multiple cameras record a performer moving through space.
Depending on the system:

  • Single-camera systems estimate movement from one viewpoint.
  • Multi-camera systems reconstruct the body more accurately in 3D.

2. Pose Estimation

AI models analyze each video frame and detect key body points such as:

  • head
  • shoulders
  • elbows
  • wrists
  • hips
  • knees
  • ankles

These points are often called landmarks or keypoints. The AI has been trained on massive datasets of human movement, allowing it to recognize body posture even under imperfect lighting or partial occlusion.

3. Skeleton Reconstruction

The detected points are connected into a digital skeleton or “rig.”
The software estimates:

  • body pose
  • joint rotation
  • movement speed
  • orientation in 3D space

Advanced systems may also track:

  • fingers
  • face
  • eye direction
  • hand gestures

4. Retargeting

The motion data is transferred (“retargeted”) onto:

  • 3D avatars
  • virtual characters
  • particle systems
  • lighting systems
  • sound engines
  • projections
  • robots or kinetic sculptures

This allows live movement to control digital media from recordings or in real time.


Example Tools:

MediaPipe

Open Pose

FreeMoCap

MediaPipe is an open-source framework developed by Google for real-time AI perception.

It includes models for:

  • body tracking
  • hand tracking
  • face tracking
  • gesture recognition

MediaPipe is widely used because it:

  • works in browsers, Python, mobile apps, and game engines
  • runs efficiently on consumer hardware
  • supports real-time interaction
  • is relatively easy to integrate into creative coding environments

MediaPipe is often connected to:

  • TouchDesigner
  • Unity
  • Unreal Engine
  • Resolume
  • Blender

OpenPose

One of the foundational open-source AI pose estimation systems developed by Carnegie Mellon University.

Widely used in:

  • research
  • interactive installations
  • experimental audiovisual systems
  • dance technology
  • real-time

It can track:

  • full body
  • hands
  • fingers
  • facial landmarks

FreeMoCapis an open-source motion capture system focused on accessible, research-based full-body capture.

Unlike simple webcam pose estimation,
FreeMoCap can use:

  • multiple webcams
  • synchronized cameras
  • AI pose estimation
  • biomechanical reconstruction

It combines tools such as:

  • MediaPipe
  • OpenCV
  • scientific motion analysis workflows

The system triangulates body positions from multiple camera angles to reconstruct motion in 3D space more accurately than a single webcam setup.

mostly for recordings, not realtime

Move.ai

DeepMotion

Rokoko Vision

Move.ai

Uses AI and multiple cameras or phones for high-quality markerless mocap without suits. Often used in:

  • virtual production
  • indie filmmaking
  • game animation
  • previs workflows

Mostly processed capture, some live workflows emerging

https://www.deepmotion.com/ 

Allows users to upload ordinary video footage and generate motion capture animation automatically using AI.

Useful for:

  • rapid prototyping
  • avatar animation
  • virtual influencers
  • metaverse applications

Free version tracks up to 2 people, not realtime

Rokoko Vision

Originally known for inertial mocap suits, but now also supports AI webcam-based tracking.

Popular in:

  • indie animation
  • live performance
  • virtual characters
  • VTubing
  • interactive installations

Free for up to 15 seconds