Various types of MoCap, a comparison

MoCap, short for motion capture, is a technique used to digitally record movement. In art, it's a tool that allows creators to translate physical gestures into digital data that can be used to generate or manipulate digital work.

What is it?MoCap?

Motion capture often involves placing sensors or markers on a person’s body (or using camera-based systems) to track movement in 3D space. This data is then sent to software that interprets the motion and applies it to a digital avatar, 3D model, or visual system.

Examples of use:

Live performance & dance: people wearing mocap suits can control visuals, sound or avatars in real time, turning their movement into an interactive experience.
Digital puppetry: Use MoCap to animate virtual characters that mirror their movements, creating storytelling pieces or interactive experiences.
Film & animation: MoCap can be used to create detailed, lifelike animation without manual keyframing.
Interactive installations: Viewers’ movements can be captured and visualized, making them part of the artwork.
Experimental art & research: MoCap enables artists to explore themes like embodiment, identity, or data aesthetics by abstracting or transforming movement.

Why artists use it

Expressiveness: It captures the nuance of real human motion.
Efficiency: Complex animations can be recorded rather than animated by hand.
Interactivity: MoCap allows for responsive, real-time work—art that moves because you move.
Hybrid creation: It bridges physical and digital realms, letting artists craft performances or immersive visuals that live in both.

There are various types of Mocap:

Optical Motion Capture

Inertial Motion Capture

AI-based Motion Capture

How it works:

Uses cameras (usually infrared) to track reflective markers or colored dots placed on the performer.
Multiple cameras triangulate the position of each marker in 3D space.

Variants:

Passive optical (uses reflective markers + infrared light, e.g., Vicon or OptiTrack)
Active optical (uses LED markers that emit their own light)

Pros:

Very accurate spatial tracking
Excellent for large-scale and high-precision capture (e.g., dance, film, games)
Good for multiple actors and full-body motion

Cons:

Requires a studio setup with multiple calibrated cameras
Sensitive to occlusion (when a marker is hidden from view)
Expensive

How it works:

Uses IMUs (Inertial Measurement Units), which are small sensors containing gyroscopes and accelerometers.
Sensors are worn in a suit (e.g., Rokoko, Xsens) and measure rotation and acceleration to calculate joint angles and movement.

Variants:

Can be combined with Optical Mocap for precision.

Pros:

Portable: Can be used anywhere, indoors or outdoors
Not affected by lighting or line-of-sight
Great for live performance, field work, and small studios

Cons:

Less accurate in tracking absolute position (especially in large spaces)
Susceptible to drift over time (though software can correct this)
Locomotion is harder to grasp, like jumping, climbing etc.
Rokoko: frustrating glitches & subscription needed for realtime.

~~How it works:~~

~~Uses a single camera (or a small number of cameras) and AI algorithms to detect and track body, face, and hand movement.~~

~~Examples include:~~
- ~~MediaPipe (Google): Real-time pose estimation in 2D or 3D~~
- ~~OpenPose : Widely used for body landmark detection~~
- ~~Move.ai: Advanced multi-camera AI mocap, often used with smartphones~~
- ~~DepthAI / OAK-D/ Zedi: Cameras with built-in AI processors that provide depth and pose data~~

~~Pros:~~

~~No suits or markers needed — just a (web)camera~~

~~Low cost, often free or open-source~~

~~Quick to set up, highly accessible for artists and educators~~

~~Can be embedded into web or mobile apps~~

~~Good for gesture-based interaction, web-based artworks, or low-budget capture~~

~~Cons:~~

~~Generally less accurate than optical or inertial systems~~

~~Often limited to 2D or rough 3D estimation~~

~~Struggles with occlusion, fast movement, or unusual poses~~

~~Limited support for fine detail (like fingers or subtle facial expressions)~~

Some systems combine optical + inertial tracking (e.g., combining Xsens suit with camera tracking or facial capture or Rokoko, iphone & Coil ), giving the best of both worlds—especially for virtual production and advanced installations.