Skip to content

ehsanwwe/many_mocap

Repository files navigation

Many MoCap

AI-powered markerless motion capture from two synchronized camera views.

Two videos in → 3D human pose reconstruction → animation-ready BVH motion file out.

Python AI Pose Estimation Computer Vision Motion Capture


Overview

Many MoCap is a markerless AI motion-capture pipeline that reconstructs 3D human motion from two synchronized video streams recorded from different camera angles.

The system uses AI-based human pose estimation to detect body and hand landmarks in each camera view, then applies stereo computer vision to triangulate the detected 2D landmarks into 3D space. The reconstructed motion is converted into a hierarchical animation skeleton and exported as a BVH motion-capture file.

This project is built for low-cost motion capture experiments using normal cameras instead of expensive marker suits or optical mocap stages.

Camera 0 video        Camera 1 video
     │                     │
     ├── AI pose detection ┤
     │                     │
     └── 2D body + hand keypoints
               │
               ▼
      Stereo triangulation
               │
               ▼
        3D keypoint tracks
               │
               ▼
   Skeleton solving + smoothing
               │
               ▼
        BVH motion capture file

What kind of AI is used?

This is not generative AI.
Many MoCap uses computer-vision AI for human pose estimation.

The AI part of the pipeline detects human landmarks from video frames, including:

  • full-body pose landmarks
  • left-hand landmarks
  • right-hand landmarks
  • frame-by-frame motion tracking

The project combines:

Area Role
AI / Machine Learning Human pose and hand landmark detection
Computer Vision Two-camera reconstruction and camera projection
Geometry DLT triangulation from two views
Animation Systems Skeleton hierarchy, joint rotations, BVH export
Signal Processing Median filtering to reduce jitter

Key features

  • Takes two synchronized videos from different angles.
  • Detects body and hand landmarks using AI pose estimation.
  • Tracks up to 75 landmarks per frame:
    • 33 body pose landmarks
    • 21 left-hand landmarks
    • 21 right-hand landmarks
  • Reconstructs 3D keypoints using calibrated stereo camera geometry.
  • Uses Direct Linear Transform (DLT) for triangulation.
  • Saves intermediate 2D and 3D keypoint data for debugging.
  • Applies motion smoothing to reduce noisy/jittery landmarks.
  • Builds a hierarchical human skeleton.
  • Estimates bone lengths from captured motion.
  • Computes joint rotations frame by frame.
  • Exports the final motion as a BVH file.
  • Includes a 3D visualizer for inspecting reconstructed motion.

Why this project is important

Many single-camera pose systems only estimate 2D landmarks or approximate 3D pose.
Many MoCap uses two real camera views and camera calibration data to reconstruct a more meaningful 3D motion track.

This makes it useful for:

  • indie animation
  • game development
  • virtual production
  • AR / VR avatar motion
  • human-motion analysis
  • robotics and biomechanics experiments
  • low-cost motion capture research
  • animation prototyping for Blender, Unity, Unreal Engine, and similar tools

Pipeline

1. Two-view AI landmark detection

bodypose3d.py reads two video streams:

media/studio7/cam0.mp4
media/studio7/cam1.mp4

For each frame, it runs AI-based landmark detection on both views and extracts body and hand keypoints.

The current keypoint layout is:

0  - 32 : body pose landmarks
33 - 53 : left hand landmarks
54 - 74 : right hand landmarks

If a landmark is not detected, it is stored as:

2D: [-1, -1]
3D: [-1, -1, -1]

This allows the pipeline to continue even when some points are temporarily missing.


2. Camera projection

utils.py loads camera calibration files and builds projection matrices for both cameras.

Expected calibration structure:

camera_parameters/
└── studio7/
    ├── camera0_intrinsics.dat
    ├── camera1_intrinsics.dat
    ├── world_to_camera0_rot_trans.dat
    └── world_to_camera1_rot_trans.dat

These files describe each camera's intrinsic parameters and its position/orientation in the capture setup.


3. 3D reconstruction

For every matching landmark pair from the two camera views, the pipeline triangulates a 3D position using DLT.

Generated files:

kpts_cam0.dat   # 2D keypoints from camera 0
kpts_cam1.dat   # 2D keypoints from camera 1
kpts_3d.dat     # reconstructed 3D keypoints

These intermediate files make the system easier to debug and improve.


4. 3D visualization

show_3d_pose.py loads the reconstructed 3D keypoints and visualizes the skeleton motion in a 3D plot.

Run:

python show_3d_pose.py

5. BVH export

BVHmaker4.py converts reconstructed 3D keypoints into an animation skeleton and exports a BVH file.

The exporter:

  • maps keypoint indices to named joints
  • adds virtual HIP and NECK joints
  • defines the body hierarchy
  • calculates bone lengths
  • creates a normalized base skeleton
  • computes root motion
  • computes per-joint rotations
  • writes the HIERARCHY and MOTION sections of a BVH file

Run:

python BVHmaker4.py kpts_3d.dat

Default output:

Bebinam_output.bvh

The generated .bvh file can be imported into tools such as:

  • Blender
  • Unity
  • Unreal Engine
  • MotionBuilder
  • Maya
  • other BVH-compatible animation tools

Repository structure

many_mocap/
├── bodypose3d.py              # Two-view AI pose detection + 3D reconstruction
├── BVHmaker4.py               # 3D keypoints to BVH motion export
├── BVHmaker3.py               # Earlier BVH conversion experiment
├── show_3d_pose.py            # 3D pose visualization
├── utils.py                   # Camera projection, DLT, rotations, file IO
├── kpts_cam0.dat              # Camera 0 detected 2D keypoints
├── kpts_cam1.dat              # Camera 1 detected 2D keypoints
├── kpts_3d.dat                # Reconstructed 3D keypoints
├── kpts_3d_studio7.dat        # Studio capture 3D keypoint data
├── GrandPapa.bvh              # Sample BVH output
├── GrandMama.bvh              # Sample BVH output
└── GrandPapa_refigned.bvh     # Refined BVH output

Installation

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install dependencies:

pip install numpy scipy opencv-python mediapipe matplotlib

Usage

1. Prepare two synchronized videos

Place your videos here:

media/studio7/cam0.mp4
media/studio7/cam1.mp4

The videos should capture the same motion from two different viewpoints.


2. Add camera calibration files

Place calibration files here:

camera_parameters/studio7/

Required files:

camera0_intrinsics.dat
camera1_intrinsics.dat
world_to_camera0_rot_trans.dat
world_to_camera1_rot_trans.dat

3. Generate 3D keypoints

python bodypose3d.py

This creates:

kpts_cam0.dat
kpts_cam1.dat
kpts_3d.dat

You can also use webcam IDs:

python bodypose3d.py 0 1

4. Preview the reconstructed 3D pose

python show_3d_pose.py

5. Export BVH motion

python BVHmaker4.py kpts_3d.dat

Output:

Bebinam_output.bvh

Technical highlights

  • AI pose estimation for body and hands
  • Markerless motion capture without suits or body markers
  • Two-view stereo reconstruction
  • Camera calibration based projection matrices
  • DLT triangulation
  • 75-point body + hand landmark representation
  • 3D skeleton visualization
  • Median filtering for jitter reduction
  • Bone-length estimation
  • Hierarchical skeleton solving
  • BVH motion export

Current status

This repository is a working research/prototype stage of a two-camera markerless motion-capture system.

The core concept is implemented:

two videos → AI keypoints → 3D reconstruction → skeleton motion → BVH

Future work can improve the developer experience, calibration flow, retargeting presets, and production packaging.


Roadmap

  • Add a single CLI command for the complete pipeline
  • Add requirements.txt
  • Add sample preview GIFs
  • Add camera calibration helper scripts
  • Add Blender import / retargeting guide
  • Add Unity retargeting guide
  • Add confidence-based keypoint filtering
  • Add interpolation for missing landmarks
  • Add automatic video synchronization helpers
  • Add Docker or reproducible environment setup
  • Add cleaner module structure for production use

Suggested GitHub topics

motion-capture
markerless-mocap
ai
computer-vision
human-pose-estimation
mediapipe
opencv
stereo-vision
3d-reconstruction
bvh
animation
python
virtual-production

Author

Built by Ehsan Moradi as part of research and engineering work in AI, computer vision, 3D reconstruction, real-time systems, and animation pipelines.

About

AI-powered markerless motion capture from two synchronized camera views. Detects body and hand landmarks, reconstructs 3D motion with stereo vision, and exports animation-ready BVH files.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages