AI-powered markerless motion capture from two synchronized camera views.
Two videos in → 3D human pose reconstruction → animation-ready BVH motion file out.
Many MoCap is a markerless AI motion-capture pipeline that reconstructs 3D human motion from two synchronized video streams recorded from different camera angles.
The system uses AI-based human pose estimation to detect body and hand landmarks in each camera view, then applies stereo computer vision to triangulate the detected 2D landmarks into 3D space. The reconstructed motion is converted into a hierarchical animation skeleton and exported as a BVH motion-capture file.
This project is built for low-cost motion capture experiments using normal cameras instead of expensive marker suits or optical mocap stages.
Camera 0 video Camera 1 video
│ │
├── AI pose detection ┤
│ │
└── 2D body + hand keypoints
│
▼
Stereo triangulation
│
▼
3D keypoint tracks
│
▼
Skeleton solving + smoothing
│
▼
BVH motion capture file
This is not generative AI.
Many MoCap uses computer-vision AI for human pose estimation.
The AI part of the pipeline detects human landmarks from video frames, including:
- full-body pose landmarks
- left-hand landmarks
- right-hand landmarks
- frame-by-frame motion tracking
The project combines:
| Area | Role |
|---|---|
| AI / Machine Learning | Human pose and hand landmark detection |
| Computer Vision | Two-camera reconstruction and camera projection |
| Geometry | DLT triangulation from two views |
| Animation Systems | Skeleton hierarchy, joint rotations, BVH export |
| Signal Processing | Median filtering to reduce jitter |
- Takes two synchronized videos from different angles.
- Detects body and hand landmarks using AI pose estimation.
- Tracks up to 75 landmarks per frame:
- 33 body pose landmarks
- 21 left-hand landmarks
- 21 right-hand landmarks
- Reconstructs 3D keypoints using calibrated stereo camera geometry.
- Uses Direct Linear Transform (DLT) for triangulation.
- Saves intermediate 2D and 3D keypoint data for debugging.
- Applies motion smoothing to reduce noisy/jittery landmarks.
- Builds a hierarchical human skeleton.
- Estimates bone lengths from captured motion.
- Computes joint rotations frame by frame.
- Exports the final motion as a BVH file.
- Includes a 3D visualizer for inspecting reconstructed motion.
Many single-camera pose systems only estimate 2D landmarks or approximate 3D pose.
Many MoCap uses two real camera views and camera calibration data to reconstruct a more meaningful 3D motion track.
This makes it useful for:
- indie animation
- game development
- virtual production
- AR / VR avatar motion
- human-motion analysis
- robotics and biomechanics experiments
- low-cost motion capture research
- animation prototyping for Blender, Unity, Unreal Engine, and similar tools
bodypose3d.py reads two video streams:
media/studio7/cam0.mp4
media/studio7/cam1.mp4
For each frame, it runs AI-based landmark detection on both views and extracts body and hand keypoints.
The current keypoint layout is:
0 - 32 : body pose landmarks
33 - 53 : left hand landmarks
54 - 74 : right hand landmarks
If a landmark is not detected, it is stored as:
2D: [-1, -1]
3D: [-1, -1, -1]
This allows the pipeline to continue even when some points are temporarily missing.
utils.py loads camera calibration files and builds projection matrices for both cameras.
Expected calibration structure:
camera_parameters/
└── studio7/
├── camera0_intrinsics.dat
├── camera1_intrinsics.dat
├── world_to_camera0_rot_trans.dat
└── world_to_camera1_rot_trans.dat
These files describe each camera's intrinsic parameters and its position/orientation in the capture setup.
For every matching landmark pair from the two camera views, the pipeline triangulates a 3D position using DLT.
Generated files:
kpts_cam0.dat # 2D keypoints from camera 0
kpts_cam1.dat # 2D keypoints from camera 1
kpts_3d.dat # reconstructed 3D keypoints
These intermediate files make the system easier to debug and improve.
show_3d_pose.py loads the reconstructed 3D keypoints and visualizes the skeleton motion in a 3D plot.
Run:
python show_3d_pose.pyBVHmaker4.py converts reconstructed 3D keypoints into an animation skeleton and exports a BVH file.
The exporter:
- maps keypoint indices to named joints
- adds virtual
HIPandNECKjoints - defines the body hierarchy
- calculates bone lengths
- creates a normalized base skeleton
- computes root motion
- computes per-joint rotations
- writes the
HIERARCHYandMOTIONsections of a BVH file
Run:
python BVHmaker4.py kpts_3d.datDefault output:
Bebinam_output.bvh
The generated .bvh file can be imported into tools such as:
- Blender
- Unity
- Unreal Engine
- MotionBuilder
- Maya
- other BVH-compatible animation tools
many_mocap/
├── bodypose3d.py # Two-view AI pose detection + 3D reconstruction
├── BVHmaker4.py # 3D keypoints to BVH motion export
├── BVHmaker3.py # Earlier BVH conversion experiment
├── show_3d_pose.py # 3D pose visualization
├── utils.py # Camera projection, DLT, rotations, file IO
├── kpts_cam0.dat # Camera 0 detected 2D keypoints
├── kpts_cam1.dat # Camera 1 detected 2D keypoints
├── kpts_3d.dat # Reconstructed 3D keypoints
├── kpts_3d_studio7.dat # Studio capture 3D keypoint data
├── GrandPapa.bvh # Sample BVH output
├── GrandMama.bvh # Sample BVH output
└── GrandPapa_refigned.bvh # Refined BVH output
Create a virtual environment:
python -m venv .venv
source .venv/bin/activateOn Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install dependencies:
pip install numpy scipy opencv-python mediapipe matplotlibPlace your videos here:
media/studio7/cam0.mp4
media/studio7/cam1.mp4
The videos should capture the same motion from two different viewpoints.
Place calibration files here:
camera_parameters/studio7/
Required files:
camera0_intrinsics.dat
camera1_intrinsics.dat
world_to_camera0_rot_trans.dat
world_to_camera1_rot_trans.dat
python bodypose3d.pyThis creates:
kpts_cam0.dat
kpts_cam1.dat
kpts_3d.dat
You can also use webcam IDs:
python bodypose3d.py 0 1python show_3d_pose.pypython BVHmaker4.py kpts_3d.datOutput:
Bebinam_output.bvh
- AI pose estimation for body and hands
- Markerless motion capture without suits or body markers
- Two-view stereo reconstruction
- Camera calibration based projection matrices
- DLT triangulation
- 75-point body + hand landmark representation
- 3D skeleton visualization
- Median filtering for jitter reduction
- Bone-length estimation
- Hierarchical skeleton solving
- BVH motion export
This repository is a working research/prototype stage of a two-camera markerless motion-capture system.
The core concept is implemented:
two videos → AI keypoints → 3D reconstruction → skeleton motion → BVH
Future work can improve the developer experience, calibration flow, retargeting presets, and production packaging.
- Add a single CLI command for the complete pipeline
- Add
requirements.txt - Add sample preview GIFs
- Add camera calibration helper scripts
- Add Blender import / retargeting guide
- Add Unity retargeting guide
- Add confidence-based keypoint filtering
- Add interpolation for missing landmarks
- Add automatic video synchronization helpers
- Add Docker or reproducible environment setup
- Add cleaner module structure for production use
motion-capture
markerless-mocap
ai
computer-vision
human-pose-estimation
mediapipe
opencv
stereo-vision
3d-reconstruction
bvh
animation
python
virtual-production
Built by Ehsan Moradi as part of research and engineering work in AI, computer vision, 3D reconstruction, real-time systems, and animation pipelines.