This guide contains a walkthrogh of how to use the tools in the metric depth video toolbox.
Select a video to work with. This should be a clip, preferably less than 6-7 minutes long (due to GPU memmory usage), and there should not be any cuts in the video. The video should preferably have the same zoom level over the hole clip. Due to GPU memmory constraints in Video-Depth-Anything the aspect ratio is best keept under 16:9. If you want to convert an entire movie split it up and do it scene by scene. There are tools that can cut down a movie to its scenes automatically.
I will use in_office_720p.mp4 with two individuals walking in a hallway obtained from pexels.com

On your machine Install metric depth video toolbox, see main README.
Generate a metric depth video from the source video
python video_metric_convert.py --color_video ~/in_office_720p.mp4
the result is a metric 3d video file called ~/in_office_720p.mp4_depth.mkv
View result in 3D:
python3.11 3d_view_depthfile.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40
You can skip step 2 - 6 if you just want a basic 3D stereo video.
Generate a mask video from the source video
./create_video_mask.sh -install
./create_video_mask.sh ~/in_office_720p.mp4
the result is a black and white mask video ~/in_office_720p.mp4_mask.mkv
Generate tracking points from the source video, more iterations = more points steps_bewtwen_track_init is the numer of frames betwen initation of new tracking points.
python track_points_in_video.py --color_video ~/in_office_720p.mp4 --nr_iterations 4 --steps_bewtwen_track_init 30
the result is a tracking file called ~/in_office_720p.mp4_tracking.json
Visualised here as tiny dots in the images:

Generate camera transformations from the depth and the source video. We make a guess of 30-50 deg and chose 40 deg. Later analysis showed that the real FOV is something like 42 deg. See RECOVER_FOV.md for more info on recovering the FOV of a video. If the video has paralax you can run sam_track_video.py with --optimize_intrinsic and it will give you a accurate FOV.
./install_mdvtoolbox.sh -megasam #takes a long time to install
python sam_track_video.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40
The result is a transformations file ~/in_office_720p.mp4_depth.mkv_transformations.json
and two debug videos file called _megasam.mkv
Triangulate points to get acurate depth readings and realigin the metric depth video to fit the more accurate depth readings.
python3.11 convert_metric_depth_video_to_other_format.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --track_file ~/in_office_720p.mp4_tracking.json --mask_video ~/in_office_720p.mp4_mask.mkv --show_scene_point_clouds --use_triangulated_points --tringulation_min_observations 20 --save_rescaled_depth --show_both_point_clouds --global_align
The result is a rescaled depth video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv
And two .ply files with point cloud data for the scene. One ply file with tirangualted points and one with averages of the depth map called in_office_720p.mp4_depth.mkv_avgmonodepth.ply, in_office_720p.mp4_depth.mkv_triangulated.ply.
You can run the script again with the new _rescled.mkv file to get a rescaled version of the _avgmonodepth.ply file.
python3.11 convert_metric_depth_video_to_other_format.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --track_file ~/in_office_720p.mp4_tracking.json --mask_video ~/in_office_720p.mp4_mask.mkv --show_scene_point_clouds
View the result where the two subjects are walking throgh a point cloud. Camera movment has been canceled out, edges removed, a background .ply file inserted and we have added visulisation for the camera view-frustrum. Finally we use the mask video to mask out the bakground so we only see the point cloud.
python3.11 3d_view_depthfile.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --transformation_file ~/in_office_720p.mp4_depth.mkv_transformations.json --background_ply ~/in_office_720p.mp4_depth.mkv_avgmonodepth.ply --remove_edges --show_camera
--x -0.1 --y 0 --z -3 --mask_video ~/in_office_720p.mp4_mask.mkv --invert_mask --background_ply ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_avgmonodepth.ply
Now that we have our depth video we can create stereo video.
Technically you dont need to do step 2 - 6, the end result will end up slightly better if you do them since you can the use the rescaled depth instead of the raw depth from the monocular depth model.
This renders one frame for the right eye then one for the left you can alter the pupillary distance with --pupillary_distance if you want, but the default of 63 mm is more or less industry standard and is good enogh for most people. We tell stereo_rerender.py to remove all edges as we will use infill to fill them in later, and we add a argument to add create a infill_mask file. If you dont want to add infill, just skip the last two arguments.
python3.11 stereo_rerender.py --color_video ~/in_office_720p.mp4 --depth_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv --yfov 40 --infill_mask --remove_edges
Raw side by side stero (black where there is paralax):

Side by side stero infill mask. Black where there is no infill needed and the normal of the projected edge where infill is needed: (From the normal finding the lower and higher side if the edge is trivial see mark_lower_side() in stereo_crafter_infill.py) (example image is from a diffrent frame)

Here we will use ML to add paralax infill using the tool stereo_crafter_infill.py Stereocrafter is based on stable defusion so is slow, be patient.
./install_mdvtoolbox.sh -stereocrafter #downloads and installs stereocrafter in the right folder
python3.11 stereo_crafter_infill.py --sbs_color_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_stereo.mkv --sbs_mask_video ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_infillmask.mkv
The result will be a video file named: ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_infilled.mkv
As is visible in the image below, stereocrafter does a pretty good job. If you look hard enogh you will find discrepancies. These discrepancies are however not that bad and since the focus tends to be on things that are not the infilled areas a viewer may not notice them.
Here we use ffmpeg to extract the original audio and add it back in the video as well as compressing the large uncompressed video file in to a video format/size that a modern VR headset or other stereo capable device can handle.
#Extract audio as a wave file (if you have audio. The example video actually does not have any audio)
ffmpeg -i ~/in_office_720p.mp4 ~/in_office_720p.wav
#Compress video for viewing on other devices and add back audio
ffmpeg -i ~/in_office_720p.mp4_depth.mkv_rescaled.mkv_stereo.mkv_infilled.mkv -i ~/in_office_720p.wav -c:v libx265 -crf 18 -tag:v hvc1 -pix_fmt yuv420p -c:a aac -map 0:v:0 -map 1:a:0 ~/in_office_720p_final_stero.mp4