Go to the Co-Fusion repository and download one or more of the sequences. For the synthetic sequences, we used the *.tar.gz archives.
The real-world scenes are provided in the *.klg format, so you need to convert them first. A tool for doing this can be found here. After building this tool with CMake (together with the rest of the repository), we used
convert_klg -i <path/to/klg> -o <output/path> -frames -sub -pngto generate folders for the real-world scenes. Please check
build/bin/convert_klg -h for what the options do. We additionally renamed the
color subfolder to colour to match the synthetic sequences.
In order to preprocess masks, run
preprocess_masks -d <dataset/path> [--depthdir depth_noise] [-c <path/to/config>] -m <output/mask/path>The program output will contain a lot of "Buffering failure." outputs. This is normal since we do not process every frame and just skip over non-mask frames. Thus, the buffering thread cannot always keep up pre-loading the next frame.
The --depthdir argument is only needed for the synthetic scenes since the
depth subfolder has this name.
The -c option is needed e.g. for the robust tracking experiment with the
TUM-RGBD-scenes where Mask R-CNN is instructed to only detect persons.
Go to the build folder of EM-Fusion and run the EM-Fusion executable.
This is an example running EM-Fusion with preprocessed masks (the car4-full
dataset extracted to ~/co-fusion-datasets/car4-full/ and preprocessed masks in
a subfolder called preproc_masks). The output will contain more of the
TensorFlow output seen in preprocess_masks if Mask R-CNN is run on-the-fly
from EM-Fusion.
./EM-Fusion -d ~/co-fusion-datasets/car4-full/ --depthdir depth_noise \
-c ../config/default.cfg \
-m ~/co-fusion-datasets/car4-full/preproc_masks/ \
--3d-vis
Reading from /home/streckus/co-fusion-datasets/car4-full//
Buffer thread started with id: 139871750289152
Created new Object with ID: 1
Created new Object with ID: 2
Created new Object with ID: 3
Created new Object with ID: 4
Deleting Object 4 because it is not visible!
Finished processing, press any key to end the program!
Program ended successfully!While the program is running, you will see the following windows:
If you export the results of EM-Fusion with -e, you can evaluate the poses
numerically. For dynamic scene datasets, EM-Fusion has to "guess" the object
centers, thus the object coordinate systems might not be well aligned with the
ground truth. The authors of Co-Fusion provide a program to convert these poses:
convert_poses.
The second bullet point (How to compare an object-trajectory of your non-static SLAM-method with ground-truth data?)
here explains how to use
it.
After converting the poses by using the first frame in the object trajectory as a reference, we evaluated numerical accuracy by scripts from the TUM RGB-D benchmark, namely evaluate_ate.py and evaluate_rpe.py.

