Common terminology for this doc and the rest of the project:
- Position map
-
A
256x256x3array whose[v, u]index stores the position of the vertex with uv-space coordinates(u, v)as an(x, y, z)triple - Texture
-
A
256x256x3array whose[v, u]index stores the color of the vertex with uv-space coordinates(u, v)as a(b, g, r)triple - Canonical face/position-map/texture
-
Canonical face is any face that is used to align other faces, so that the center, size and orientation of the other faces are similar to the canonical face. Usually, there is one canonical face evaluated for every video chunk.
This face itself can have any size and orientation, but usually tends to be aligned so that thex, y, zvalues lie between ±2.5, with mean 0; and the face is looking directly in the front.
- Frontalized position map
- A position map that has been aligned with a specific canonical position map
Each video/video-chunk should have a corresponding directory, usually located in the videos folder, which should have the following files/subdirectories:
-
input: [Directory] Contains each video frame as an image (named
00001.jpg,00002.jpgetc.) -
posmap: [Directory] [Optional, not used after data extraction] Contains position map detected by PRNet for each video frame (named
00001.exr,00002.exretc.) -
canonical: [Directory] Contains data for the canonical position-map and texture. Has the following files:
frontalized.exr: The canonical position maptexture.webp: The canonical texturekeypoints.npy: A 68x3 array of keypoint positions corresponding tofrontalized.exrnormals.exr: A256x256x3array whose[v, u]index stores the area-vector of the quadrilateral formed by all vertices adjacent to the vertex with uv-space coordinates(u, v)selected.txt: [Optional, not used for anything] A list of all the frames that were used to create this canonical face.
-
output: [Directory] Contains a directory for each face-containing video frame (named
00001,00002etc.) containing information of the face. Each such directory has the following files within it:frontalized.exr: The frontalized position map of the facetexture.webp: The texture of the facekeypoints.npy: A 68x3 array of frontalized keypoint positions corresponding tofrontalized.exrparams.npy: A vector (usually 20-dimensional) containing PCA parameters for mouth shape and inner-mouth colorslighting.npy: A 4-dimensional vector representing the lighting in the frametexWithoutL.exr: The texture with the lighting separatedtransform.txt: A 4x3 matrix which, when when multiplied with a homogenized frontalized position, gives the actual non-frontalized position. (i.e. The affine transform required to take the positions infrontalized.exrto the actual positions in the video)
-
pca.npz: Contains the matrices and coefficients required to go from actual frontalized positions and colors of inner mouth to the PCA parameters, and vice-versa.
-
albedo.npy: Contains the matrix required to go from actual texture to the lighting parameters, and vice-versa.
-
video.mp4: [Optional, not used for anything] The actual video.
For instructions on how to run a particular python script within this repo, use python <script> -h
- Create base directory and the
inputdirectory andvideo.mp4within it - Extract images using
ffmpeg -i <path/to/base>/video.mp4 -q:v 2 <path/to/base>/input/%05d.jpg - Run
pipeline/demo.pyto get posmap folder - Run
pipeline/make_canonical.pyto get the canonical face - Run
pipeline/get_output.pyto get theoutputfolder - Run TMFR to get better results for the keypoints
- Run
pipeline/convert_tmfr_output.pyto get anew_outputfolder using TMFR results. Replace the output folder with this folder - Run
pipeline/get_pca_params.pyto getpca.npzin the base directory, andparams.npyfiles in each directory inoutput - Run
pipeline/save_light_albedo.pyto getalbedo.npyin base, andlighting.npyandtexWithoutL.exrin each directory inoutput
Alternatively, just use the extract-data repository (which contains this one and TMFR as submodules).