Outputs

Each backend processor generates outputs in a specific format, usually written to the disk as a text file. Thus, original output formats exhibit significant variations across backends. Bitbox, therefore, includes wrapper functions that convert these outputs into a standard Python dictionary format.

File Caching System

Running backend processors to produce output files might take some time, usually a few minutes per video file, depending on specific hardware requirements. To enhance analysis efficiency and provide a versioning system, Bitbox includes an integrated file caching mechanism.

Each time a processor is run, Bitbox checks if the output files and their metadata, which detail the last execution, already exist in the specified output directory. This metadata is stored as .json files. If the files and metadata are found, Bitbox verifies if the time elapsed since their last creation is within the retention period (default is 6 months). If it is, Bitbox uses the existing files, avoiding the need to recreate them. This process significantly saves time.

Adjust the retention time according to your requirements.

# you can set the retention period using natural language: 1 year, 3 minutes, etc.
processor.cache.change_retention_period('1 year')

Each file saved to disk will have an accompanying .json file, named identically, that tracks the details of the most recent execution.

    "backend": "3DI",
    "morphable_model": "BFMmm-19830",
    "camera": 30,
    "landmark": "global4",
    "fast": false,
    "local_bases": "0.0.1.F591-cd-K32d",
    "input_hash": "4e31c4610ad3641ed651394855516d7989f9c5b3127520add6d87efc5618c162",
    "cmd": "CUDA_VISIBLE_DEVICES=1 docker run --rm --gpus device=1 -v /home/test/bitbox/tutorials/data:/app/input -v /home/test/bitbox/tutorials/output:/app/output -w /app/3DI bitbox:cuda12 ./video_detect_landmarks /app/input/elaine.mp4 /app/output/elaine_rects.3DI /app/output/elaine_landmarks.3DI /app/3DI/configs/BFMmm-19830.cfg1.global4.txt > /dev/null",
    "input": "/home/test/bitbox/tutorials/data/elaine.mp4",
    "output": "/home/test/bitbox/tutorials/output",
    "time": "2025-07-18 12:33:50"

Outputs Types

Bitbox returns Python dictionaries by default after each processing step, allowing users to easily manipulate the output. If you prefer that the steps return nothing, and only generate backend output files, set the return_output parameter to None. To receive paths of the generated files, set it to 'file'.

Output Formats

Below is a list of common components of face and body analysis pipelines and their associated outputs. The wrapper functions generate these raw behavioral signals, which serve as inputs for analysis functions to produce behavioral measurements. Details on outputs of analysis functions are given in Biomechanics, Affective Expressions, and Social Dynamics sections.

Face Rectangles

The dictionary containing the coordinates for the face rectangles is structured as follows.

Head Pose

The dictionary containing head pose is structured as follows.

The first three values (Tx, Ty, Tz) are the x, y, z coordinates of the translation vector and the last three values (Rx, Ry, Rz) are pitch, yaw, roll angles in radians of the rotation vector.

2D Face Landmarks

The dictionary containing the coordinates for the facial landmarks is structured as follows.

Below is an illustration showcasing the 51 landmarks from the iBUG schema included in Bitbox.

3D Face Landmarks

3DI and 3DI-Lite backends also identify 3D coordinates for the same 51 landmarks in a standardized/canonicalized template, adjusted for pose and individual identity. These coordinates effectively represent expression-related motion only. The dictionary containing these coordinates is structured as follows.

Facial Expressions

The dictionary containing facial expressions is structured as follows.

Depending on the backend processor used, the columns of the data frame have different meanings. With 3DI and 3DI-Lite, they may represent global, non-interpretable facial deformations along 79 PCA directions if generated by processor.fit(). Alternatively, they can denote localized, interpretable facial motions, similar to Action Units, if generated by processor.localized_expressions(). The format field will inform you of their specific representation. With OpenFace (coming soon), they will correspond to Action Units.

Body Joints

Coming Soon