Advances in the state of the art for 3d human sensing are currently limited by the lack of visual datasets with 3d ground truth, including multiple people, in motion, operating in real-world environments, with complex illumination or occlusion, and potentially observed by a moving camera. Sophisticated scene understanding would require estimating human pose and shape as well as gestures, towards representations that ultimately combine useful metric and behavioral signals with free-viewpoint photo-realistic visualisation capabilities. To sustain progress, we build a largescale photo-realistic dataset, Human-SPACE (HSPACE), of animated humans placed in complex synthetic indoor and outdoor environments. We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, as well as parametric variations in body shape (for a total of 1,600 different humans), in order to generate an initial dataset of over 1 million frames. Human animations are obtained by fitting an expressive human body model, GHUM, to single scans of people, followed by novel re-targeting and positioning procedures that support the realistic animation of dressed humans, statistical variation of body proportions, and jointly consistent scene placement of multiple moving people. Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines. The dataset with evaluation server will be made available for research. Our large-scale analysis of the impact synthetic data, in connection with real data and weak supervision, underline the considerable potential for continuing quality improvements and limiting the sim-to-real gap, in this practical setting, in connection with increased model capacity.


Methodology for Reposing and Reshaping

The first stage in our pipeline is to fit the GHUM model to an initial 3d scan of a person. We build a representation that supports the plausible animation of both the body and the clothing based on different 3d motion capture signals.

The shape can also be varied obtaining scans with different appearance and body mass index, reasonable automatic clothing deformations synthesized using GHUM statistical shape parameters.

Reposed and reshaped scans are animated with retargetted motion capture data. Highly dynamic motions work best with characters wearing tight fitted clothing, the sequences look natural and smooth, but good performance can be achieved for less tight clothing.

Multiple Render Passes

Scans are animated and placed in complexly lit 3d scenes. For each sequence we generate multiple render passes including RGB, segmentation masks and ground truth GHUM 3D pose and shape information for each character.


If you use this model or code for your publication, please cite the following papers 1, 2 and 3:

Dataset Download

Please fill in the this form. By completing this form you agree to use the HSPACE dataset in accordance with the Google AI Principles. The dataset is currently in alpha version so there might be upcoming changes. Feedback is welcome!

The data will be available at once access is granted.


