DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
CVPR 2025
1Meshcapade,
2Zhejiang University
3Stanford University
4Max Planck Institute for Intelligent Systems
* denotes equal contribution co-authorship

We address the task of reconstructing 3D hair geometry from a single image, which is challenging due to the diversity of hairstyles and the lack of paired image-to-3D hair data. Previous methods are primarily trained on synthetic data and cope with the limited amount of such data by using low-dimensional intermediate representations, such as guide strands and scalp-level embeddings, that require post-processing to decode, upsample, and add realism. These approaches fail to reconstruct detailed hair, struggle with curly hair, or are limited to handling only a few hairstyles. To overcome these limitations, we propose DiffLocks, a novel framework that enables detailed reconstruction of a wide variety of hairstyles directly from a single image. First, we address the lack of 3D hair data by automating the creation of the largest synthetic hair dataset to date, containing 40K hairstyles. Second, we leverage the synthetic hair dataset to learn an image-conditioned diffusion-transfomer model that reconstructs accurate 3D strands from a single frontal image. By using a pretrained image backbone, our method generalizes to in-the-wild images despite being trained only on synthetic data. Our diffusion model predicts a scalp texture map in which any point in the map contains the latent code for an individual hair strand. These codes are directly decoded to 3D strands without post-processing techniques. Representing individual strands, instead of guide strands, enables the transformer to model the detailed spatial structure of complex hairstyles. With this, DiffLocks can reconstruct highly curled hair, like afro hairstyles, from a single image for the first time. Qualitative and quantitative results demonstrate that DiffLocks outperforms exising state-of-the-art approaches. Data and code is available for research.
Video
Overview of our DiffLocks pipeline. Given a single RGB image, we use pretrained DINOv2 model to extract local and global features, which are used to guide a scalp diffusion model. The scalp diffusion model denoises a density map and a scalp texture containing latent codes for strand geometry. Finally, we probabilistically sample texels from the scalp texture and decode the latent code z into strands of 256 points. Decoding in parallel 100K strands yields the final hairstyle.
DiffLocks dataset
The DiffLocks dataset consists 40.000 samples which contain realistic RGB image rendered in Blender, files for the full 3D hair and metadata regarding the hair. Please check the link at the top of the page for more information regarding how to download it.
Comparison
Our method can robustly reconstruct a large variety of hairstyles with significantly more detail than previous approaches.
Acknowledgements
The website template was borrowed from Michaël Gharbi.