METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

Abstract
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present METASCENES, a large-scale simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce SCAN2SIM, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate METASCENES: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm METASCENES ’s potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research.
Authors
Huangyue Yu*, Baoxiong Jia*, Yixin Chen*, Yandan Yang†, Puhao Li†, Rongpeng Su†, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang
Publication Year
2025
https://openaccess.thecvf.com/content/CVPR2025/papers/Yu_METASCENES_Towards_Automated_Replica_Creation_for_Real-world_3D_Scans_CVPR_2025_paper.pdf
Publication Venue
CVPR
Scroll to Top