Total3DUnderstanding
- Scene understanding and 3D shape modeling from single image

Relation Networks for Object Detection
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
Background
Semantic reconstruction of indoor scenes
- Scene understanding
- Object reconstruction
Brief history of scene reconstruction
- Early works focus on room layout estimation
- With the advance of CNNs, object pose estimation
- Instead of bounding box, shape retrieval methods w/ 3d model
- General shape representation: point cloud, patches, primitives w/ post-processing
- Voxel-grid representation: computationally intensive and time-consuming
- Object mesh reconstruction from a template
Previous works
- Scene understanding w/o shape details of indoor objects (instead, 3D bounding box)
- Scene-level reconstruction w/ object shapes under contextual knowledge
- Depth or voxel representation (c.f., voxel = 3D pixel)
- Mesh-retrieval methods w/ 3D model retrieval module
- Object-wise mesh reconstruction (e.g., Mesh R-CNN)
Achievement of this paper