Total3DUnderstanding

Scene understanding and 3D shape modeling from single image

Relation Networks for Object Detection

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Background

Semantic reconstruction of indoor scenes

Scene understanding
Object reconstruction

Brief history of scene reconstruction

Early works focus on room layout estimation
With the advance of CNNs, object pose estimation
Instead of bounding box, shape retrieval methods w/ 3d model
General shape representation: point cloud, patches, primitives w/ post-processing
Voxel-grid representation: computationally intensive and time-consuming
Object mesh reconstruction from a template

Previous works

Scene understanding w/o shape details of indoor objects (instead, 3D bounding box)
Scene-level reconstruction w/ object shapes under contextual knowledge
- Depth or voxel representation (c.f., voxel = 3D pixel)
- Mesh-retrieval methods w/ 3D model retrieval module
- Object-wise mesh reconstruction (e.g., Mesh R-CNN)

Achievement of this paper