Pseudo-lidar from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving

This papers mentions two types of hardware used for object detection purposes: LiDAR and stereo/monocular cameras. LiDAR is expensive, but has a higher detection accuracy compared to stereo camera.

To solve the problem the papers proposes a pipeline where the stereo image is used to calculate depth and using that information transform it into a point cloud. The point cloud is used after with LiDAR object recognition techniques.

The results show a better performance by making that representation change.

Dataset used: KITTI

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e9566e57-dce9-4a8c-94f6-73b9e3011dc5/Untitled.png

Learning the Depths of Moving People by Watching Frozen People

This works uses the great variety of videos of static persons and moving camera extracted from the mannequin challenge trend and creates a RGBD dataset from them.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/0e635253-787d-4bd0-82a2-426788d5a78b/Untitled.png

Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis (2021)

Existing datasets

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute)

It consists in hours of traffic scenario recordings, obtained by using a great variety of sensors such as: RGB and grayscake stereo HQ cameras, and a 3D scanner.

The dataset uses 2D and 3D representations.

The dataset doesn't include semantic annotations.

It contains the categories of cars and pedestrians.