论文地址:https://arxiv.org/pdf/1904.07850.pdf
发布时间:2019.4.16
**作者:**Xingyi Zhou, Dequan Wang, and Philipp Kr
机构:UT Austin,UC Berkeley
论文地址:https://arxiv.org/pdf/1904.07850.pdf
代码地址:https://github.com/xingyizhou/CenterNet
Detection identififies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, ineffificient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to fifind center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28*.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.*1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.
当前主流的方法(anchor-based):检测将对象标识为图像中的轴对齐的框。大多数目标检测器枚举一个几乎详尽的潜在对象位置列表,并对每个位置进行分类(将目标检测简化为图像分类),使用NMS进行过滤,耗时耗力,
无论是one-stage还是two-stage都需要使用 NMS(非极大值抑制)进行后处理,通过计算Bbox间的IOU来删除同个目标的重复检测框。这种后处理很难区分和训练,因此现有大多检测器都不是端到端可训练的
本论文提出的方法:将一个对象建模为一个单个点——其边界框的中心点。检测器使用关键点估计来寻找中心点,并回归到所有其他对象属性,如大小、三维位置、方向,甚至姿势,具体如下图所示。
即目标检测问题变成了一个标准的关键点估计问题。作者仅仅将图像传入全卷积网络,得到一个热力图,热力图峰值点即中心点,根据每个特征图的峰值点位置回归预测了目标的宽高信息等
模型训练采用标准的监督学习,推理仅仅是单个前向传播网络,不存在NMS这类后处理
创新点:本文用对象的边界框中心的单个点来表示对象(将目标描述成一个中心点,见图2)而其他属性,如物体的大小、尺寸、三维范围、方向和姿态,则直接从中心位置的图像特征进行回归。