作者：Wei Liu，Dragomir Anguelov，Dumitru Erhan，Christian Szegedy，Scott Reed，Cheng-Yang Fu，Alexander C. Berg

作者单位：UNC Chapel Hill, Chapel Hill, USA

发布时间：2016

发布期刊/会议：ECCV

论文全称：SSD: Single Shot MultiBox Detector

论文地址：https://linkspringer.53yu.com/chapter/10.1007/978-3-319-46448-0_2

论文代码：https://github.com/weiliu89/caffe/tree/ssd

地位：One-stage 系列的优秀算法之一

个人理解

**创新点：**主要是提出了使用深度神经网络中不同尺度的特征图来进行检测，虽然这种思想在现在很常见，但是在当时还是比较新颖的。
**为什么：**因为当时存在的目标检测算法主要有two-stage和one-stage的，然而当时具有代表性的two-stage算法是faster r-cnn，虽然准确性挺不错的，但是速度比较慢，而具有代表性的one-stage算法是yolov1，该算法虽然速度快，但是准确性不行，因此作者想要设计一种在速度和准确率都可以的算法
**怎么做：**作者使用VGG16的变种当做骨干网络，然后使用骨干网络卷积得到的 6 种不同尺寸的特征图，在这 6 种特征图上设定数量固定的先验矩形框，然后在分别使用检测头对这 6 种特征图进行检测，得到最终的检测结果。

一、摘要

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with difffferent resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confifirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unifified framework for both training and inference. For 300×300 input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 × 512 input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https://github.com/weiliu89/caffffe/tree/ssd.

提出了SSD算法，该算法使用一个单一的深度神经网络就能完成目标检测任务。
将边界框的输出空间离散化为不同尺寸的特征图中的不同长宽比的一组默认框（default box）。
在预测阶段，网络会在每个默认框（default box）中为每个类别的出现生成置信度，并对每个默认框进行调整以更好的匹配目标形状。