<aside> 🔦 Edge detection is one of the core low-level problems in computer vision. This article proposes an overview of how the problem is currently formulated, why it’s important to predict sharp well-localized edges, how the latest Deep Learning techniques tackle this problem and how this task reveals some of the current shortcomings of the domain. To provide comparable results, all models discussed in this post have been re-trained with the same dataset, training regime, and methods. Some of them have been re-implemented or modified for experimental purposes. In consequence, the models' evaluation scores presented in this article won't be the same as the one presented in their research papers.

</aside>

Table of content

Edge detection and the importance of sharp accurate detection

Edge detection can be viewed as a method to extract visually salient edge and object boundaries from a given natural image. This task can seem very trivial for a human brain but this field has known a much slower progress than other computer vision tasks like classification or object detection. For example, those classification models are very performant and sometimes quite robust on real-world applications, but it has been shown that they often achieve these performances by extracting key features from small, high-frequency structures like textures and patterns. And while their ability to extract accurate predictions from tiny details can be very impressive, those models rarely have an object-level understanding of the problem, meaning they don't learn to recognize objects by their general shape and structure. But edge and boundary detection require specifically a multi-scale understanding of natural images. To tackle this problem, researchers in this field had to come up with new CNN structures to output high-resolution accurate edge maps. This edge/boundary detection problem is one of the most interesting in the computer vision field because its accomplishments will generally impact most of the other high-level tasks in computer vision. And developing deep architectures being able to learn an understanding of all-scale features in a natural image will quickly become key in our ability to come up with robust computer vision systems.

Performant edge detection provides well-localized information about objects shapes and structures, which higher-level tasks models like image segmentation often lack and need to provide accurate robust results. For example, edge detection can help produce a more accurate estimation of boundary-preserving optical flow in movement interpolation tasks. Or it can help provide a more precise generation of object proposals tasks that stick to object boundaries or provide better-localized object masks in image segmentation tasks. A good edge detection model can be integrated into a computer vision system as a post-processing step of the higher-level task's output to make it more accurate and robust. But to know how to provide sharp and well-localized edges, we first need to explore how the problem is formulated and how it's being solved by researchers.

The visualisation of optical flow between 2 images with the help of 2 different edge maps outputed from edge detection models. Res16x-CED model outputs a much sharper and details edgemap than HED and helps to generate a better movement flow detail.

1. The edge problem formulation

Asking a model for the right location and shape of edges is a quite open question and can be resolved in multiple ways. Also, the output to provide is very sparse: an edgemap will be evaluated on predicting accurately less than 5% of the pixels. Meaning that an error is much harder paid than in an image segmentation task where all pixels have to be labeled and an error is quickly absorbed by the loss reduction effect.

The input and ground truth of a sample from the BSDS500 dataset. In this sample, less than 5% of the pixels are labeled as edge positive.

This all means that the edge detection problem needs subtle formulation and like in all machine learning applications, posing the problem the right way is often undermined but far more important than how you resolve it and with which deep architecture. Judging from the slower pace of progression of this field compared to others, it seems that Edge detection is currently suffering from those difficulties.

Posing a problem in machine learning relies on two major steps:

The dataset creation, providing a series of samples with an input image and a ground truth representing what should be the correct output. But in the edge detection field, there is a lot of ways to approach what should be an edge and what should not. Also how to produce them and present them to the model.
The evaluation criteria, providing the model a basic sense of his amount of prediction error. But these criteria must be mathematically formulated in a way that allows backward gradients calculation, or the model won't be able to learn. But like in a lot of image2image tasks, the more open the task is, the harder it is to find the right evaluation criteria.

1.1 Datasets approaches

The dataset is a big part of the problem formulation: what are the inputs and what are the rules of the ground truth creation. Here we have two main approaches that define quite different tasks:

The boundary detection, asking to detect the boundary pixels of all meaningful objects in the image.
The edge detection, ****asking to detect abrupt pixels at which the luminance, color, or stereo changes sharply