Segmenting Objects - That should be easy right?

Fig 1: Shows examples of the different annotation strategies. While Instance Segmentation and Object Detection differentiate between distinct focal pathologies (symbolised by the ducks) while semantic segmentation does not.

Fig 1: Shows examples of the different annotation strategies. While Instance Segmentation and Object Detection differentiate between distinct focal pathologies (symbolised by the ducks) while semantic segmentation does not.

Let's start with an example (Figure 1) which displays multiple commonly used annotation schemes. The Ground Truth reflects how humans typically perceive focal pathologies such as tumours, aneurysms or metastasis in radiological scans. In this example, the ducks 🦆 symbolise the focal pathologies of interest and we can differentiate between three distinct regions.

In this guide, detection refers to Instance Segmentation and Object Detection which both differentiate between multiple ducks. Instance Segmentation refers to pixel wise segmentations of each duck and allows for various further evaluations i.e. to measure the volume of each duck. While Object Detection is not suitable to measure the volume, it is very relevant for diagnostic purposes where clinical decisions are based on the presence of certain pathologies (i.e. cancer diagnostics or counting metastasis).

In this example, Semantic Segmentation only differentiates between the duck and the background but it does not differentiate between the different ducks. While we as humans, are extremely good at clustering objects simply by their appearance, this can be incredibly difficult for computers which operate on numerical representations. To illustrate the problem, Figure 2 shows an example where the clustering of objects solely based on numerical values is much more difficult.

Figure 2: Numerical representations of Instance and Semantic Segmentation. While one the regions can be identified easily it is much more difficult to differentiated between the second and third instance (right image indicated by 2 and 3).

Figure 2: Numerical representations of Instance and Semantic Segmentation. While one the regions can be identified easily it is much more difficult to differentiated between the second and third instance (right image indicated by 2 and 3).

Without proper visual context, even humans can not certainly group the ones to individual objects. As a consequence, errors in the clustering can lead to inconsistent annotations of objects and influence training and evaluation of object detection algorithms.

<aside> 🦆 Conclusion: Detection annotations can differentiate between multiple regions of the same class while semantic segmentation does differentiate these. To determine if a segmentation is in the correct format it is always a good idea to ask: Is it possible to count the number of instances from the segmentation?

</aside>

Representing Instance Segmentations

Non - Overlapping Instances

If the regions of interest can not overlap it is typically a good idea to assign a unique number (this guide typically refers to this as an Identifier) to each instance and save them into a single file. To differentiate between the different annotation types, this file is typically referred to as mask in this guide while semantic segmentations are simply called segmentations. The mask contains the spatial information of each instance but it does not include any additional information such as the class of the instance.

Overlapping Instances

If instances can overlap each objects needs to be saved into a separate file or an additional dimension needs to be used to differentiate them.

Instance Information

Since the masks only contain the spatial information of each instance, we typically need a second file to specify additional properties of each object. Common solutions for this include a CSV file which uses the unique numbers (Identifiers) of the mask and assigns them additional information. As an alternative, a json file can be used to save additional information for each of the masks (this approach is explained in more detail below).

nnDetection Annotation Format

Perceived Instances case001.nii.gz - label (mask) file per case

Figure 3: The perceived instances are depicted on the left side and the mask on right.

Figure 3: The perceived instances are depicted on the left side and the mask on right.

dataset.json - classes for the whole dataset

"labels": {
        "0": "Square",
        "1": "Circle",
    },

case001.json - label information per case

"instances": {
        "1": 0,
        "2": 1,
				"3": 1,
    }

The dataset.json file defines the classes of the dataset. In this example we have two classes Square and Circle. Since it it easier to work with numerical values (at least inside the code), the Square class will be referred as 0 and the Circle class will be referred as 1. These mapping is defined in the label field of the dataset.json. Each label (the combination of spatial information and properties of each object) consists of two files: the nifty file (mask) which defines the spatial locations of each instance and an accompanying json file which assigns a class to each instance. In this example the first Instance is a square, the second instance is a circle and the third instance is another circle.