P² quantile estimator: estimating the median without storing values

Imagine that you are implementing performance telemetry in your application. There is an operation that is executed millions of times, and you want to get its “average” duration. It’s not a good idea to use the arithmetic mean because the obtained value can be easily spoiled by outliers. It’s much better to use the median which is one of the most robust ways to describe the average.

The straightforward median estimation approach requires storing all the values. In our case, it’s a bad idea to keep all the values because it will significantly increase the memory footprint. Such telemetry is harmful because it may become a new bottleneck instead of monitoring the actual performance.

Another way to get the median value is to use a sequential quantile estimator (also known as an online quantile estimator or a streaming quantile estimator). This is an algorithm that allows calculating the median value (or any other quantile value) using a fixed amount of memory. Of course, it provides only an approximation of the real median value, but it’s usually enough for typical telemetry use cases.

In this post, I will show one of the simplest sequential quantile estimators that is called the P² quantile estimator (or the Piecewise-Parabolic quantile estimator).

The P² quantile estimator

This algorithm was initially suggested in [Jain1985]. Below you can find a short overview of this approach, notes about typos in the original paper, numerical simulation, and a C# implementation.

The main idea

Let’s say we have a stream of observations and we want to estimate p-quantile. The suggested approach introduces five markers that correspond to the estimations of

Also, we have to maintain the marker positions . These integer values describe actual marker indexes across obtained observations at the moment.

Next, we have to define the marker desired positions . For the first observations, these real values are defined as follows:

In order to speed up the algorithm, we can precalculate increments of the desired positions which should be added to the current values after each new observation:

Note that in the original paper, the authors use one-based indexing. I decided to adapt it to the zero-based indexing which is more convenient from the implementation point of view.

Initialization

Once we collected the first five elements, we should perform initialization logic:

{q0=x(0),n0=0,n0′=0,dn0′=0,q1=x(1),n1=1,n1′=2p,dn1′=p/2,q2=x(2),n2=2,n2′=4p,dn2′=p,q3=x(3),n3=3,n3′=2+2p,dn3′=(1+p)/2,q4=x(4),n4=4,n4′=4,dn4′=1.

Marker invalidation

Firstly, we should adjust extreme marker heights (if , we should update ; if , we should update ) and find such that (or for ):

Secondly, we should update the marker positions and the marker desired positions:

Finally, we should adjust non-extreme marker heights () and positions () for in the following way: