I describe a new approach to solve efficiently the convex hull problem in a two-dimensional space. I try to keep it accessible to everybody, not being too academic and using lots of pictures.

Sometime ago, a colleague of mine launched a programming challenge to the team. No prize and no glory, just lots of fun. Since I'm still amazed of the result I obtained, I decided to share it.

The contest consisted in two programming tasks. The first one was to find, as fast as possible, the smallest fence for a herd of sheep, given their positions (the second one was a search problem, not discussed in this post).

This is more or less the input data, visualized.

The input set of the problem is a randomly distributed herd.

The solution is the set of yellow sheep, around which can be built the smallest fence.

No need to say I happily accepted the challenge and spent a couple of weeks on it.

A more accurate (hopefully not scary) way of describing the problem is: given a set of points in a two dimensional space, find the smallest convex polygon which contains all of them. The vertices of this polygon are just a subset of the input points.

Luckily, I'm not the first person in history who has tried to solve this problem. In fact, in literature it is well known as the Convex Hull problem. It has been largely studied and has applications in many fields, like path finding, GIS, visual pattern matching, video games...and lots of interesting stuff. There are many algorithms for solving it, so I can reuse other people's work, or at least I have a source of inspiration.

There is just a little complication. My task would be very straightforward, if only the speed factor didn't matter. It's easy to come up with a working solution, but not that easy to come up with a fast solution. For fast, I mean being able to process several million points in less than a second.

Unfortunately, the first attempts with the algorithms found on the Internet (for example Graham scan, Quickhull, Monotone chain) were poor, because of the O(n log n) complexity, which doesn't scale for large input sets. Also, they massively use floating point operations, which are slower compared to the integer counterparts.

The intuition: pruning before building

After some failures, I had the feeling there was some better way to solve it. Instead of building the hull examining all the input points, as the conventional algorithms do, what about examining only a small set of potential hull points and ignore the rest? I thought I didn't really need to rewrite an entire convex hull algorithm from scratch, rather I only needed a way to identify as many non-hull-points* as possible and discard them, just before running a conventional existing algorithm.

Okay, let's clarify the title of this article, which is a bit (intentionally) misleading. What I'm going to describe here is not really a new convex hull algorithm, but a fast pruning algorithm which can be used in conjunction with any convex hull algorithm to dramatically reduce the size of the input data, and consequently the running time.