How to update a million rows in under a second in Postgres

A.K.A efficiently update time series predictions in a relational database

OK, the title is a bit of a clickbait but to be fair, it was the exact Google search we ran when we faced this problem: how to update time series data in bulk? We store time series data in a Postgres table and each row corresponds to the values of a particular timestamp. Updating the values from a certain time period could mean updating millions of rows in the table. The solution we state here is for this particular problem.

Normally when we see time series data, the data itself is from the past so we do not need to update it. For example, if we are collecting IoT sensor data, we need an append-only data store because it's like history and we do not need to/should not modify history.

However, editing becomes relevant when we are making time series predictions. There should be an option to modify the prediction results because it has not materialised yet. For example, sometimes our clients have a deeper domain knowledge than our prediction engine so we need to offer our clients an option to edit the prediction results. This creates a unique challenge because updating a single summary number can lead to updating millions of rows in the database.

Let's say we have a table that stores the prediction results for cargo volume. We predict the cargo flow from all possible origin, destination pairs along with detailed breakdowns on customer and product types for the next quarter. Suppose we have 20 different origins going to 20 different destinations, 30 customer groups, 10 products. To cover all unique combinations, we have to predict 20*20*30*10=120000 values for each day within the prediction horizon. The total predicted volume for the next quarter, for example, is calculated by summing up all the predicted volume numbers of the next quarter (90 days) and that’s 90*120000=10800000 rows in the database.

Now in an extreme case, if the client had some inside information that a new partnership would be announced and would like to increase the predicted volume by 10% across all routes as a result, it would mean updating the numbers in over a millions rows in the database!

Going into the database and actually updating a million rows is simply not feasible for a SaaS platform where clients are expecting all numbers to be updated instantly with a click of button.

Here's how we dealt with it.

Create edit instructions

We realised that all of the edit instructions from the client shared the same format: increase/decrease the numbers by x% for this route/product/customer and this time range.

So instead of going into the database and update those numbers, we standardised the edit instructions and stored them in a new table called edit_history. When we return the prediction results to the client, we would aggregate the original prediction results and apply the edit instructions on the fly.