Why time travel is hard

Data warehouses need to be:

  1. subject oriented
  2. integrated
  3. time variant
  4. non volatile

Achieving subject oriented is all about designing the data model based on the actual business. And making things non volatile is all about just not deleting or updating data and doing insert only flows instead. Integrating data without time aspect is mainly about finding the correct join keys to combine data from different sources but the moment one adds time aspect to it things get kazzillion times more complicated.

Let’s look this example from imaginary b2b SaaS company that sells some digital services and has consumption based pricing. The pricing is calculated from:

  1. what the selected service plan is
  2. which add ons there are
  3. for how long have the customer committed in their billing plan, like monthly vs annual

In addition we want to pull some extra display info from the service_type for easier reporting on human readable terms.

The source schema illustrated below. The purple boxes describe the join logic between the tables.

Screenshot 2023-09-27 at 9.51.45.png

Getting the current state of things is simple joining, but when we need to be able to report historically what was the price and as such revenue at any given point in time in history it’s whole another story.

Recording history permanently