Defining time series and linking dependent time series

We encountered numerous problems defining time series while designing our demand forecasting platform. This is Part 1 of a two part series. This one talks about how we ended up defining time series and linking dependent time series. Part 2 goes into the technical details and talks about the techniques we used for optimising the calculations.

Problem statement

We will try to solve 2 problems

How to aggregate time series

Aggregation is needed naturally because different business units care about the data in different granularities. Sales teams might only need monthly data while operational teams need daily data.

Hence for any time series we store, we define the corresponding aggregation method associated with the time series so that the client has control over what granularity they see. The aggregation method can vary based on the nature of the time series.
How to link dependent time series

This is not necessary if we don't allow clients to adjust forecasted demand. Different time series can be stored and retrieved independently. But we know that our clients' domain knowledge is a gold mine and should be taken seriously. Hence, we provide our clients the option of adjusting the forecasted numbers. Linking dependent time series allows us to capture the domino effect of these adjustments.

For example, a shipping line care about both the forecasted volume of cargo and the forecasted revenue. Now if the client increases the forecasted volume by a certain percentage because of a new partnership, ideally the revenue should also change accordingly but this is impossible if the time series are stored independently.

We solved the above problems by defining what we call intrinsic metrics and derived metrics. Intrinsic metrics solves the aggregation problem and derived metrics solves the linking problem.

On Portcast platform, not all forecasted metrics can be adjusted directly, nor do we think they should be. Intrinsic metrics are metrics that can be adjusted directly and stored in the database. Derived metrics, on the other hand, are not even stored in the database, they are calculated on the fly from the intrinsic metrics so that any change to intrinsic metrics are reflected in them instantly.

For example, cargo volume is one metric, cargo price is another, multiplying the two gives revenue. Instead of storing volume and revenue, we store volume and price in the database and calculate revenue on the fly. In this case, volume and price are what we call intrinsic metrics and revenue a derived metric. If the client increases the forecasted volume, the forecasted revenue automatically increases as well because it is calculated on the fly as the multiplication of volume and price.

How do you decide which metrics are intrinsic and which are derived? Taking the same example, any two of the metrics can derive the other. The decision comes down to the business and the kind of levers you have as a business. In this particular case, the market demand drives the volume and the price is a lever that the business has to manage the volume, so it makes sense to make volume and price the intrinsic metrics and revenue a derived one. The forecasted revenue will change only if either the forecasted volume changes or the price changes.

Let's see how we can systematically define these metrics.

Defining metrics

Let's say we are storing the intrinsic metrics in the database, how do we convert a daily metric to a weekly metric? Or calculate the overall metric for any fixed time period? We need to define the aggregation methods.