Writing a "fill-forward" view

This SQL will take an intermittent time series data set (one with "holes" in it) and carry values forward to fill the holes (e.g. if July 6th and July 7th are missing but July 5th and July 8th are not, the value from July 5th will be used to fill in the 6th and 7th). Note this is PostgreSQL focused - not all PostgreSQL functions and syntax are SQL standard and the process could vary for other database environments.

STEP 1 - Create and "All-Dates" view using GENERATE SERIES

You'll need a complete time series data set to start, but this can be created dynamically using the generate_series function in PostgreSQL. This function takes 3 inputs:

start date, end date and interval.

The start date can be most easily gotten by grabbing the MIN on the date in your data set you're trying to fill (or you could hardwire it to a date known to be the earlies of interest for your project).

The end date can be set to now(), or now() + interval '1 day', or some other dynamic date referencing today.

The time series interval in this example is 1 day but you could build other time series such as monthly, 15 minute, etc.

Quite likely the dynamic time series view is something you'll use for other things too so it can be created as it's own view and referenced for the fill-forward; for the purposes of this example however we will embed the "all dates" view in the overall fill-forward view as:

<aside> 💡 SELECT generate_series( (SELECT MIN(ddate) FROM data.mydata) AS fill_date, now() + '1 day'::interval, '1 day'::interval)::date) AS fill_date

</aside>

Note we give the generate_series function result an alias so it can be treated as a field.

Create the "Explode" sub-query

With the All Dates view we can now join our data with holes to it using an inequality which creates a temporary "exploded" version of our data where every value is carried forward.

<aside> 💡 SELECT ddate, mydatavalues, fill_date FROM data.mydata d JOIN (SELECT generate_series((SELECT MIN(ddate) FROM data.mydata), now() + '1 day'::interval, '1 day'::interval)::date AS fill_date) alldates ON ddate ≤= fill_date

</aside>

Collapse the Explode sub-query into a "associated dates" view

Now we use a GROUP BY on our exploded data to grab the max fill date associated with each real data point. This results in a query that associates each date in our complete time series with the last available date in your actual data set:

<aside> 💡 SELECT fill_date, max(ddate) AS ddate FROM (explode query from above) GROUP BY fill_date

</aside>

If you run this and ORDER BY fill_date you'll see that there is now a complete time series in the fill_date column, and each date has a "real date" associated with it which is the last available date for which you have actual data.

Pulling it all together

Now we will join together our associated dates with our data table again, but we'll use as our final date field the "fill_date" field, and use the ddate field just to join in the rest of the fields from our data set (note that all other fields will be carried forward for the fill periods).