On October 24th, 2022, at 14:33 UTC, we initiated a planned backend change to our PostgreSQL database schema for some columns with type datetime, from a default value of CURRENT_TIMESTAMP to CURRENT_TIMESTAMP(6) (A no-op change that is needed by our CDC system to properly track schemas). We inadvertently included some columns that had no default values in this change. That resulted in webhooks not being delivered, and being erroneously reported as having been archived for being past retention for 1 hour and 17 minutes between 14:33 UTC and 15:50 UTC.

We resumed delivery at 15:50 UTC, and the backlog was cleared shortly after.

No webhooks were lost during that outage.

Timeline

Lessons learned

What went well

What went wrong

Corrective actions