Give Jobsets an ID; add jobset_id to builds and jobs. by grahamc · Pull Request #710 · NixOS/hydra

RESULT: The latest-finished URL went from finishing in 10 minutes to about 500ms. PROBLEM: it made the existing indexes on Builds which used (project, jobset) useless and some queries took a long time, taking down the frontend. Adding another index (migration #65) fixed it.


Migrations (February 5, 2020)

Plan:

  1. Add an id to the Jobsets table: serial, non-null, unique.
  2. Add a jobset_id to the Jobs table: nullable, foreign key to Jobsets
  3. Add a jobset_id to the Builds table: nullable, foreign key to Jobsets (3f074388)
  4. Backfill Jobs with jobset_id values.
    1. The naive way to backfill has many problems:
      1. Very slow
      2. Huge amount of time with a read lock.
      3. Rewrites the entire table on disk in one shot, causing a full 2x table bloat
    2. Solution:
      • Create a purpose-built tool to incrementally backfill the table:
        1. running this in a loop:

          UPDATE jobs
          SET jobset_id = (
            SELECT jobsets.id
            FROM jobsets
            WHERE jobsets.name = jobs.jobset
              AND jobsets.project = jobs.project
          )
          WHERE (jobs.project, jobs.jobset, jobs.name) in (
            SELECT jobsprime.project, jobsprime.jobset, jobsprime.name
            FROM jobs jobsprime
            WHERE jobsprime.jobset_id IS NULL
            FOR UPDATE SKIP LOCKED
            LIMIT 10000
          );
          
        2. Every N iterations, run VACUUM

      • Hydra can stay fully online during the entire migration
      • The subselect of a specific collection of IDs allows the write lock to only affect those rows.
      • VACUUM will prevent 2x table bloat from happening all at once
  5. Backfill Builds with jobset_id values.
    1. The naive way to backfill has many problems:
      1. Very slow
      2. Huge amount of time with a read lock.
      3. Rewrites the entire table on disk in one shot, causing a full 2x table bloat
    2. Solution:
      • Create a purpose-built tool to incrementally backfill the table:
        1. running this in a loop:

          UPDATE builds
          SET jobset_id = (
            SELECT jobsets.id
            FROM jobsets
            WHERE jobsets.name = builds.jobset
              AND jobsets.project = builds.project
          )
          WHERE builds.id in (
            SELECT buildprime.id
            FROM builds buildprime
            WHERE buildprime.jobset_id IS NULL
            ORDER BY buildprime.id
            FOR UPDATE SKIP LOCKED
            LIMIT 10000
          );
          
        2. Every N iterations, run VACUUM

      • Hydra can stay fully online during the entire migration
      • The subselect of a specific collection of IDs allows the write lock to only affect those rows.
      • VACUUM will prevent 2x table bloat from happening all at once
  6. Perform an explicit release and put this in to production, and run the backfill tool until all rows are updated.
    1. Monitor to see if new Builds or Jobs rows are added with null jobset_id fields, this would indicate a bug which needs to be fixed.
  7. Modify the Builds table, doing two things in one transaction:
    1. Assert that there are no rows with a null jobset_id, any rows with a null jobset_id is likely from a bug we should fix.
    2. Alter the Builds table to make jobset_id not-null. Hopefully not very slow, I think postgresql will only validate no rows have a null jobset_id.