Give Jobsets an ID; add jobset_id to builds and jobs. by grahamc · Pull Request #710 · NixOS/hydra

RESULT: The latest-finished URL went from finishing in 10 minutes to about 500ms. PROBLEM: it made the existing indexes on Builds which used (project, jobset) useless and some queries took a long time, taking down the frontend. Adding another index (migration #65) fixed it.

Discussion with samueldr in #nixos-dev February 4, 2020
pgbadger & instrumentation in a private discussion with LnL February 4, 2020
EXPLAIN EXTENDED on a tested query, which took 230s February 4, 2020

Migrations (February 5, 2020)

Plan:

Add an id to the Jobsets table: serial, non-null, unique.
- Fast.
- Postgresql will automatically backfill, and new writes will automatically have them too.
- Code changes: None. Fully backwards and forward compatible.
Add a jobset_id to the Jobs table: nullable, foreign key to Jobsets
- Fast.
- Code changes:
  - All places writing to Jobs should begin writing the jobset_id
    - I believe there is only one place:
    - https://github.com/NixOS/hydra/blob/9da60e3c66430eb82f06f2f369e9df7e584b344f/src/script/hydra-eval-jobset#L420
Add a jobset_id to the Builds table: nullable, foreign key to Jobsets (3f074388)
- Fast.
- Code changes:
  - All places writing to Builds should begin writing the jobset_id
    - I believe there is only one place:
    - https://github.com/NixOS/hydra/blob/9da60e3c66430eb82f06f2f369e9df7e584b344f/src/script/hydra-eval-jobset#L465 (52283da9)
Backfill Jobs with jobset_id values.
1. The naive way to backfill has many problems:
  1. Very slow
  2. Huge amount of time with a read lock.
  3. Rewrites the entire table on disk in one shot, causing a full 2x table bloat
2. Solution:
  - Create a purpose-built tool to incrementally backfill the table:
    1. running this in a loop:
```
UPDATE jobs
SET jobset_id = (
  SELECT jobsets.id
  FROM jobsets
  WHERE jobsets.name = jobs.jobset
    AND jobsets.project = jobs.project
)
WHERE (jobs.project, jobs.jobset, jobs.name) in (
  SELECT jobsprime.project, jobsprime.jobset, jobsprime.name
  FROM jobs jobsprime
  WHERE jobsprime.jobset_id IS NULL
  FOR UPDATE SKIP LOCKED
  LIMIT 10000
);
```
    2. Every N iterations, run VACUUM
  - Hydra can stay fully online during the entire migration
  - The subselect of a specific collection of IDs allows the write lock to only affect those rows.
  - VACUUM will prevent 2x table bloat from happening all at once
Backfill Builds with jobset_id values.
1. The naive way to backfill has many problems:
  1. Very slow
  2. Huge amount of time with a read lock.
  3. Rewrites the entire table on disk in one shot, causing a full 2x table bloat
2. Solution:
  - Create a purpose-built tool to incrementally backfill the table:
    1. running this in a loop:
```
UPDATE builds
SET jobset_id = (
  SELECT jobsets.id
  FROM jobsets
  WHERE jobsets.name = builds.jobset
    AND jobsets.project = builds.project
)
WHERE builds.id in (
  SELECT buildprime.id
  FROM builds buildprime
  WHERE buildprime.jobset_id IS NULL
  ORDER BY buildprime.id
  FOR UPDATE SKIP LOCKED
  LIMIT 10000
);
```
    2. Every N iterations, run VACUUM
  - Hydra can stay fully online during the entire migration
  - The subselect of a specific collection of IDs allows the write lock to only affect those rows.
  - VACUUM will prevent 2x table bloat from happening all at once
Perform an explicit release and put this in to production, and run the backfill tool until all rows are updated.
1. Monitor to see if new Builds or Jobs rows are added with null jobset_id fields, this would indicate a bug which needs to be fixed.
Modify the Builds table, doing two things in one transaction:
1. Assert that there are no rows with a null jobset_id, any rows with a null jobset_id is likely from a bug we should fix.
2. Alter the Builds table to make jobset_id not-null. Hopefully not very slow, I think postgresql will only validate no rows have a null jobset_id.