Question Generation Prompt

Give me ONE PostgreSQL index design drill, formatted like a LeetCode SQL practice problem.

Difficulty: Easy | Medium | Hard (I will specify)

Requirements:
- This is NOT a tutorial.
- This is a practice problem I will solve and record in my notes.
- Use realistic schemas (users, orders, events, logs, payments, etc.).
- Assume realistic data volume (100k–10M rows).
- Clearly describe data distribution and skew.
- Use PostgreSQL.
- I MUST be able to actually test this locally.

Output format (VERY IMPORTANT):

1. Problem Title
   - Short, descriptive title

2. Difficulty
   - (Easy) / (Medium) / (Hard)

3. Schema
   - Table definitions (simplified if needed)

4. Data Characteristics
   - Row counts
   - Skew / cardinality
   - Any important assumptions

5. Data Setup (REQUIRED)
   - SQL to INSERT representative test data
   - Use generate_series where appropriate
   - Explicitly model the stated skew and distributions
   - Data volume can be scaled down but proportions must match

6. Query
   - ONE query only (the “problem query”)

7. Task
   - Decide whether to add an index or not
   - If yes, specify the exact index
   - If no, explain why
   - Predict how EXPLAIN will change (scan type / join type / sort)

Rules:
- Do NOT give the answer immediately.
- Do NOT explain theory.
- Do NOT mention join algorithms unless they are visible in EXPLAIN.
- Treat this like a LeetCode-style decision problem.

After I respond:
- Critique my decision
- Show expected EXPLAIN change (high level)
- Explain trade-offs (read vs write, index size, planner behavior)

⭐️ (Medium) Recent Login Events for Active Users

Last review: Feb 10, 2026

Question and Query

CREATE TABLE users (
    id            BIGSERIAL PRIMARY KEY,
    country_code  TEXT NOT NULL,
    status        TEXT NOT NULL,   -- 'active', 'suspended', 'deleted'
    created_at    TIMESTAMP NOT NULL
);

CREATE TABLE user_events (
    id         BIGSERIAL PRIMARY KEY,
    user_id    BIGINT NOT NULL REFERENCES users(id),
    event_type TEXT NOT NULL,      -- 'login', 'logout', 'purchase', 'view'
    created_at TIMESTAMP NOT NULL
);

INSERT INTO users (country_code, status, created_at)
SELECT
    CASE
        WHEN r < 0.85 THEN 'TW'
        WHEN r < 0.95 THEN 'US'
        ELSE 'JP'
    END,
    CASE
        WHEN r2 < 0.80 THEN 'active'
        WHEN r2 < 0.95 THEN 'suspended'
        ELSE 'deleted'
    END,
    now() - (random() * interval '3 years')
FROM (
    SELECT
        random() AS r,
        random() AS r2
    FROM generate_series(1, 200000)
) s;

INSERT INTO user_events (user_id, event_type, created_at)
SELECT
    (random() * 199999 + 1)::bigint,
    CASE
        WHEN r < 0.60 THEN 'view'
        WHEN r < 0.85 THEN 'login'
        WHEN r < 0.95 THEN 'logout'
        ELSE 'purchase'
    END,
    CASE
        WHEN r2 < 0.70
            THEN now() - (random() * interval '180 days')
        ELSE
            now() - (random() * interval '7 days')
    END
FROM (
    SELECT
        random() AS r,
        random() AS r2
    FROM generate_series(1, 5000000)
) s;
EXPLAIN ANALYZE
SELECT
    e.user_id,
    max(e.created_at) AS last_login_at
FROM user_events e
JOIN users u ON u.id = e.user_id
WHERE
    u.country_code = 'TW'
    AND u.status = 'active'
    AND e.event_type = 'login'
    AND e.created_at >= now() - interval '7 days'
GROUP BY e.user_id;

GroupAggregate  (cost=95987.54..130946.10 rows=168051 width=16) (actual time=259.662..324.621 rows=118270.00 loops=1)
  Group Key: e.user_id
  Buffers: shared hit=14390 read=24197, temp read=888 written=891
  ->  Gather Merge  (cost=95987.54..127895.75 rows=273969 width=16) (actual time=259.632..291.731 rows=277996.00 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=14390 read=24197, temp read=888 written=891
        ->  Sort  (cost=94987.52..95272.90 rows=114154 width=16) (actual time=239.366..245.138 rows=92665.33 loops=3)
              Sort Key: e.user_id
              Sort Method: external merge  Disk: 2376kB
              Buffers: shared hit=14390 read=24197, temp read=888 written=891
              Worker 0:  Sort Method: external merge  Disk: 2352kB
              Worker 1:  Sort Method: external merge  Disk: 2376kB
              ->  Parallel Hash Join  (cost=4239.59..83445.23 rows=114154 width=16) (actual time=30.902..220.676 rows=92665.33 loops=3)
                    Hash Cond: (e.user_id = u.id)
                    Buffers: shared hit=14374 read=24197
                    ->  Parallel Seq Scan on user_events e  (cost=0.00..78766.67 rows=167222 width=16) (actual time=0.098..153.172 rows=136422.00 loops=3)
                          "Filter: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
                          Rows Removed by Filter: 1530245
                          Buffers: shared hit=14374 read=22726
                    ->  Parallel Hash  (cost=3235.71..3235.71 rows=80311 width=8) (actual time=30.121..30.121 rows=45314.00 loops=3)
                          Buckets: 262144  Batches: 1  Memory Usage: 7424kB
                          Buffers: shared read=1471
                          ->  Parallel Seq Scan on users u  (cost=0.00..3235.71 rows=80311 width=8) (actual time=9.909..19.398 rows=45314.00 loops=3)
                                "Filter: ((country_code = 'TW'::text) AND (status = 'active'::text))"
                                Rows Removed by Filter: 21353
                                Buffers: shared read=1471
Planning:
  Buffers: shared hit=10 read=6 dirtied=1
Planning Time: 0.450 ms
JIT:
  Functions: 45
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.774 ms (Deform 0.979 ms), Inlining 0.000 ms, Optimization 1.992 ms, Emission 27.449 ms, Total 32.216 ms
Execution Time: 329.972 ms

Noneffective Solution 1

CREATE INDEX CONCURRENTLY idx_user_events_event_type_created_at
  ON user_events(event_type, created_at);
  1. ✅ No need to add and index to users → Low selectivity on u.country_code = 'TW' and u.status = 'active'
  2. ⚠️ (event_type, created_at) is the **logically aligned, “**event_type = 'login' + recent 7 days" is only 7.5% of user_events (high selectivity). But the problem is the 7.5% still huge to Bitmap (Heap Blocks: exact = 37096 → ~37k heap pages). It can’t beat the Seq Scan with parallelism.

Result

EXPLAIN ANALYZE
SELECT
    e.user_id,
    max(e.created_at) AS last_login_at
FROM user_events e
JOIN users u ON u.id = e.user_id
WHERE
    u.country_code = 'TW'
    AND u.status = 'active'
    AND e.event_type = 'login'
    AND e.created_at >= now() - interval '7 days'
GROUP BY e.user_id;

HashAggregate  (cost=80295.33..84651.33 rows=168051 width=16) (actual time=335.887..350.996 rows=118272.00 loops=1)
  Group Key: e.user_id
  Planned Partitions: 4  Batches: 5  Memory Usage: 8249kB  Disk Usage: 1544kB
  Buffers: shared read=40142 written=1, temp read=101 written=266
  ->  Hash Join  (cost=16563.76..62744.06 rows=273971 width=16) (actual time=70.384..260.670 rows=278014.00 loops=1)
        Hash Cond: (e.user_id = u.id)
        Buffers: shared read=40142 written=1
        ->  Bitmap Heap Scan on user_events e  (cost=10386.15..55512.91 rows=401338 width=16) (actual time=28.551..97.246 rows=409289.00 loops=1)
              "Recheck Cond: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
              Heap Blocks: exact=37100
              Buffers: shared read=38671 written=1
              ->  Bitmap Index Scan on idx_user_events_event_type_created_at  (cost=0.00..10285.82 rows=401338 width=0) (actual time=24.273..24.273 rows=409289.00 loops=1)
                    "Index Cond: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
                    Index Searches: 1
                    Buffers: shared read=1571
        ->  Hash  (cost=4471.00..4471.00 rows=136529 width=8) (actual time=41.797..41.798 rows=135942.00 loops=1)
              Buckets: 262144  Batches: 1  Memory Usage: 7359kB
              Buffers: shared read=1471
              ->  Seq Scan on users u  (cost=0.00..4471.00 rows=136529 width=8) (actual time=0.158..23.317 rows=135942.00 loops=1)
                    "Filter: ((country_code = 'TW'::text) AND (status = 'active'::text))"
                    Rows Removed by Filter: 64058
                    Buffers: shared read=1471
Planning:
  Buffers: shared hit=2 read=6
Planning Time: 0.381 ms
Execution Time: 354.562 ms

Noneffective Solution 2

CREATE INDEX CONCURRENTLY idx_user_events_login_recent_by_user
  ON user_events (created_at DESC)
  WHERE event_type = 'login';
  1. ✅ Add DESC to created_at, because max(e.created_at) only cares about latest value
  2. ✅ The e.event_type = 'login' has only ~25% of event_type, add WHERE event_type = 'login' clause to index to reduce the index size. The (event_type, created_at DESC) WHERE event_type = 'login' is redundant, so remove the event_type from index.