Question Generation Prompt
Give me ONE PostgreSQL index design drill, formatted like a LeetCode SQL practice problem.
Difficulty: Easy | Medium | Hard (I will specify)
Requirements:
- This is NOT a tutorial.
- This is a practice problem I will solve and record in my notes.
- Use realistic schemas (users, orders, events, logs, payments, etc.).
- Assume realistic data volume (100k–10M rows).
- Clearly describe data distribution and skew.
- Use PostgreSQL.
- I MUST be able to actually test this locally.
Output format (VERY IMPORTANT):
1. Problem Title
- Short, descriptive title
2. Difficulty
- (Easy) / (Medium) / (Hard)
3. Schema
- Table definitions (simplified if needed)
4. Data Characteristics
- Row counts
- Skew / cardinality
- Any important assumptions
5. Data Setup (REQUIRED)
- SQL to INSERT representative test data
- Use generate_series where appropriate
- Explicitly model the stated skew and distributions
- Data volume can be scaled down but proportions must match
6. Query
- ONE query only (the “problem query”)
7. Task
- Decide whether to add an index or not
- If yes, specify the exact index
- If no, explain why
- Predict how EXPLAIN will change (scan type / join type / sort)
Rules:
- Do NOT give the answer immediately.
- Do NOT explain theory.
- Do NOT mention join algorithms unless they are visible in EXPLAIN.
- Treat this like a LeetCode-style decision problem.
After I respond:
- Critique my decision
- Show expected EXPLAIN change (high level)
- Explain trade-offs (read vs write, index size, planner behavior)
Last review: Feb 10, 2026
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
country_code TEXT NOT NULL,
status TEXT NOT NULL, -- 'active', 'suspended', 'deleted'
created_at TIMESTAMP NOT NULL
);
CREATE TABLE user_events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
event_type TEXT NOT NULL, -- 'login', 'logout', 'purchase', 'view'
created_at TIMESTAMP NOT NULL
);
INSERT INTO users (country_code, status, created_at)
SELECT
CASE
WHEN r < 0.85 THEN 'TW'
WHEN r < 0.95 THEN 'US'
ELSE 'JP'
END,
CASE
WHEN r2 < 0.80 THEN 'active'
WHEN r2 < 0.95 THEN 'suspended'
ELSE 'deleted'
END,
now() - (random() * interval '3 years')
FROM (
SELECT
random() AS r,
random() AS r2
FROM generate_series(1, 200000)
) s;
INSERT INTO user_events (user_id, event_type, created_at)
SELECT
(random() * 199999 + 1)::bigint,
CASE
WHEN r < 0.60 THEN 'view'
WHEN r < 0.85 THEN 'login'
WHEN r < 0.95 THEN 'logout'
ELSE 'purchase'
END,
CASE
WHEN r2 < 0.70
THEN now() - (random() * interval '180 days')
ELSE
now() - (random() * interval '7 days')
END
FROM (
SELECT
random() AS r,
random() AS r2
FROM generate_series(1, 5000000)
) s;
EXPLAIN ANALYZE
SELECT
e.user_id,
max(e.created_at) AS last_login_at
FROM user_events e
JOIN users u ON u.id = e.user_id
WHERE
u.country_code = 'TW'
AND u.status = 'active'
AND e.event_type = 'login'
AND e.created_at >= now() - interval '7 days'
GROUP BY e.user_id;
GroupAggregate (cost=95987.54..130946.10 rows=168051 width=16) (actual time=259.662..324.621 rows=118270.00 loops=1)
Group Key: e.user_id
Buffers: shared hit=14390 read=24197, temp read=888 written=891
-> Gather Merge (cost=95987.54..127895.75 rows=273969 width=16) (actual time=259.632..291.731 rows=277996.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=14390 read=24197, temp read=888 written=891
-> Sort (cost=94987.52..95272.90 rows=114154 width=16) (actual time=239.366..245.138 rows=92665.33 loops=3)
Sort Key: e.user_id
Sort Method: external merge Disk: 2376kB
Buffers: shared hit=14390 read=24197, temp read=888 written=891
Worker 0: Sort Method: external merge Disk: 2352kB
Worker 1: Sort Method: external merge Disk: 2376kB
-> Parallel Hash Join (cost=4239.59..83445.23 rows=114154 width=16) (actual time=30.902..220.676 rows=92665.33 loops=3)
Hash Cond: (e.user_id = u.id)
Buffers: shared hit=14374 read=24197
-> Parallel Seq Scan on user_events e (cost=0.00..78766.67 rows=167222 width=16) (actual time=0.098..153.172 rows=136422.00 loops=3)
"Filter: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
Rows Removed by Filter: 1530245
Buffers: shared hit=14374 read=22726
-> Parallel Hash (cost=3235.71..3235.71 rows=80311 width=8) (actual time=30.121..30.121 rows=45314.00 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 7424kB
Buffers: shared read=1471
-> Parallel Seq Scan on users u (cost=0.00..3235.71 rows=80311 width=8) (actual time=9.909..19.398 rows=45314.00 loops=3)
"Filter: ((country_code = 'TW'::text) AND (status = 'active'::text))"
Rows Removed by Filter: 21353
Buffers: shared read=1471
Planning:
Buffers: shared hit=10 read=6 dirtied=1
Planning Time: 0.450 ms
JIT:
Functions: 45
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.774 ms (Deform 0.979 ms), Inlining 0.000 ms, Optimization 1.992 ms, Emission 27.449 ms, Total 32.216 ms
Execution Time: 329.972 ms
CREATE INDEX CONCURRENTLY idx_user_events_event_type_created_at
ON user_events(event_type, created_at);
u.country_code = 'TW' and u.status = 'active'Heap Blocks: exact = 37096 → ~37k heap pages). It can’t beat the Seq Scan with parallelism.Result
EXPLAIN ANALYZE
SELECT
e.user_id,
max(e.created_at) AS last_login_at
FROM user_events e
JOIN users u ON u.id = e.user_id
WHERE
u.country_code = 'TW'
AND u.status = 'active'
AND e.event_type = 'login'
AND e.created_at >= now() - interval '7 days'
GROUP BY e.user_id;
HashAggregate (cost=80295.33..84651.33 rows=168051 width=16) (actual time=335.887..350.996 rows=118272.00 loops=1)
Group Key: e.user_id
Planned Partitions: 4 Batches: 5 Memory Usage: 8249kB Disk Usage: 1544kB
Buffers: shared read=40142 written=1, temp read=101 written=266
-> Hash Join (cost=16563.76..62744.06 rows=273971 width=16) (actual time=70.384..260.670 rows=278014.00 loops=1)
Hash Cond: (e.user_id = u.id)
Buffers: shared read=40142 written=1
-> Bitmap Heap Scan on user_events e (cost=10386.15..55512.91 rows=401338 width=16) (actual time=28.551..97.246 rows=409289.00 loops=1)
"Recheck Cond: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
Heap Blocks: exact=37100
Buffers: shared read=38671 written=1
-> Bitmap Index Scan on idx_user_events_event_type_created_at (cost=0.00..10285.82 rows=401338 width=0) (actual time=24.273..24.273 rows=409289.00 loops=1)
"Index Cond: ((event_type = 'login'::text) AND (created_at >= (now() - '7 days'::interval)))"
Index Searches: 1
Buffers: shared read=1571
-> Hash (cost=4471.00..4471.00 rows=136529 width=8) (actual time=41.797..41.798 rows=135942.00 loops=1)
Buckets: 262144 Batches: 1 Memory Usage: 7359kB
Buffers: shared read=1471
-> Seq Scan on users u (cost=0.00..4471.00 rows=136529 width=8) (actual time=0.158..23.317 rows=135942.00 loops=1)
"Filter: ((country_code = 'TW'::text) AND (status = 'active'::text))"
Rows Removed by Filter: 64058
Buffers: shared read=1471
Planning:
Buffers: shared hit=2 read=6
Planning Time: 0.381 ms
Execution Time: 354.562 ms
CREATE INDEX CONCURRENTLY idx_user_events_login_recent_by_user
ON user_events (created_at DESC)
WHERE event_type = 'login';
DESC to created_at, because max(e.created_at) only cares about latest valuee.event_type = 'login' has only ~25% of event_type, add WHERE event_type = 'login' clause to index to reduce the index size. The (event_type, created_at DESC) WHERE event_type = 'login' is redundant, so remove the event_type from index.