This note goes hand in hand with the noteIndex .

Commands

-- 0) Refresh planner statistics (affects planning quality)
ANALYZE users;

-- Collects table & column statistics (row count, value distribution, MCV, histograms).
-- Required after bulk data changes to make EXPLAIN meaningful.
-- Most common values (MCV) and histograms drastically affect selectivity estimates.
-- If stats are stale, row estimates can be wildly wrong (leading to bad plans).

-- 1) Show the planner's **estimated plan (NO execution)**
EXPLAIN
SELECT ...;

-- Displays planner **estimates only** (cost, estimated rows, chosen plan).
-- Uses statistics from ANALYZE.
-- Does NOT execute the query.

-- 2) Execute the query and show real **runtime behavior**
EXPLAIN ANALYZE
SELECT ...;

-- Executes the query and shows **actual time**, actual rows, loops.
-- **Still uses planner estimates for plan selection**.
-- **Does NOT refresh statistics**.

-- 3) Add I/O visibility (heap vs index access)
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;

-- PostgreSQL’s cost model uses default I/O and CPU cost constants;
-- tuning these can change plan choices but does not affect correctness.

-- 4) Maximum debugging context
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT ...;

Sequential Scan

Example: Users and Orders

CREATE TABLE users (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email TEXT NOT NULL,
    country TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL
);

CREATE TABLE orders (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id),
    amount INT NOT NULL,
    status TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL
);

INSERT INTO users (email, country, created_at)
SELECT
    'user' || i || '@example.com',
    CASE WHEN i % 2 = 0 THEN 'US' ELSE 'TW' END,
    NOW() - (i || ' days')::INTERVAL
FROM generate_series(1, 1000) AS i; -- ~1,000 users

If the table is small relative to the predicate density, Sequential Scan is often cheapest despite having no index usage.

EXPLAIN
SELECT *
FROM users
WHERE country = 'TW';

-- Output
Seq Scan on users  (cost=0.00..21.50 rows=500 width=34)
   Filter: (country = 'TW'::text)

Sequential Scan → Read every row in users, apply the filter row-by-row
- No index on country
- Table is small (~1000 rows) → Full scan is cheap
cost=0.00..21.50 → an unitless score (NOT time) used only to compare plans
- 0.00 (startup_cost) → Cost before producing the first row **(≈ 0 → start reading immediately)
  - Cost to opens the table
  - Cost to reads the first page
- 21.50 (total_cost) → Cost to produce all rows
  - Cost to read all pages
  - Cost to evaluate country = 'TW' per row
  - Cost to returning ~500 rows
  - (pages_read × seq_page_cost) + (rows × cpu_tuple_cost) + (rows × cpu_operator_cost)
rows=500 → The estimated (not actual) rows.
width=34 → The estimated (not actual) average row size (bytes).
- It is matter for memory usage → Whether data fits in cache

Bitmap Scan

Prerequisite: Where/How PostgreSQL stores the data

Add One Index and Rerun

CREATE INDEX idx_users_country ON users(country);

-- Force PostgreSQL not using Sequential Scan
-- Because the dataset is small or the data selectivity is high (~50%),
-- the cost of I/O (reading index pages) more than reading the table once
SET enable_seqscan = off;

EXPLAIN
SELECT *
FROM users
WHERE country = 'TW';

-- Reset afterward
RESET enable_seqscan;

-- Output
Bitmap Heap Scan on users  (cost=8.03..23.27 rows=500 width=34)
  Recheck Cond: (country = 'TW'::text)
  ->  Bitmap Index Scan on idx_users_country  (cost=0.00..7.90 rows=500 width=0)
        Index Cond: (country = 'TW'::text)

Bitmap Index Scan on idx_users_country
- This phase answers “which rows?”, not “what are the rows?”
- It scans the index only (does not fetch table rows yet)
- It builds a bitmap (Bitmap Index) in memory
  - Set of heap block (page) IDs + tuple (row) offsets → a compact set of row locations
  - Many row hits on the same page collapse into one page fetch
- rows=500 → planner expects many matches the index
- width=0 → no row data is returned
Bitmap Heap Scan
- It walks through the bitmap
  - “I’ll first find which rows match using the index, then fetch the table rows in a cache-friendly order.”
- It double-checks the condition (Bitmap may be lossy)
- cost=8.03..23.27 → In this small dataset, the cost is higher than Sequential Scan ****(cost=0.00..21.50) that’s why the SET enable_seqscan = off; is needed.

The approach reduce the I/O cost: