BigQuery Performance
Four Key Elements of Work
- I/O — how many bytes read? (읽기 바이트 수)
- Shuffle — bytes passed to next stage;
- Grouping — bytes per group (그룹별 바이트)
- Materialization — bytes written to storage (쓰기 바이트 수)
- CPU work — UDFs & built-in functions (CPU 사용량)
Avoid I/O Waste
- Don’t
SELECT *
→ select only needed columns.
- Push down filters early with
WHERE
.
- Avoid
ORDER BY
without LIMIT
.
Prevent Data Hotspots
Shuffle wisely
- Filter early to avoid overloading workers on JOIN.
- Use Query Explanation map → compare Max vs Avg stage times to spot skew.
- BigQuery auto-reshuffles overloaded workers (자동 재셔플링).
GROUP BY
- Best when distinct groups are small.
- Bad: grouping by high-cardinality unique IDs.
Joins & Unions
- Know join key uniqueness → avoid accidental cross joins.