Advanced Functions + UDF + Partitioning
Function toolbox
- String:
UPPER()
, SUBSTR()
, FORMAT()
- Aggregation:
SUM()
, COUNT()
, AVG()
, MIN()
, MAX()
→ 집계 결과 1행
- Type cast:
CAST(x AS ...)
, SAFE_CAST()
- Date/Time:
PARSE_DATETIME()
, DATE()
, TIMESTAMP()
- Statistical:
STDDEV_POP/STDDEV_SAMP
, VAR_POP/VAR_SAMP
, APPROX_QUANTILES
- Analytic (Window): “행 그룹 위에 계산, 행당 1값 반환” → 이동평균, 순위, LAG/LEAD
- User-Defined Functions (UDF): SQL 또는 JavaScript로 custom function 정의
Analytic / Window functions
- Navigation:
LAG(expr, n)
= 이전 n행 값
LEAD(expr, n)
= 이후 n행 값
NTH_VALUE(expr, n)
= 윈도우 내 n번째
- Ranking/Distribution:
ROW_NUMBER()
(중복 없이 번호)
RANK()
/ DENSE_RANK()
(동점 처리 다름)
PERCENT_RANK()
, CUME_DIST()
- Framing:
OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...)
-- Example: state별 최대 고용주 (2015), 동률 허용: RANK()
WITH base AS (
SELECT
ein, name, state,
noemplyeesw3cnt AS num_emp
FROM `bigquery-public-data.irs_990.irs_990_2015` a
JOIN `bigquery-public-data.irs_990.irs_990_ein` b USING (ein)
)
SELECT *
FROM (
SELECT
ein, name, state, num_emp,
RANK() OVER (PARTITION BY state ORDER BY num_emp DESC) AS r
FROM base
)
WHERE r = 1
ORDER BY num_emp DESC;
UDF (User-Defined Functions)
Components
CREATE [TEMP] FUNCTION dataset.fn_name(named_params...)
RETURNS <data_type>
LANGUAGE js | sql
AS """ <code> """