Objective

Build a modern end-to-end data pipeline ingesting TPCH data from S3 into Snowflake, transforming via dbt (Silver/Gold layers), orchestrated by Airflow, with one Alteryx hybrid ETL workflow.

Architecture

AWS S3 → Snowflake RAW → dbt SILVER → dbt GOLD → Airflow Orchestration → Alteryx ETL

Tech Stack

Snowflake — Cloud Data Warehouse dbt Core — Transformations and testing Airflow (Docker) — Pipeline orchestration Alteryx Designer — Hybrid ETL GitHub — Version control Notion — Documentation

Status Tracker

Phase Tool Status
Raw Layer Snowflake ✅ Done
Transformations dbt Core ✅ Done
Orchestration Airflow ✅ Done
Hybrid ETL Alteryx ✅ Done
Documentation GitHub + Notion ✅ Done

Screenshots

snowflake_initial.png

dbt_structure.png

lineage.png

airflow_dag.png

alteryx_workflow.png

snowflake_final.png

GitHub Repository

https://github.com/ganeshkumar20261/capstone-data-pipeline