π GitHub: https://github.com/dhananjay93/data-engineering-end-to-end-demo
Tech Stack: PostgreSQL (Cloud SQL), Airbyte, dbt, Airflow, Tableau, GCP
Built a modern, cloud-native data pipeline using GCP and open-source tools to ingest, transform, orchestrate, and visualize retail data from the Sample Superstore dataset.
Layer | Tool Used | Description |
---|---|---|
Storage | GCP Cloud SQL (Postgres) | Hosted raw and transformed data |
Ingestion | Airbyte (Cloud) | Moved data from raw DB to staging DB (ELT) |
Modeling | dbt (Local) | SQL transformations: intermediate β destination |
Orchestration | Apache Airflow (Local) | Scheduled & automated dbt runs |
Visualization | Tableau | Built dashboards from transformed tables |
Used Tableauβs open-source Sample Superstore dataset, which contains sales, customer, region, and product information.
Excel (.csv)
β
Cloud SQL (raw.orders, raw.returns)
β β Airbyte β
Cloud SQL (intermediate.orders, intermediate.returns)
β
dbt transformations
β
Cloud SQL (destination.aggregated)
β
Tableau dashboards
master_table.sql
: Cleaned and formatted raw datafct_sales_summary.sql
: Aggregated metrics by region and category