As a data engineer, I don't want "perfect" pipelines. I want pipelines that are easy to understand, easy for others to pick up, and resilient when requirements change.
After working on several Databricks projects, I found myself rebuilding the same structure again and again.
So I stripped it back to something simple and reusable.
👉 Full working notebooks available here: https://payhip.com/b/S83I9
Medallion architecture is a common pattern in Databricks where data is processed in layers:
That’s it.
The idea is simple — but in practice, it’s very easy to overcomplicate.
Here’s what this looks like in a real project.
Ingest the raw data (CSV, JSON, etc.), add an ingestion timestamp, and resist the urge to “fix” things at this stage — the goal is to land data reliably, not perfect it.
Clean up column names, cast data types, remove duplicates, and filter out bad records. This is where most of the real engineering work happens.