As a data engineer, I don't want "perfect" pipelines. I want pipelines that are easy to understand, easy for others to pick up, and resilient when requirements change.

After working on several Databricks projects, I found myself rebuilding the same structure again and again.

So I stripped it back to something simple and reusable.

👉 Full working notebooks available here: https://payhip.com/b/S83I9


🟤 What is Medallion Architecture?

Medallion architecture is a common pattern in Databricks where data is processed in layers:

That’s it.

The idea is simple — but in practice, it’s very easy to overcomplicate.


⚪ How I Structure It in Practice

Here’s what this looks like in a real project.

Step 1: Bronze (ingest and keep it raw)

Ingest the raw data (CSV, JSON, etc.), add an ingestion timestamp, and resist the urge to “fix” things at this stage — the goal is to land data reliably, not perfect it.

Step 2: Silver (clean and standardise)

Clean up column names, cast data types, remove duplicates, and filter out bad records. This is where most of the real engineering work happens.

Step 3: Gold (publish business-ready outputs)