Table of Contents

What is Kestra?

image.png

Kestra is an open-source platform for orchestrating and automating data workflows. It allows you to define pipelines as code (YAML/JSON) and run tasks across distributed systems with built-in logging, retries, and monitoring.

It supports plugins for databases, APIs, cloud storage, and more, making it highly extensible. Unlike Airflow or Prefect, Kestra focuses on cloud-native execution and scalability, while keeping pipelines declarative and versionable.

Common use cases include ETL pipelines, machine learning workflows, and batch jobs. Overall, Kestra simplifies running complex, reliable, and scalable data pipelines.

How Can I Install Kestra?

Based on Kestra Docs, you can install Kestra using Docker. Just like we saw in the last blog about Building Data Engineering Pipelines with Docker + PostgreSQL, the Kestra team provides a Docker image with all the necessary resources to run Kestra:

docker pull kestra/kestra:lates

Open http://localhost:8080 in your browser to launch the UI, create your user, and take the product tour to begin building your first flow.

What Is Workflow Orchestration?

Workflow orchestration is the automation and management of a series of tasks to ensure they run in the correct order. It handles dependencies, scheduling, retries, and monitoring, so pipelines run reliably without manual intervention.

image.png

Tools like Apache Airflow, Prefect, or Kestra make orchestration easier, allowing teams to execute complex workflows automatically and track each task’s progress.

For example, a data pipeline that extracts, transforms, and loads data, then runs analytics and generates reports, becomes scalable and reproducible with orchestration. In short, workflow orchestration ensures efficient, reliable, and maintainable pipelines.

Our Workflow

The main idea is to create a system that can download data from multiple sources (Green Trip Taxi Data and Yellow Trip Taxi Data), upload it to a Google Cloud Storage (GCS) bucket, and then create tables in BigQuery.

image.png

Pre-Requirements