Part 1: Typhoon HelloWorld

This tutorial will take you through how to write your a Hello World and a real world example. We will make an end to end ('EL' of 'ELT') flow from MySQL into Snowflake. Its a great place to start to see if Typhoon can simplify your workflow for landing data to the cloud.

Introducing Typhoon & HelloWorld.yml

Why use Typhoon for batch workflow orchestration?

Super clean YAML code compiling to normal python functions.
Fully testable, as everything is normal python functions.
Switchable connection environments and DB connections (e.g. dev sqlite3 → production Postgres).
Modular Lego like so you save huge amounts of time.
Very performant, much more so than Singer based frameworks that wrap each row in JSON.
Asynchronous pipelines that can be deployed to AWS Lambda / Fargate to be completely serverless.
Typhoon compiles to Airflow so you get all the benefits, without the risk of removing mission-critical workflow immediately.

It is the simplest way to write clean and easily maintainable data flows.

name: hello_world
schedule_interval: rate(1 hours)
granularity: hour

tasks:
  send_data:
    function: typhoon.flow_control.branch
    args:
      branches:
        - filename: users.txt
          contents: John, Amy, Adam, Jane
        - filename: animals.txt
          contents: dog, cat, mouse, elephant, giraffe
        - filename: fruits.csv
          contents: apple,pear,apricot

  write_data:
    input: send_data
    function: typhoon.filesystem.write_data
    args:
      hook: !Hook data_lake
      path: !MultiStep
        - !Py $BATCH['filename']
        - !Py $DAG_CONTEXT.interval_end
        - !Py f'/store/{$2}/{$1}'
      data: !Py $BATCH['contents']
      create_intermediate_dirs: True

Let's go through this code step by step and then we can move onto our real-world example.

name: hello_world
schedule_interval: rate(10 minutes)
granularity: hour

Very simply this sets you flow name (no spaces) and the schedule interval in a rate to run at. It will use a timestamp truncated to the hour for the intervals.

tasks:
  send_data:
    function: typhoon.flow_control.branch
    args:
      branches:
        - filename: users.txt
          contents: John, Amy, Adam, Jane
        - filename: animals.txt
          contents: dog, cat, mouse, elephant, giraffe
        - filename: fruits.csv
          contents: apple,pear,apricot

Here we are setting up our tasks in our flow. This is a DAG structure similar to many other workflow tools (e.g. Task A → Task B → Task C).

Lets examine our first task, send_data, which simply outputs 3 'files' containing CSV formatted strings represented as a YAML list of dictionaries. We use flow_control.branch to yield our 3 branches to the next node.

Introducing Typhoon & HelloWorld.yml

Connections & "typhoon status"