This tutorial will take you through how to write your a Hello World and a real world example. We will make an end to end ('EL' of 'ELT') flow from MySQL into Snowflake. Its a great place to start to see if Typhoon can simplify your workflow for landing data to the cloud.

Introducing Typhoon & HelloWorld.yml

Why use Typhoon for batch workflow orchestration?

It is the simplest way to write clean and easily maintainable data flows.

name: hello_world
schedule_interval: rate(1 hours)
granularity: hour

tasks:
  send_data:
    function: typhoon.flow_control.branch
    args:
      branches:
        - filename: users.txt
          contents: John, Amy, Adam, Jane
        - filename: animals.txt
          contents: dog, cat, mouse, elephant, giraffe
        - filename: fruits.csv
          contents: apple,pear,apricot

  write_data:
    input: send_data
    function: typhoon.filesystem.write_data
    args:
      hook: !Hook data_lake
      path: !MultiStep
        - !Py $BATCH['filename']
        - !Py $DAG_CONTEXT.interval_end
        - !Py f'/store/{$2}/{$1}'
      data: !Py $BATCH['contents']
      create_intermediate_dirs: True

Let's go through this code step by step and then we can move onto our real-world example.

name: hello_world
schedule_interval: rate(10 minutes)
granularity: hour

Very simply this sets you flow name (no spaces) and the schedule interval in a rate to run at. It will use a timestamp truncated to the hour for the intervals.

tasks:
  send_data:
    function: typhoon.flow_control.branch
    args:
      branches:
        - filename: users.txt
          contents: John, Amy, Adam, Jane
        - filename: animals.txt
          contents: dog, cat, mouse, elephant, giraffe
        - filename: fruits.csv
          contents: apple,pear,apricot

Here we are setting up our tasks in our flow. This is a DAG structure similar to many other workflow tools (e.g. Task A → Task B → Task C).

Lets examine our first task, send_data, which simply outputs 3 'files' containing CSV formatted strings represented as a YAML list of dictionaries. We use flow_control.branch to yield our 3 branches to the next node.

Connections & "typhoon status"