This tutorial will take you through how to write your a Hello World and a real world example. We will make an end to end ('EL' of 'ELT') flow from MySQL into Snowflake. Its a great place to start to see if Typhoon can simplify your workflow for landing data to the cloud.
Why use Typhoon for batch workflow orchestration?
It is the simplest way to write clean and easily maintainable data flows.
name: hello_world
schedule_interval: rate(1 hours)
granularity: hour
tasks:
send_data:
function: typhoon.flow_control.branch
args:
branches:
- filename: users.txt
contents: John, Amy, Adam, Jane
- filename: animals.txt
contents: dog, cat, mouse, elephant, giraffe
- filename: fruits.csv
contents: apple,pear,apricot
write_data:
input: send_data
function: typhoon.filesystem.write_data
args:
hook: !Hook data_lake
path: !MultiStep
- !Py $BATCH['filename']
- !Py $DAG_CONTEXT.interval_end
- !Py f'/store/{$2}/{$1}'
data: !Py $BATCH['contents']
create_intermediate_dirs: True
Let's go through this code step by step and then we can move onto our real-world example.
name: hello_world
schedule_interval: rate(10 minutes)
granularity: hour
Very simply this sets you flow name (no spaces) and the schedule interval in a rate to run at. It will use a timestamp truncated to the hour for the intervals.
tasks:
send_data:
function: typhoon.flow_control.branch
args:
branches:
- filename: users.txt
contents: John, Amy, Adam, Jane
- filename: animals.txt
contents: dog, cat, mouse, elephant, giraffe
- filename: fruits.csv
contents: apple,pear,apricot
Here we are setting up our tasks
in our flow. This is a DAG structure similar to many other workflow tools (e.g. Task A → Task B → Task C).
Lets examine our first task, send_data
, which simply outputs 3 'files' containing CSV formatted strings represented as a YAML list of dictionaries. We use flow_control.branch
to yield our 3 branches to the next node.