Credits:

Serverless Data Pipelines Made Easy with Prefect and AWS ECS Fargate

Distributed data pipelines made easy with AWS EKS and Prefect

Distributed Data Pipelines with AWS ECS Fargate and Prefect Cloud


https://lucid.app/publicSegments/view/8ee69b9f-a1b9-4ffe-87de-3934b1880680/image.jpeg

Run the below commands in any order, before registering your flow:

Orchestration - Start ECS / EKS agent


# Login to Prefect Cloud
prefect auth login -t <TENANT_TOKEN>

# Option 1: ECS Agent
prefect agent ecs start -t <RUNNER_TOKEN> -l <TAGS>

# Option 2: EKS Agent
prefect agent install kubernetes -t <RUNNER_TOKEN> \\
    --rbac | kubectl apply -f -

# Run in background using supervisor
# (edit /etc/supervisor/supervisord.conf)
[program:agent]
command=/absolute/path/to/prefect "Option 1 or 2"

Execution - Create ECS / EKS cluster


# Login to AWS-CLI
aws configure

# Option 1: ECS Cluster
aws ecs create-cluster

# Option 2: EKS Cluster
eksctl create cluster --name fargate-eks --region <REGION> --fargate

Create IAM Role (using AWS Console)

<aside> 💡 Refer to section: Creating an IAM role for our ECS tasks (result: task role ARN)

</aside>

# Add permissions for your tasks to access AWS Services
arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>

Storage - Use S3 + Docker Hub, or ECR


Flow code


Option 1 Using S3 to store your flow code

from prefect.storage import S3

STORAGE = S3(bucket='<BUCKET_NAME>')

Option 2 Using ECR to store both your flow code + Docker image

from prefect.storage import Docker

STORAGE = Docker(registry_url="<YOUR_ECR_REGISTRY_ID>.dkr.ecr.eu-central-1.amazonaws.com",
				    		 python_dependencies=["pandas==1.1.0"],
                 image_tag='latest')

Last Updated: February 27, 2021 8:33 PM (GMT+8)