This technical appendix describes how I designed the AWS pipeline behind the Job Market Skill Tracker. The goal of the project was to build a repeatable workflow that reads real data scientist job descriptions, extracts skills in a controlled way, compares them against my resume, and produces a weekly snapshot of market demand and skill gaps. The sections below focus on how the pipeline is structured, why I chose this setup, and what technical challenges came up along the way.
💡 The full implementation and code are available in Github Repository, and the project story story can be found here.
This pipeline reads weekly job descriptions from S3, extracts and normalizes skills, aggregates weekly demand signals, compares them against my resume, routes the results through a human review step, and writes the final outputs for later analysis. The overall design prioritized auditability, modularity, and decision-useful outputs over full automation.
The main AWS services used in this workflow are summarized below.
AWS component used
| Component | Role |
|---|---|
| S3 | Store raw job descriptions, resume, and weekly recommendation outputs |
| Step Functions | Orchestrate the pipeline, coordinates each stage cleanly |
| Lambda | Run extraction, aggregation, comparison, review, and output steps |
| Bedrock | Handle controlled skill extraction and comparison |
| DynamoDB | stores weekly skill snapshots for downstream analysis |
| SNS | Send notifications for human review |
| API Gateway | Connect the pipeline to the human review interface |
At a high level, the pipeline flow is shown below.

The workflow can be understood as 7 main steps, from data ingestion to final output generation.