This technical appendix describes how I designed the AWS pipeline behind the Job Market Skill Tracker. The goal of the project was to build a repeatable workflow that reads real data scientist job descriptions, extracts skills in a controlled way, compares them against my resume, and produces a weekly snapshot of market demand and skill gaps. The sections below focus on how the pipeline is structured, why I chose this setup, and what technical challenges came up along the way.

💡 The full implementation and code are available in Github Repository, and the project story story can be found here.

Pipeline Architecture

This pipeline reads weekly job descriptions from S3, extracts and normalizes skills, aggregates weekly demand signals, compares them against my resume, routes the results through a human review step, and writes the final outputs for later analysis. The overall design prioritized auditability, modularity, and decision-useful outputs over full automation.

The main AWS services used in this workflow are summarized below.

AWS component used

Component Role
S3 Store raw job descriptions, resume, and weekly recommendation outputs
Step Functions Orchestrate the pipeline, coordinates each stage cleanly
Lambda Run extraction, aggregation, comparison, review, and output steps
Bedrock Handle controlled skill extraction and comparison
DynamoDB stores weekly skill snapshots for downstream analysis
SNS Send notifications for human review
API Gateway Connect the pipeline to the human review interface

At a high level, the pipeline flow is shown below.

Screenshot 2026-03-23 at 6.28.23 PM.png

The workflow can be understood as 7 main steps, from data ingestion to final output generation.


Technical Challenges and Lessons

Q1: What did I learn about working with LLMs?