📑 Resume ↔ Job Matching System Report

1. Problem Understanding

Set out to build a system that matches resumes to job listings efficiently, balancing rule-based filtering (hard constraints) and semantic similarity (embeddings). The goal was to automate recommendations while ensuring quality and explainability.

Initial Idea (Evolution)

Started with the thought: “Let’s parse resumes and listings, then run LLM comparisons.”
Early iterations mixed LLM calls for everything → very slow + costly + no simple way to come up with a deterministic scoring mechanism.
Quickly finalized: embeddings + structured info extraction (IE) + hard rule-based filtering are key to making the comparisons more deterministic rather than relying purely on LLM-based outputs.

Final Approach

Built a modular pipeline to convert job listings into structured dictionaries
Built a modular pipeline with resume parsing → structured extraction → enrichment → embeddings → smart filters → ranking.
Focused on efficiency: precomputed job embeddings, single-pass resume enrichment.
Added an evaluation layer + proxy metrics for validation.

Current Limitations

No fine-tuning or ground-truth dataset yet.
Heavy reliance on proxy metrics.
Some fields (e.g., education, certifications, skills) are still handled in overly simplified ways.
Final score points need refinement. Currently, the scoring mechanism doesn’t generally let the score go above 0.6 unless there are perfect matches (which will happen very rarely).

2. Data Insights

Two main data sources:

Resumes: PDFs and DOCX resumes enriched via parsing & LLM-based enrichment