Strategic Credit Risk & Profitability Analysis: A 2.2M Record Study→

Tech Stack: Python (Pandas), Data Engineering, Financial Modelling, Tableau

Dataset: 2.26 Million Loan Records (Lending Club)

🎯 Executive Summary

This project evaluates the risk-return profile of a $30B+ loan portfolio. By processing 2.4GB of raw data, I built a data pipeline to analyze the correlation between credit grades and actual financial performance. Key Discovery: Identified a critical pricing failure in high-risk segments (Grades C-G), where the Default Rate significantly outpaced the Interest Premium, resulting in an expected loss of up to 12.5% per dollar for the riskiest tier.

https://github.com/ZIXUANZHAO1998/Credit-Risk-Profitability-Study

🛠 Technical Workflow

🛠 View Python Code: Data Slimming & Cleaning Pipeline

1. Data Engineering & "Slimming"

The Challenge: The raw CSV contained 145 columns and occupied 2.4GB, causing memory overflows and slow processing.
The Solution: Implemented a "Data Slimming" strategy using Python, filtering for 10 mission-critical financial attributes.
Result: Reduced memory footprint by 90%+, improving data loading speed from 3 minutes to 5 seconds.

2. Financial Logic Cleaning

Feature Engineering: Converted int_rate from string (e.g., "13.56%") to numeric floats for mathematical modeling.
Data Validation: Performed consistency checks post-slimming to ensure the aggregated metrics (mean interest rates, loan volumes) matched the original dataset with 100% accuracy.
Default Definition: Synthesized a binary is_bad indicator to capture Non-Performing Loans (NPL), including "Charged Off", "Default", and "Late (31-120 days)".

📊 Key Insights & Visualization

1. Risk-Return Divergence Analysis

Figure 1: Interest Rate vs. Default Rate: The Widening Risk Gap

Why does the portfolio lose money in high-grade segments?