Tech Stack: Python (Pandas), Data Engineering, Financial Modelling, Tableau
Dataset: 2.26 Million Loan Records (Lending Club)
This project evaluates the risk-return profile of a $30B+ loan portfolio. By processing 2.4GB of raw data, I built a data pipeline to analyze the correlation between credit grades and actual financial performance. Key Discovery: Identified a critical pricing failure in high-risk segments (Grades C-G), where the Default Rate significantly outpaced the Interest Premium, resulting in an expected loss of up to 12.5% per dollar for the riskiest tier.
https://github.com/ZIXUANZHAO1998/Credit-Risk-Profitability-Study
int_rate from string (e.g., "13.56%") to numeric floats for mathematical modeling.is_bad indicator to capture Non-Performing Loans (NPL), including "Charged Off", "Default", and "Late (31-120 days)".
Figure 1: Interest Rate vs. Default Rate: The Widening Risk Gap
Why does the portfolio lose money in high-grade segments?