📌 Project Objective

The goal of this project is to analyze and predict California housing prices using the California Housing Dataset. The objective is to understand the key factors affecting house prices and build a predictive model with solid performance.


📊 Exploratory Data Analysis (EDA)

Dataset Summary

# Column Non-Null Count Dtype Note
0 longitude 20,640 float64
1 latitude 20,640 float64
2 housing_median_age 20,640 float64
3 total_rooms 20,640 float64
4 total_bedrooms 20,433 float64 Non-Null values ~= 20433 which indicates having ~= 277 missing values
5 population 20,640 float64
6 households 20,640 float64
7 median_income 20,640 float64
8 median_house_value 20,640 float64
9 ocean_proximity 20,640 object Categorical Feature

Histogram Analysis

Histogram Plots.png

Train-Test Splitting Strategy

To ensure the model is trained and tested on data that reflects the income distribution of the entire dataset, we applied stratified sampling based on median_income.

This helps maintain representative proportions across income categories in both training and testing sets, leading to more robust evaluation and better generalization.