World Life Expectancy Analysis (MySQL)

📌 Project Overview

This project focuses on cleaning and analyzing global life expectancy data using MySQL. The dataset contains health, economic, and demographic indicators across countries and years, with several real-world data quality issues such as missing values, inconsistencies, and duplicates.

The goal of this project is to:

Prepare a reliable, analysis-ready dataset through structured SQL data cleaning.

Extract meaningful insights related to life expectancy, economic factors, and health indicators.

All logic is clearly documented using in-line SQL comments, making the project easy to follow and reproducible.

🧱 Dataset Description

The dataset includes country-level data across multiple years with the following key attributes:

Demographics: Country, Year, Status (Developed / Developing)

Health Indicators: Life Expectancy, Adult Mortality, Infant Deaths, HIV/AIDS, BMI

Immunization: Polio, Diphtheria

Economic Factors: GDP, Percentage Expenditure

Education & Nutrition: Schooling, Thinness (1–19 & 5–9 years)

Example records include countries such as Afghanistan, Albania, Algeria, Australia, Argentina, and many more, spanning multiple years.

🧹 Part 1: Data Cleaning (SQL)

The cleaning phase focuses on making the dataset accurate, consistent, and trustworthy for analysis.

Key Cleaning Steps:

Checking for duplicates using count and concat functions, if found we need to delete them and update it to the database:

547532074-965c2660-8fa9-4799-858f-fe1fdcad69ed (1).png