By predicting diabetes status (0 = non-diabetic, 1 = diabetic) from a combination of demographic, lifestyle, and health-related features, we can identify individuals at higher risk earlier and enable proactive healthcare interventions. The dataset includes variables such as age, gender, BMI, hypertension, heart disease, and smoking history, alongside biomarkers like HbA1c and blood glucose levels.

The target of this project is the binary variable diabetes, indicating whether a person is diagnosed with diabetes or not.

This project supports SDG 3 (Good Health and Well-being) by promoting early detection and personalised prevention strategies, and SDG 10 (Reduced Inequalities) by showing how data-driven approaches can help make healthcare more accessible and equitable across populations.

Impact: By modelling patterns of diabetes risk, healthcare providers can improve patient outcomes through targeted interventions, encourage lifestyle changes before severe complications occur, and reduce the long-term economic and societal burden of diabetes.

GitHub repository

Dataset info & index

Stakeholder report | Machine Learning predicting diabetes risk from demographic and health factors

Milestone statements:

Impact Statement

Alignment Statement

Data quality risk reflection