After having a sight on regressions (simple linear regression & Ridge regression), the next step will be binary classification.

It’s typically one of the big use of supervised ML and tabular data.

It exists a great number of classifiers. The most simple being the Logistic regression.

In the chap 1 we have seen the use case of spam detection which is a classification problem.

And in this chapter this will be a Churn prediction project.

We will use a Kaggle dataset again:

Telco Customer Churn

1) What’s the Churn?

We take an example of a telecom company.

It wants to know which customer plans to leave the company. That means stopping using the services of the company to leave in another by example.

The company have informations about customers and from these infos,is going to assign a score for every customer & identify those who are the most likely to leave elsewhere.

From this information, it will take a decision about an offer to make to the customer according to the assigned score made by model.

2) How to approach the problem?

To approach this, we have the binary classification which can answer to this kind of use case. Why is it fitting? Because the problem has a binary approach: the customer wants to churn or no…

And as we have seen before with car price prediction project, how can we measure this question with mathematical terms?

How to be the closest possible and approximate y: YES or NO…

Telco company can create a X matrix with features about customers & queries its historical data for knowing those who leaves and what are the patterns that we can find in it .