<aside> ⚠️ This note serves as a reminder of the book's content, including additional research on the mentioned topics. It is not a substitute for the book. Most images are sourced from the book or referenced.

</aside>

<aside> 🚨 I've noticed that taking notes on this site while reading the book significantly extends the time it takes to finish the book. I've stopped noting everything, as in previous chapters, and instead continue reading by highlighting/hand-writing notes instead. I plan to return to the detailed style when I have more time.

</aside>

<aside> ✊ This book contains 1007 pages of readable content. If you read at a pace of 10 pages per day, it will take you approximately 3.3 months (without missing a day) to finish it. If you aim to complete it in 2 months, you'll need to read at least 17 pages per day.

</aside>

Information

List of notes for this book


<aside> πŸ“” Jupyter notebook for this chapter: on Github, on Colab, on Kaggle.

</aside>

Main steps

In this chapter you will work through an example project end to end.

Working with Real Data

In this chapter, we use California Housing Prices dataset (or download it from the author’s repository).

Fig 2-1. California housing prices

Fig 2-1. California housing prices

This data includes metrics as the population, median income, median housing price for each block group (called β€œdistrict” for short).

Look at the Big Picture

Your model should learn from this data β†’ predict the median housing price in any district.

<aside> ☝ You should pull out this ML project checklist (Appendix A in the book) for each project.

</aside>

Frame the Problem

Ask questions to find the methods.

<aside> ☝ Pipeline = a sequence of data processing components is called a data pipeline*.*** Each component is handled by a team. The whole process is robust.

</aside>