π List of all notes for this book. IMPORTANT UPDATE November 18, 2024: I've stopped taking detailed notes from the book and now only highlight and annotate directly in the PDF files/book. With so many books to read, I don't have time to type everything. In the future, if I make notes while reading a book, they'll contain only the most notable points (for me).
<aside> π Jupyter notebook for this chapter: on Github, on Colab, on Kaggle.
</aside>
In this chapter you will work through an example project end to end.
In this chapter, we use California Housing Prices dataset (or download it from the authorβs repository).
Fig 2-1. California housing prices
This data includes metrics as the population, median income, median housing price for each block group (called βdistrictβ for short).
Your model should learn from this data β predict the median housing price in any district.
<aside> β You should pull out this ML project checklist (Appendix A in the book) for each project.
</aside>
Ask questions to find the methods.
Question: What exactly the business objective is? (find a model isnβt a final goal) β Business objective: Whether itβs worth to invest in a given area?
Fig 2-2. A machine learning pipeline for real estate investments
Question: What the current solution looks like (if any)? β a ref for performance β currently estimated manually by experts. β Their estimates were off by more than 30%.
<aside> β Pipeline = a sequence of data processing components is called a data pipeline*.*** Each component is handled by a team. The whole process is robust.
</aside>
Question: What kind of training supervision the model (supervised, unsupervised, semi-supervised, self-supervised of reinforcement)? Classification / Regression / ? Use batch learning / online learning?
<aside> β If data were huge β split batch learning across multiple servers (use MapReduce technique) or online learning.
</aside>