How To Get Better Machine Learning Performance
Overview
This cheat sheet is designed to give you ideas to lift performance on your machine learning problem.
All it takes is one good idea to get a breakthrough.
Find that one idea, then come back and find another.
I have divided the list into 4 sub-topics:
- Improve Performance With Data.
- Improve Performance With Algorithms.
- Improve Performance With Algorithm Tuning.
- Improve Performance With Ensembles.
The gains often get smaller the further you go down the list.
For example, a new framing of your problem or more data is often going to give you more payoff than tuning the parameters of your best performing algorithm. Not always, but in general.
1. Improve Performance With Data
You can get big wins with changes to your training data and problem definition. Perhaps even the biggest wins.
Strategy: Create new and different perspectives on your data in order to best expose the structure of the underlying problem to the learning algorithms.
Data Tactics
- Get More Data. Can you get more or better quality data? Modern nonlinear machine learning techniques like deep learning continue to improve in performance with more data.
- Invent More Data. If you can’t get more data, can you generate new data? Perhaps you can augment or permute existing data or use a probabilistic model to generate new data.
- Clean Your Data. Can you improve the signal in your data? Perhaps there are missing or corrupt observations that can be fixed or removed, or outlier values outside of reasonable ranges that can be fixed or removed in order to lift the quality of your data.
- Resample Data. Can you resample data to change the size or distribution? Perhaps you can use a much smaller sample of data for your experiments to speed things up or over-sample or under-sample observations of a specific type to better represent them in your dataset.