Intro
Important Concepts
- ML Experiment:
- We are not talking about A/B Testing, but the process of building an ML model.
- Experiment Run:
- Each trial in an ML Experiment (including the models’ versions, types, and hyperparameters).
- Run Artifact:
- Any file associated with an ML run.
- Experiment Metadata
- All the information that is related to the experiment.
Experiment Tracking
- It’s the process of keeping track of all the relevant information from an ML experiment.
- It’s so important because of:
- Reproducibility
- Organization
- Optimization
- Tracking experiments in spreadsheets may be good, but it’s not enough because:
- Error Prone
- No Standard Format
- Visibility & Collaboration
MLflow
Intro
- It’s the tool that we’ll use instead for the experiment tracking.
- It’s an open-source platform for the ML lifecycle.
- It contains 4 main modules:
- Tracking
- Models
- Model Registry
- Projects
- The MLflow Tracking module allows you to keep track of:
- Parameters
- Includes any data relevant to the model experiment:
- Hyperparameters
- data used
- Metrics
- Metadata
- Artifacts
- Models
- You may ignore tracking the models as you track the hyperparameters for each trial.
- It also logs extra information for each run:
- Source Code (File name that was run)
- Code Version
- Start & End Time
- Name of the author
- To use the
model registry
feature in MLflow, we’ll need to connect it to an RDBMS:
- PostgreSQL
- MySQL
- SQLite
- MSSQL Server
Installation
- Prefer to create a
requirements.txt
file including:
- mlflow
- jupyter
- scikit-learn
- pandas
- seaborn
- hyperopt
- xgboost