Intro
Important Concepts
- ML Experiment:
- We are not talking about A/B Testing, but the process of building an ML model.
 
 
- Experiment Run:
- Each trial in an ML Experiment (including the models’ versions, types, and hyperparameters).
 
 
- Run Artifact:
- Any file associated with an ML run.
 
 
- Experiment Metadata
- All the information that is related to the experiment.
 
 
Experiment Tracking
- It’s the process of keeping track of all the relevant information from an ML experiment.
 
- It’s so important because of:
- Reproducibility
 
- Organization
 
- Optimization
 
 
- Tracking experiments in spreadsheets may be good, but it’s not enough because:
- Error Prone
 
- No Standard Format
 
- Visibility & Collaboration
 
 
MLflow
Intro
- It’s the tool that we’ll use instead for the experiment tracking.
 
- It’s an open-source platform for the ML lifecycle.
 
- It contains 4 main modules:
- Tracking
 
- Models
 
- Model Registry
 
- Projects
 
 
- The MLflow Tracking module allows you to keep track of:
- Parameters
- Includes any data relevant to the model experiment:
- Hyperparameters
 
- data used
 
 
 
- Metrics
 
- Metadata
 
- Artifacts
 
- Models
- You may ignore tracking the models as you track the hyperparameters for each trial.
 
 
 
- It also logs extra information for each run:
- Source Code (File name that was run)
 
- Code Version
 
- Start & End Time
 
- Name of the author
 
 
- To use the 
model registry feature in  MLflow, we’ll need to connect it to an RDBMS:
- PostgreSQL
 
- MySQL
 
- SQLite
 
- MSSQL Server
 
 
Installation
- Prefer to create a 
requirements.txt file including:
- mlflow
 
- jupyter
 
- scikit-learn
 
- pandas
 
- seaborn
 
- hyperopt
 
- xgboost