TFDV is used for detecting data anomalies and schema anomalies in the data. It is a part of the TensorFlow Extended (TFX) platform and provides libraries for data validation and schema validation for large datasets in an ML pipeline. The key TFX libraries are TensorFlow Data Validation, TensorFlow Transform, used for data preprocessing and feature engineering, TensorFlow Model Analysis for ML model evaluation and analysis, and TensorFlow Serving for serving ML models.

Review Questions

  1. D ✅
  2. B ❌ A
    1. training on hospital name → data leakage
  3. A ✅
  4. B ✅
  5. B ✅
  6. A ❌ D
    1. transform data before splitting for test / train will avoid data leakage
    2. But option A is also very important. Problem with question
  7. C ✅
  8. A, C, D ❌ A, B, D
    1. All except option C are reasons for data leakage