1. Data Loading and Column Selection
- 1.1 Load the data
- 1.2 Clean Column Names
- 1.3 Select relevant columns
2. Data Cleaning and Formatting
- 2.1 Identify missing values
- 2.2 Drop missing values
- 2.3 Data type conversions
- 2.4 Convert
qualified_candidates and candidate_hired to numeric
- 2.5 Convert
difficulty_level to numeric
- 2.6 Double checking the data types
3. Feature Engineering
- 3.1 Budget allocation percentages
- 3.2 Cost per application (CPA) per channel
- 3.3 Why CPA was just used for visualization later on
4. Exploratory Data Patterns
- 4.1 Cross-validation with StratifiedKFold
- 4.1 PCA 2D Visualization
- 4.2 K-Means Clustering
- 4.3 Cluster vs. Outcome Analysis