2024 Data Scientist Interview Questions & Answers

Machine Learning

What is the bias-variance tradeoff?
How is KNN different from k-means?
How would you implement the k-means algorithm?
How do you choose the k in k-means clustering?
What are the pros and cons of the k-means algorithm?
What does an ROC curve show?
What is the difference between a type 1 and type 2 error?
Define precision and recall.
What is k-fold cross validation?
Explain what a false positive and a false negative are. Provide examples when false positives are more important than false negatives, false negatives are more important than false positives.
When would you use random forests vs. SVM and why?
Why is dimension reduction important?
What is principal component analysis? Explain the sort of problems you would use PCA for.
What are the assumptions required for linear regression? What if some of these assumptions are violated?
What are some of the steps for data wrangling and data cleaning before applying machine learning algorithms?
What is multicollinearity and how do we deal with it?
You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 98%. Is this model ready to be used in production?
How would you evaluate an algorithm on unbalanced data?