Linear regression

Classifying labels of unseen data - :

build a model
model learns from the labelled data awe pass to it
pass unlabelled data to the model as input
model predicts the labels of the unseen data

labelled data = training data

Measuring Model Performance -

In classification, accuracy is a commonly used metric

Accuracy : Correct Predictions / Total Observations

Splitting data -

Training Set
- Fit/train Classifier on training set
Test Set
- - Calculate accuracy using the test set

Train/test Split -

from sklearn.model_selection
X_train, X_test, y_train, y_test = train_test_split(X,y, test_siz = 0.3,
																												random_state = 21, stratifiy=y)
																												
knn = KNeighborsClassifier(n_neighbors = 6)
knn.fit(X_train, y_train)

print(knn.score(X_test,y_test))

Model Complexity and over/underfitting

train_accuracies = {}
test_accuracies = {}
neighbors = np.arange(1,26)

for neighbor in neighbors :
		knn = KNeighborsClassifier(n_neighbors = neighbor)
		knn.fit(X_train,y_train)
		train_accuracies[neighbor] = knn.score(X_train, y_train)
		test_accuracies[neighbor] = knn.score(X_test, y_test)

#plot Results
plt.figure(figsize=(8,6))
plt.title("Title")
plt.plot(neighbors,train_accuracies.values(), label = "Training Accuracy")
plt.plot(neighbors,train_accuracies.values(), label = "Training Accuracy")
plt.legend()
plt.xlabel("")
plt.ylabel("Accuracy")
plt.show()