Ryan Chang changrya 1004103748 | Kevin Covelli covellik 1004359758

Part A

Q1

a)

Validation Accuracy

Choose K* = 11

Final test accuracy with chosen K*=11 : 0.684

c)

Validation Accuracy

Choose K* = 21

Final test accuracy with chosen K*=21 : 0.6816257408975445

Underlying assumption: if question A has the same same correct or incorrect answers on other user as question B, A's correctness for a given user matches that of question B.

d)

for smaller ks impute by user is better on the other hand impute by question is better on larger ks. But neither is much better than the other.

e)

limitation 1)

For both knn_impute_by_item and knn_impute_by_user The dimensionality of each sample is quite large, (542, 1774) respectively. Since in high dimensions most points are all approximately vary far, so the neighbors the knn algorithm finds may actually not be the best neighbors. This can thus limit our algorithms prediction accuracy.

limitation 2)