diff --git a/ProgrammingAssignment1.ipynb b/ProgrammingAssignment1.ipynb index 437e0e7386261858a7b2fe0074cedb4f4b31430e..a1aaf8c605994f69b41820141252c6e2dedf67de 100644 --- a/ProgrammingAssignment1.ipynb +++ b/ProgrammingAssignment1.ipynb @@ -89,7 +89,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can start implementing our $k$-NN classifier. $k$-NN class inherits Model class. You'll need to implement \"fit\" and \"predict\" methods. Use the \"distance\" function you defined above. \"fit\" method takes $k$ as an argument. \"predict\" takes as input an $mxd$ array containing $d$-dimensional $m$ feature vectors for examples and outputs the predicted class and the proportion of predicted class labels in $k$ nearest neighbors." + "We can start implementing our $k$-NN classifier. $k$-NN class inherits Model class. You'll need to implement \"fit\" and \"predict\" methods. Use the \"distance\" function you defined above. \"fit\" method takes $k$ as an argument. \"predict\" takes as input an $mxd$ array containing $d$-dimensional $m$ feature vectors for examples and outputs the predicted class and the ratio of positive examples in $k$ nearest neighbors." ] }, { @@ -211,6 +211,7 @@ "metadata": {}, "outputs": [], "source": [ + "# try sizes 0, 100, 200, 300, ..., up to the largest multiple of 100 >= train_size\n", "training_sizes = np.xrange(0, my_model.train_size + 1, 100)\n", "\n", "# Calculate error for each entry in training_sizes\n", @@ -238,8 +239,7 @@ "metadata": {}, "outputs": [], "source": [ - "# You should see array([ 196, 106, 193, 105]) with seed 123\n", - "conf_matrix(my_model.labels[my_model.test_indices], final_labels, threshold= 0.5)" + "conf_matrix(my_model.labels[my_model.test_indices], final_labels, threshold = 0.5)" ] }, { @@ -269,35 +269,14 @@ "metadata": {}, "source": [ "### ROC curve and confusion matrix for the final model\n", - "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement a \"ROC\" function that predicts the labels of the test set examples using different $threshold$ values in \"predict\" and plot the ROC curve. \"ROC\" takes a list containing different $threshold$ parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def ROC(model, indices, value_list):\n", - " '''\n", - " model: a fitted k-NN model\n", - " indices: for data points to predict\n", - " value_list: array containing different threshold values\n", - " Calculate sensitivity and 1-specificity for each point in value_list\n", - " Return two nX1 arrays: sens (for sensitivities) and spec_ (for 1-specificities)\n", - " '''\n", - " \n", - " # use predict_batch to obtain predicted labels at different threshold values\n", - " raise NotImplementedError\n", - " \n", - " return sens, spec_" + "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement, in \"model.ipynb\", a \"ROC\" function that predicts the labels of the test set examples using different $threshold$ values in \"predict\" and plot the ROC curve. \"ROC\" takes a list containing different $threshold$ parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can finally create the confusion matrix and plot the ROC curve for our optimal $k$-NN classifier." + "We can finally create the confusion matrix and plot the ROC curve for our optimal $k$-NN classifier. (Use the $k$ value you found above.)" ] }, { diff --git a/model.ipynb b/model.ipynb index 34e04c4aa09e32b0b8608fe4745c46f0c7ba0287..94618fe6006d89f2379d0484c4ea64eb82020509 100644 --- a/model.ipynb +++ b/model.ipynb @@ -116,7 +116,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## General supervised learning related functions" + "## General supervised learning related functions \n", + "### (To be implemented later when it is indicated in other notebooks)" ] }, { @@ -128,7 +129,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -146,6 +147,38 @@ " # returns the confusion matrix as numpy.ndarray\n", " return np.array([tp,tn, fp, fn])" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement a \"ROC\" function that predicts the labels of the test set examples using different $threshold$ values in \"predict\" and plot the ROC curve. \"ROC\" takes a list containing different $threshold$ parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def ROC(model, indices, value_list):\n", + " '''\n", + " model: a fitted supervised learning model\n", + " indices: for data points to predict\n", + " value_list: array containing different threshold values\n", + " Calculate sensitivity and 1-specificity for each point in value_list\n", + " Return two nX1 arrays: sens (for sensitivities) and spec_ (for 1-specificities)\n", + " '''\n", + " \n", + " # use predict method to obtain predicted labels at different threshold values\n", + " # use conf_matrix to calculate tp, tn, fp, fn\n", + " # calculate sensitivity, 1-specificity\n", + " # return two arrays\n", + " \n", + " raise NotImplementedError\n", + " \n", + " return sens, spec_" + ] } ], "metadata": {