From 88f10a757294d9c34a0546e96f1cf343bb66cfe0 Mon Sep 17 00:00:00 2001 From: Zeynep Hakguder <zhakguder@cse.unl.edu> Date: Fri, 1 Jun 2018 15:20:07 -0500 Subject: [PATCH] clean --- .../ProgrammingAssignment1-Solution.ipynb | 279 --------- .../ProgrammingAssignment1_solution.ipynb | 581 ------------------ ProgrammingAssignment_1/model_solution.ipynb | 274 --------- 3 files changed, 1134 deletions(-) delete mode 100644 ProgrammingAssignment_1/ProgrammingAssignment1-Solution.ipynb delete mode 100644 ProgrammingAssignment_1/ProgrammingAssignment1_solution.ipynb delete mode 100644 ProgrammingAssignment_1/model_solution.ipynb diff --git a/ProgrammingAssignment_1/ProgrammingAssignment1-Solution.ipynb b/ProgrammingAssignment_1/ProgrammingAssignment1-Solution.ipynb deleted file mode 100644 index c279662..0000000 --- a/ProgrammingAssignment_1/ProgrammingAssignment1-Solution.ipynb +++ /dev/null @@ -1,279 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# k-Nearest Neighbor" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use numpy for array operations and matplpotlib for plotting for this assignment. Please do not add other libraries." - ] - }, - { - "cell_type": "code", - "execution_count": 247, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Following code makes the Model class and relevant functions available from model.ipynb." - ] - }, - { - "cell_type": "code", - "execution_count": 256, - "metadata": {}, - "outputs": [], - "source": [ - "%run 'model-Solution.ipynb'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Choice of distance metric plays an important role in the performance of kNN. Let's start by implementing a distance method in the \"distance\" function below. It should take two data points and the name of the metric and return a scalar value." - ] - }, - { - "cell_type": "code", - "execution_count": 257, - "metadata": {}, - "outputs": [], - "source": [ - "def distance(x, y, metric):\n", - " '''\n", - " x: a 1xd array\n", - " y: a 1xd array\n", - " metric: Euclidean, Hamming, etc.\n", - " '''\n", - " #raise NotImplementedError\n", - " \n", - " if metric == 'Euclidean':\n", - " dist = np.sqrt(np.sum(np.square((x-y))))\n", - " \n", - " ####################################\n", - " return dist # scalar distance btw x and y" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can implement our kNN classifier. kNN class inherits Model class. Implement \"fit\" and \"predict\" methods. Use the \"distance\" function you defined above. \"fit\" method takes $k$ as an argument. \"predict\" takes as input the feature vector for a single test point and outputs the predicted class, and the proportion of predicted class labels in $k$ nearest neighbors." - ] - }, - { - "cell_type": "code", - "execution_count": 283, - "metadata": {}, - "outputs": [], - "source": [ - "class kNN(Model):\n", - "\n", - " def fit(self, k, distance_f, **kwargs):\n", - " \n", - " #raise NotImplementedError\n", - " \n", - " self.k = k\n", - " self.distance_f = distance_f\n", - " self.distance_metric = kwargs['metric']\n", - " \n", - " \n", - " #######################\n", - " return\n", - " # vary the threshold value for ROC analysis\n", - " def predict(self, test_points):\n", - " \n", - " chosen_labels = []\n", - " for test_point in self.features[test_indices]:\n", - " #raise NotImplementedError\n", - " tmp_dist = [np.inf] * self.k\n", - " distances = []\n", - "\n", - " labels = []\n", - " for index in self.training_indices:\n", - " dist = self.distance_f(self.features[index], test_point, self.distance_metric)\n", - " distances.append(dist)\n", - " labels.append(self.labels[index])\n", - " a_order = np.argsort(distances)\n", - " tmp_labels = list(np.array(labels)[a_order[::-1]][:self.k])\n", - " b = tmp_labels.count(1)\n", - " chosen_labels.append(b/self.k)\n", - " \n", - " ##########################\n", - " # return the predicted class label and the following ratio: \n", - " # number of points that have the same label as the test point / k\n", - " return np.array(chosen_labels)\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It's time to build and evaluate our model now. Remember you need to provide values to $p$, $v$ parameters for \"partition\" function and to $file\\_path$ for \"preprocess\" function." - ] - }, - { - "cell_type": "code", - "execution_count": 284, - "metadata": {}, - "outputs": [], - "source": [ - "# populate the keyword arguments dictionary kwargs\n", - "kwargs = {'p': 0.3, 'v': 0.1, 'seed': 123, 'file_path': 'madelon_train'}\n", - "# initialize the model\n", - "my_model = kNN(preprocessor_f=preprocess, partition_f=partition, **kwargs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Assign a value to $k$ and fit the kNN model. You do not need to change the value of the $threshold$ parameter yet." - ] - }, - { - "cell_type": "code", - "execution_count": 285, - "metadata": {}, - "outputs": [], - "source": [ - "kwargs_f = {'metric': 'Euclidean'}\n", - "my_model.fit(k = 10, distance_f=distance, **kwargs_f)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Evaluate your model on the test data and report your accuracy. Also, calculate and report the confidence interval on the generalization error estimate." - ] - }, - { - "cell_type": "code", - "execution_count": 286, - "metadata": {}, - "outputs": [], - "source": [ - "final_labels = my_model.predict(my_model.test_indices)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that we have the true labels and the predicted ones from our model, we can build a confusion matrix and see how accurate our model is. Implement the \"conf_matrix\" function that takes as input an array of true labels ($true$) and an array of predicted labels ($pred$). It should output a numpy.ndarray. " - ] - }, - { - "cell_type": "code", - "execution_count": 289, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([196, 106, 193, 105])" - ] - }, - "execution_count": 289, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# You should see array([ 196, 106, 193, 105]) with seed 123\n", - "conf_matrix(my_model.labels[my_model.test_indices], final_labels, threshold= 0.5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement a \"ROC\" function that predicts the labels of the test set examples using different $threshold$ values in \"fit\" and plot the ROC curve. \"ROC\" takes a list containing different $threshold$ parameter values to try and returns (sensitivity, 1-specificity) pair for each $parameter$ value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def ROC(true, pred, value_list):\n", - " '''\n", - " true: nx1 array of true labels for test set\n", - " pred: nx1 array of predicted labels for test set\n", - " Calculate sensitivity and 1-specificity for each point in value_list\n", - " Return two nX1 arrays: sens (for sensitivities) and spec_ (for 1-specificities)\n", - " '''\n", - " \n", - " \n", - " return sens, spec_" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can finally create the confusion matrix and plot the ROC curve for our kNN classifier." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# confusion matrix\n", - "conf_matrix(true_classes, predicted_classes)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# ROC curve\n", - "roc_sens, roc_spec_ = ROC(true_classes, predicted_classes, np.arange(0.1, 1.0, 0.1))\n", - "plt.plot(roc_sens, roc_spec_)\n", - "plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/ProgrammingAssignment_1/ProgrammingAssignment1_solution.ipynb b/ProgrammingAssignment_1/ProgrammingAssignment1_solution.ipynb deleted file mode 100644 index 02836ab..0000000 --- a/ProgrammingAssignment_1/ProgrammingAssignment1_solution.ipynb +++ /dev/null @@ -1,581 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# *k*-Nearest Neighbor\n", - "\n", - "We'll implement *k*-Nearest Neighbor (*k*-NN) algorithm for this assignment. We recommend using [Madelon](https://archive.ics.uci.edu/ml/datasets/Madelon) dataset, although it is not mandatory. If you choose to use a different dataset, it should meet the following criteria:\n", - "* dependent variable should be binary (suited for binary classification)\n", - "* number of features (attributes) should be at least 50\n", - "* number of examples (instances) should be between 1,000 - 5,000\n", - "\n", - "A skeleton of a general supervised learning model is provided in \"model.ipynb\". The functions that will be implemented there will be indicated in this notebook. \n", - "\n", - "### Assignment Goals:\n", - "In this assignment, we will:\n", - "* we'll implement 'Euclidean' and 'Manhattan' distance metrics \n", - "* use the validation dataset to find a good value for *k*\n", - "* evaluate our model with respect to performance measures:\n", - " * accuracy, generalization error and ROC curve\n", - "* try to assess if *k*-NN is suitable for the dataset you used\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# GRADING\n", - "\n", - "You will be graded on parts that are marked with **\\#TODO** comments. Read the comments in the code to make sure you don't miss any.\n", - "\n", - "### Mandatory for 478 & 878:\n", - "\n", - "| | Tasks | 478 | 878 |\n", - "|---|----------------------------|-----|-----|\n", - "| 1 | Implement `distance` | 10 | 10 |\n", - "| 2 | Implement `k-NN` methods | 25 | 20 |\n", - "| 3 | Model evaluation | 25 | 20 |\n", - "| 4 | Learning curve | 20 | 20 |\n", - "| 6 | ROC curve analysis | 20 | 20 |\n", - "\n", - "### Mandatory for 878, bonus for 478\n", - "\n", - "| | Tasks | 478 | 878 |\n", - "|---|----------------|-----|-----|\n", - "| 5 | Optimizing *k* | 10 | 10 |\n", - "\n", - "### Bonus for 478/878\n", - "\n", - "| | Tasks | 478 | 878 |\n", - "|---|----------------|-----|-----|\n", - "| 7 | Assess suitability of *k*-NN | 10 | 10 |\n", - "\n", - "Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There a total of 100 points in this assignment and extra 20 bonus points for 478 students and 10 bonus points for 878 students." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries." - ] - }, - { - "cell_type": "code", - "execution_count": 119, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Following code makes the Model class and relevant functions available from model.ipynb." - ] - }, - { - "cell_type": "code", - "execution_count": 134, - "metadata": {}, - "outputs": [], - "source": [ - "%run 'model_solution.ipynb'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 1: Implement `distance` function" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Choice of distance metric plays an important role in the performance of *k*-NN. Let's start with implementing a distance method in the \"distance\" function in **model.ipynb**. It should take two data points and the name of the metric and return a scalar value." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Rubric:\n", - "* Euclidean +5, +5\n", - "* Manhattan +5, +5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test `distance`" - ] - }, - { - "cell_type": "code", - "execution_count": 136, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Euclidean distance: 1000.0, Manhattan distance: 10000\n" - ] - } - ], - "source": [ - "x = np.array(range(100))\n", - "y = np.array(range(100, 200))\n", - "dist_euclidean = distance(x, y, 'Euclidean')\n", - "dist_manhattan = distance(x, y, 'Manhattan')\n", - "print('Euclidean distance: {}, Manhattan distance: {}'.format(dist_euclidean, dist_manhattan))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 2: Implement $k$-NN Class Methods" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can start implementing our *k*-NN classifier. *k*-NN class inherits Model class. Use the \"distance\" function you defined above. \"fit\" method takes *k* as an argument. \"predict\" takes as input an *mxd* array containing *d*-dimensional *m* feature vectors for examples and outputs the predicted class and the ratio of positive examples in *k* nearest neighbors." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Rubric:\n", - "* correct implementation of fit method +5, +5\n", - "* correct implementation of predict method +20, +15" - ] - }, - { - "cell_type": "code", - "execution_count": 137, - "metadata": {}, - "outputs": [], - "source": [ - "class kNN(Model):\n", - " '''\n", - " Inherits Model class. Implements the k-NN algorithm for classification.\n", - " '''\n", - " \n", - " def fit(self, training_features, training_labels, k, distance_f, **kwargs):\n", - " '''\n", - " Fit the model. This is pretty straightforward for k-NN.\n", - " Args:\n", - " training_features: ndarray\n", - " training_labels: ndarray\n", - " k: int\n", - " distance_f: function\n", - " kwargs: dict\n", - " Contains keyword arguments that will be passed to distance_f\n", - " '''\n", - " # TODO\n", - " # set self.train_features, self.train_labels,self.k, self.distance_f, self.distance_metric\n", - " self.train_features = training_features \n", - " self.train_labels = training_labels\n", - " self.k = k\n", - " self.distance_f = distance_f\n", - " self.distance_metric = kwargs['metric']\n", - " \n", - " return\n", - " \n", - " \n", - " def predict(self, test_features):\n", - " \n", - " test_size = len(test_features)\n", - " train_size = len(self.train_labels)\n", - " \n", - " pred = np.empty(len(test_features))\n", - " # TODO\n", - " # for each point in test points\n", - " for idx in range(test_size):\n", - " point = test_features[idx] \n", - " distances = []\n", - " labels = []\n", - " \n", - "\n", - " for tr_idx in range(train_size):\n", - " train_example = self.train_features[tr_idx]\n", - " train_label = self.train_labels[tr_idx]\n", - " dist = self.distance_f(point, train_example, metric = self.distance_metric)\n", - " distances.append(dist)\n", - " labels.append(train_label)\n", - " \n", - " # get the order of distances\n", - " dist_order = np.argsort(distances)\n", - " # get the labels of k points that are closest to test point\n", - " k_labels = list(np.array(labels)[dist_order[::-1]][:self.k])\n", - " \n", - " # get number of positive labels in k neighbours\n", - " b = k_labels.count(1)\n", - " \n", - " pred[idx] = b/self.k\n", - " \n", - " return pred\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 3: Build and Evaluate the Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Rubric:\n", - "* Reasonable accuracy values +10, +5\n", - "* Reasonable confidence intervals on the error estimate +10, +10\n", - "* Reasonable confusion matrix +5, +5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Preprocess the data files and partition the data." - ] - }, - { - "cell_type": "code", - "execution_count": 138, - "metadata": {}, - "outputs": [], - "source": [ - "# initialize the model\n", - "my_model = kNN()\n", - "# obtain features and labels from files\n", - "features, labels = preprocess('../data/madelon.data', '../data/madelon.labels')\n", - "# partition the data set\n", - "val_indices, test_indices, train_indices = partition(features.shape[0], t = 0.3, v = 0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Assign a value to *k* and fit the *k*-NN model." - ] - }, - { - "cell_type": "code", - "execution_count": 139, - "metadata": {}, - "outputs": [], - "source": [ - "# pass the training features and labels to the fit method\n", - "kwargs_f = {'metric': 'Euclidean'}\n", - "my_model.fit(features[train_indices], labels[train_indices], k=10, distance_f=distance, **kwargs_f)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Computing the confusion matrix for *k* = 10\n", - "Now that we have the true labels and the predicted ones from our model, we can build a confusion matrix and see how accurate our model is. Implement the \"conf_matrix\" function (in model.ipynb) that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray. You do not need to change the value of the threshold parameter yet." - ] - }, - { - "cell_type": "code", - "execution_count": 140, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO\n", - "\n", - "# get model predictions\n", - "pred_ratios = my_model.predict(features[test_indices])\n", - "# For now, we will consider a data point as predicted in the positive class if more than 0.5 \n", - "# of its k-neighbors are positive.\n", - "threshold = 0.5\n", - "# convert predicted ratios to predicted labels\n", - "pred_labels = [1 if x >= threshold else 0 for x in pred_ratios]\n", - "tp,tn, fp, fn = conf_matrix(labels[test_indices], pred_labels)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Evaluate your model on the test data and report your **accuracy**. Also, calculate and report the 95% confidence interval on the generalization **error** estimate." - ] - }, - { - "cell_type": "code", - "execution_count": 142, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy: 0.475\n", - "Confidence interval: 0.49110132745961876-0.5588986725403813\n" - ] - } - ], - "source": [ - "# TODO\n", - "# Calculate and report accuracy and generalization error with confidence interval here. Show your work in this cell.\n", - "accuracy = (tp+tn)/len(test_indices)\n", - "error = 1 - accuracy\n", - "diff = 0.96 * np.sqrt((error * (1 - error)) / len(test_indices))\n", - "lower_bound = error - diff\n", - "upper_bound = error + diff\n", - "print('Accuracy: {}'.format(accuracy))\n", - "print('Confidence interval: {}-{}'.format(lower_bound, upper_bound))\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " ## TASK 4: Plotting a learning curve\n", - " \n", - "A learning curve shows how error changes as the training set size increases. For more information, see [learning curves](https://www.dataquest.io/blog/learning-curves-machine-learning/).\n", - "We'll plot the error values for training and validation data while varying the size of the training set. Report a good size for training set for which there is a good balance between bias and variance." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Rubric:\n", - "* Correct training error calculation for different training set sizes +8, +8\n", - "* Correct validation error calculation for different training set sizes +8, +8\n", - "* Reasonable learning curve +4, +4" - ] - }, - { - "cell_type": "code", - "execution_count": 144, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XlcVWX+wPHPwyaoqAiKu6CQO7iQOmUumQuUW1pqq1ZTWma2zExNM+3za89qsnGqsUVNMwu33Cot09REu6BiKuaGCiLuuLA9vz+eC4GCXODCgXu/79frvuSe89x7v4eD33Pu8zzne5TWGiGEEO7Bw+oAhBBCVB5J+kII4UYk6QshhBuRpC+EEG5Ekr4QQrgRSfpCCOFGJOkLIYQbkaQvhBBuRJK+EEK4ES+rA7hUUFCQDgkJsToMIYSoVjZv3nxMa92gpHZVLumHhIQQFxdndRhCCFGtKKX2O9JOuneEEMKNSNIXQgg3IklfCCHciCR9IYRwI5L0hRDCjUjSF0IINyJJXwgh3EiVm6cvhKhgubkwdy6kpkLz5ubRrBk0agSenlZHJyqYJH0h3MnevXDPPfDDD5ev8/KCJk3+OAjkHRAKPm/YEDykg6A6k6QvhDvQGj74AJ54ApSCjz6C4cMhOdk8Dh7845GcDJs3w4IFcPFi4ffx9jYHgOIOCs2bQ1CQ+QxRJUnSF8LVHTgA994L330HN9wA//sftGhh1gUGQmRk0a/TGo4dK/qgcPAgrF8PX34JWVmFX1ejxpUPCs2bQ0CAHBgsIklfCFelNcyYAY8+avrxp0+H++93PNkqBQ0amEeXLkW3yc2FtLTCB4WCB4Yff4RDhyAnp/DratY0B4LWrWHUKPOoU6d82yscorTWVsdQSFRUlJaCa0KUU3Iy/PnPsHw59O1rkn9oqDWx5OSYQeOiDgq//gpJSeDra7qb7rwTBg404wuiVJRSm7XWUSW1k9+sEK5Ea/jsM3jkEdPt8u9/w4MPWjv46ulpBoibNIEePQqv0xo2bTIxz51rHsHBMHasOQB06SLdQE4mw/BCuIojR2DoUBg3Djp1gvh4mDSpas+2UQq6d4f33oPDh83gca9e8P770K2b2Y5XXzXfDIRTVOG/BiGEQ7SG2bOhQwczWDt1qulLDwuzOrLS8fGBYcNg/nxISTFjEHXrwpNPmoHnG24w3wjOnrU60mpNkr4Q1VlqKowcCXfcAW3bgs0GU6ZU7bN7RwQEwAMPwLp1sHs3PPOMucbg7rtN98+dd8LKlZcPEIsSVfO/DCHc2Lx55ux+6VJ4/XX46Sdo08bqqJwvLAyee84M+K5daxL+kiUwaJD5BvCXv8DWrVZHWW1I0heiuklLg1tvhdGjoVUrMwPmiSdcv4SCUnDttabb58gR0w0UFQVvvw0REdC5M7z5plkniiVJX4jq5Ouvzdn9ggXwf/8HP/8M7dpZHVXl8/U13VoLF5ok/9575qKwJ54w8/8HD4bPP4dz56yOtMqRpC9EdZCeDrfdZhJd8+amTMJTT8l8djBlHx56CDZuhN9+M7+X336D2283/f/jx8OqVeZCMuFY0ldKDVZK7VRKJSmlnixi/TilVJpSymZ/3Gdf3lkptV4ptV0plaCUGu3sDRDC5S1aBB07mpIHL7wAGzaYqYzicm3awEsvwe+/m6Jyo0ebb0f9+0NIiDkgJCZaHaWlSrwiVynlCewCBgDJwCZgrNY6sUCbcUCU1nrSJa+9CtBa691KqSbAZqCd1vpkcZ8nV+QKYXfihJmJ89lnps/6009Nv7UonfPnzYHzs89gxQoz46dbN7jrLhgzxlQOdQGOXpHryJl+dyBJa/271joTmAsMcyQIrfUurfVu+8+HgaNAA0deK4RbW7bMnN3Png3//Ke5alUSftn4+Zkz/m++MXWApk411zY88oi5Svimm8w6N+FI0m8KHCzwPNm+7FIj7V0485VSzS9dqZTqDvgAe8oUaTVx4vwJJi6ZSHxKvNWhiOro1Cm47z6IiTFz1TduNF06Pj5WR+YagoPNt6fNm2HbNjPwa7OZxD91qtXRVQpHkn5RhS8u7RNaDIRorSOA74BPC72BUo2BmcB4rfVloylKqfuVUnFKqbi0tDTHIq+iluxawvTN0+n+UXfe2fAOVa2gnajCvv3W9NV//LHpe9682XRDiIrRoQO88orp/x85Eh57zDx3cY4k/WSg4Jl7M+BwwQZa63Stdd7dFj4E8v9SlVJ1gG+Af2itNxT1AVrrD7TWUVrrqAYNqnfvjy3Fhq+XL4NaD2LKiinc+PmNpJ5NtTosUZWdOQMTJpjqkrVqmTr1//d/ZgqiqHg+PqbQ29ix5mD7/POm+8dFOZL0NwHhSqlQpZQPMAZYVLCB/Uw+z1Bgh325DxALfKa1/tI5IVdttlQbnRp2YuGYhUyLmcbqfauJmB7B8qTlVocmqqJVq8zZfd5drbZsMQXIROXy8oKZM02xuueeg6efdtnEX2LS11pnA5OAFZhkPk9rvV0p9YJSaqi92WT7tMx4YDIwzr78VqA3MK7AdE6XHY3SWmNLsdG5UWeUUjx49YNs+vMmGtZqSPTsaB5d/igXsy+W/EbC9Z09aypg9u9vzjTXrjWlFPz8rI7MfXl6mruK3X8/vPyyOQi7YuLXWlepR7du3XR1deDkAc1z6Gm/TCu0/FzmOf3w0oc1z6Ej/xOpE48mWhShqBJ+/FHrVq20VkrrKVO0zsiwOiJRUG6u1pMmaQ1aP/SQ1jk5VkfkECBOO5Bj5YpcJ7Kl2ADo3Kjwlxk/bz/ejX6XxWMXc+jMIbp90I3/xv1XBnndTU6OOXvs29c8/+EHM2OkZk0roxKXUgrefRcefxymTTPVPl3oal5J+k4Un2qmaXZqWPTVkjdddRMJExLo1aIXE76ZwMh5I0k/l16ZIQqr5OTAPfeYgmATJkBCAvTubXVUojhKme62p5+Gjz4ypRxcpIyzJH0nsqXYCKsfhn8N/2LbNPZvzPI7lvPGgDdYsmsJkdMjWb13dSVGKSpdXsL/7DMzM+T9980sHVG1KWVKOrzwgtl3d9xhbkFZzUnSd6K8QdySeCgPHr/mcTbct4FaPrXo/1l//v7938nKqf5/UOISOTnmLPGzz0zyeOYZqyMSpfXPf5r5+3PnmrINmZlWR1QukvSd5PTF0+w5sYfOwY5PTurauCtb7t/CvV3u5eW1L3PtjGtJOp5UgVGKSpWX8GfOhBdfNMlDVE9/+5sZf/n6a3Mh14ULVkdUZpL0nSQhNQG4fBC3JLV8avHh0A/58pYv2X18N13+24XP4j+TQd7qLifHzPmeOdN0EfzjH1ZHJMpryhTTNbdkibmX7/nzVkdUJpL0naS4mTuOGtV+FAkTEujauCt3L7ib27++nVMXTjkzRFFZcnLMvVxnzTIJ/+mnrY5IOMvEiWZg99tv4cYbISPD6ohKTZK+k9hSbATVDKKJf5Myv0fzus1ZddcqXur3EvO2z6Pzfzuz/uB6J0YpKlxewp89G/71L0n4rujee80YzY8/mjt0nT5tdUSlIknfSeJT4/OvxC0PTw9Pnu79NGvvWYtCcd3H1/Hijy+Sk+sa08VcWna2qdE+e7apnfP3v1sdkagod9xhbse4fr2pmXSy2FuEVDmS9J0gOzebralbiQyOdNp79mzWE9sEG2M6juGZH56h36f9OHDqgNPeXzhZdrY5w//8c3MJ/1NPWR2RqGijR5u7mW3ZYspppFePa24k6TvBzmM7uZhzscz9+cWpU6MOs26excwRM7Gl2IicHsmX292ibl31kneGn5fwn7zsjqLCVY0YAbGxsH07XH89HD1qdUQlkqTvBOUdxC3JHRF38OsDv9ImsA23zr+Vexfey9nMsxXyWaKUsrPhzjthzhwzl1sSvvu58UZzO8bdu6FfPzhyxOqIrkiSvhPYUmzU8KxBm8A2FfYZreu35qfxP/H0dU/zse1jun3Qjc2HN1fY5wkH5CX8uXPh1VfNXG7hngYOhKVLYf9+U1spOdnqiIolSd8JbKk2OjbsiLend4V+jrenNy9d/xKr717Nuaxz/Ol/f+KNn98g9/KbkYmKlp1tBvPmzoXXXoO//tXqiITV+vY1N14/cgT69DEHgCpIkn456QI19CtLn5A+xE+IZ0ibIfzl278waNYgjpyp2l8pXUpewv/iC5Pw//IXqyMSVcW118J338Hx46ag3p6qd0twSfrldOTsEY6dO1Y46ScmVnhhpvp+9Zl/y3w+uOkD1h1YR8T0CBbvXFyhnykwCf/2203Cf/11Sfjict27w/ffmxvl9O4NO3daHVEhkvTLKW8QN3+65vLl5obLzZub2ulbt1bYZyul+HO3P7PlgS00q9OMoXOHMmnpJM5nVc/Lw6u8rCy47TaYNw/eeMPsXyGK0rWruV9CVpbp6tm+3eqI8knSL6e8pB8RHGEWzJsHderANdfAO+9ARARERcF771XYPN62QW3ZcO8GHuv5GNM2TePqD69ma2rFHWzcUlaWOcP/8ktTE//xx62OSFR1nTqZq3Y9PEx/f3y81REBkvTLzZZio1VAK+r61jVf/RctgptuMtX4Dh82iT83Fx5+GBo3NhX6Fi92evdPDa8avDnoTZbfvpxj545x9YdX894v70nhNmfIO8PPS/iPPWZ1RKK6aNfOJH5fXzOdMy7O6ogk6ZdXoUHctWvN2fyIEeZ5gwYwebK5Ys9mMzfC/uknGDoUmjUzZ4tO7v4ZFDaIhIkJ9G/Vn4eXPczXO7526vu7nawsGDsW5s+Ht96ShC9KLzwc1qwxPQD9+8OGDZaGI0m/HM5cPEPS8aQ/aujHxkKNGqYI06UiI03SOHQIFi40o/zvvmu6f7p1g3//G44dc0pcDWs1ZNGYRQT6BbJg5wKnvKdbykv4X31l9t2jj1odkaiuQkNN4m/QAAYMMCd/FpGkXw5bj25Fo82ZvtawYIHZobVrF/8ib29zpl+w+0dr842gSROndf94engyOGwwy5OWyzz+ssjKMndJ+uorc/MMSfiivFq0MIm/aVNzYvj995aEIUm/HAqVX/j1Vzhw4I+uHUdUcPdPTHgMx84dI+6w9f2I1Upewv/6a3j7bXPzDCGcoUkT08ffqpUZ+1uxotJDkKRfDvEp8dT3q0+zOs1M146HBwwZUrY3K9j9s2gR9OplunzK0f0zqPUgFIqlu5eWLSZ3lJlpqifmJfxHHrE6IuFqgoNh9Wpo29ac4C2u3OtrJOmXgy3VRmRwpKmhv2ABXHedOXsvD29vc+D46ivT/fPuu2Z5Gbp/AmsG0qNZD5YlLStfTO4iL+HHxppuN0n4oqIEBZnunYgIuPlm8/+9kkjSL6Ps3GwSUhNM105SEmzbBsOHO/dDgoLMVM/Nm80c34cfNjOEStH9ExMWw6ZDmziaUfVLvloqL+EvWGAOtJMnWx2RcHX165uSDVdfbf725syplI+VpF9Gu9N3cyH7gkn6sbFmobOTfkEREWaOeHJyqbp/YsJj0GhWJFV+32G1kZkJt95qEv6//20OrkJUhrp1Tb/+tdeai/8+/bTCP1JVtYt3oqKidFwVuIChJHO2zuG2r28jfkI8ETdPhPPnzYBsZTp2zJwdfPKJ+ey8rqFx48zsAG9vcnUujd9szPWh1zNnpINnErm5ZnvOnTOPgj8XfBS1vKhltWqZA9PVV5urk8vbBeZMeQl/4UKT8CdNsjoi4Y4yMmDYMLh40ZRv8PQs9VsopTZrraNKaufl4JsNBt4BPIGPtNavXLJ+HPA6cMi+6D2t9Uf2dXcD/7Avf0lrXfGHskpgS7Hh4+lD2+x65j6Zzz1X+UHkdf88/DAkJJizhFmzzCBkw4YwejQeAQFE5zRgUUIsOUvuxvPchZIT9oULZYvHzw9q1vzj37zHoUNmHCLvBKNFC5P88w4C3bpBQIDzfi+OysyEW24x35zeew8eeqjyYxACzIlR3lhdGRJ+aZR4pq+U8gR2AQOAZGATMFZrnVigzTggSms96ZLX1gfigChAA5uBblrrE8V9XnU50x80axBpGWls4QGYMMEk3U6drA7L/NEsX27O/u1/RPO6+DB6WCbrlgRzzZl6hRNyUUm6LMt8fc3speKcPm2mtcbFmcemTYXLzrZuXfhA0LUr+PtX3O8pMxNGjTK/I0n4wgU480y/O5Cktf7d/sZzgWFA4hVfZQwCvtVaH7e/9ltgMFA5IxYVKD4lnpjwGJi+wCSsjh2tDsnI6+IZMsTUAvL0ZMCFk3i8HsTSV+/jmutfsiauOnVMtcE+ff5YduKEGaTOOxCsX29KFgMoBW3aFD4QdO5sDjDldfGiOcNfvBimTYMHHyz/ewpRTTiS9JsCBws8TwZ6FNFupFKqN+ZbwaNa64PFvLZpGWOtMlLOppCakUpk3avg+1lmpodSVod1OS+zewP8Arim+TUsS1rGS1Yl/aIEBMANN5hHnqNHCx8Ivv/edFmB+SbRocMfB4GoKDOQXaOG45958aI5w1+yBN5/HyZOdO42CVHFOZL0i8pml/YJLQbmaK0vKqUmAJ8C1zv4WpRS9wP3A7Ro0cKBkKyVfyXunnOmO6U0V+FaJCYshr+v+jtHzhyhsX9jq8MpXsOGEB1tHnkOHy7cLbRoEcyYYdZ5e5tutYIHgg4dzPJLXbxornP45hv4z39Mt5wQbsaRKZvJQPMCz5sBhws20Fqna60v2p9+CHRz9LX213+gtY7SWkc1qEozO4qRf+OU77aaq+t69rQ4opLFhMcAsDxpucWRlEGTJubahBdegGXLzLeBfftM5cvHHzffGObOhT//Gbp0MV1Jf/qTGeD+9FNzA4tz5yThC4FjZ/qbgHClVChmds4Y4LaCDZRSjbXWeTdpHQrssP+8Avg/pVTe1IyBwFPljtpithQbIXVbUm/Jd6YKYwWPtjtDRHAETfybsCxpGeO7jLc6nPJRClq2NI+RI80yrc3AcMFvBJ98YgZpweyjnByYPh0eeMCy0IWwWolJX2udrZSahEngnsAMrfV2pdQLQJzWehEwWSk1FMgGjgPj7K89rpR6EXPgAHghb1C3OrOl2OisGsPZ/dWiawfMrRWjw6KZnzifrJwsvD2L6P6ozpSCsDDzGDPGLMvJgV27zEFgyxZzQVveQUIINyUXZ5VSRmYG/i/78+ypzjz7URKkpZVuINFCX+/4mpHzRvLjuB/p3bK31eEIIZzI0SmbUoahlPJr6K/bAzfeWG0SPsANrW7Ay8OLZbulAJsQ7kqSfinFp5ibG3feebpia+1UgDo16tCrRS+WJkmpZSHclST9UrKl2Kina9DivHfhaYXVRExYDAmpCSSfTrY6FCGEBSTpl5ItxUZkqkLdMMBMDaxmosPNgapaTt0UQpSbJP1SyMnNISHFRud9F6pd106eDg060LxOc7mblhBuSpJ+KSQdT+JczgU6p2AuFqqGlFLEhMfw7e/fkpmTaXU4QohKJkm/FArdCD042OJoyi4mPIazmWdZe2Ct1aEIISqZJP1SsO38Ae8caN9/jNWhlMv1odfj4+kjUzeFcEOS9EshfscPtE8Dn5tvsTqUcqntU5veLXvL1E0h3JAk/VKwnfudzhcDoFUrq0Mpt5iwGBLTEtl/cr/VoQghKpEkfQel7tvGkRqZRDa/2upQnCJv6uayJOniEcKdSNJ3UPyS/wHQudcoiyNxjjaBbQitFypTN4VwM5L0HWSzmYuZIq+92eJInCNv6ub3e7/nQnYZb4QuhKh2JOk74swZbKd30SLXn/o1A62Oxmmiw6I5l3WOn/b/ZHUoQohKIknfEcuWYWuYS+eGEVZH4lT9QvtRw7OGdPEI4UYk6Tvg/ML57AyEzu36WR2KU9X0rkm/0H4ydVMINyJJvySZmWzbtJRcD+jcuKvV0ThddFg0u9J3sef4HqtDEUJUAkn6JVm1Cpt/BgCRjSItDsb58m6YLlM3hXAPkvRLEhuLrbk3dXzqEFIvxOponC6sfhjh9cOlX18INyFJ/0pyc2HhQmxt6hDZKBIP5Zq/rpjwGFbvW835rPNWhyKEqGCumcWcZcMGco+mEl/rrKms6aKiw6K5kH2BH/b9YHUoQogKJkn/SmJj2dPAiwx90aWTfp+QPvh5+UkXjxBuQJJ+cbQ2/fn9OwC4dNL39fKlf6v+LE1aitba6nCEEBVIkn5xtm+HPXuI79oYLw8v2jdob3VEFSo6LJrfT/zO7uO7rQ5FCFGBJOkXJzYWlMIWmE27oHb4evlaHVGFig4zVTeli0cI1yZJvzgLFkDPnthO7HDJ+fmXCg0IpV1QO0n6Qrg4SfpF2b8ftmwhbdgNHDpziM7BrtufX1B0WDQ/7v+RjMwMq0MRQlQQSfpFWbAAgPhrzB2yXHkQt6CY8BgyczJZtXeV1aEIISqIQ0lfKTVYKbVTKZWklHryCu1GKaW0UirK/txbKfWpUmqrUmqHUuopZwVeoRYsgA4dsHkeA1yz/EJRerXoRW2f2tLFI4QLKzHpK6U8gWlANNAeGKuUumwqi1LKH5gMbCyw+Baghta6E9ANeEApFVL+sCvQsWOwZg2MGIEtxUazOs0IqhlkdVSVooZXDW5odQPLkpbJ1E0hXJQjZ/rdgSSt9e9a60xgLjCsiHYvAq8BBW/DpIFaSikvwA/IBE6XL+QKtnixKb8wfDjxqfFu07WTJzosmv2n9rPj2A6rQxFCVABHkn5T4GCB58n2ZfmUUl2A5lrrJZe8dj6QARwBDgBvaK2Plz3cSrBgAbRowYWI9uxI2+E2g7h5ZOqmEK7NkaSviliW/91fKeUBTAUeL6JddyAHaAKEAo8rpVpd9gFK3a+UilNKxaWlpTkUeIXIyICVK2H4cLanJZKjc9ymPz9P87rN6dSwk5RaFsJFOZL0k4HmBZ43Aw4XeO4PdAR+UErtA3oCi+yDubcBy7XWWVrro8A6IOrSD9Baf6C1jtJaRzVo0KBsW+IMy5fDhQswfDi2FBvgPjN3CooOi+an/T9x+mLV7okTQpSeI0l/ExCulApVSvkAY4BFeSu11qe01kFa6xCtdQiwARiqtY7DdOlcr4xamAPCb07fCmeJjYXAQLjuOmwpNmr71KZVwGVfTFxeTHgMWblZfP/791aHIoRwshKTvtY6G5gErAB2APO01tuVUi8opYaW8PJpQG1gG+bg8bHWOqGcMVeMrCxYsgSGDAEvL2ypNiKDXbeG/pVc0/wa6tSoI/36QrggL0caaa2XAksvWfZMMW37Fvj5LGbaZtX3ww9w6hQMH06uziU+JZ67Iu+yOipLeHt6M6DVgPypm0oVNawjhKiO3O80tjixsVCzJgwcyN4TezmTecYt+/PzxITHcOjMIbYe3Wp1KEIIJ5KkD/m3RWTwYPDzIz41HnDPQdw8g8MGAzJ1UwhXI0kfYNMmOHwYhg8HwJZiw1N50qFBB4sDs04T/yZ0adRFpm4K4WIk6YPp2vHygptuAkzSbxPUBj9vP4sDs1Z0WDTrDqzj5IWTVocihHASSfpgrsLt2xcCAgCT9N25aydPTHgMOTqHb/d8a3UoQggnkaS/Ywfs3AkjRgCQfi6dg6cPul35haL0aNaDAN8A6eIRwoVI0o+NNf8OMzXkZBD3D14eXgxsPZBlScvI1blWhyOEcAJJ+gsWQPfu0NTUkMsrv+BuNXeKExMeQ8rZlPzfixCienPvpJ+cbGbu2Lt2wJzpN/FvQsNaDS0MrOrIm7q5bLd08QjhCtw76dtvi5g3VRNkEPdSDWs1JKpJFEuTZL6+EK5Akn7btuYBXMy+SGJaogziXiImLIYNyRs4fr5q3wpBCFEy9036x4+bejsFunYS0xLJzs2W/vxLxITHkKtzWblnpdWhCCHKyX2T/pIlkJNzWdcOyMydS0U1iSLQL1BKMgjhAtw36S9YYGbsRP1xTxdbio1a3rVoHdDawsCqHk8PTwaHDZapm0K4APdM+ufOmbtkDR8OHn/8CmypNiKCI/D08LQwuKopJjyGY+eOEXc4zupQhBDl4J5Jf+VKOH++UNeO1pr4lHjp2inGoNaDUCiZuilENeeeST821tTZ6dMnf9H+U/s5dfGUJP1iBNYMpEezHjJ1U4hqzv2SfnY2LF5sKmp6e+cvlkHcksWExbDp0CbSMtKsDkUIUUbul/TXrIETJwpN1QST9D2UBx0bdrQosKovJjwGjWbFnhVWhyKEKCP3S/qxseDrCwMHFlpsS7FxVeBV1PSuaVFgVV+Xxl1oWKuhTN0Uohpzr6SvtZmqOWgQ1KpVaJWUXyiZh/IgOiyaFXtWkJObY3U4QogycK+kv3mzKbJ2SdfOifMn2H9qv5RfcEBMeAzHzx/nl0O/WB2KEKIM3Cvpx8aCp2f+bRHzSA19xw1oNQAP5SFdPEJUU+6V9BcsgN69ITCw0OL4FEn6jgrwC+Ca5tfI1E0hqin3Sfq7dkFi4mVdO2CuxG1UuxHBtYMtCKz6iQmLYcuRLaScTbE6FCFEKblP0r/ktogFySBu6cSExwCwPGm5xZEIIUrLfZL+ggXQrRu0aFFocWZOJtuPbicyWMopOyoiOIIm/k2kX1+Iasg9kv7hw7BhQ5FdOzvSdpCVmyVn+qWglCI6LJqVe1aSnZttdThCiFJwKOkrpQYrpXYqpZKUUk9eod0opZRWSkUVWBahlFqvlNqulNqqlPJ1RuClsnCh+bdAgbU8Un6hbGLCYzh18RTrD663OhQhRCmUmPSVUp7ANCAaaA+MVUq1L6KdPzAZ2FhgmRcwC5igte4A9AWynBJ5aSxYAOHh0P6ysLGl2PDz8iO8fnilh1Wd3dDqBrw8vKSLR4hqxpEz/e5Aktb6d611JjAXuHw0FF4EXgMuFFg2EEjQWscDaK3TtdaVeynnyZOwapXp2lHqstVSQ79s6tSoQ68WvWTqphDVjCNJvylwsMDzZPuyfEqpLkBzrfWSS157FaCVUiuUUluUUn8t6gOUUvcrpeKUUnFpaU6u4PjNN6ayZhFdO1JDv3xiwmJISE3g0OlDVocihHCQI0n/8tNj0PkrlfIApgKPF9HOC+gF3G7/d4RSqv9lb6b1B1rrKK11VIMGDRwK3GGxsdC4MfTocdmqg6cPcuLCCUkjlTgiAAAgAElEQVT6ZRQdHg3AsiS5sYoQ1YUjST8ZaF7geTPgcIHn/kBH4Ael1D6gJ7DIPpibDPyotT6mtT4HLAW6OiNwh5w/b26LOGxYodsi5skbxJXpmmXToUEHmtdpLv36QlQjjiT9TUC4UipUKeUDjAEW5a3UWp/SWgdprUO01iHABmCo1joOWAFEKKVq2gd1+wCJTt+K4nz3HWRkFDlVE0zSVyg6BXeqtJBciVKKmPAYvvv9OzJzMq0ORwjhgBKTvtY6G5iESeA7gHla6+1KqReUUkNLeO0J4C3MgcMGbNFaf1P+sB0UGwt160LfvkWutqXYCA8Mp7ZP7UoLydXEhMdwJvMM6w6sszoUIYQDvBxppLVeiumaKbjsmWLa9r3k+SzMtM3KlZ0NixbBjTeCj0+RTWwpNq5uenUlB+Zarg+9Hh9PH5buXkq/0H5WhyOEKIHrXpG7bh2kpxfbtXPqwin2ntwrNfTLqbZPbXq37C2DuUJUE66b9GNjoUYNGDy4yNUJqQmAXInrDDFhMWxP287+k/utDkUIUQKHuneqnbzbIg4YALWL7q+X8gvOEx0ezWMrH2NZ0jImRE2wOpwqLSMzA6WU3Iu5CsnKyWLfyX3oP2aiW8bXy5cWdVuU3LAcXDPp22ywfz88U+Swg2mSYqNhrYY0qt2oEgNzTW0C2xBaL1SSfgk2JG9g4MyBnM08S3hgOBHBEUQ0jDD/BkfQsl5LPJTrfvmuSrTW/HLoF2YmzGTutrmkn0+3OiQAejTtwYb7NlToZ7hm0o+NNfPyhwwptokt1UZkcCSqiNIMonTypm5+bPuYi9kXqeFVw+qQqpzEtERu/PxGGtZqyGN/eoytR7diS7HxVeJX+WeY/j7+dAruVOhA0Cm4E3Vq1LE4etex98ReZiXMYtbWWexK34Wvly/D2gxjcNhgfDyLnvBRmQL9AktuVE6umfQXLIBevaCYq3uzcrLYdnQbj/R4pJIDc13RYdFM2zSNNfvXMKD1AKvDqVL2n9zPwJkD8fH0YeWdK2kV0Cp/3dnMs2w/up2E1ATzOJrAnG1zmL55en6bkHohl30rCKsfJvWiHHTi/AnmbZ/HzISZrDtophb3DenL3679GyPbjaSub12LI6xcrpf09+yBrVth6tRim/x27DcyczKlP9+J+oX2o4ZnDZbuXipJv4C0jDQGzjJdOmvGrymU8MHMfurRrAc9mv1RJkRrzcHTB/84ENgfS3YtIVfnAuDn5UeHhh0KHQgigiMIrFnxZ4rVQWZOJkt3L2VmwkyW7FpCZk4m7Ru05+X+L3Nbp9sqvN+8KnO9pJ93W8QiCqzlkUFc56vpXZN+of1YlrSMqRR/wHUnZy6eIXp2NAdOHeDbO78lIjjCodcppWhRtwUt6rbgpqtuyl9+IfsCiWmJhQ4Ei3YtYoZtRn6bpv5NCx0EIoIjaBPYBm9Pb6dvX1WjtWZ98npmxs9kXuI8jp8/TnCtYB6MepA7I++kS6Mu0p2LKyb9BQugc2cICSm2SXxqPL5evlwVeFXlxeUGosOieWT5I+w5vofW9VtbHY6lLmRfYPgXw7Gl2Fg4ZiG9WvQq93v6evnStXFXujb+o3yV1prUjNTLvhV89/t3ZOWaW1d4e3jTvkH7yw4GwbWCXSIJJh1PYmb8TGZtncXvJ37Hz8uPEe1GcGfEnfn3fRB/cK3fRmoq/PwzPPfcFZvZUmx0athJ/hicLCY8hkeWP8KypGVM6j7J6nAsk5Obw+1f386qvauYOWImN151Y4V9llKKRrUb0ah2Iwa2Hpi/PCsni53pOwsdCFbtXcXMhJn5bRrUbEBko0gigyOJCI4gMjiSdg3aVYkBzZKkn0vni+1fMDNhJhuSN6BQXB96Pc/0foab292Mfw1/q0Osslwr6y1caOboF3MVLpgzI1uKjZvb3VyJgbmHsPphhNcPd+ukr7VmwpIJfL3ja6YOmsodEXdYEoe3pzcdG3akY8OO3Nbptvzl6efS2Xp0K/Ep8SSkJhCfGs97v7zHxZyL5nUe3rRr0C7/IBAZHElko0ga1mpoyXYUdCH7Akt2LWFWwiyW7l5KVm4WnRp24rUbXmNsp7E0q9PM6hCrBddK+gsWQKtW0LFjsU0OnTlE+vl06c+vIDHhMfx38385n3UeP28/q8OpdE+vepqPfv2Ip697mik9p1gdzmUCawbSN6QvfUP65i/Lzs1mV/oucxBIiSc+NZ5Ve1cxK+GPklmNaje67EBQGWMFuTqXdQfWMTNhJl8mfsnJCydpXLsxk3tM5s6IO4lsJGXRS8t1kv7p0/D99/Dww0XeFjGP1NCvWNFh0byz8R1+2PdD/k1W3MVb69/i5bUv80C3B3ix34tWh+MwLw8v2jdoT/sG7RnTcUz+8mPnjhU6EMSnxvPOxnfyy2j7ePrQvkH7QgeCyOBIp8wg2nlsZ/58+n0n91HTuyY3t7uZOyPupH9of5muWg6uk/QzM2HKFLj11is2y0v6js6kEKXTJ6QPfl5+LEta5lZJ/7P4z3h85eOMaj+KaTHTXGKANKhmENeHXs/1odfnL8sbKyh4IFixZwWfxn+a36aJf5PLDgThgeEljqGlZaQxd9tcZibMZNPhTXgoD25odQMv9nuR4W2HSwl0J1FaW19voqCoqCgdFxdXYe8/at4o4lPj2f3w7gr7DHc3ZM4QEtMSSXo4ySWSX0kW71zMiC9G0DekL9/c9o1bXpF8NONooQNBQmoCiWmJZOdmA2bmUYcGHfIPBHldRb5evizetZiZCTNZnrSc7NxsIoMjuTPiTsZ2GksT/yYWb1n1oZTarLWOKqmd65zpO8iWYqNL4y5Wh+HSosOiWbJrCbuP73b5abE/7f+JW+ffStfGXYkdHeuWCR+gYa2GDGg9oNCFeZk5mexI25F/EIhPjWfxrsWFrivw8fQhMyeTpv5NeaznY9wRcYfcya6CuVXSP3PxDHtO7GF85/FWh+LSosPsN0zfvcylk358SjxD5gyhZd2WLL19qUwTvISPp4/p3ikw2Kq1JuVsSv5B4MiZI9x01U30Dekr/fSVxK2SvtTQrxyhAaG0C2rH0qSlPNLTNesb7Tm+h0GzBuFfw5+Vd64kqGaQ1SFVC0opGvs3prF/YwaFDbI6HLfkVnVcpfxC5YkOi+aHfT+QkZlhdShOd+TMEQbOGkh2bjYr71jp1nVcRPXjdkk/0C9QBocqQUx4DJk5mazet9rqUJzq5IWTDJ49mNSzqSy9fSntGrSzOiQhSsW9kn6qjc6NOrvFjBKr9WrRi9o+tVm6e6nVoTjNuaxzDJkzhB1pO4gdHUv3pt2tDkmIUnObpJ+dm83W1K3StVNJanjV4IZWN7B091Kq2rTgssjKyWL0/NGsO7CO2TfPlvLRotpym6S/89hOLuZclKRfiaLDotl/aj87ju2wOpRyydW53LvoXpbsWsL7N77PLR1usTokIcrMbZJ+fGo8IIO4lang1M3qSmvNEyufYGbCTF7s96LcA1hUe26T9G0pNmp41qBNYBurQ3Ebzes2p1PDTnyW8Bm70ndZHU6ZvLL2FaZumMrk7pN5+rqnrQ5HiHJzq6TfsWFHt7iDUFXyxDVPsPPYTtq+15YRX4xg3YF11aaP/8PNH/L3VX/n9k63M3XwVJkAIFyCWyT9vBr60rVT+e6KvIv9U/bzj97/YM3+NfT6uBfXzLiGrxK/Iic3x+rwivVV4ldM+GYC0WHRfDzsYzyUW/xXEW7ALf6Sj5w9Qtq5NCmnbJHg2sG80O8FDkw5wLSYaaRlpDHqy1G0ea8N036ZVuUu4Fq1dxW3fX0bPZv1ZP6t8+XboXApDiV9pdRgpdROpVSSUurJK7QbpZTSSqmoS5a3UEqdVUo9Ud6Ay0KuxK0aavnU4sGrH2TnpJ3Mv2U+QTWDmLRsEi3ebsE/V/2T1LOpVodI3OE4hs0dxlWBV7Fk7BJqete0OiQhnKrEpK+U8gSmAdFAe2CsUqp9Ee38gcnAxiLeZipg2RQOqaFftXh6eDKy/UjW37uetePX0rtlb/71079o+XZL/rzoz+xIs2aK585jO4meHU1QzSBW3LGCAL8AS+IQoiI5cqbfHUjSWv+utc4E5gLDimj3IvAacKHgQqXUcOB3YHs5Yy0zW4qNVgGtqOtb16oQRBGUUlzb4lpiR8fy26TfGN95PLO2zqL9++0ZMmcIP+77sdIGfZNPJzNg5gA8lAff3vmtlOoQLsuRpN8UOFjgebJ9WT6lVBegudZ6ySXLawF/A56/0gcope5XSsUppeLS0tIcCrw04lPjpWunirsq8Cr+c9N/ODDlAM/1eY4NyRvo+2lfun/UnS+2fZF/M46KkH4unYEzB3Lq4imW376csPphFfZZQljNkaRf1Dy1/NMvpZQHpvvm8SLaPQ9M1VqfvdIHaK0/0FpHaa2jGjRo4EBIjjubeZbd6bvpHCxJvzpoUKsBz/Z9lgNTDjD9xumcunCKMV+NIezdMN7Z8A5nM6/4p1RqZzPPcuPnN/L7id9ZNGaR3GBHuDxHkn4y0LzA82bA4QLP/YGOwA9KqX1AT2CRfTC3B/CaffkU4O9KqUlOiNthW1O3otFypl/N+Hn78UDUA/w26TcWjF5A87rNmbJiCs2nNuep757i8JnDJb9JCTJzMhk5bySbDm/ii1Ff0CekjxMiF6JqcyTpbwLClVKhSikfYAywKG+l1vqU1jpIax2itQ4BNgBDtdZxWuvrCix/G/g/rfV7zt+M4uUN4ha8e4+oPjyUB8PaDuOn8T+x/t713NDqBl77+TVC3g5h/MLxbD9atqGinNwc7oq9i5V7VvLRkI8Y1raoYSohXE+Jd87SWmfbz85XAJ7ADK31dqXUC0Cc1nrRld/BWrYUGwG+ATSv07zkxqJK69msJ1/e8iV7ju/h7Q1vM8M2g09snxAdFs0T1zxBv5B+Dl01q7Vm8rLJfLH9C14f8Drju7jH7TOzsrJITk7mwoULJTcWVZavry/NmjXD27ts14+oqnZJfFRUlI6Li3Pa+/X4qAe1vGux6u5VTntPUTWkn0tnetx0/v3Lv0nNSKVLoy48cc0T3NL+liteUPXs6md5Yc0L/PWav/LqgFcrMWJr7d27F39/fwIDA6WkRDWltSY9PZ0zZ84QGhpaaJ1SarPWOqqYl+Zz6Styc3JzpIa+CwusGcjTvZ9m35R9fDTkI85nn+f2r2+n9buteWv9W5y+ePqy1/x74795Yc0L3NP5Hl654RULorbOhQsXJOFXc0opAgMDy/VtzaWT/u7juzmffV6Svovz9fLl3q73sv3B7SwZu4TW9Vvz+MrHaT61OX/99q8kn04G4POtnzN5+WSGtx3Of4f81y2Tnztus6sp7z4ssU+/OpPyC+7FQ3lw41U3cuNVNxJ3OI4317/JW+vfYuqGqQy5agiLdy2mb0hf5oycg5eHS//pC1Eslz7Tt6XY8PH0oW1QW6tDEZUsqkkUc0bOIWlyEpOunsTKPSuJCI5g4ZiF+Hr5Wh2e2zp58iTvv/9+qV8XExPDyZMnr9jmmWee4bvvvitraG7DpQdyB88azNGMo2x5YItT3k9UXxmZGXh5eFHDq4bVoVhmx44dtGvXztIY9u3bx0033cS2bdsKLc/JycHT09OiqMrn0tgd3Zbs7Gy8vMr2jbOofenoQK5Lf8e1pdiIDo+2OgxRBdTyqWV1CFXLlClgszn3PTt3hrffvmKTJ598kj179tC5c2e8vb2pXbs2jRs3xmazkZiYyPDhwzl48CAXLlzgkUce4f777wcgJCSEuLg4zp49S3R0NL169eLnn3+madOmLFy4ED8/P8aNG8dNN93EqFGjCAkJ4e6772bx4sVkZWXx5Zdf0rZtW9LS0rjttttIT0/n6quvZvny5WzevJmgoKAi4501axbvvvsumZmZ9OjRg/fffx9PT09q167NY489xooVK3jzzTe54447uOeee1i5ciWTJk2ibdu2TJgwgXPnztG6dWtmzJhBQEAAffv25ZprrmHdunUMHTqUxx8vqpBBxXLZ7p2UsymkZqRK+QUhqpBXXnmF1q1bY7PZeP311/nll1/417/+RWJiIgAzZsxg8+bNxMXF8e6775Kenn7Ze+zevZuHHnqI7du3U69ePb766qsiPysoKIgtW7YwceJE3njjDQCef/55rr/+erZs2cKIESM4cOBAsbHu2LGDL774gnXr1mGz2fD09GT27NkAZGRk0LFjRzZu3EivXr0AM39+7dq1jBkzhrvuuotXX32VhIQEOnXqxPPP/1F+7OTJk/z444+WJHxw4TN9GcQV4gpKOCOvLN27dy803/zdd98lNjYWgIMHD7J7924CAwMLvSY0NJTOnc3/627durFv374i3/vmm2/Ob/P1118DsHbt2vz3Hzx4MAEBxZfP/v7779m8eTNXX301AOfPn6dhw4YAeHp6MnLkyELtR48eDcCpU6c4efIkffqYsh533303t9xyy2XtrOLySV/KLwhRddWq9Ue32w8//MB3333H+vXrqVmzJn379i1yPnqNGn+My3h6enL+/Pki3zuvnaenJ9nZpkpracYwtdbcfffdvPzyy5et8/X1vazfvuC2XImj7SqKy3bvxKfGE1IvhHq+9awORQhh5+/vz5kzZ4pcd+rUKQICAqhZsya//fYbGzZscPrn9+rVi3nz5gGwcuVKTpw4UWzb/v37M3/+fI4ePQrA8ePH2b9/f4mfUbduXQICAvjpp58AmDlzZv5Zf1Xg0mf60rUjRNUSGBjItddeS8eOHfHz8yM4ODh/3eDBg5k+fToRERG0adOGnj17Ov3zn332WcaOHcsXX3xBnz59aNy4Mf7+/kW2bd++PS+99BIDBw4kNzcXb29vpk2bRsuWLUv8nE8//TR/ILdVq1Z8/PHHzt6UMnPJKZsZmRn4v+zPs32e5dm+zzopMiGqt6owZdNqFy9exNPTEy8vL9avX8/EiROxOXsWUyWQKZuX2HZ0m9TQF0Jc5sCBA9x6663k5ubi4+PDhx9+aHVIlc4lk74M4gohihIeHs6vv/5aaFl6ejr9+/e/rO33339/2cwhV+CySb9ujbq0rFty35sQwr0FBgZWyy6esnLJ2Tu2VDOIKxUFhRCiMJdL+lJDXwghiudySX/PiT1kZGVI0hdCiCK4XNKX8gtCCFE8l0z63h7etG/Q3upQhBDlVLt2bQAOHz7MqFGjimzTt29fSrq25+233+bcuXP5zx2pz++qXDLpt2vQDh9PH6tDEUI4SZMmTZg/f36ZX39p0l+6dCn16lVuiZa8+j/FPXf0deXlclM2bSk2BrQeYHUYQlRpU5ZPye8KdZbOjTrz9uArV+/829/+RsuWLXnwwQcBeO6551BKsWbNGk6cOEFWVhYvvfQSw4YNK/S6gjdfOX/+POPHjycxMZF27doVKrg2ceJENm3axPnz5xk1ahTPP/887777LocPH6Zfv34EBQWxevXq/Pr8QUFBvPXWW8yYMQOA++67jylTprBv375i6/YXZc+ePTz00EOkpaVRs2ZNPvzwQ9q2bcu4ceOoX78+v/76K127dsXf35/Dhw+zb98+goKCmDFjBhMnTiQuLg4vLy/eeust+vXrxyeffMI333zDhQsXyMjIYNWqVeXZNYW4VNI/mnGUI2ePSA19IaqoMWPGMGXKlPykP2/ePJYvX86jjz5KnTp1OHbsGD179mTo0KHFTrn+z3/+Q82aNUlISCAhIYGuXbvmr/vXv/5F/fr1ycnJoX///iQkJDB58mTeeustVq9efdnNUjZv3szHH3/Mxo0b0VrTo0cP+vTpQ0BAALt372bOnDl8+OGH3HrrrXz11VfccccdRcZ0//33M336dMLDw9m4cSMPPvhgfqLetWsX3333HZ6enjz33HNs3ryZtWvX4ufnx5tvvgnA1q1b+e233xg4cCC7du0CYP369SQkJFC/fv3y/dIv4VJJPz4lHpBBXCFKUtIZeUXp0qULR48e5fDhw6SlpREQEEDjxo159NFHWbNmDR4eHhw6dIjU1FQaNWpU5HusWbOGyZMnAxAREUFERET+unnz5vHBBx+QnZ3NkSNHSExMLLT+UmvXrmXEiBH55Y5vvvlmfvrpJ4YOHepw3f6zZ8/y888/F6qZf/Hixfyfb7nllkJlmIcOHZr/jWHt2rU8/PDDALRt25aWLVvmJ/0BAwY4PeGDiyV9Kb8gRNU3atQo5s+fT0pKCmPGjGH27NmkpaWxefNmvL29CQkJKbKOfkFFfQvYu3cvb7zxBps2bSIgIIBx48aV+D5XKjjpaN3+3Nxc6tWrV+xVvZfWzy/4/EqfX1F1911qINeWaqNF3RbU93P+0VEI4Rxjxoxh7ty5zJ8/n1GjRnHq1CkaNmyIt7c3q1evLrFmfe/evfNvW7ht2zYSEhIAOH36NLVq1aJu3bqkpqaybNmy/NcUV8e/d+/eLFiwgHPnzpGRkUFsbCzXXXddqbanTp06hIaG8uWXXwImkcfHxzv02oLbsmvXLg4cOECbNm1K9fml5VpJX2roC1HldejQgTNnztC0aVMaN27M7bffTlxcHFFRUcyePZu2bdte8fUTJ07k7NmzRERE8Nprr9G9e3cAIiMj6dKlCx06dOCee+7h2muvzX/N/fffT3R0NP369Sv0Xl27dmXcuHF0796dHj16cN9999GlS5dSb9Ps2bP53//+R2RkJB06dGDhwoUOve7BBx8kJyeHTp06MXr0aD755JNC3zAqgkP19JVSg4F3AE/gI631K8W0GwV8CVyttY5TSg0AXgF8gEzgL1rrKw5Dl7We/vms89R+uTb/uO4fPN/v+ZJfIISbkXr6rqNC6+krpTyBacAAIBnYpJRapLVOvKSdPzAZ2Fhg8TFgiNb6sFKqI7ACaFrSZ5bF6YunGd1hNL1b9q6ItxdCCJfgyEBudyBJa/07gFJqLjAMSLyk3YvAa8ATeQu01gULV28HfJVSNbTWF3Gy4NrBfD7yc2e/rRBC5HvooYdYt25doWWPPPII48ePtyii0nMk6TcFDhZ4ngz0KNhAKdUFaK61XqKUeoKijQR+rYiEL4QQlWHatGlWh1BujiT9oq6QyB8IUEp5AFOBccW+gVIdgFeBgcWsvx+4H6BFixYOhCSEKAuttdxnopor733NHZm9kww0L/C8GXC4wHN/oCPwg1JqH9ATWKSUigJQSjUDYoG7tNZ7ivoArfUHWusorXVUgwYNSr8VQogS+fr6kp6eXu6kIayjtSY9PR1fX98yv4cjZ/qbgHClVChwCBgD3FYgiFNA/rXNSqkfgCfss3fqAd8AT2mtC3eECSEqVbNmzUhOTiYtLc3qUEQ5+Pr60qxZszK/vsSkr7XOVkpNwsy88QRmaK23K6VeAOK01ouu8PJJQBjwT6XUP+3LBmqtj5Y5YiFEmXh7exMaGmp1GMJiDs3Tr0xlnacvhBDuzNF5+i51Ra4QQogrk6QvhBBupMp17yil0oArV1yyRhDmCmNX5urbKNtX/bn6NpZn+1pqrUuc/ljlkn5VpZSKc6S/rDpz9W2U7av+XH0bK2P7pHtHCCHciCR9IYRwI5L0HfeB1QFUAlffRtm+6s/Vt7HCt0/69IUQwo3Imb4QQrgRSfp2SqnmSqnVSqkdSqntSqlH7MvrK6W+VUrttv8bYF+ulFLvKqWSlFIJSqmu1m6BY5RSnkqpX5VSS+zPQ5VSG+3b94VSyse+vIb9eZJ9fYiVcTtCKVVPKTVfKfWbfT/+yQX336P2v89tSqk5Sinf6rwPlVIzlFJHlVLbCiwr9T5TSt1tb79bKXW3FdtSnGK28XX732mCUirWXqcsb91T9m3cqZQaVGD5YPuyJKXUk2UOSGstD9PF1Rjoav/ZH9gFtMfcGOZJ+/IngVftP8cAyzClp3sCG63eBge38zHgc2CJ/fk8YIz95+nARPvPDwLT7T+PAb6wOnYHtu1T4D77zz5APVfaf5h7W+wF/Arsu3HVeR8CvYGuwLYCy0q1z4D6wO/2fwPsPwdYvW0lbONAwMv+86sFtrE9EA/UAEKBPZiaZ572n1vZ/7bjgfZlisfqX0hVfQALMbeI3Ak0ti9rDOy0//xfYGyB9vntquoDUxb7e+B6YIn9P8+xAn98fwJW2H9eAfzJ/rOXvZ2yehuusG117AlRXbLclfZf3g2N6tv3yRJgUHXfh0DIJQmxVPsMGAv8t8DyQu2qwuPSbbxk3Qhgtv3npzBVifPWrbDv0/z9WlS70jyke6cI9q/BXTD3+w3WWh8BsP/b0N6sqDuKVcj9f53obeCvQK79eSBwUmudbX9ecBvyt8++/pS9fVXVCkgDPrZ3X32klKqFC+0/rfUh4A3gAHAEs0824zr7ME9p91m125eXuAfzDQYqYRsl6V9CKVUb+AqYorU+faWmRSyrslOhlFI3AUe11psLLi6iqXZgXVXkhfkK/R+tdRcgA9M1UJzqtn3Y+7aHYb72NwFqAdFFNK2u+7AkxW1Ptd1OpdTTQDYwO29REc2cuo2S9AtQSnljEv5srfXX9sWpSqnG9vWNgbx7AZR0R7Gq5lpgqDJ3N5uL6eJ5G6inlMq7r0LBbcjfPvv6usDxygy4lJKBZK31Rvvz+ZiDgKvsP4AbgL1a6zStdRbwNXANrrMP85R2n1XHfYl9wPkm4HZt77OhErZRkr6dUkoB/wN2aK3fKrBqEZA3G+BuTF9/3vK77DMKegKn8r6SVkVa66e01s201iGYQb1VWuvbgdXAKHuzS7cvb7tH2dtX2bMnrXUKcFAp1ca+qD+QiIvsP7sDQE+lVE3732veNrrEPiygtPtsBTBQKRVg/zY00L6sylJKDQb+BgzVWp8rsGoRMMY+8yoUCAd+ocAdDO2zs8bY25ae1QMcVeUB9MJ8XUoAbPZHDKYP9Htgt/3f+vb2CpiGGVHfCkRZve7nUKkAAACvSURBVA2l2Na+/DF7p5X9jyoJ+BKoYV/ua3+eZF/fyuq4HdiuzkCcfR8uwMzkcKn9BzwP/AZsA2ZiZnlU230IzMGMT2RhzmbvLcs+w/SLJ9kf463eLge2MQnTR5+Xa6YXaP+0fRt3AtEFlsdgZhXuAZ4uazxyRa4QQrgR6d4RQgg3IklfCCHciCR9IYRwI5L0hRDCjUjSF0IINyJJXwgh3IgkfSGEcCOS9IUQwo38P5f6OLTWeCINAAAAAElFTkSuQmCC\n", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "600 is a good training size\n" - ] - } - ], - "source": [ - "# train using %10, %20, %30, ..., 100% of training data\n", - "training_proportions = np.arange(0.10, 1.01, 0.10)\n", - "\n", - "# TODO\n", - "\n", - "# Calculate error for each entry in training_sizes\n", - "# for training and validation sets and populate\n", - "# error_train and error_val arrays. Each entry in these arrays\n", - "# should correspond to each entry in training_sizes.\n", - "\n", - "error_train = []\n", - "error_val = []\n", - "training_sizes = []\n", - "for proportion in training_proportions:\n", - " \n", - " size = len(train_indices)\n", - " size_avail = np.int(np.ceil(size*proportion))\n", - " training_sizes.append(size_avail)\n", - " idx_avail = train_indices[:size_avail]\n", - " \n", - " kwargs_f = {'metric': 'Euclidean'}\n", - " my_model.fit(features[idx_avail], labels[idx_avail], k = 10, distance_f=distance, **kwargs_f)\n", - " \n", - " val_pred_ratios = my_model.predict(features[val_indices])\n", - " val_pred_labels = [1 if x >= threshold else 0 for x in val_pred_ratios]\n", - " tp,tn, fp, fn = conf_matrix(labels[val_indices], val_pred_labels)\n", - " val_accuracy = (tp+tn)/len(val_indices)\n", - " val_error = 1 - val_accuracy\n", - " error_val.append(val_error)\n", - " \n", - " train_pred_ratios = my_model.predict(features[train_indices])\n", - " train_pred_labels = [1 if x >= threshold else 0 for x in train_pred_ratios]\n", - " tp,tn, fp, fn = conf_matrix(labels[idx_avail], train_pred_labels)\n", - " train_accuracy = (tp+tn)/size_avail\n", - " train_error = 1 - train_accuracy\n", - " error_train.append(train_error)\n", - " \n", - "plt.plot(training_sizes, error_train, 'r', label = 'training_error')\n", - "plt.plot(training_sizes, error_val, 'g', label = 'validation_error')\n", - "plt.legend()\n", - "plt.show()\n", - "print('{} is a good training size'.format(600))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 5: Determining *k*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Rubric:\n", - "* Increased accuracy with new *k* +5, +5\n", - "* Improved confusion matrix +5, +5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can use the validation set to come up with a *k* value that results in better performance in terms of accuracy.\n", - "\n", - "Below calculate the accuracies for different values of *k* using the validation set. Report a good *k* value and use it in the analyses that follow this section. Hint: Try values both smaller and larger than 10." - ] - }, - { - "cell_type": "code", - "execution_count": 157, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[0.49666666666666665, 0.55, 0.545, 0.505, 0.495, 0.465, 0.445]" - ] - }, - "execution_count": 157, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# TODO\n", - "k_accuracies = []\n", - "# Change values of k. \n", - "for k in [1, 5, 10, 50, 100, 150, 200]:\n", - " # Calculate accuracies for the validation set.\n", - " my_model.fit(features[train_indices], labels[train_indices], k=k, distance_f=distance, **kwargs_f)\n", - " pred_ratios = my_model.predict(features[val_indices])\n", - " pred_labels = [1 if x >= threshold else 0 for x in pred_ratios]\n", - " tp,tn, fp, fn = conf_matrix(labels[val_indices], pred_labels)\n", - " accuracy = (tp+tn)/len(val_indices)\n", - " k_accuracies.append(accuracy)\n", - "k_accuracies\n", - "# Report a good k value." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 6: ROC curve analysis\n", - "* ROC curve has correct shape +20, +20" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ROC curve and confusion matrix for the final model\n", - "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement, in \"model.ipynb\", a \"ROC\" function that predicts the labels of the test set examples using different *threshold* values in \"predict\" and plot the ROC curve. \"ROC\" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can finally create the confusion matrix and plot the ROC curve for our optimal *k*-NN classifier. Use the *k* value you found above, if you completed TASK 5, else use *k* = 10. We'll plot the ROC curve for values between 0.1 and 1.0." - ] - }, - { - "cell_type": "code", - "execution_count": 135, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl4VOXd//H3l0DY9wRZw46AyBpZLAJWbXGpqKUVEBVFcO3++GgfvWofu7dPa7WCCoIoylKXKrW0dJFFxQBBdlAMYQuLSVjCGrJ9f3/MkF/EkAyQyWQyn9d15cqcM/eZ+Z5sn9z3Oec+5u6IiIgA1Ih0ASIiUnUoFEREpJhCQUREiikURESkmEJBRESKKRRERKSYQkFERIopFEREpJhCQUREitWMdAHnKiEhwTt06BDpMkREosrq1auz3T2xvHZRFwodOnQgNTU10mWIiEQVM9sZSjsNH4mISDGFgoiIFFMoiIhIMYWCiIgUUyiIiEixsIWCmc00s0wz23iW583MnjGzNDNbb2b9w1WLiIiEJpw9hVnAyDKevxboGvyYDDwXxlpERCQEYQsFd18GHCyjySjgFQ9IAZqYWatw1SMiEq2OnSrgD//8lHW7D4f9vSJ5TKENsLvEckZw3ZeY2WQzSzWz1KysrEopTkSkqjiRV8Az76WxYU9O2N8rkqFgpazz0hq6+zR3T3b35MTEcq/SFhGpNo7m5vP+1uxKe79ITnORAbQrsdwW2BuhWkREqoSiImfzviMs3ZrF0q1ZfLzzEAVFToPaNenesmHY3z+SobAAeMjM5gGDgBx33xfBekREIiL72Ck++CybpVuzeP+zLLKP5QFwSetGTBrWieHdEumf1JT4muEf3AlbKJjZXGAEkGBmGcATQC0Ad38eWAhcB6QBJ4C7wlWLiEhVkl9YxJpdh1kW7A2cPlbQrH48V3RNYHi3RIZ2TaBFwzqVXlvYQsHdx5bzvAMPhuv9RUSqkoxDJ1i2NZulWzNZnnaAo6cKiKth9E9qwo+u6cbwixPp1boxNWqUdri18kTd1NkiItEgN7+QFdsPsvTTLJZuzWRb1nEAWjeuww19WjGsayKXd0mgcd1aEa70ixQKIiIVwN3ZlnWMJZ9mseyzbFakH+BUQRHxNWswqGMzxg5MYsTFiXRObIBZZHsDZVEoiIicpyO5+SxPCxwgXrY1mz2HTwLQObE+tw1qz7BuCQzq2Jy68XERrjR0CgURkRAVFTkb9+YUHyD+eNdhCouchrVrcnmX5jx4ZReGdUugbdN6kS71vCkURETKkHX0FO9/lsWyrVm8/1k2B44HThe9tE1j7hveieHdWtAvqQm14qrHpNMKBRGREvILi/h456HAkNBnWWzccwSA5vXjGdYtsfh00YQGtSNcaXgoFEQk5u0+eIJln2Wx9NMslm87wLHg6aIDkpry8NcvZni3RHq2ahTx00Urg0JBRGLOybxCUrYfKD42kB48XbRNk7rc2Ld18HTR5jSqU7VOF60MCgURiQl7D59k4YZ9LN2axYrtB8krKKJ2zRoM7tSc8YPaM/ziRDol1K/Sp4tWBoWCiFRr6VnHeG7JNv6yZg8FRU7XFg24fXB7hndLZGDHZtSpFT2ni1YGhYKIVEub9x5hypI0Fm7YR+2aNRg/uD0Th3akXbPoPV20MigURKRa+XjXIaa8l8Z/PsmkYe2a3D+8M3cP7VhtzxaqaAoFEYl67s7ybQd49r00Pko/QNN6tfjRNd244/IOVW5uoapOoSAiUcvd+c+WTJ5dnMba3Ye5qFFtHr++B2MHJlG/tv68nQ991UQk6hQWOX/bsI+pi9P4ZP9R2jWryy9u7sXoAW2pXVMHji+EQkFEokZeQRFvr9nDc0u3sT37OF1aNOCpW/vwjd6tqVlNppmINIWCiFR5ufmFzFu5i2nL0tmbk0uvNo14fnx/vtazZUxcZVyZFAoiUmUdzc3n1ZRdzPggnexjeVzWoSm/vOVShndLjPmLzMJFoSAiVc6h43m89OF2Zi3fwZHcAoZ1S+ShK7swsGOzSJdW7SkURKTKyDySy/T303ltxS5O5BUy8pKWPHBlZ3q3bRLp0mKGQkFEIm73wRO8sGwbf07NoLDIubFPax4Y0ZmuFzWMdGkxR6EgIhGTlnmMqUvSeGftXuLM+OaAttw/vDNJzTUVRaQoFESk0m3ck8OUxWn8Y9N+6tSMY8LlHZh0RSdaNq4T6dJinkJBRCpN6o6DPLs4jSWfZtGwTk0eHNGFu77Sgeaal6jKUCiISFi5Ox+kZfPse2ms2H6QZvXjefjrF3P7kPYxeRObqk6hICJhUVTk/GvL50xdnMa6jBxaNqrDT27oydiBSdSN11QUVZVCQUQqVEFhEe+u38fUJWls/fwYSc3q8atbLuWW/m00L1EUUCiISIU4VVDIWx/v4fml29h54ATdLmrA02P6cv2lrTQvURRRKIjIBTmZV8jc4LxE+4/k0rttY164fQDX9LhI8xJFIYWCiJyXI7n5zP5oJzM+2M7B43kM6tiM332rN0O7JGheoigW1lAws5HA00Ac8KK7//qM55OAl4EmwTaPuvvCcNYkIhfm4PE8Zn6wnZc/2sHR3AJGXByYlyi5g+Ylqg7CFgpmFgdMAa4BMoBVZrbA3TeXaPY48Gd3f87MegILgQ7hqklEzt/+nFymLUtn7spd5BYUcm2vljwwogu92jSOdGlSgcLZUxgIpLl7OoCZzQNGASVDwYFGwceNgb1hrEdEzsPOA8d5fmk6b67OoNCdUX0D8xJ1aaF5iaqjcIZCG2B3ieUMYNAZbX4K/NPMvgPUB64OYz0icg62fn6UqYvTWLBuLzXjavDty9py77DOtGumeYmqs3CGQmlHmvyM5bHALHf/vZkNAWabWS93L/rCC5lNBiYDJCUlhaVYEQnYkJHDs4s/Y9Gmz6kXH8fEoR2ZdEUnWjTSvESxIJyhkAG0K7Hcli8PD00ERgK4+0dmVgdIADJLNnL3acA0gOTk5DODRUQqwIr0A0xZso1lW7NoVKcm372qK3dd3oGm9eMjXZpUonCGwiqgq5l1BPYAY4BxZ7TZBVwFzDKzHkAdICuMNYlICe7O0q1ZTFmcxqodh0hoEM8jI7szfnASDTUvUUwKWyi4e4GZPQQsInC66Ux332RmTwKp7r4A+BEw3cx+QGBoaYK7qycgEmZFRc6iTfuZsiSNjXuO0LpxHX76jZ7cepnmJYp1Fm1/g5OTkz01NTXSZYhEpYLCIhas28vUJdtIyzxGx4T63D+8Mzf1a0N8TU1FUZ2Z2Wp3Ty6vna5oFokBufmFvPlxBs8v3cbugyfp3rIhz4ztx/WXtiJOU1FICQoFkWrsRF4Bc1YE5iXKPHqKvu2a8MQNl3BVjxaaikJKpVAQqYZyTubz8vIdvPThdg6dyGdIp+Y8dWtfLu/cXGEgZVIoiFQj2cdOMeOD7cz+aCfHThVwVfcWPHBlFwa0bxrp0iRKKBREqoG9h08Wz0uUV1jEdZe24sERXejZulH5G4uUoFAQiWI7so/z3JJtvLUmA3e4uV8b7hvRmc6JDSJdmkQphYJIFPpk/xGmLt7Gu+sD8xKNHZjE5GGdaNtU8xLJhVEoiESRtbsP8+x7afx7y+fUj49j0rBOTBzakRYNNS+RVAyFgkgV5+58lH6AqYu38UFaNo3r1uL7V3dlwuUdaFJP8xJJxVIoiFRR7s7iTzN59r00Pt51mIQGtfnxtd25bXB7GtTWr66Eh36yRKqYwiLn7xv3MWXxNrbsO0KbJnX52ahL+FZyO+rU0rxEEl4KBZEqIr+wiLfX7OG5pdtIzzpOp4T6/G50b27q14ZacZqXSCqHQkEkwnLzC3k9dTfPL01nz+GT9GjViCnj+jOyV0vNSySVTqEgEiHHThXwWspOpr+/nexjp+if1ISf3XQJV16seYkkchQKIpXs8Ik8Zi3fwUsf7iDnZD5DuyTwwJV9GdJJ8xJJ5CkURCpJ5tFcZry/nVdTdnI8r5Cre1zEg1d2pl+S5iWSqkOhIBJmGYdOMG1ZOvNX7Sa/sIgberfmgSs7072l5iWSqkehIBIm27KO8dySbby9Zg9mcEu/ttw3ojMdE+pHujSRs1IoiFSwzXuPMGVJGgs37CM+rgbjB7dn8rBOtG5SN9KliZRLoSBSQVbvPMSUxWm890kmDWrX5L7hnZk4tCMJDWpHujSRkCkURC6Au7N82wGefS+Nj9IP0KReLX54TTfuHNKBxvVqRbo8kXOmUBA5D+7Ov7dkMmVxGmt3H6ZFw9o8fn0Pxg5Mor7mJZIopp9ekXNQWOT8bcM+pi5O45P9R2nbtC4/v6kXowe01bxEUi0oFERCkFdQxF/WZPDckm3sOHCCLi0a8Idv9+EbfVprXiKpVhQKImXIzS9k3spdTFuWzt6cXC5p3YjnbuvP1y9pSQ3NSyTVkEJBpBRHc/OZnbKTmR9sJ/tYHsntm/KLWy5lRLdETUUh1ZpCQaSEQ8fzeOnD7cxavoMjuQVc0TWBh67swqBOzSNdmkilUCiIAJ8fyeXF99N5bcUuTuQV8vVLLuKBEV3o065JpEsTqVQKBYlpuw+e4Pml23g9NYOCoiJu7NOaB67sQreLGka6NJGIUChITErLPMrUJdt4Z+1eahiMHtCW+4Z3pn1zzUsksS2kUDCzXu6+MdzFiITb7oMn+OXCLfxj035q16zBnUM6MGlYR1o11rxEIhB6T+F5M4sHZgFz3P1wKBuZ2UjgaSAOeNHdf11Km28DPwUcWOfu40KsSeScbM8+zrjpKRw5mc8DIzpz91c60lzzEol8QUih4O5DzawrcDeQamYrgZfc/V9n28bM4oApwDVABrDKzBa4++YSbboCPwa+4u6HzKzFBeyLyFltyzrG2GkpFBQ5r993OT1b614GIqUJ+VJMd/8MeBx4BBgOPGNmn5jZLWfZZCCQ5u7p7p4HzANGndFmEjDF3Q8F3yPzXHdApDyffX6UW19IociduZMGKxBEyhBSKJhZbzN7CtgCfBX4hrv3CD5+6iybtQF2l1jOCK4rqRvQzcw+NLOU4HBTae8/2cxSzSw1KysrlJJFAPhk/xHGTEvBDOZNHszFLXVWkUhZQu0pPAt8DPRx9wfd/WMAd99LoPdQmtIu+/QzlmsCXYERwFjgRTP70onh7j7N3ZPdPTkxMTHEkiXWbdqbw9hpKdSMM+ZNHkyXFgoEkfKEGgpvuftsdz95eoWZfQ/A3WefZZsMoF2J5bbA3lLavOPu+e6+HfiUQEiIXJANGTmMm76CurXimD95CJ0TG0S6JJGoEGoo3FHKugnlbLMK6GpmHYNnLo0BFpzR5m3gSgAzSyAwnJQeYk0ipVq7+zDjXkyhQe2azL93CB10T2SRkJV59pGZjQXGAR3NrOQf9IbAgbK2dfcCM3sIWETglNSZ7r7JzJ4EUt19QfC5r5nZZqAQeNjdy3xdkbKs3nmICTNX0rR+PHMmDaJt03qRLkkkqpR3SupyYB+QAPy+xPqjwPryXtzdFwILz1j3kxKPHfhh8EPkgqzacZAJM1eS2LA2cycP1gVpIuehzFBw953ATmBI5ZQjcn5S0g9w96xVtGxch7mTBnNRozqRLkkkKpV5TMHMPgh+PmpmR0p8HDWzI5VTokjZPkzLZsJLK2nTpC7zJisQRC5EeT2FocHPOpdPqqSlW7OY/EoqHRPq8+o9g0jQtBUiFyTUi9cGm1nDEssNzGxQ+MoSKd/iTzKZ9HIqnRIbMGfSYAWCSAUI9ZTU54BjJZZPBNeJRMS/Nn/O5NmpdGvZgLmTBtGsfnykSxKpFkINBQueKQSAuxehezFIhPxj4z7uf3U1PVs14rWJg2lST4EgUlFCDYV0M/uumdUKfnwPXWQmEfDu+r08OGcNvds2ZvY9g2hcr1akSxKpVkINhfuAy4E9BKamGARMDldRIqV5Z+0evjt3Df2TmvDKxEE0qqNAEKlood5PIZPANBUiEfHm6gwefmMdAzs2Y8adl1G/tkYvRcKhvGku/tvdf2tmf+LLM5zi7t8NW2UiQX9etZtH3lrP5Z2b8+Idl1E3Pi7SJYlUW+X9u3X6Lmmp4S5EpDSvrdjJY3/ZyLBuiUy7fQB1aikQRMKpvFC4FXgXaOLuT1dCPSLFXvloBz95ZxNf7d6Cqbf1VyCIVILyDjQPMLP2wN1m1tTMmpX8qIwCJTbN+GA7P3lnE9f0vIjnxisQRCpLeT2F54F/AJ2A1XzxbmoeXC9SoV5Yuo1f/f0TRl7SkmfG9iO+Zsi3EheRC1Te3EfPAM+Y2XPufn8l1SQxbMriNH636FOu792KP97al1pxCgSRylTe2UeN3P0I8Fhpw0XufjBslUnMefrfn/HUv7cyqm9rfv+tPtRUIIhUuvKGj+YANxAYOnI0fCRh4O784V9b+dN7aXyzf1t+O7o3cTWs/A1FpMKVN3x0Q/Bzx8opR2KNu/PbRZ/y3JJt3Jrcjl/dcik1FAgiERPq1Nk3m1njEstNzOym8JUlscDd+eXCLTy3ZBu3DUpSIIhUAaEO2j7h7jmnF9z9MPBEeEqSWODuPPnuZqa/v507h7Tn5zf1UiCIVAGhTiBTWnho8hk5L0VFzhMLNjE7ZScTh3bk8et7YKZAEKkKQu0ppJrZH8yss5l1MrOnCBx8FjknRUXOY29vYHbKTu4d3kmBIFLFhBoK3wHygPnAn4GTwIPhKkqqp8Ii55E31zN35W4eurILj47srkAQqWJCnTr7OPComTVw92PlbiByhsIi5+HX1/HWmj18/+qufO+qrgoEkSoo1LOPLjezzQRnTTWzPmY2NayVSbVRUFjE9+ev5a01e/jRNd34/tXdFAgiVVSow0dPAV8HDgC4+zpgWLiKkuojv7CI785bw1/X7eWRkd35zlVdI12SiJQh5DOI3H33Gf/dFVZ8OVKd5BUU8Z25H7No0+c8fn0P7rlCF8CLVHWhhsJuM7sccDOLB74LbAlfWRLtThUU8uBrH/PvLZk88Y2e3PUVXRQvEg1CHT66j8DZRm2APUBfdPaRnEVufiH3zl7Nv7dk8rObeikQRKJIqGcfZQO3hbkWqQZO5hUyeXYqH6Rl86tbLmXswKRIlyQi5yDUs486mdlfzSzLzDLN7B0zK3eA2MxGmtmnZpZmZo+W0W60mbmZJZ9L8VK1nMgr4O5Zq/ggLZvffrO3AkEkCoU6fDSHwEVrrYDWwOvA3LI2MLM4YApwLdATGGtmPUtp15DAMYoVoZctVc2xUwVMeGkVK7Yf4A/f7sO3kttFuiQROQ+hhoK5+2x3Lwh+vErgfgplGQikuXu6u+cB84BRpbT7GfBbIDfkqqVKOZqbz50zV7J65yH+OKYfN/drG+mSROQ8hRoKi83sUTPrYGbtzey/gb+ZWbPS7sgW1AbYXWI5I7iumJn1A9q5+7vnXLlUCTkn87l9xkrW7T7Mn8b248Y+rSNdkohcgFBPSb01+Hly8PPpCxbu5ux3YCvtktXi3oWZ1SBwUdyE8t7czCaffu+kJI1TVxWHT+Rxx8yVbNl3hCm39efrl7SMdEkicoHK7CmY2WVm1tLdOwbvvva/wEbgr8CA4PqzHXDOAEoOLLcF9pZYbgj0ApaY2Q5gMLCgtIPN7j7N3ZPdPTkxMTHUfZMwOnQ8j3HTV/DJvqM8P36AAkGkmihv+OgFArOjYmbDgF8BLwM5wLRytl0FdDWzjsEL3sYAC04/6e457p7g7h3cvQOQAtzo7qnntSdSaQ4cO8XY6SmkZR1j2h0DuKrHRZEuSUQqSHnDR3HufjD4+FZgmru/CbxpZmvL2tDdC8zsIWAREAfMdPdNZvYkkOruC8raXqqmrKOnuO3FFHYeOMGMO5O5oqt6biLVSbmhYGY13b0AuIr/f0whlG1x94XAwjPW/eQsbUeU93oSWZlHchk7PYW9h3N56a7LuLxzQqRLEpEKVt4f9rnAUjPLJnBjnfcBzKwLgSEkiRH7c3IZNz2F/UdymXXXZQzq1DzSJYlIGJQZCu7+CzP7D4GL1v7p7qfPHqpB4G5sEgP2HD7JuOkpHDiWx+yJAxnQ/mxnIYtItAtlCCillHVbw1OOVDW7D55g7PQUck7mM3viQPolNY10SSISRiHfT0Fiz84Dxxk3fQXHThXw2j2D6N22SaRLEpEwUyhIqbZnH2fstBROFRTy2j2D6NWmcaRLEpFKoFCQL0nLPMa46SkUFDlzJg2mR6tGkS5JRCqJQkG+YOvnRxk3PTBh7bzJg+l2UcMIVyQilSnUCfEkBmzZd4Sx01KoYQoEkVilUBAANu3NYdz0FGrF1WDe5MF0adEg0iWJSARo+EjYkJHD+BkrqB8fx9zJg2nfvH6kSxKRCFEoxLg1uw5xx8yVNK5bi7mTBtOuWb1IlyQiEaRQiGGrdx7kzpmraFY/nrmTB9OmSd1IlyQiEaZjCjFq5faD3DFjJYkNazP/XgWCiAQoFGLQ8m3Z3DlzJS0b12He5MG0aqxAEJEADR/FmA8+y+aeV1bRrmk95kwaTGLD2pEuSUSqEPUUYsiSTzO5++VVdGhen3mTFQgi8mXqKcSI/2z5nPtf/ZguLRrw6j2DaFY/PtIliUgVpJ5CDFi0aT/3vbqai1s2ZM4kBYKInJ16CtXcwg37+O7cNfRq05iX7x5I47q1Il2SiFRhCoVq7K/r9vL9+Wvp264Js+66jIZ1FAgiUjYNH1VTb6/Zw/fmrWFAUlNevnugAkFEQqKeQjX0xuoMHn5jHYM7NmfGhGTqxevbLCKh0V+Lamb+ql08+tYGhnZJYNrtydSNj4t0SSISRTR8VI28mrKTR97cwLCuiUy/Q4EgIudOPYVq4uXlO3hiwSau6t6CqeP7U7umAkFEzp1CoRp48f10fv63LXyt50U8O64/8TXVARSR86NQiHLPL93Gr//+Cddd2pKnx/SjVpwCQUTOn0Ihij373mf83z+38o0+rXnq232oqUAQkQukUIhC7s7T//mMP/77M27u14bfje6tQBCRCqFQiDLuzu//uZVnF6cxekBbfvPN3sTVsEiXJSLVhEIhirg7v/7HJ7ywNJ0xl7XjlzdfSg0FgohUoLCOOZjZSDP71MzSzOzRUp7/oZltNrP1ZvYfM2sfznqimbvz879t4YWl6YwfnKRAEJGwCFsomFkcMAW4FugJjDWznmc0WwMku3tv4A3gt+GqJ5q5O//7183M+GA7Ey7vwM9G9VIgiEhYhLOnMBBIc/d0d88D5gGjSjZw98XufiK4mAK0DWM9UamoyHn87Y3MWr6De4Z25Ilv9MRMgSAi4RHOUGgD7C6xnBFcdzYTgb+X9oSZTTazVDNLzcrKqsASq7aiIud//rKB11bs4r7hnXns+h4KBBEJq3CGQml/vbzUhmbjgWTgd6U97+7T3D3Z3ZMTExMrsMSqq7DIefiN9cxbtZvvfLULj4y8WIEgImEXzrOPMoB2JZbbAnvPbGRmVwOPAcPd/VQY64kaBYVF/Nfr63h77V5+cHU3vnd110iXJCIxIpw9hVVAVzPraGbxwBhgQckGZtYPeAG40d0zw1hL1MgvLOL789fy9tq9PPz1ixUIIlKpwtZTcPcCM3sIWATEATPdfZOZPQmkuvsCAsNFDYDXg0Mju9z9xnDVVNXlFRTxvXlr+PvG/fz42u7cO7xzpEsSkRgT1ovX3H0hsPCMdT8p8fjqcL5/NDlVUMhDc9bwr82f8/j1Pbjnik6RLklEYpCuaK4CcvMLeeC1j3nvk0z+98ZLuPPyDpEuSURilEIhwnLzC7l39mqWbs3iFzf34rZBuqhbRCJHoRBBJ/MKmfRKKh9uy+Y337yUWy9LinRJIhLjFAoRciKvgImzUknZfoDfje7D6AG6mFtEIk+hEAHHThVw90urSN15kD/e2pdRfcu60FtEpPIoFCrZ0dx8Jry0irW7D/PM2H7c0Lt1pEsSESmmUKhEOSfzuWPmSjbtyeHZsf249tJWkS5JROQLFAqV5PCJPG6fsZJP9h9h6m39+dolLSNdkojIlygUKsHB43mMf3EFaZnHeOH2AXy1+0WRLklEpFQKhTDLPnaK8S+uYHv2cabfmczwbrExy6uIRCeFQhhlHs3ltukr2H3oBDPuvIyhXRMiXZKISJkUCmHy+ZFcxk5PYd/hXF6aMJAhnZtHuiQRkXIpFMJgX85Jxk1fQeaRXF6+eyADOzaLdEkiIiFRKFSwjEMnGDd9BQeP5/HKxIEMaK9AEJHooVCoQBv35HDv7NUcyc3n1XsG0bddk0iXJCJyTsJ557WY4e7M/mgHt0xdTpE7c+4ZrEAQkaiknsIFOpKbz4/f3MDfNuzjyosT+cO3+9K0fnykyxIROS8KhQuwcU8OD875mIxDJ/nxtd2ZdEUnatSwSJclInLeFArnwd2ZnbKTn7+7heYN4vnzvYN1QFlEqgWFwjk6kpvPo2+uZ+GG/Xy1ewt+/60+Gi4SkWpDoXAONmQEhov2HNZwkYhUTwqFELg7r3y0k1/8bQsJGi4SkWpMoVCOI7n5PPLGev6+UcNFIlL9KRTKUHK46H+u6849QzVcJCLVm0KhFBouEpFYpVA4g4aLRCSWKRRKWJ9xmIfmrGGvhotEJEYpFAgMF728fAe/WLiFxAa1mX/vEAa0bxrpskREKl3Mh0LOycBw0T827eeq7i34Pw0XiUgMi+lQWJ9xmAfnfMy+w7k8dl0P7rmiI2YaLhKR2BXWqbPNbKSZfWpmaWb2aCnP1zaz+cHnV5hZh3DWc5q789KH2/nmc8spLHTm3zuEScM6KRBEJOaFradgZnHAFOAaIANYZWYL3H1ziWYTgUPu3sXMxgC/AW4NV00QGC767zfWsWjT51zdIzBc1KSehotERCC8w0cDgTR3Twcws3nAKKBkKIwCfhp8/AbwrJmZu3s4Clq3+zAPzQ0MFz1+fQ8mDtVwkYhISeEMhTbA7hLLGcCgs7Vx9wIzywGaA9kVXczrqbv5n79soEXDOvz5viH0T9LZRSIiZwpnKJT2L/iZPYBQ2mBmk4HJAElJSedVTKfE+lzV/SJ+/c1LNVwkInIW4QyFDKBdieW2wN6ztMkws5pAY+DgmS/k7tOAaQASfpYiAAAHQElEQVTJycnnNbQ0oH0zBtyuqSpERMoSzrOPVgFdzayjmcUDY4AFZ7RZANwZfDwaeC9cxxNERKR8YespBI8RPAQsAuKAme6+ycyeBFLdfQEwA5htZmkEeghjwlWPiIiUL6wXr7n7QmDhGet+UuJxLvCtcNYgIiKhC+vFayIiEl0UCiIiUkyhICIixRQKIiJSTKEgIiLFLNouCzCzLGDneW6eQBim0KjitM+xQfscGy5kn9u7e2J5jaIuFC6EmaW6e3Kk66hM2ufYoH2ODZWxzxo+EhGRYgoFEREpFmuhMC3SBUSA9jk2aJ9jQ9j3OaaOKYiISNliracgIiJlqJahYGYjzexTM0szs0dLeb62mc0PPr/CzDpUfpUVK4R9/qGZbTaz9Wb2HzNrH4k6K1J5+1yi3WgzczOL+jNVQtlnM/t28Hu9yczmVHaNFS2En+0kM1tsZmuCP9/XRaLOimJmM80s08w2nuV5M7Nngl+P9WbWv0ILcPdq9UFgmu5tQCcgHlgH9DyjzQPA88HHY4D5ka67Evb5SqBe8PH9sbDPwXYNgWVACpAc6bor4fvcFVgDNA0ut4h03ZWwz9OA+4OPewI7Il33Be7zMKA/sPEsz18H/J3AnSsHAysq8v2rY09hIJDm7unungfMA0ad0WYU8HLw8RvAVWZW2q1Bo0W5++zui939RHAxhcCd8KJZKN9ngJ8BvwVyK7O4MAllnycBU9z9EIC7Z1ZyjRUtlH12oFHwcWO+fIfHqOLuyyjlDpQljAJe8YAUoImZtaqo96+OodAG2F1iOSO4rtQ27l4A5ADNK6W68Ahln0uaSOA/jWhW7j6bWT+gnbu/W5mFhVEo3+duQDcz+9DMUsxsZKVVFx6h7PNPgfFmlkHg/i3fqZzSIuZcf9/PSVhvshMhpf3Hf+YpVqG0iSYh74+ZjQeSgeFhrSj8ytxnM6sBPAVMqKyCKkEo3+eaBIaQRhDoDb5vZr3c/XCYawuXUPZ5LDDL3X9vZkMI3M2xl7sXhb+8iAjr36/q2FPIANqVWG7Ll7uTxW3MrCaBLmdZ3bWqLpR9xsyuBh4DbnT3U5VUW7iUt88NgV7AEjPbQWDsdUGUH2wO9Wf7HXfPd/ftwKcEQiJahbLPE4E/A7j7R0AdAnMEVVch/b6fr+oYCquArmbW0cziCRxIXnBGmwXAncHHo4H3PHgEJ0qVu8/BoZQXCARCtI8zQzn77O457p7g7h3cvQOB4yg3untqZMqtEKH8bL9N4KQCzCyBwHBSeqVWWbFC2eddwFUAZtaDQChkVWqVlWsBcEfwLKTBQI6776uoF692w0fuXmBmDwGLCJy5MNPdN5nZk0Cquy8AZhDoYqYR6CGMiVzFFy7Eff4d0AB4PXhMfZe73xixoi9QiPtcrYS4z4uAr5nZZqAQeNjdD0Su6gsT4j7/CJhuZj8gMIwyIZr/yTOzuQSG/xKCx0meAGoBuPvzBI6bXAekASeAuyr0/aP4ayciIhWsOg4fiYjIeVIoiIhIMYWCiIgUUyiIiEgxhYKIiBRTKEjMMLPHgjOHrjeztWY2qAJfe3nwcwczG1difbKZPVPOtveZ2R3BxxPMrHVF1SVyrnRKqsSE4PQHfwBGuPup4IVd8e5eoZOnmdkI4L/c/Ybz3H5JcPtovshOoph6ChIrWgHZp6f3cPdsd99rZgPMbKmZrTazRadnmzSzJWb2GzNbaWZbzeyK4PpLguvWBnscXYPrjwXf59fAFcHnf2BmI8zsXTOrYWY7zKzJ6YKC8+FfZGY/NbP/MrPRBOalei24/fVm9pcS7a8xs7cq5aslMUuhILHin0C74B/4qWY23MxqAX8CRrv7AGAm8IsS29R094HA9wlcVQpwH/C0u/cl8Ac844z3eRR43937uvtTp1cGJ2d7B7gZIDh0tcPdPy/R5g0gFbgt+PoLgR5mlhhschfw0gV/JUTKoFCQmODux4ABwGQC8+LMB+4lMGnev8xsLfA4X7zPxOn/ylcDHYKPPwL+x8weAdq7+8lzKGM+cGvw8Zjgclk1OzCbwLTQTYAhRP+U51LFVbu5j0TOxt0LgSUEZk7dADwIbHL3IWfZ5PRMsoUEf1fcfY6ZrQCuBxaZ2T3u/l6IJXwEdAn+538T8PMQtnkJ+CuBmwS9Hrz/h0jYqKcgMcHMLj49/h/UF9gCJAYPQmNmtczsknJepxOQ7u7PEJitsvcZTY4SmLb7S4L/+f+FwAHvLWeZqO4L2wcPhO8l0IuZVVZtIhVBPQWJFQ2APwWHYQoIzDA5mcD9fZ8xs8YEfh/+CGwq43VuJTCckw/sB5484/n1QIGZrSPwR3zNGc/PJzAd9ISzvP4s4HkzOwkMCQ5PvQYkuvvm8ndT5MLolFSRKs7MngXWuPuMSNci1Z9CQaQKM7PVwHHgmmpwtzyJAgoFEREppgPNIiJSTKEgIiLFFAoiIlJMoSAiIsUUCiIiUkyhICIixf4fVoozmjfQ8YYAAAAASUVORK5CYII=\n", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# TODO\n", - "# ROC curve\n", - "my_model.fit(features[train_indices], labels[train_indices], k=100, distance_f=distance, **kwargs_f)\n", - "pred_ratios = my_model.predict(features[test_indices])\n", - "\n", - "roc_sens, roc_spec_ = ROC(labels[test_indices], pred_ratios, np.arange(0.1, 1.0, 0.1))\n", - "plt.plot(roc_sens, roc_spec_)\n", - "plt.xlabel('Sensitivity')\n", - "plt.ylabel('Specificity')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 7: Assess suitability of *k*-NN to your dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use this cell to write about your understanding of why *k*-NN performed well if it did or why not if it didn't. What properties of the dataset affect the performance of the algorithm?" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/ProgrammingAssignment_1/model_solution.ipynb b/ProgrammingAssignment_1/model_solution.ipynb deleted file mode 100644 index c218359..0000000 --- a/ProgrammingAssignment_1/model_solution.ipynb +++ /dev/null @@ -1,274 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# JUPYTER NOTEBOOK TIPS\n", - "\n", - "Each rectangular box is called a cell. \n", - "* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.\n", - "* Alt+ENTER evaluates the current cell and adds a new cell below it.\n", - "* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two \"d\"s in a row) when the frame is blue." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Supervised Learning Model Skeleton\n", - "\n", - "We'll use this skeleton for implementing different supervised learning algorithms." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " \n", - " def fit(self):\n", - " \n", - " raise NotImplementedError\n", - " \n", - " def predict(self, test_points):\n", - " raise NotImplementedError" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def preprocess(feature_file, label_file):\n", - " '''\n", - " Args:\n", - " feature_file: str \n", - " file containing features\n", - " label_file: str\n", - " file containing labels\n", - " Returns:\n", - " features: ndarray\n", - " nxd features\n", - " labels: ndarray\n", - " nx1 labels\n", - " '''\n", - " \n", - " # read in features and labels\n", - " features = np.genfromtxt(feature_file)\n", - " labels = np.genfromtxt(label_file)\n", - " \n", - " return features, labels" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "def partition(size, t, v = 0):\n", - " '''\n", - " Args:\n", - " size: int\n", - " number of examples in the whole dataset\n", - " t: float\n", - " proportion kept for test\n", - " v: float\n", - " proportion kept for validation\n", - " Returns:\n", - " test_indices: ndarray\n", - " 1D array containing test set indices\n", - " val_indices: ndarray\n", - " 1D array containing validation set indices\n", - " '''\n", - " \n", - " # number of test and validation examples\n", - " t_size = np.int(np.ceil(size*t))\n", - " v_size = np.int(np.ceil(size*v))\n", - "\n", - " # shuffle the indices\n", - " permuted = np.random.permutation(size)\n", - " \n", - " # spare the first t_size for test\n", - " test_indices = permuted[:t_size]\n", - " # and the next v_size for validation\n", - " val_indices = permuted[t_size+1:t_size+v_size+1]\n", - " train_indices = np.delete(np.arange(size), np.append(test_indices, val_indices), 0)\n", - " \n", - " return test_indices, val_indices, train_indices" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## TASK 1: Implement `distance` function" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\"distance\" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "#TODO: Programming Assignment 1\n", - "def distance(x, y, metric):\n", - " '''\n", - " Args:\n", - " x: ndarray \n", - " 1D array containing coordinates for a point\n", - " y: ndarray\n", - " 1D array containing coordinates for a point\n", - " metric: str\n", - " Euclidean, Manhattan \n", - " Returns:\n", - " dist: float\n", - " '''\n", - " if metric == 'Euclidean':\n", - " dist = np.sqrt(np.sum(np.square((x-y))))\n", - " elif metric == 'Manhattan':\n", - " dist = np.sum(abs(x-y))\n", - " else:\n", - " raise ValueError('{} is not a valid metric.'.format(metric))\n", - " return dist # scalar distance btw x and y" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## General supervised learning performance related functions " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Implement the \"conf_matrix\" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Programming Assignment 1\n", - "\n", - "def conf_matrix(true, pred):\n", - " '''\n", - " Args: \n", - " true: ndarray\n", - " nx1 array of true labels for test set\n", - " pred: ndarray \n", - " nx1 array of predicted labels for test set\n", - " Returns:\n", - " ndarray\n", - " '''\n", - " \n", - " tp = tn = fp = fn = 0\n", - " # calculate true positives (tp), true negatives(tn)\n", - " # false positives (fp) and false negatives (fn)\n", - " \n", - " size = len(true)\n", - " for i in range(size):\n", - " if true[i]==1:\n", - " if pred[i] > 0:\n", - " \n", - " tp += 1\n", - " else:\n", - " \n", - " fn += 1\n", - " else:\n", - " if pred[i] == 0:\n", - " \n", - " tn += 1 \n", - " else:\n", - " \n", - " fp += 1 \n", - " \n", - " # returns the confusion matrix as numpy.ndarray\n", - " return np.array([tp,tn, fp, fn])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. \"ROC\" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: Programming Assignment 1\n", - "\n", - "def ROC(true_labels, preds, value_list):\n", - " '''\n", - " Args:\n", - " true_labels: ndarray\n", - " 1D array containing true labels\n", - " preds: ndarray\n", - " 1D array containing thresholded value (e.g. proportion of positive neighbors in kNN)\n", - " value_list: ndarray\n", - " 1D array containing different threshold values\n", - " Returns:\n", - " sens: ndarray\n", - " 1D array containing sensitivities\n", - " spec_: ndarray\n", - " 1D array containing 1-specifities\n", - " '''\n", - " \n", - " # use conf_matrix to calculate tp, tn, fp, fn\n", - " # calculate sensitivity, 1-specificity\n", - " # return two arrays\n", - " sens = []\n", - " spec_ = []\n", - " for threshold in value_list:\n", - " pred_labels = [1 if x >= threshold else 0 for x in pred_ratios]\n", - " tp,tn, fp, fn = conf_matrix(true_labels, pred_labels) \n", - " se = tp/(tp+fn)\n", - " sens.append(se)\n", - " spec = tn/(tn+fp)\n", - " spec_.append(1 - spec)\n", - " \n", - " return np.array(sens), np.array(spec_)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} -- GitLab