PA1 missing learning curve, PA2 missing naive Bayes

b9420489 · Zeynep Hakguder · 76b172e1 · b9420489 · b9420489 · b9420489
Commit b9420489 authored May 22, 2018 by Zeynep Hakguder
--- a/ProgrammingAssignment1.ipynb
+++ b/ProgrammingAssignment1.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# $k$-Nearest Neighbor\n",
+    "\n",
+    "We'll implement $k$-Nearest Neighbor ($k$-NN) algorithm for this assignment. A skeleton of a general supervised learning model is provided in \"model.ipynb\". Please look through it and complete the \"preprocess\" and \"partition\" methods.\n",
+    "\n",
+    "### Assignment Goals:\n",
+    "In this assignment, we will:\n",
+    "* learn to split a dataset into training/validation/test partitions \n",
+    "* use the validation dataset to find a good value for $k$\n",
+    "* Having found the \"best\" $k$, we'll obtain final performance measures:\n",
+    "    * accuracy, generalization error and ROC curve\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Following code makes the Model class and relevant functions available from model.ipynb."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run 'model.ipynb'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Choice of distance metric plays an important role in the performance of $k$-NN. Let's start by implementing a distance method in the \"distance\" function below. It should take two data points and the name of the metric and return a scalar value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def distance(x, y, metric):\n",
+    "    '''\n",
+    "    x: a 1xd array\n",
+    "    y: a 1xd array\n",
+    "    metric: Euclidean, Hamming, etc.\n",
+    "    '''\n",
+    "    raise NotImplementedError\n",
+    "   \n",
+    "    return dist # scalar distance btw x and y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### $k$-NN Class Methods"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can start implementing our $k$-NN classifier. $k$-NN class inherits Model class. You'll need to implement \"fit\" and \"predict\" methods. Use the \"distance\" function you defined above. \"fit\" method takes $k$ as an argument. \"predict\" takes as input the feature vector for a single test point and outputs the predicted class and the proportion of predicted class labels in $k$ nearest neighbors."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class kNN(Model):\n",
+    "    '''\n",
+    "    Inherits Model class. Implements the k-NN algorithm for classification.\n",
+    "    '''\n",
+    "    def __init__(self, preprocessor_f, partition_f, distance_f):\n",
+    "        super().__init__(preprocessor_f, partition_f)\n",
+    "        \n",
+    "        # set self.distance_f and self.distance_metric\n",
+    "        \n",
+    "        \n",
+    "    def fit(self, k):\n",
+    "        '''\n",
+    "        Fit the model. This is pretty straightforward for k-NN.\n",
+    "        '''\n",
+    "        \n",
+    "        raise NotImplementedError\n",
+    "        \n",
+    "        return\n",
+    "    \n",
+    "    \n",
+    "    def predict(self, test_point):\n",
+    "        \n",
+    "        raise NotImplementedError\n",
+    "        \n",
+    "        \n",
+    "        # use self.distance_f(...,self.distance_metric)\n",
+    "        \n",
+    "        # return the predicted class label and the following ratio: \n",
+    "        # number of points that have the same label as the test point / k\n",
+    "        return predicted_label, ratio\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Build and Evaluate the Model (Accuracy, Confidence Interval, Confusion Matrix)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It's time to build and evaluate our model now. Remember you need to provide values to $p$, $v$ parameters for \"partition\" function and to $file\\_path$ for \"preprocess\" function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# populate the keyword arguments dictionary kwargs\n",
+    "kwargs = {'p': 0.3, 'v': 0.1, 'file_path': 'mnist_test.csv', 'metric': 'Euclidean'}\n",
+    "# initialize the model\n",
+    "my_model = kNN(preprocessor_f=preprocess, partition_f=partition, distance=distance, **kwargs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Assign a value to $k$ and fit the $k$-NN model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "my_model.fit(k=10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can use \"predict_batch\" function below to evaluate your model on the test data. You do not need to change the value of the threshold yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def predict_batch(model, indices, threshold=0.5):\n",
+    "    '''\n",
+    "    model: a fitted k-NN model\n",
+    "    indices: for data points to predict\n",
+    "    threshold: lower limit on the ratio for a point to be considered positive\n",
+    "    '''\n",
+    "    \n",
+    "    predicted_labels = []\n",
+    "    true_labels = []\n",
+    "\n",
+    "    for index in indices:\n",
+    "        # vary the threshold value for ROC analysis\n",
+    "        predicted_classes.append(model.predict(model.features[index], threshold))\n",
+    "        true_classes.append(model.labels[index])\n",
+    "\n",
+    "    return predicted_labels, true_labels"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use \"predict_batch\" function above to report your model's accuracy on the test set. Also, calculate and report the confidence interval on the generalization error estimate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predict_batch(my_model, my_model.test_indices)\n",
+    "# Calculate accuracy and generalization error with confidence interval here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# TODO: leaa \n",
+    "\n",
+    "Now that we have the true labels and the predicted ones from our model, we can build a confusion matrix and see how accurate our model is. Implement the \"conf_matrix\" function that takes as input an array of true labels ($true$) and an array of predicted labels ($pred$). It should output a numpy.ndarray. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def conf_matrix(true, pred):\n",
+    "    '''\n",
+    "    true: nx1 array of true labels for test set\n",
+    "    pred: nx1 array of predicted labels for test set\n",
+    "    '''\n",
+    "    raise NotImplementedError\n",
+    "    # returns the confusion matrix as numpy.ndarray\n",
+    "    return c_mat"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Finding a good value for $k$\n",
+    "\n",
+    "We can use the validation set to come up with a $k$ value that results in better performance in terms of accuracy. Additionally, in some cases, predicting examples from a certain class correctly is more critical than other classes. In those cases, we can use the confusion matrix to find a good trade off between correct and wrong predictions and allow more wrong predictions in some classes to predict more examples correctly in a that class.\n",
+    "\n",
+    "Below calculate the accuracies and confusion matrices for different values of $k$ using the validation set. Report a good $k$ value and use it in the analyses that follow this section."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Change values of $k. \n",
+    "# Calculate accuracies and confusion matrices for the validation set.\n",
+    "# Report a good k value that you'll use in the following analyses."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ROC curve and confusion matrix for the final model\n",
+    "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement a \"ROC\" function that predicts the labels of the test set examples using different $threshold$ values in \"predict\" and plot the ROC curve. \"ROC\" takes a list containing different $threshold$ parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def ROC(model, indices, value_list):\n",
+    "    '''\n",
+    "    model: a fitted k-NN model\n",
+    "    indices: for data points to predict\n",
+    "    value_list: array containing different threshold values\n",
+    "    Calculate sensitivity and 1-specificity for each point in value_list\n",
+    "    Return two nX1 arrays: sens (for sensitivities) and spec_ (for 1-specificities)\n",
+    "    '''\n",
+    "    \n",
+    "    # use predict_batch to obtain predicted labels at different threshold values\n",
+    "    raise NotImplementedError\n",
+    "    \n",
+    "    return sens, spec_"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can finally create the confusion matrix and plot the ROC curve for our optimal $k$-NN classifier."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# confusion matrix\n",
+    "conf_matrix(true_classes, predicted_classes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ROC curve\n",
+    "roc_sens, roc_spec_ = ROC(my_model, my_model.test_indices, np.arange(0.1, 1.0, 0.1))\n",
+    "plt.plot(roc_sens, roc_spec_)\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# $k$-Nearest Neighbor
+
+We'll implement $k$-Nearest Neighbor ($k$-NN) algorithm for this assignment. A skeleton of a general supervised learning model is provided in "model.ipynb". Please look through it and complete the "preprocess" and "partition" methods.
+
+### Assignment Goals:
+In this assignment, we will:
+* learn to split a dataset into training/validation/test partitions
+* use the validation dataset to find a good value for $k$
+* Having found the "best" $k$, we'll obtain final performance measures:
+    * accuracy, generalization error and ROC curve
+
+%% Cell type:markdown id: tags:
+
+You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+%% Cell type:markdown id: tags:
+
+Following code makes the Model class and relevant functions available from model.ipynb.
+
+%% Cell type:code id: tags:
+
+``` python
+%run 'model.ipynb'
+```
+
+%% Cell type:markdown id: tags:
+
+Choice of distance metric plays an important role in the performance of $k$-NN. Let's start by implementing a distance method in the "distance" function below. It should take two data points and the name of the metric and return a scalar value.
+
+%% Cell type:code id: tags:
+
+``` python
+def distance(x, y, metric):
+    '''
+    x: a 1xd array
+    y: a 1xd array
+    metric: Euclidean, Hamming, etc.
+    '''
+    raise NotImplementedError
+
+    return dist # scalar distance btw x and y
+```
+
+%% Cell type:markdown id: tags:
+
+### $k$-NN Class Methods
+
+%% Cell type:markdown id: tags:
+
+We can start implementing our $k$-NN classifier. $k$-NN class inherits Model class. You'll need to implement "fit" and "predict" methods. Use the "distance" function you defined above. "fit" method takes $k$ as an argument. "predict" takes as input the feature vector for a single test point and outputs the predicted class and the proportion of predicted class labels in $k$ nearest neighbors.
+
+%% Cell type:code id: tags:
+
+``` python
+class kNN(Model):
+    '''
+    Inherits Model class. Implements the k-NN algorithm for classification.
+    '''
+    def __init__(self, preprocessor_f, partition_f, distance_f):
+        super().__init__(preprocessor_f, partition_f)
+
+        # set self.distance_f and self.distance_metric
+
+
+    def fit(self, k):
+        '''
+        Fit the model. This is pretty straightforward for k-NN.
+        '''
+
+        raise NotImplementedError
+
+        return
+
+
+    def predict(self, test_point):
+
+        raise NotImplementedError
+
+
+        # use self.distance_f(...,self.distance_metric)
+
+        # return the predicted class label and the following ratio:
+        # number of points that have the same label as the test point / k
+        return predicted_label, ratio
+
+```
+
+%% Cell type:markdown id: tags:
+
+### Build and Evaluate the Model (Accuracy, Confidence Interval, Confusion Matrix)
+
+%% Cell type:markdown id: tags:
+
+It's time to build and evaluate our model now. Remember you need to provide values to $p$, $v$ parameters for "partition" function and to $file\_path$ for "preprocess" function.
+
+%% Cell type:code id: tags:
+
+``` python
+# populate the keyword arguments dictionary kwargs
+kwargs = {'p': 0.3, 'v': 0.1, 'file_path': 'mnist_test.csv', 'metric': 'Euclidean'}
+# initialize the model
+my_model = kNN(preprocessor_f=preprocess, partition_f=partition, distance=distance, **kwargs)
+```
+
+%% Cell type:markdown id: tags:
+
+Assign a value to $k$ and fit the $k$-NN model.
+
+%% Cell type:code id: tags:
+
+``` python
+my_model.fit(k=10)
+```
+
+%% Cell type:markdown id: tags:
+
+You can use "predict_batch" function below to evaluate your model on the test data. You do not need to change the value of the threshold yet.
+
+%% Cell type:code id: tags:
+
+``` python
+def predict_batch(model, indices, threshold=0.5):
+    '''
+    model: a fitted k-NN model
+    indices: for data points to predict
+    threshold: lower limit on the ratio for a point to be considered positive
+    '''
+
+    predicted_labels = []
+    true_labels = []
+
+    for index in indices:
+        # vary the threshold value for ROC analysis
+        predicted_classes.append(model.predict(model.features[index], threshold))
+        true_classes.append(model.labels[index])
+
+    return predicted_labels, true_labels
+```
+
+%% Cell type:markdown id: tags:
+
+Use "predict_batch" function above to report your model's accuracy on the test set. Also, calculate and report the confidence interval on the generalization error estimate.
+
+%% Cell type:code id: tags:
+
+``` python
+predict_batch(my_model, my_model.test_indices)
+# Calculate accuracy and generalization error with confidence interval here.
+```
+
+%% Cell type:markdown id: tags:
+
+# TODO: leaa
+
+Now that we have the true labels and the predicted ones from our model, we can build a confusion matrix and see how accurate our model is. Implement the "conf_matrix" function that takes as input an array of true labels ($true$) and an array of predicted labels ($pred$). It should output a numpy.ndarray.
+
+%% Cell type:code id: tags:
+
+``` python
+def conf_matrix(true, pred):
+    '''
+    true: nx1 array of true labels for test set
+    pred: nx1 array of predicted labels for test set
+    '''
+    raise NotImplementedError
+    # returns the confusion matrix as numpy.ndarray
+    return c_mat
+```
+
+%% Cell type:markdown id: tags:
+
+### Finding a good value for $k$
+
+We can use the validation set to come up with a $k$ value that results in better performance in terms of accuracy. Additionally, in some cases, predicting examples from a certain class correctly is more critical than other classes. In those cases, we can use the confusion matrix to find a good trade off between correct and wrong predictions and allow more wrong predictions in some classes to predict more examples correctly in a that class.
+
+Below calculate the accuracies and confusion matrices for different values of $k$ using the validation set. Report a good $k$ value and use it in the analyses that follow this section.
+
+%% Cell type:code id: tags:
+
+``` python
+# Change values of $k.
+# Calculate accuracies and confusion matrices for the validation set.
+# Report a good k value that you'll use in the following analyses.
+```
+
+%% Cell type:markdown id: tags:
+
+### ROC curve and confusion matrix for the final model
+ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. Now, implement a "ROC" function that predicts the labels of the test set examples using different $threshold$ values in "predict" and plot the ROC curve. "ROC" takes a list containing different $threshold$ parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
+
+%% Cell type:code id: tags:
+
+``` python
+def ROC(model, indices, value_list):
+    '''
+    model: a fitted k-NN model
+    indices: for data points to predict
+    value_list: array containing different threshold values
+    Calculate sensitivity and 1-specificity for each point in value_list
+    Return two nX1 arrays: sens (for sensitivities) and spec_ (for 1-specificities)
+    '''
+
+    # use predict_batch to obtain predicted labels at different threshold values
+    raise NotImplementedError
+
+    return sens, spec_
+```
+
+%% Cell type:markdown id: tags:
+
+We can finally create the confusion matrix and plot the ROC curve for our optimal $k$-NN classifier.
+
+%% Cell type:code id: tags:
+
+``` python
+# confusion matrix
+conf_matrix(true_classes, predicted_classes)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# ROC curve
+roc_sens, roc_spec_ = ROC(my_model, my_model.test_indices, np.arange(0.1, 1.0, 0.1))
+plt.plot(roc_sens, roc_spec_)
+plt.show()
+```
--- a/ProgrammingAssignment2.ipynb
+++ b/ProgrammingAssignment2.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Linear Regression & Naive Bayes\n",
+    "\n",
+    "We'll implement linear regression & Naive Bayes algorithms for this assignment. Please modify the \"preprocess\" and \"partition\" methods in \"model.ipynb\" to suit your datasets for this assignment. In this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.\n",
+    "\n",
+    "### Assignment Goals:\n",
+    "In this assignment, we will:\n",
+    "* implement linear regression\n",
+    "    * use gradient descent for optimization\n",
+    "    * use residuals to decide if we need a polynomial model\n",
+    "    * change our model to quadratic/cubic regression and use cross-validation to find the \"best\" polynomial degree\n",
+    "    * implement regularization techniques\n",
+    "        * $l_1$/$l_2$ regularization\n",
+    "        * use cross-validation to find a good regularization parameter $\\lambda$\n",
+    "        \n",
+    "* implement Naive Bayes\n",
+    "    * address sparse data problem with **pseudocounts** (**$m$-estimate**)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Following code makes the Model class and relevant functions available from \"model.ipynb\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run 'model.ipynb'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def mse(y_pred, y_true):\n",
+    "    '''\n",
+    "    y_hat: values predicted by our method\n",
+    "    y_true: true y values\n",
+    "    '''\n",
+    "    raise NotImplementedError\n",
+    "    # returns mean squared error between y_pred and y_true\n",
+    "    return cost\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll start by implementing a partition function for $k$-fold cross-validation. $5$ and $10$ are commonly used values for $k$. You can use either one of them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def kfold(k):\n",
+    "    '''\n",
+    "    k: number of desired splits in data.\n",
+    "    Assume test set is already separated.\n",
+    "    This function chooses 1/k of training indices randomly and separates them as validation partition.\n",
+    "    It returns the 1/k selected indices as val_indices and the remaining 1-(1/k) of the indices as train_indices\n",
+    "    '''\n",
+    "    \n",
+    "    return train_indices, val_indices"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class linear_regression(Model):\n",
+    "    def __init__(self, preprocessor_f, partition_f, **kwargs):\n",
+    "        super().__init__(preprocessor_f, partition_f, **kwargs)\n",
+    "        theta = \n",
+    "    # You can disregard polynomial_degree and regularizer in your first pass\n",
+    "    def fit(self, learning_rate = 0.001, epochs = 1000, regularizer=None, polynomial_degree=1, **kwargs):\n",
+    "        \n",
+    "        # for each epoch\n",
+    "        # compute y_hat array which holds model predictions for training examples\n",
+    "        y_hat = None\n",
+    "        # use mse function to find the cost\n",
+    "        cost = None\n",
+    "        # calculate gradients wrt theta\n",
+    "        grad_theta = None\n",
+    "        # update theta\n",
+    "        theta_curr = None\n",
+    "        raise NotImplementedError\n",
+    "        \n",
+    "        return theta\n",
+    "    \n",
+    "    def predict(self, indices):\n",
+    "        \n",
+    "        raise NotImplementedError\n",
+    "        \n",
+    "        return y_hat\n",
+    "        "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# populate the keyword arguments dictionary kwargs\n",
+    "kwargs = {'p': 0.3, 'v': 0.0, 'file_path': 'mnist_test.csv', 'k': 5}\n",
+    "# initialize the model\n",
+    "my_model = linear_regression(preprocessor_f=preprocess, partition_f=partition, k_fold=True, **kwargs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# use fit_kwargs to pass arguments to regularization function\n",
+    "fit_kwargs = {}\n",
+    "my_model.fit(**fit_kwargs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Residuals are the differences between the predicted value $y_{hat}$ and the true value $y$ for each example. Predict $y_{hat}$ for the validation set. Calculate and plot residuals. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y_hat_val = my_model.predict(my_model.features[my_model.val_indices])\n",
+    "residuals = my_model.labels[my_model.val_indices] - y_hat_val\n",
+    "plt.plot(residuals)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If the data is better suited for quadratic/cubic regression, regions of positive and negative residuals will alternate in the plot. Regardless, modify the fit and predict in the class definition to raise the feature values to $polynomial\\_degree$. You can directly make the modification in the above definition, do not repeat. Use the validation set to find among the degree of polynomial that results in lowest \"mse\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# calculate mse for linear model\n",
+    "fit_kwargs = {}\n",
+    "my_model.fit(polynomial_degree = 1 ,**fit_kwargs)\n",
+    "pred_3 = my_model.predict(my_model.features[my_model.val_indices])\n",
+    "mse_1 = mse(pred_2, my_model.labels[my_model.val_indices])\n",
+    "\n",
+    "# calculate mse for quadratic model\n",
+    "my_model.fit(polynomial_degree = 2 ,**fit_kwargs)\n",
+    "pred_2 = my_model.predict(my_model.features[my_model.val_indices])\n",
+    "mse_2 = mse(pred_2, my_model.labels[my_model.val_indices])\n",
+    "\n",
+    "# calculate mse for cubic model\n",
+    "my_model.fit(polynomial_degree = 3 ,**fit_kwargs)\n",
+    "pred_3 = my_model.predict(my_model.features[my_model.val_indices])\n",
+    "mse_3 = mse(pred_2, my_model.labels[my_model.val_indices])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define \"regularization\" function which implements $l_1$ and $l_2$ regularization. You'll use this function in \"fit\" method of \"linear_regression\" class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def regularization(method):\n",
+    "    if method == \"l1\":\n",
+    "        raise NotImplementedError\n",
+    "    elif method == \"l2\":\n",
+    "        raise NotImplementedError"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using the validation set, and the value of $polynomial_{degree}$ you found above, try different values of $\\lambda$ to find a a good value that results in low $mse$. You can "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# Linear Regression & Naive Bayes
+
+We'll implement linear regression & Naive Bayes algorithms for this assignment. Please modify the "preprocess" and "partition" methods in "model.ipynb" to suit your datasets for this assignment. In this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.
+
+### Assignment Goals:
+In this assignment, we will:
+* implement linear regression
+    * use gradient descent for optimization
+    * use residuals to decide if we need a polynomial model
+    * change our model to quadratic/cubic regression and use cross-validation to find the "best" polynomial degree
+    * implement regularization techniques
+        * $l_1$/$l_2$ regularization
+        * use cross-validation to find a good regularization parameter $\lambda$
+
+* implement Naive Bayes
+    * address sparse data problem with **pseudocounts** (**$m$-estimate**)
+
+%% Cell type:markdown id: tags:
+
+You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
+
+%% Cell type:code id: tags:
+
+``` python
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+%% Cell type:markdown id: tags:
+
+Following code makes the Model class and relevant functions available from "model.ipynb".
+
+%% Cell type:code id: tags:
+
+``` python
+%run 'model.ipynb'
+```
+
+%% Cell type:code id: tags:
+
+``` python
+def mse(y_pred, y_true):
+    '''
+    y_hat: values predicted by our method
+    y_true: true y values
+    '''
+    raise NotImplementedError
+    # returns mean squared error between y_pred and y_true
+    return cost
+
+```
+
+%% Cell type:markdown id: tags:
+
+We'll start by implementing a partition function for $k$-fold cross-validation. $5$ and $10$ are commonly used values for $k$. You can use either one of them.
+
+%% Cell type:code id: tags:
+
+``` python
+def kfold(k):
+    '''
+    k: number of desired splits in data.
+    Assume test set is already separated.
+    This function chooses 1/k of training indices randomly and separates them as validation partition.
+    It returns the 1/k selected indices as val_indices and the remaining 1-(1/k) of the indices as train_indices
+    '''
+
+    return train_indices, val_indices
+```
+
+%% Cell type:code id: tags:
+
+``` python
+class linear_regression(Model):
+    def __init__(self, preprocessor_f, partition_f, **kwargs):
+        super().__init__(preprocessor_f, partition_f, **kwargs)
+        theta =
+    # You can disregard polynomial_degree and regularizer in your first pass
+    def fit(self, learning_rate = 0.001, epochs = 1000, regularizer=None, polynomial_degree=1, **kwargs):
+
+        # for each epoch
+        # compute y_hat array which holds model predictions for training examples
+        y_hat = None
+        # use mse function to find the cost
+        cost = None
+        # calculate gradients wrt theta
+        grad_theta = None
+        # update theta
+        theta_curr = None
+        raise NotImplementedError
+
+        return theta
+
+    def predict(self, indices):
+
+        raise NotImplementedError
+
+        return y_hat
+
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# populate the keyword arguments dictionary kwargs
+kwargs = {'p': 0.3, 'v': 0.0, 'file_path': 'mnist_test.csv', 'k': 5}
+# initialize the model
+my_model = linear_regression(preprocessor_f=preprocess, partition_f=partition, k_fold=True, **kwargs)
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# use fit_kwargs to pass arguments to regularization function
+fit_kwargs = {}
+my_model.fit(**fit_kwargs)
+```
+
+%% Cell type:markdown id: tags:
+
+Residuals are the differences between the predicted value $y_{hat}$ and the true value $y$ for each example. Predict $y_{hat}$ for the validation set. Calculate and plot residuals.
+
+%% Cell type:code id: tags:
+
+``` python
+y_hat_val = my_model.predict(my_model.features[my_model.val_indices])
+residuals = my_model.labels[my_model.val_indices] - y_hat_val
+plt.plot(residuals)
+plt.show()
+```
+
+%% Cell type:markdown id: tags:
+
+If the data is better suited for quadratic/cubic regression, regions of positive and negative residuals will alternate in the plot. Regardless, modify the fit and predict in the class definition to raise the feature values to $polynomial\_degree$. You can directly make the modification in the above definition, do not repeat. Use the validation set to find among the degree of polynomial that results in lowest "mse".
+
+%% Cell type:code id: tags:
+
+``` python
+# calculate mse for linear model
+fit_kwargs = {}
+my_model.fit(polynomial_degree = 1 ,**fit_kwargs)
+pred_3 = my_model.predict(my_model.features[my_model.val_indices])
+mse_1 = mse(pred_2, my_model.labels[my_model.val_indices])
+
+# calculate mse for quadratic model
+my_model.fit(polynomial_degree = 2 ,**fit_kwargs)
+pred_2 = my_model.predict(my_model.features[my_model.val_indices])
+mse_2 = mse(pred_2, my_model.labels[my_model.val_indices])
+
+# calculate mse for cubic model
+my_model.fit(polynomial_degree = 3 ,**fit_kwargs)
+pred_3 = my_model.predict(my_model.features[my_model.val_indices])
+mse_3 = mse(pred_2, my_model.labels[my_model.val_indices])
+```
+
+%% Cell type:markdown id: tags:
+
+Define "regularization" function which implements $l_1$ and $l_2$ regularization. You'll use this function in "fit" method of "linear_regression" class.
+
+%% Cell type:code id: tags:
+
+``` python
+def regularization(method):
+    if method == "l1":
+        raise NotImplementedError
+    elif method == "l2":
+        raise NotImplementedError
+```
+
+%% Cell type:markdown id: tags:
+
+Using the validation set, and the value of $polynomial_{degree}$ you found above, try different values of $\lambda$ to find a a good value that results in low $mse$. You can
--- a/mnist_test.csv
+++ b/mnist_test.csv