Upload New File

771455f3 · Zeynep Hakguder · ac43814f · 771455f3
Commit 771455f3 authored 7 years ago by Zeynep Hakguder
--- a/ProgrammingAssignment_2/model.ipynb
+++ b/ProgrammingAssignment_2/model.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# JUPYTER NOTEBOOK TIPS\n",
+    "\n",
+    "Each rectangular box is called a cell. \n",
+    "* ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.\n",
+    "* alt+ENTER evaluates the current cell and adds a new cell below it.\n",
+    "* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two \"d\"s in a row) when the frame is blue."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Supervised Learning Model Skeleton\n",
+    "\n",
+    "We'll use this skeleton for implementing different supervised learning algorithms."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Model:\n",
+    "        \n",
+    "    def fit(self):\n",
+    "        \n",
+    "        raise NotImplementedError\n",
+    "    \n",
+    "    def predict(self, test_points):\n",
+    "        raise NotImplementedError"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def preprocess(data_f, feature_names_f):\n",
+    "    '''\n",
+    "    data_f: where to read the dataset from\n",
+    "    feature_names_f: where to read the feature names from\n",
+    "    Returns:\n",
+    "        features: ndarray\n",
+    "            nxd array containing `float` feature values\n",
+    "        labels: ndarray\n",
+    "            1D array containing `float` label\n",
+    "    '''\n",
+    "    # You might find np.genfromtxt useful for reading in the file. Be careful with the file delimiter, \n",
+    "    # e.g. for comma-separated files use delimiter=',' argument.\n",
+    "    \n",
+    "    data = np.genfromtxt(data_f)\n",
+    "    features = data[:,:-1]\n",
+    "    target = data[:,-1]\n",
+    "    feature_names = np.genfromtxt(feature_names_f, dtype='unicode')\n",
+    "    \n",
+    "    return features, feature_names, target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In cases where data is not abundantly available, we resort to getting an error estimate from average of error on different splits of dataset. In this case, every fold of data is used for testing and for training in turns, i.e. assuming we split our data into 3 folds, we'd\n",
+    "* train our model on fold-1+fold-2 and test on fold-3\n",
+    "* train our model on fold-1+fold-3 and test on fold-2\n",
+    "* train our model on fold-2+fold-3 and test on fold-1.\n",
+    "\n",
+    "We'd use the average of the error we obtained in three runs as our error estimate. \n",
+    "\n",
+    "Implement function \"kfold\" below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: Programming Assignment 2\n",
+    "\n",
+    "def kfold(indices, k):\n",
+    "\n",
+    "    '''\n",
+    "    Args:\n",
+    "        indices: ndarray\n",
+    "            1D array with integer entries containing indices\n",
+    "        k: int \n",
+    "            Number of desired splits in data.(Assume test set is already separated.)\n",
+    "        Returns:\n",
+    "        fold_dict: dict\n",
+    "            A dictionary with integer keys corresponding to folds. Values are (training_indices, val_indices).\n",
+    "        \n",
+    "        val_indices: ndarray\n",
+    "            1/k of training indices randomly chosen and separates them as validation partition.\n",
+    "        train_indices: ndarray\n",
+    "            Remaining 1-(1/k) of the indices.\n",
+    "            \n",
+    "            e.g. fold_dict = {0: (train_0_indices, val_0_indices), \n",
+    "            1: (train_0_indices, val_0_indices), 2: (train_0_indices, val_0_indices)} for k = 3\n",
+    "    '''\n",
+    "    \n",
+    "    return fold_dict"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Implement \"mse\" and regularization functions. They will be used in the fit method of linear regression."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#TODO: Programming Assignment 2\n",
+    "def mse(y_pred, y_true):\n",
+    "    '''\n",
+    "    Args:\n",
+    "        y_hat: ndarray \n",
+    "            1D array containing data with `float` type. Values predicted by our method\n",
+    "        y_true: ndarray\n",
+    "            1D array containing data with `float` type. True y values\n",
+    "    Returns:\n",
+    "        cost: ndarray\n",
+    "            1D array containing mean squared error between y_pred and y_true.\n",
+    "        \n",
+    "    '''\n",
+    "    raise NotImplementedError\n",
+    "\n",
+    "    return cost\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#TODO: Programming Assignment 2\n",
+    "def regularization(weights, method):\n",
+    "    '''\n",
+    "    Args:\n",
+    "        weights: ndarray\n",
+    "            1D array with `float` entries\n",
+    "        method: str\n",
+    "    Returns:\n",
+    "        value: float\n",
+    "            A single value. Regularization term that will be used in cost function in fit.\n",
+    "    '''\n",
+    "    if method == \"l1\":\n",
+    "        value = None\n",
+    "    elif method == \"l2\":\n",
+    "        value = None\n",
+    "    raise NotImplementedError\n",
+    "    return value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## General supervised learning performance related functions "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Implement the \"conf_matrix\" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def conf_matrix(true, pred):\n",
+    "    '''\n",
+    "    Args:    \n",
+    "        true:  ndarray\n",
+    "            nx1 array of true labels for test set\n",
+    "        pred: ndarray \n",
+    "            nx1 array of predicted labels for test set\n",
+    "    Returns:\n",
+    "        ndarray\n",
+    "    '''\n",
+    "        \n",
+    "    tp = tn = fp = fn = 0\n",
+    "    # calculate true positives (tp), true negatives(tn)\n",
+    "    # false positives (fp) and false negatives (fn)\n",
+    "    \n",
+    "    size = len(true)\n",
+    "    for i in range(size):\n",
+    "        if true[i]==1:\n",
+    "            if pred[i] == 1:               \n",
+    "                tp += 1\n",
+    "            else: \n",
+    "                fn += 1\n",
+    "        else:\n",
+    "            if pred[i] == 0:\n",
+    "                tn += 1    \n",
+    "            else:\n",
+    "                fp += 1                            \n",
+    "    \n",
+    "    # returns the confusion matrix as numpy.ndarray\n",
+    "    return np.array([tp,tn, fp, fn])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+# JUPYTER NOTEBOOK TIPS
+Each rectangular box is called a cell.
+* ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
+* alt+ENTER evaluates the current cell and adds a new cell below it.
+* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
+%% Cell type:markdown id: tags:
+# Supervised Learning Model Skeleton
+We'll use this skeleton for implementing different supervised learning algorithms.
+%% Cell type:code id: tags:
+``` python
+class Model:
+    def fit(self):
+        raise NotImplementedError
+    def predict(self, test_points):
+        raise NotImplementedError
+```
+%% Cell type:code id: tags:
+``` python
+def preprocess(data_f, feature_names_f):
+    '''
+    data_f: where to read the dataset from
+    feature_names_f: where to read the feature names from
+    Returns:
+        features: ndarray
+            nxd array containing `float` feature values
+        labels: ndarray
+            1D array containing `float` label
+    '''
+    # You might find np.genfromtxt useful for reading in the file. Be careful with the file delimiter,
+    # e.g. for comma-separated files use delimiter=',' argument.
+    data = np.genfromtxt(data_f)
+    features = data[:,:-1]
+    target = data[:,-1]
+    feature_names = np.genfromtxt(feature_names_f, dtype='unicode')
+    return features, feature_names, target
+```
+%% Cell type:markdown id: tags:
+In cases where data is not abundantly available, we resort to getting an error estimate from average of error on different splits of dataset. In this case, every fold of data is used for testing and for training in turns, i.e. assuming we split our data into 3 folds, we'd
+* train our model on fold-1+fold-2 and test on fold-3
+* train our model on fold-1+fold-3 and test on fold-2
+* train our model on fold-2+fold-3 and test on fold-1.
+We'd use the average of the error we obtained in three runs as our error estimate.
+Implement function "kfold" below.
+%% Cell type:code id: tags:
+``` python
+# TODO: Programming Assignment 2
+def kfold(indices, k):
+    '''
+    Args:
+        indices: ndarray
+            1D array with integer entries containing indices
+        k: int
+            Number of desired splits in data.(Assume test set is already separated.)
+        Returns:
+        fold_dict: dict
+            A dictionary with integer keys corresponding to folds. Values are (training_indices, val_indices).
+        val_indices: ndarray
+            1/k of training indices randomly chosen and separates them as validation partition.
+        train_indices: ndarray
+            Remaining 1-(1/k) of the indices.
+            e.g. fold_dict = {0: (train_0_indices, val_0_indices),
+            1: (train_0_indices, val_0_indices), 2: (train_0_indices, val_0_indices)} for k = 3
+    '''
+    return fold_dict
+```
+%% Cell type:markdown id: tags:
+Implement "mse" and regularization functions. They will be used in the fit method of linear regression.
+%% Cell type:code id: tags:
+``` python
+#TODO: Programming Assignment 2
+def mse(y_pred, y_true):
+    '''
+    Args:
+        y_hat: ndarray
+            1D array containing data with `float` type. Values predicted by our method
+        y_true: ndarray
+            1D array containing data with `float` type. True y values
+    Returns:
+        cost: ndarray
+            1D array containing mean squared error between y_pred and y_true.
+    '''
+    raise NotImplementedError
+    return cost
+```
+%% Cell type:code id: tags:
+``` python
+#TODO: Programming Assignment 2
+def regularization(weights, method):
+    '''
+    Args:
+        weights: ndarray
+            1D array with `float` entries
+        method: str
+    Returns:
+        value: float
+            A single value. Regularization term that will be used in cost function in fit.
+    '''
+    if method == "l1":
+        value = None
+    elif method == "l2":
+        value = None
+    raise NotImplementedError
+    return value
+```
+%% Cell type:markdown id: tags:
+## General supervised learning performance related functions
+%% Cell type:markdown id: tags:
+Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*).
+%% Cell type:code id: tags:
+``` python
+def conf_matrix(true, pred):
+    '''
+    Args:
+        true:  ndarray
+            nx1 array of true labels for test set
+        pred: ndarray
+            nx1 array of predicted labels for test set
+    Returns:
+        ndarray
+    '''
+    tp = tn = fp = fn = 0
+    # calculate true positives (tp), true negatives(tn)
+    # false positives (fp) and false negatives (fn)
+    size = len(true)
+    for i in range(size):
+        if true[i]==1:
+            if pred[i] == 1:
+                tp += 1
+            else:
+                fn += 1
+        else:
+            if pred[i] == 0:
+                tn += 1
+            else:
+                fp += 1
+    # returns the confusion matrix as numpy.ndarray
+    return np.array([tp,tn, fp, fn])
+```