Skip to content
Snippets Groups Projects
model.ipynb 6.65 KiB
Newer Older
Zeynep Hakguder's avatar
pa1
Zeynep Hakguder committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# JUPYTER NOTEBOOK TIPS\n",
    "\n",
    "Each rectangular box is called a cell. \n",
    "* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.\n",
    "* Alt+ENTER evaluates the current cell and adds a new cell below it.\n",
    "* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two \"d\"s in a row) when the frame is blue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supervised Learning Model Skeleton\n",
    "\n",
    "We'll use this skeleton for implementing different supervised learning algorithms."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Model:\n",
    "        \n",
    "    def fit(self):\n",
    "        \n",
    "        raise NotImplementedError\n",
    "    \n",
    "    def predict(self, test_points):\n",
    "        raise NotImplementedError"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess(feature_file, label_file):\n",
    "    '''\n",
    "    Args:\n",
    "        feature_file: str \n",
    "            file containing features\n",
    "        label_file: str\n",
    "            file containing labels\n",
    "    Returns:\n",
    "        features: ndarray\n",
    "            nxd features\n",
    "        labels: ndarray\n",
    "            nx1 labels\n",
    "    '''\n",
    "    \n",
    "    # read in features and labels\n",
Zeynep Hakguder's avatar
pa1
Zeynep Hakguder committed
    "    return features, labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def partition(size, t, v = 0):\n",
    "    '''\n",
    "    Args:\n",
    "        size: int\n",
    "            number of examples in the whole dataset\n",
    "        t: float\n",
    "            proportion kept for test\n",
    "        v: float\n",
    "            proportion kept for validation\n",
    "    Returns:\n",
    "        test_indices: ndarray\n",
    "            1D array containing test set indices\n",
    "        val_indices: ndarray\n",
    "            1D array containing validation set indices\n",
    "    '''\n",
    "    \n",
    "    # number of test and validation examples\n",
    "    \n",
    "    return test_indices, val_indices, train_indices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TASK 1: Implement `distance` function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\"distance\" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO: Programming Assignment 1\n",
    "def distance(x, y, metric):\n",
    "    '''\n",
    "    Args:\n",
    "        x: ndarray \n",
    "            1D array containing coordinates for a point\n",
    "        y: ndarray\n",
    "            1D array containing coordinates for a point\n",
    "        metric: str\n",
    "            Euclidean, Manhattan \n",
    "    Returns:\n",
    "        dist: float\n",
    "    '''\n",
Zeynep Hakguder's avatar
Zeynep Hakguder committed
    "if metric == 'Euclidean':\n",
        "raise NotImplementedError",
    "elif metric == 'Manhattan':\n",
        "raise NotImplementedError\n",
    "else:\n",
        "raise ValueError('{} is not a valid metric.'.format(metric))\n",
Zeynep Hakguder's avatar
pa1
Zeynep Hakguder committed
    "    return dist # scalar distance btw x and y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## General supervised learning performance related functions "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Implement the \"conf_matrix\" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Programming Assignment 1\n",
    "\n",
    "def conf_matrix(true, pred, n_classes):\n",
    "    '''\n",
    "    Args:    \n",
    "        true:  ndarray\n",
    "            nx1 array of true labels for test set\n",
    "        pred: ndarray \n",
    "            nx1 array of predicted labels for test set\n",
    "        n_classes: int\n",
    "    Returns:\n",
    "        result: ndarray\n",
    "            n_classes x n_classes array confusion matrix\n",
    "    '''\n",
    "    raise NotImplementedError\n",
    "    result = np.ndarray([n_classes, n_classes])\n",
    "    \n",
    "    \n",
    "    # returns the confusion matrix as numpy.ndarray\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. \"ROC\" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Programming Assignment 1\n",
    "\n",
    "def ROC(true_labels, preds, value_list):\n",
    "    '''\n",
    "    Args:\n",
    "        true_labels: ndarray\n",
    "            1D array containing true labels\n",
    "        preds: ndarray\n",
    "            1D array containing thresholded value (e.g. proportion of neighbors in kNN)\n",
    "        value_list: ndarray\n",
    "            1D array containing different threshold values\n",
    "    Returns:\n",
    "        sens: ndarray\n",
    "            1D array containing sensitivities\n",
    "        spec_: ndarray\n",
    "            1D array containing 1-specifities\n",
    "    '''\n",
    "    \n",
    "    # calculate sensitivity, 1-specificity\n",
    "    # return two arrays\n",
    "    \n",
    "    raise NotImplementedError\n",
    "    \n",
    "    return sens, spec_"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}