Update model.ipynb

f3d2919a · Zeynep Hakguder · 8b26d097 · f3d2919a
Commit f3d2919a authored Jun 6, 2018 by Zeynep Hakguder
--- a/ProgrammingAssignment_1/model.ipynb
+++ b/ProgrammingAssignment_1/model.ipynb
 %% Cell type:markdown id: tags:
 # JUPYTER NOTEBOOK TIPS
 Each rectangular box is called a cell.
 * Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
 * Alt+ENTER evaluates the current cell and adds a new cell below it.
 * If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
 %% Cell type:markdown id: tags:
 # Supervised Learning Model Skeleton
 We'll use this skeleton for implementing different supervised learning algorithms.
 %% Cell type:code id: tags:
 ``` python
 class Model:
    def fit(self):
        raise NotImplementedError
    def predict(self, test_points):
        raise NotImplementedError
 ```
 %% Cell type:code id: tags:
 ``` python
 def preprocess(feature_file, label_file):
    '''
    Args:
        feature_file: str
            file containing features
        label_file: str
            file containing labels
    Returns:
        features: ndarray
            nxd features
        labels: ndarray
            nx1 labels
    '''
    # read in features and labels
    return features, labels
 ```
 %% Cell type:code id: tags:
 ``` python
 def partition(size, t, v = 0):
    '''
    Args:
        size: int
            number of examples in the whole dataset
        t: float
            proportion kept for test
        v: float
            proportion kept for validation
    Returns:
        test_indices: ndarray
            1D array containing test set indices
        val_indices: ndarray
            1D array containing validation set indices
    '''
    # number of test and validation examples
    return test_indices, val_indices, train_indices
 ```
 %% Cell type:markdown id: tags:
 ## TASK 1: Implement `distance` function
 %% Cell type:markdown id: tags:
 "distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value.
 %% Cell type:code id: tags:
 ``` python
 #TODO: Programming Assignment 1
 def distance(x, y, metric):
    '''
    Args:
        x: ndarray
            1D array containing coordinates for a point
        y: ndarray
            1D array containing coordinates for a point
        metric: str
            Euclidean, Manhattan
    Returns:
        dist: float
    '''
 if metric == 'Euclidean':
-raise NotImplementedError
+   raise NotImplementedError
 elif metric == 'Manhattan':
-raise NotImplementedError
+   raise NotImplementedError
 else:
-raise ValueError('{} is not a valid metric.'.format(metric))
+   raise ValueError('{} is not a valid metric.'.format(metric))
-    return dist # scalar distance btw x and y
+   return dist # scalar distance btw x and y
 ```
 %% Cell type:markdown id: tags:
 ## General supervised learning performance related functions
 %% Cell type:markdown id: tags:
 Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray.
 %% Cell type:code id: tags:
 ``` python
 # TODO: Programming Assignment 1
 def conf_matrix(true, pred, n_classes):
    '''
    Args:
        true:  ndarray
            nx1 array of true labels for test set
        pred: ndarray
            nx1 array of predicted labels for test set
        n_classes: int
    Returns:
        result: ndarray
            n_classes x n_classes array confusion matrix
    '''
    raise NotImplementedError
    result = np.ndarray([n_classes, n_classes])
    # returns the confusion matrix as numpy.ndarray
    return result
 ```
 %% Cell type:markdown id: tags:
 ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
 %% Cell type:code id: tags:
 ``` python
 # TODO: Programming Assignment 1
 def ROC(true_labels, preds, value_list):
    '''
    Args:
        true_labels: ndarray
            1D array containing true labels
        preds: ndarray
            1D array containing thresholded value (e.g. proportion of neighbors in kNN)
        value_list: ndarray
            1D array containing different threshold values
    Returns:
        sens: ndarray
            1D array containing sensitivities
        spec_: ndarray
            1D array containing 1-specifities
    '''
    # calculate sensitivity, 1-specificity
    # return two arrays
    raise NotImplementedError
    return sens, spec_
 ```