Skip to content
Snippets Groups Projects
Commit f3d2919a authored by Zeynep Hakguder's avatar Zeynep Hakguder
Browse files

Update model.ipynb

parent 8b26d097
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# JUPYTER NOTEBOOK TIPS # JUPYTER NOTEBOOK TIPS
Each rectangular box is called a cell. Each rectangular box is called a cell.
* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text. * Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* Alt+ENTER evaluates the current cell and adds a new cell below it. * Alt+ENTER evaluates the current cell and adds a new cell below it.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue. * If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Supervised Learning Model Skeleton # Supervised Learning Model Skeleton
We'll use this skeleton for implementing different supervised learning algorithms. We'll use this skeleton for implementing different supervised learning algorithms.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class Model: class Model:
def fit(self): def fit(self):
raise NotImplementedError raise NotImplementedError
def predict(self, test_points): def predict(self, test_points):
raise NotImplementedError raise NotImplementedError
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def preprocess(feature_file, label_file): def preprocess(feature_file, label_file):
''' '''
Args: Args:
feature_file: str feature_file: str
file containing features file containing features
label_file: str label_file: str
file containing labels file containing labels
Returns: Returns:
features: ndarray features: ndarray
nxd features nxd features
labels: ndarray labels: ndarray
nx1 labels nx1 labels
''' '''
# read in features and labels # read in features and labels
return features, labels return features, labels
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def partition(size, t, v = 0): def partition(size, t, v = 0):
''' '''
Args: Args:
size: int size: int
number of examples in the whole dataset number of examples in the whole dataset
t: float t: float
proportion kept for test proportion kept for test
v: float v: float
proportion kept for validation proportion kept for validation
Returns: Returns:
test_indices: ndarray test_indices: ndarray
1D array containing test set indices 1D array containing test set indices
val_indices: ndarray val_indices: ndarray
1D array containing validation set indices 1D array containing validation set indices
''' '''
# number of test and validation examples # number of test and validation examples
return test_indices, val_indices, train_indices return test_indices, val_indices, train_indices
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## TASK 1: Implement `distance` function ## TASK 1: Implement `distance` function
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
"distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value. "distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#TODO: Programming Assignment 1 #TODO: Programming Assignment 1
def distance(x, y, metric): def distance(x, y, metric):
''' '''
Args: Args:
x: ndarray x: ndarray
1D array containing coordinates for a point 1D array containing coordinates for a point
y: ndarray y: ndarray
1D array containing coordinates for a point 1D array containing coordinates for a point
metric: str metric: str
Euclidean, Manhattan Euclidean, Manhattan
Returns: Returns:
dist: float dist: float
''' '''
if metric == 'Euclidean': if metric == 'Euclidean':
raise NotImplementedError raise NotImplementedError
elif metric == 'Manhattan': elif metric == 'Manhattan':
raise NotImplementedError raise NotImplementedError
else: else:
raise ValueError('{} is not a valid metric.'.format(metric)) raise ValueError('{} is not a valid metric.'.format(metric))
return dist # scalar distance btw x and y return dist # scalar distance btw x and y
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## General supervised learning performance related functions ## General supervised learning performance related functions
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray. Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# TODO: Programming Assignment 1 # TODO: Programming Assignment 1
def conf_matrix(true, pred, n_classes): def conf_matrix(true, pred, n_classes):
''' '''
Args: Args:
true: ndarray true: ndarray
nx1 array of true labels for test set nx1 array of true labels for test set
pred: ndarray pred: ndarray
nx1 array of predicted labels for test set nx1 array of predicted labels for test set
n_classes: int n_classes: int
Returns: Returns:
result: ndarray result: ndarray
n_classes x n_classes array confusion matrix n_classes x n_classes array confusion matrix
''' '''
raise NotImplementedError raise NotImplementedError
result = np.ndarray([n_classes, n_classes]) result = np.ndarray([n_classes, n_classes])
# returns the confusion matrix as numpy.ndarray # returns the confusion matrix as numpy.ndarray
return result return result
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities. ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# TODO: Programming Assignment 1 # TODO: Programming Assignment 1
def ROC(true_labels, preds, value_list): def ROC(true_labels, preds, value_list):
''' '''
Args: Args:
true_labels: ndarray true_labels: ndarray
1D array containing true labels 1D array containing true labels
preds: ndarray preds: ndarray
1D array containing thresholded value (e.g. proportion of neighbors in kNN) 1D array containing thresholded value (e.g. proportion of neighbors in kNN)
value_list: ndarray value_list: ndarray
1D array containing different threshold values 1D array containing different threshold values
Returns: Returns:
sens: ndarray sens: ndarray
1D array containing sensitivities 1D array containing sensitivities
spec_: ndarray spec_: ndarray
1D array containing 1-specifities 1D array containing 1-specifities
''' '''
# calculate sensitivity, 1-specificity # calculate sensitivity, 1-specificity
# return two arrays # return two arrays
raise NotImplementedError raise NotImplementedError
return sens, spec_ return sens, spec_
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment