"raise ValueError('{} is not a valid metric.'.format(metric))\n",
" return dist # scalar distance btw x and y"
]
},
...
...
%% Cell type:markdown id: tags:
# JUPYTER NOTEBOOK TIPS
Each rectangular box is called a cell.
* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* Alt+ENTER evaluates the current cell and adds a new cell below it.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
%% Cell type:markdown id: tags:
# Supervised Learning Model Skeleton
We'll use this skeleton for implementing different supervised learning algorithms.
%% Cell type:code id: tags:
``` python
classModel:
deffit(self):
raiseNotImplementedError
defpredict(self,test_points):
raiseNotImplementedError
```
%% Cell type:code id: tags:
``` python
defpreprocess(feature_file,label_file):
'''
Args:
feature_file: str
file containing features
label_file: str
file containing labels
Returns:
features: ndarray
nxd features
labels: ndarray
nx1 labels
'''
# read in features and labels
returnfeatures,labels
```
%% Cell type:code id: tags:
``` python
defpartition(size,t,v=0):
'''
Args:
size: int
number of examples in the whole dataset
t: float
proportion kept for test
v: float
proportion kept for validation
Returns:
test_indices: ndarray
1D array containing test set indices
val_indices: ndarray
1D array containing validation set indices
'''
# number of test and validation examples
returntest_indices,val_indices,train_indices
```
%% Cell type:markdown id: tags:
## TASK 1: Implement `distance` function
%% Cell type:markdown id: tags:
"distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value.
%% Cell type:code id: tags:
``` python
#TODO: Programming Assignment 1
defdistance(x,y,metric):
'''
Args:
x: ndarray
1D array containing coordinates for a point
y: ndarray
1D array containing coordinates for a point
metric: str
Euclidean, Manhattan
Returns:
dist: float
'''
ifmetric=='Euclidean':
raiseNotImplementedError
elifmetric=='Manhattan':
raiseNotImplementedError
else:
raiseValueError('{} is not a valid metric.'.format(metric))
returndist# scalar distance btw x and y
```
%% Cell type:markdown id: tags:
## General supervised learning performance related functions
%% Cell type:markdown id: tags:
Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray.
%% Cell type:code id: tags:
``` python
# TODO: Programming Assignment 1
defconf_matrix(true,pred,n_classes):
'''
Args:
true: ndarray
nx1 array of true labels for test set
pred: ndarray
nx1 array of predicted labels for test set
n_classes: int
Returns:
result: ndarray
n_classes x n_classes array confusion matrix
'''
raiseNotImplementedError
result=np.ndarray([n_classes,n_classes])
# returns the confusion matrix as numpy.ndarray
returnresult
```
%% Cell type:markdown id: tags:
ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
%% Cell type:code id: tags:
``` python
# TODO: Programming Assignment 1
defROC(true_labels,preds,value_list):
'''
Args:
true_labels: ndarray
1D array containing true labels
preds: ndarray
1D array containing thresholded value (e.g. proportion of neighbors in kNN)