"raise ValueError('{} is not a valid metric.'.format(metric))\n",
" raise ValueError('{} is not a valid metric.'.format(metric))\n",
" return dist # scalar distance btw x and y"
" return dist # scalar distance btw x and y"
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# JUPYTER NOTEBOOK TIPS
# JUPYTER NOTEBOOK TIPS
Each rectangular box is called a cell.
Each rectangular box is called a cell.
* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* Ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* Alt+ENTER evaluates the current cell and adds a new cell below it.
* Alt+ENTER evaluates the current cell and adds a new cell below it.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Supervised Learning Model Skeleton
# Supervised Learning Model Skeleton
We'll use this skeleton for implementing different supervised learning algorithms.
We'll use this skeleton for implementing different supervised learning algorithms.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
classModel:
classModel:
deffit(self):
deffit(self):
raiseNotImplementedError
raiseNotImplementedError
defpredict(self,test_points):
defpredict(self,test_points):
raiseNotImplementedError
raiseNotImplementedError
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defpreprocess(feature_file,label_file):
defpreprocess(feature_file,label_file):
'''
'''
Args:
Args:
feature_file: str
feature_file: str
file containing features
file containing features
label_file: str
label_file: str
file containing labels
file containing labels
Returns:
Returns:
features: ndarray
features: ndarray
nxd features
nxd features
labels: ndarray
labels: ndarray
nx1 labels
nx1 labels
'''
'''
# read in features and labels
# read in features and labels
returnfeatures,labels
returnfeatures,labels
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defpartition(size,t,v=0):
defpartition(size,t,v=0):
'''
'''
Args:
Args:
size: int
size: int
number of examples in the whole dataset
number of examples in the whole dataset
t: float
t: float
proportion kept for test
proportion kept for test
v: float
v: float
proportion kept for validation
proportion kept for validation
Returns:
Returns:
test_indices: ndarray
test_indices: ndarray
1D array containing test set indices
1D array containing test set indices
val_indices: ndarray
val_indices: ndarray
1D array containing validation set indices
1D array containing validation set indices
'''
'''
# number of test and validation examples
# number of test and validation examples
returntest_indices,val_indices,train_indices
returntest_indices,val_indices,train_indices
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## TASK 1: Implement `distance` function
## TASK 1: Implement `distance` function
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
"distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value.
"distance" function will be used in calculating cost of *k*-NN. It should take two data points and the name of the metric and return a scalar value.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
#TODO: Programming Assignment 1
#TODO: Programming Assignment 1
defdistance(x,y,metric):
defdistance(x,y,metric):
'''
'''
Args:
Args:
x: ndarray
x: ndarray
1D array containing coordinates for a point
1D array containing coordinates for a point
y: ndarray
y: ndarray
1D array containing coordinates for a point
1D array containing coordinates for a point
metric: str
metric: str
Euclidean, Manhattan
Euclidean, Manhattan
Returns:
Returns:
dist: float
dist: float
'''
'''
ifmetric=='Euclidean':
ifmetric=='Euclidean':
raiseNotImplementedError
raiseNotImplementedError
elifmetric=='Manhattan':
elifmetric=='Manhattan':
raiseNotImplementedError
raiseNotImplementedError
else:
else:
raiseValueError('{} is not a valid metric.'.format(metric))
raiseValueError('{} is not a valid metric.'.format(metric))
returndist# scalar distance btw x and y
returndist# scalar distance btw x and y
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## General supervised learning performance related functions
## General supervised learning performance related functions
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray.
Implement the "conf_matrix" function that takes as input an array of true labels (*true*) and an array of predicted labels (*pred*). It should output a numpy.ndarray.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# TODO: Programming Assignment 1
# TODO: Programming Assignment 1
defconf_matrix(true,pred,n_classes):
defconf_matrix(true,pred,n_classes):
'''
'''
Args:
Args:
true: ndarray
true: ndarray
nx1 array of true labels for test set
nx1 array of true labels for test set
pred: ndarray
pred: ndarray
nx1 array of predicted labels for test set
nx1 array of predicted labels for test set
n_classes: int
n_classes: int
Returns:
Returns:
result: ndarray
result: ndarray
n_classes x n_classes array confusion matrix
n_classes x n_classes array confusion matrix
'''
'''
raiseNotImplementedError
raiseNotImplementedError
result=np.ndarray([n_classes,n_classes])
result=np.ndarray([n_classes,n_classes])
# returns the confusion matrix as numpy.ndarray
# returns the confusion matrix as numpy.ndarray
returnresult
returnresult
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
ROC curves are a good way to visualize sensitivity vs. 1-specificity for varying cut off points. "ROC" takes a list containing different *threshold* parameter values to try and returns two arrays; one where each entry is the sensitivity at a given threshold and the other where entries are 1-specificities.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
# TODO: Programming Assignment 1
# TODO: Programming Assignment 1
defROC(true_labels,preds,value_list):
defROC(true_labels,preds,value_list):
'''
'''
Args:
Args:
true_labels: ndarray
true_labels: ndarray
1D array containing true labels
1D array containing true labels
preds: ndarray
preds: ndarray
1D array containing thresholded value (e.g. proportion of neighbors in kNN)
1D array containing thresholded value (e.g. proportion of neighbors in kNN)