Skip to content
Snippets Groups Projects
Commit c67dd046 authored by Zeynep Hakguder's avatar Zeynep Hakguder
Browse files

Update model.ipynb

parent dfb615ca
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# JUPYTER NOTEBOOK TIPS # JUPYTER NOTEBOOK TIPS
Each rectangular box is called a cell. Each rectangular box is called a cell.
* ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text. * ctrl+ENTER evaluates the current cell; if it contains Python code, it runs the code, if it contains Markdown, it returns rendered text.
* alt+ENTER evaluates the current cell and adds a new cell below it. * alt+ENTER evaluates the current cell and adds a new cell below it.
* If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue. * If you click to the left of a cell, you'll notice the frame changes color to blue. You can erase a cell by hitting 'dd' (that's two "d"s in a row) when the frame is blue.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Supervised Learning Model Skeleton # Supervised Learning Model Skeleton
We'll use this skeleton for implementing different supervised learning algorithms. We'll use this skeleton for implementing different supervised learning algorithms.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class Model: class Model:
def fit(self): def fit(self):
raise NotImplementedError raise NotImplementedError
def predict(self, test_points): def predict(self, test_points):
raise NotImplementedError raise NotImplementedError
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def preprocess(data_f, feature_names_f): def preprocess(data_f, feature_names_f):
''' '''
data_f: where to read the dataset from data_f: where to read the dataset from
feature_names_f: where to read the feature names from feature_names_f: where to read the feature names from
Returns: Returns:
features: ndarray features: ndarray
nxd array containing `float` feature values nxd array containing `float` feature values
labels: ndarray labels: ndarray
1D array containing `float` label 1D array containing `float` label
''' '''
# You might find np.genfromtxt useful for reading in the file. Be careful with the file delimiter, # You might find np.genfromtxt useful for reading in the file. Be careful with the file delimiter,
# e.g. for comma-separated files use delimiter=',' argument. # e.g. for comma-separated files use delimiter=',' argument.
data = np.genfromtxt(data_f) data = np.genfromtxt(data_f)
features = data[:,:-1] features = data[:,:-1]
target = data[:,-1] target = data[:,-1]
feature_names = np.genfromtxt(feature_names_f, dtype='unicode') feature_names = np.genfromtxt(feature_names_f, dtype='unicode')
return features, feature_names, target return features, feature_names, target
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In cases where data is not abundantly available, we resort to getting an error estimate from average of error on different splits of dataset. In this case, every fold of data is used for testing and for training in turns, i.e. assuming we split our data into 3 folds, we'd In cases where data is not abundantly available, we resort to getting an error estimate from average of error on different splits of dataset. In this case, every fold of data is used for testing and for training in turns, i.e. assuming we split our data into 3 folds, we'd
* train our model on fold-1+fold-2 and test on fold-3 * train our model on fold-1+fold-2 and test on fold-3
* train our model on fold-1+fold-3 and test on fold-2 * train our model on fold-1+fold-3 and test on fold-2
* train our model on fold-2+fold-3 and test on fold-1. * train our model on fold-2+fold-3 and test on fold-1.
We'd use the average of the error we obtained in three runs as our error estimate. We'd use the average of the error we obtained in three runs as our error estimate.
Implement function "kfold" below. Implement function "kfold" below.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# TODO: Programming Assignment 2 # TODO: Programming Assignment 2
def kfold(indices, k): def kfold(size, k):
''' '''
Args: Args:
indices: ndarray size: int
1D array with integer entries containing indices number of examples in the dataset that you want to split into k
k: int k: int
Number of desired splits in data.(Assume test set is already separated.) Number of desired splits in data.(Assume test set is already separated.)
Returns: Returns:
fold_dict: dict fold_dict: dict
A dictionary with integer keys corresponding to folds. Values are (training_indices, val_indices). A dictionary with integer keys corresponding to folds. Values are (training_indices, val_indices).
val_indices: ndarray val_indices: ndarray
1/k of training indices randomly chosen and separates them as validation partition. 1/k of training indices randomly chosen and separates them as validation partition.
train_indices: ndarray train_indices: ndarray
Remaining 1-(1/k) of the indices. Remaining 1-(1/k) of the indices.
e.g. fold_dict = {0: (train_0_indices, val_0_indices), e.g. fold_dict = {0: (train_0_indices, val_0_indices),
1: (train_0_indices, val_0_indices), 2: (train_0_indices, val_0_indices)} for k = 3 1: (train_0_indices, val_0_indices), 2: (train_0_indices, val_0_indices)} for k = 3
''' '''
return fold_dict return fold_dict
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Implement "mse" and regularization functions. They will be used in the fit method of linear regression. Implement "mse" and regularization functions. They will be used in the fit method of linear regression.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#TODO: Programming Assignment 2 #TODO: Programming Assignment 2
def mse(y_pred, y_true): def mse(y_pred, y_true):
''' '''
Args: Args:
y_hat: ndarray y_hat: ndarray
1D array containing data with `float` type. Values predicted by our method 1D array containing data with `float` type. Values predicted by our method
y_true: ndarray y_true: ndarray
1D array containing data with `float` type. True y values 1D array containing data with `float` type. True y values
Returns: Returns:
cost: ndarray cost: ndarray
1D array containing mean squared error between y_pred and y_true. 1D array containing mean squared error between y_pred and y_true.
''' '''
raise NotImplementedError raise NotImplementedError
return cost return cost
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#TODO: Programming Assignment 2 #TODO: Programming Assignment 2
def regularization(weights, method): def regularization(weights, method):
''' '''
Args: Args:
weights: ndarray weights: ndarray
1D array with `float` entries 1D array with `float` entries
method: str method: str
Returns: Returns:
value: float value: float
A single value. Regularization term that will be used in cost function in fit. A single value. Regularization term that will be used in cost function in fit.
''' '''
if method == "l1": if method == "l1":
value = None value = None
elif method == "l2": elif method == "l2":
value = None value = None
raise NotImplementedError raise NotImplementedError
return value return value
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment