Skip to content
Snippets Groups Projects
Commit f024f4e6 authored by Zeynep Hakguder's avatar Zeynep Hakguder
Browse files

Upload New File

parent d347c796
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Linear Regression
In the linear regression part of this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.
### Assignment Goals:
In this assignment, we will:
* implement linear regression
* use gradient descent for optimization
* implement regularization techniques
* $l_1$/$l_2$ regularization
* use cross-validation to find a good regularization parameter $\lambda$
### Note:
You are not required to follow this exact template. You can change what parameters your functions take or partition the tasks across functions differently. However, make sure there are outputs and implementation for items listed in the rubric for each task. Also, indicate in code with comments which task you are attempting.
%% Cell type:markdown id: tags:
# GRADING
You will be graded on parts that are marked with **\#TODO** comments. Read the comments in the code to make sure you don't miss any.
### Mandatory for 478 & 878:
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
| 1 | Implement `kfold` | 10 | 10 |
| 2 | Implement `mse` | 5 | 5 |
| 3 | Implement `fit` method | 20 | 20 |
| 4 | Implement `predict` method | 10 | 10 |
| 5 | Implement `regularization` | 10 | 5 |
### Bonus for 478 & 878
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
| 3 | `fit` (learning rate) | 5 | 5 |
| 6 | Polynomial regression | 5 | 5 |
| 7 | Grid search | 10 | 5 |
Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 75 points in this part of assignment 2 for 478 and 65 points for 878 students.
%% Cell type:markdown id: tags:
You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id: tags:
Following code makes the Model class and relevant functions available from "model.ipynb".
%% Cell type:code id: tags:
``` python
%run 'model.ipynb'
```
%% Cell type:markdown id: tags:
The target value (house prices in $1,000) is plotted against feature values below.
%% Cell type:code id: tags:
``` python
features, feature_names, targets = preprocess('../data/housing.data', '../data/housing.names')
print('There are {} examples with {} features.'.format(features.shape[0], features.shape[1]))
%matplotlib inline
fig, axs = plt.subplots(4, 4, figsize=(15, 15), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = 0.2, wspace=.20)
# DISREGARD LAST 3 EMPTY PLOTS
for index, feature_name in enumerate(feature_names):
axs[index//4][index %4].scatter(features[:, index], targets)
axs[index//4][index %4].set_xlabel(feature_name)
fig.text(0.06, 0.5, 'House Value in $1000', ha='center', va='center', rotation='vertical', size=24)
```
%% Output
There are 506 examples with 13 features.
Text(0.06,0.5,'House Value in $1000')
%% Cell type:markdown id: tags:
## TASK 1: Implement `kfold`
%% Cell type:markdown id: tags:
Implement "kfold" function for $k$-fold cross-validation in "model.ipynb". 5 and 10 are commonly used values for $k$. You can use either one of them.
%% Cell type:markdown id: tags:
### Rubric:
* No intersection between test and train parts +5, +5
* No intersection between test folds +5, +5
%% Cell type:markdown id: tags:
### Test `kfold`
%% Cell type:code id: tags:
``` python
# Obtain 5 splits of data.
splits = kfold(targets.shape[0], k=5)
# Check that test folds are completely different
# Check that for a given i, train and test are completely different
for i in range(5):
intersection = set(splits[i][0]).intersection (set(splits[i][1]))
if intersection:
print('Test-train splits intersect!')
for j in range(5):
if i!=j:
intersection = set(splits[i][1]).intersection (set(splits[j][1]))
if intersection:
print('Test splits intersect!')
```
%% Cell type:markdown id: tags:
## TASK 2: Implement `mse`
%% Cell type:markdown id: tags:
We'll use mean squared error (mse) for linear regression. Next, implement "mse" function in "model.ipynb" that takes predicted and true target values, and returns the "mse" between them.
%% Cell type:markdown id: tags:
### Rubric:
* Correct mse +5, +5
%% Cell type:markdown id: tags:
### Test `mse`
%% Cell type:code id: tags:
``` python
mse(np.array([100, 300]), np.array([200, 400]))
```
%% Cell type:markdown id: tags:
## TASKS 3, 4, 5: Implement `fit`, `predict`, `regularization`
%% Cell type:markdown id: tags:
We can define our linear_regression model class now. Implement the "fit" and "predict" methods.
%% Cell type:markdown id: tags:
### Rubric:
* fit without regularization +10, +10
* learning rate interpretation +5, +5 (BONUS for both)
* $l_1$ regularization +5, +2.5
* $l_2$ regularization +5, +2.5
* fit works with regularization +10, +10
* predict +10, +10
%% Cell type:code id: tags:
``` python
class Linear_Regression(Model):
# You can disregard regularizer and kwargs for TASK 3
def fit(self, X, Y, learning_rate = 0.001, epochs = 2000, regularizer=None, **kwargs):
'''
Args:
learning_rate: float
step size for parameter update
epochs: int
number of updates that will be performed
regularizer: str
one of l1 or l2
lambd: float
regularization coefficient
'''
# we will need to add a column of 1's for bias
size = X.shape[0]
ones = np.ones(size)
ones = np.reshape(ones, (size ,-1))
features = np.hstack((ones, X))
# theta_hat contains the parameters for the model
# initialize theta_hat as zeros
# one parameter for each feature and one for bias
theta_hat = np.zeros(X.shape[1])
# TODO
# for each epoch
for epoch in range(epochs):
# compute model predictions for training examples
y_hat = None
if regularizer = None:
# use mse function to find the cost
cost = mse(y_hat, Y)
# You can use below print statement to monitor cost
#print('Current cost is {}'.format(cost))
# calculate gradients wrt theta
grad_theta = None
# update theta
theta_hat = None
raise NotImplementedError
else:
# take regularization into account
# use your regularization function
# you will need to compute the gradient of the regularization term
raise NotImplementedError
# update the model parameters to be used in predict method
self.theta = theta_hat
def predict(self, test_features):
# obtain test features for current fold
# do not forget to add a column for bias
# as in fit method
# TODO
# get predictions from model
y_hat = None
raise NotImplementedError
return y_hat
```
%% Cell type:markdown id: tags:
Initialize and fit the model. During training monitor your cost function. Experiment with different learning rates. Insert a cell below and summarize and briefly interpret your observations.
%% Cell type:code id: tags:
``` python
# initialize and fit the model
my_model = Linear_Regression()
# change lr to try different learning rates
lr = 0.0001
my_model.fit(features[splits[0][0]], targets[splits[0][0]], learning_rate = lr)
```
%% Cell type:markdown id: tags:
Define "regularization" function which implements $l_1$ and $l_2$ regularization in "model.ipynb".
%% Cell type:code id: tags:
``` python
weights = list(np.arange(0, 1.1 , 0.1))
for method in ['l1', 'l2']:
print(regularization(weights, method=method))
```
%% Cell type:markdown id: tags:
## TASK 6: Polynomial Regression
%% Cell type:markdown id: tags:
Do you think the dataset would benefit from polynomial regression? Please briefly explain why or why not.
%% Cell type:markdown id: tags:
### Rubric:
* Sound reasoning +5, +5
%% Cell type:markdown id: tags:
## TASK 7: Grid Search
%% Cell type:markdown id: tags:
Using cross-validation, try different values of $\lambda$ for $l_1$ and $l_2$ regularization to find good $\lambda$ values that result in low average _mse_.
%% Cell type:markdown id: tags:
### Rubric:
* Different methods are tried with different values of $\lambda$ +10, +5
%% Cell type:markdown id: tags:
### Test: Grid Search
%% Cell type:code id: tags:
``` python
# initialize the model
my_model = Linear_Regression()
# two regularization methods
for method in ['l1', 'l2']:
# different lambda
for lmbd in np.arange(0, 1, 0.1):
k_fold_mse = 0
fit_kwargs={'method': method}
for k in range(5):
# fit on training
my_model.fit(features[splits[k][0]], targets[splits[k][0]], lambd = lmbd)
# predict test
pred = my_model.predict(features[splits[k][1]])
k_fold_mse += mse(pred,targets[splits[k][1]])
print(k_fold_mse/5)
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment