"Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 75 points in this part of assignment 2 for 478 and 65 points for 878 students."
"Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 140 points in this part of assignment 2 for 478 and 120 points for 878 students."
]
]
},
},
{
{
...
@@ -158,8 +158,8 @@
...
@@ -158,8 +158,8 @@
"metadata": {},
"metadata": {},
"source": [
"source": [
"### Rubric:\n",
"### Rubric:\n",
" * No intersection between test and train parts +5, +5\n",
" * No intersection between test and train parts +10, +10\n",
" * No intersection between test folds +5, +5"
" * No intersection between test folds +10, +10"
]
]
},
},
{
{
...
@@ -210,7 +210,7 @@
...
@@ -210,7 +210,7 @@
"metadata": {},
"metadata": {},
"source": [
"source": [
"### Rubric:\n",
"### Rubric:\n",
" * Correct mse +5, +5"
" * Correct mse +10, +10"
]
]
},
},
{
{
...
@@ -248,12 +248,12 @@
...
@@ -248,12 +248,12 @@
"metadata": {},
"metadata": {},
"source": [
"source": [
"### Rubric:\n",
"### Rubric:\n",
"* fit without regularization +10, +10\n",
"* fit without regularization +20, +20\n",
"* learning rate interpretation +5, +5 (BONUS for both)\n",
"* learning rate interpretation +10, +10 (BONUS for both)\n",
"* $l_1$ regularization +5, +2.5\n",
"* $l_1$ regularization +10, +5\n",
"* $l_2$ regularization +5, +2.5\n",
"* $l_2$ regularization +10, +5\n",
"* fit works with regularization +10, +10\n",
"* fit works with regularization +20, +20\n",
"* predict +10, +10"
"* predict +20, +20"
]
]
},
},
{
{
...
@@ -393,7 +393,7 @@
...
@@ -393,7 +393,7 @@
"metadata": {},
"metadata": {},
"source": [
"source": [
"### Rubric:\n",
"### Rubric:\n",
"* Sound reasoning +5, +5"
"* Sound reasoning +10, +5"
]
]
},
},
{
{
...
...
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Linear Regression
# Linear Regression
In the linear regression part of this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.
In the linear regression part of this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.
### Assignment Goals:
### Assignment Goals:
In this assignment, we will:
In this assignment, we will:
* implement linear regression
* implement linear regression
* use gradient descent for optimization
* use gradient descent for optimization
* implement regularization techniques
* implement regularization techniques
* $l_1$/$l_2$ regularization
* $l_1$/$l_2$ regularization
* use cross-validation to find a good regularization parameter $\lambda$
* use cross-validation to find a good regularization parameter $\lambda$
### Note:
### Note:
You are not required to follow this exact template. You can change what parameters your functions take or partition the tasks across functions differently. However, make sure there are outputs and implementation for items listed in the rubric for each task. Also, indicate in code with comments which task you are attempting.
You are not required to follow this exact template. You can change what parameters your functions take or partition the tasks across functions differently. However, make sure there are outputs and implementation for items listed in the rubric for each task. Also, indicate in code with comments which task you are attempting.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# GRADING
# GRADING
You will be graded on parts that are marked with **\#TODO** comments. Read the comments in the code to make sure you don't miss any.
You will be graded on parts that are marked with **\#TODO** comments. Read the comments in the code to make sure you don't miss any.
### Mandatory for 478 & 878:
### Mandatory for 478 & 878:
| | Tasks | 478 | 878 |
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
|---|----------------------------|-----|-----|
| 1 | Implement `kfold` | 10 | 10 |
| 1 | Implement `kfold` | 20 | 20 |
| 2 | Implement `mse` | 5 | 5 |
| 2 | Implement `mse` | 10 | 10 |
| 3 | Implement `fit` method | 20 | 20 |
| 3 | Implement `fit` method | 40 | 40 |
| 4 | Implement `predict` method | 10 | 10 |
| 4 | Implement `predict` method | 20 | 20 |
| 5 | Implement `regularization` | 10 | 5 |
| 5 | Implement `regularization` | 20 | 10 |
### Bonus for 478 & 878
### Bonus for 478 & 878
| | Tasks | 478 | 878 |
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
|---|----------------------------|-----|-----|
| 3 | `fit` (learning rate) | 5 | 5 |
| 3 | `fit` (learning rate) | 10 | 10 |
| 6 | Polynomial regression | 5 | 5 |
| 6 | Polynomial regression | 10 | 5 |
| 7 | Grid search | 10 | 5 |
| 7 | Grid search | 10 | 5 |
Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 75 points in this part of assignment 2 for 478 and 65 points for 878 students.
Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 140 points in this part of assignment 2 for 478 and 120 points for 878 students.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
importnumpyasnp
importnumpyasnp
importmatplotlib.pyplotasplt
importmatplotlib.pyplotasplt
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Following code makes the Model class and relevant functions available from "model.ipynb".
Following code makes the Model class and relevant functions available from "model.ipynb".
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
%run'model.ipynb'
%run'model.ipynb'
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
The target value (house prices in $1,000) is plotted against feature values below.
The target value (house prices in $1,000) is plotted against feature values below.
We'll use mean squared error (mse) for linear regression. Next, implement "mse" function in "model.ipynb" that takes predicted and true target values, and returns the "mse" between them.
We'll use mean squared error (mse) for linear regression. Next, implement "mse" function in "model.ipynb" that takes predicted and true target values, and returns the "mse" between them.
# You can use below print statement to monitor cost
# You can use below print statement to monitor cost
#print('Current cost is {}'.format(cost))
#print('Current cost is {}'.format(cost))
# calculate gradients wrt theta
# calculate gradients wrt theta
grad_theta=None
grad_theta=None
# update theta
# update theta
theta_hat=None
theta_hat=None
raiseNotImplementedError
raiseNotImplementedError
else:
else:
# take regularization into account
# take regularization into account
# use your regularization function
# use your regularization function
# you will need to compute the gradient of the regularization term
# you will need to compute the gradient of the regularization term
raiseNotImplementedError
raiseNotImplementedError
# update the model parameters to be used in predict method
# update the model parameters to be used in predict method
self.theta=theta_hat
self.theta=theta_hat
defpredict(self,test_features):
defpredict(self,test_features):
# obtain test features for current fold
# obtain test features for current fold
# do not forget to add a column for bias
# do not forget to add a column for bias
# as in fit method
# as in fit method
# TODO
# TODO
# get predictions from model
# get predictions from model
y_hat=None
y_hat=None
raiseNotImplementedError
raiseNotImplementedError
returny_hat
returny_hat
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Initialize and fit the model. During training monitor your cost function. Experiment with different learning rates. Insert a cell below and summarize and briefly interpret your observations.
Initialize and fit the model. During training monitor your cost function. Experiment with different learning rates. Insert a cell below and summarize and briefly interpret your observations.
Define "regularization" function which implements $l_1$ and $l_2$ regularization in "model.ipynb".
Define "regularization" function which implements $l_1$ and $l_2$ regularization in "model.ipynb".
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
weights=list(np.arange(0,1.1,0.1))
weights=list(np.arange(0,1.1,0.1))
formethodin['l1','l2']:
formethodin['l1','l2']:
print(regularization(weights,method=method))
print(regularization(weights,method=method))
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## TASK 6: Polynomial Regression
## TASK 6: Polynomial Regression
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Do you think the dataset would benefit from polynomial regression? Please briefly explain why or why not.
Do you think the dataset would benefit from polynomial regression? Please briefly explain why or why not.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Rubric:
### Rubric:
* Sound reasoning +5, +5
* Sound reasoning +10, +5
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## TASK 7: Grid Search
## TASK 7: Grid Search
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Using cross-validation, try different values of $\lambda$ for $l_1$ and $l_2$ regularization to find good $\lambda$ values that result in low average _mse_.
Using cross-validation, try different values of $\lambda$ for $l_1$ and $l_2$ regularization to find good $\lambda$ values that result in low average _mse_.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Rubric:
### Rubric:
* Different methods are tried with different values of $\lambda$ +10, +5
* Different methods are tried with different values of $\lambda$ +10, +5