"In the linear regression part of this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.\n",
"\n",
"### Assignment Goals:\n",
"In this assignment, we will:\n",
"* implement linear regression\n",
" * use gradient descent for optimization\n",
" * implement regularization techniques\n",
" * $l_1$/$l_2$ regularization\n",
" * use cross-validation to find a good regularization parameter $\\lambda$\n",
" \n",
"### Note:\n",
"\n",
"You are not required to follow this exact template. You can change what parameters your functions take or partition the tasks across functions differently. However, make sure there are outputs and implementation for items listed in the rubric for each task. Also, indicate in code with comments which task you are attempting."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# GRADING\n",
"\n",
"You will be graded on parts that are marked with **\\#TODO** comments. Read the comments in the code to make sure you don't miss any.\n",
"Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 75 points in this part of assignment 2 for 478 and 65 points for 878 students."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Following code makes the Model class and relevant functions available from \"model.ipynb\"."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"%run 'model.ipynb'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The target value (house prices in $1,000) is plotted against feature values below."
"fig.text(0.06, 0.5, 'House Value in $1000', ha='center', va='center', rotation='vertical', size=24)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TASK 1: Implement `kfold`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Implement \"kfold\" function for $k$-fold cross-validation in \"model.ipynb\". 5 and 10 are commonly used values for $k$. You can use either one of them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Rubric:\n",
" * No intersection between test and train parts +5, +5\n",
" * No intersection between test folds +5, +5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test `kfold`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Obtain 5 splits of data.\n",
"splits = kfold(targets.shape[0], k=5)\n",
"\n",
"# Check that test folds are completely different\n",
"# Check that for a given i, train and test are completely different\n",
"We'll use mean squared error (mse) for linear regression. Next, implement \"mse\" function in \"model.ipynb\" that takes predicted and true target values, and returns the \"mse\" between them."
" # we will need to add a column of 1's for bias\n",
" size = X.shape[0]\n",
" ones = np.ones(size)\n",
" ones = np.reshape(ones, (size ,-1))\n",
" features = np.hstack((ones, X))\n",
"\n",
" \n",
" # theta_hat contains the parameters for the model\n",
" # initialize theta_hat as zeros\n",
" # one parameter for each feature and one for bias\n",
" theta_hat = np.zeros(X.shape[1])\n",
" \n",
" # TODO\n",
" \n",
" # for each epoch\n",
" for epoch in range(epochs):\n",
" # compute model predictions for training examples\n",
" y_hat = None\n",
"\n",
" if regularizer = None:\n",
"\n",
" # use mse function to find the cost\n",
" cost = mse(y_hat, Y)\n",
" \n",
" # You can use below print statement to monitor cost\n",
" #print('Current cost is {}'.format(cost))\n",
" # calculate gradients wrt theta\n",
" grad_theta = None\n",
" # update theta\n",
" theta_hat = None\n",
" raise NotImplementedError\n",
"\n",
" else:\n",
" # take regularization into account\n",
" # use your regularization function\n",
" # you will need to compute the gradient of the regularization term\n",
" raise NotImplementedError\n",
" \n",
" # update the model parameters to be used in predict method\n",
" self.theta = theta_hat\n",
"\n",
" \n",
" def predict(self, test_features):\n",
" \n",
" # obtain test features for current fold\n",
" # do not forget to add a column for bias\n",
" # as in fit method\n",
"\n",
" # TODO\n",
" \n",
" # get predictions from model\n",
" y_hat = None\n",
" raise NotImplementedError\n",
"\n",
" return y_hat\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize and fit the model. During training monitor your cost function. Experiment with different learning rates. Insert a cell below and summarize and briefly interpret your observations."
"Define \"regularization\" function which implements $l_1$ and $l_2$ regularization in \"model.ipynb\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weights = list(np.arange(0, 1.1 , 0.1))\n",
"for method in ['l1', 'l2']:\n",
" print(regularization(weights, method=method))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TASK 6: Polynomial Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do you think the dataset would benefit from polynomial regression? Please briefly explain why or why not."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Rubric:\n",
"* Sound reasoning +5, +5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TASK 7: Grid Search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using cross-validation, try different values of $\\lambda$ for $l_1$ and $l_2$ regularization to find good $\\lambda$ values that result in low average _mse_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Rubric:\n",
"* Different methods are tried with different values of $\\lambda$ +10, +5"
In the linear regression part of this assignment, we have a small dataset available to us. We won't have examples to spare for validation set, instead we'll use cross-validation to tune hyperparameters.
### Assignment Goals:
In this assignment, we will:
* implement linear regression
* use gradient descent for optimization
* implement regularization techniques
* $l_1$/$l_2$ regularization
* use cross-validation to find a good regularization parameter $\lambda$
### Note:
You are not required to follow this exact template. You can change what parameters your functions take or partition the tasks across functions differently. However, make sure there are outputs and implementation for items listed in the rubric for each task. Also, indicate in code with comments which task you are attempting.
%% Cell type:markdown id: tags:
# GRADING
You will be graded on parts that are marked with **\#TODO** comments. Read the comments in the code to make sure you don't miss any.
### Mandatory for 478 & 878:
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
| 1 | Implement `kfold` | 10 | 10 |
| 2 | Implement `mse` | 5 | 5 |
| 3 | Implement `fit` method | 20 | 20 |
| 4 | Implement `predict` method | 10 | 10 |
| 5 | Implement `regularization` | 10 | 5 |
### Bonus for 478 & 878
| | Tasks | 478 | 878 |
|---|----------------------------|-----|-----|
| 3 | `fit` (learning rate) | 5 | 5 |
| 6 | Polynomial regression | 5 | 5 |
| 7 | Grid search | 10 | 5 |
Points are broken down further below in Rubric sections. The **first** score is for 478, the **second** is for 878 students. There are a total of 75 points in this part of assignment 2 for 478 and 65 points for 878 students.
%% Cell type:markdown id: tags:
You can use numpy for array operations and matplotlib for plotting for this assignment. Please do not add other libraries.
%% Cell type:code id: tags:
``` python
importnumpyasnp
importmatplotlib.pyplotasplt
```
%% Cell type:markdown id: tags:
Following code makes the Model class and relevant functions available from "model.ipynb".
%% Cell type:code id: tags:
``` python
%run'model.ipynb'
```
%% Cell type:markdown id: tags:
The target value (house prices in $1,000) is plotted against feature values below.
We'll use mean squared error (mse) for linear regression. Next, implement "mse" function in "model.ipynb" that takes predicted and true target values, and returns the "mse" between them.
# You can use below print statement to monitor cost
#print('Current cost is {}'.format(cost))
# calculate gradients wrt theta
grad_theta=None
# update theta
theta_hat=None
raiseNotImplementedError
else:
# take regularization into account
# use your regularization function
# you will need to compute the gradient of the regularization term
raiseNotImplementedError
# update the model parameters to be used in predict method
self.theta=theta_hat
defpredict(self,test_features):
# obtain test features for current fold
# do not forget to add a column for bias
# as in fit method
# TODO
# get predictions from model
y_hat=None
raiseNotImplementedError
returny_hat
```
%% Cell type:markdown id: tags:
Initialize and fit the model. During training monitor your cost function. Experiment with different learning rates. Insert a cell below and summarize and briefly interpret your observations.
Define "regularization" function which implements $l_1$ and $l_2$ regularization in "model.ipynb".
%% Cell type:code id: tags:
``` python
weights=list(np.arange(0,1.1,0.1))
formethodin['l1','l2']:
print(regularization(weights,method=method))
```
%% Cell type:markdown id: tags:
## TASK 6: Polynomial Regression
%% Cell type:markdown id: tags:
Do you think the dataset would benefit from polynomial regression? Please briefly explain why or why not.
%% Cell type:markdown id: tags:
### Rubric:
* Sound reasoning +5, +5
%% Cell type:markdown id: tags:
## TASK 7: Grid Search
%% Cell type:markdown id: tags:
Using cross-validation, try different values of $\lambda$ for $l_1$ and $l_2$ regularization to find good $\lambda$ values that result in low average _mse_.
%% Cell type:markdown id: tags:
### Rubric:
* Different methods are tried with different values of $\lambda$ +10, +5