Lasso Regression

Least Absolute Shrinkage and Selection Operator Regression (Lasso)

$$J(\theta) = MSE(\theta) + \alpha \Sigma_{i=1}^{n}|\theta_{i} |$$
  • Use the regularization term only to the cost function during training
  • Once the model is trained, use the unregularized perfomrance measure
  • Tends to eliminate the weights of the least important features
  • Automatically performs feature selection and outputs a sparse model
  • To avoid Gradient Descent from bouncing around the optimum at the end, need to gradually reduce the learning rate during training in Lasso

Create Random Dataset

In [1]:
import numpy as np

X = 2*np.random.rand(100, 1)
Y = 4 + 3 * X + np.random.randn(100, 1)

Train Model with Ridge Regression

In [2]:
from sklearn.linear_model import Lasso

model = Lasso(alpha = 0.1)

model.fit(X, Y)
Out[2]:
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)
In [3]:
model.intercept_, model.coef_
Out[3]:
(array([4.41541135]), array([2.61496965]))
In [4]:
X_new = np.linspace(0, 2, 100).reshape(-1, 1)
Y_predict = model.predict(X_new)
In [6]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots();

ax.scatter(X, Y);
ax.plot(X_new, Y_predict, 'r')
Out[6]:
[<matplotlib.lines.Line2D at 0x1a172bdc10>]

Train Model with Stochastic Gradient Descent

In [7]:
from sklearn.linear_model import SGDRegressor

model = SGDRegressor(penalty='l1')

model.fit(X, Y)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:724: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Out[7]:
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
             eta0=0.01, fit_intercept=True, l1_ratio=0.15,
             learning_rate='invscaling', loss='squared_loss', max_iter=1000,
             n_iter_no_change=5, penalty='l1', power_t=0.25, random_state=None,
             shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
             warm_start=False)
In [8]:
model.intercept_, model.coef_
Out[8]:
(array([3.82020529]), array([3.16096941]))
In [9]:
X_new = np.linspace(0, 2, 100).reshape(-1, 1)
Y_predict = model.predict(X_new)
In [10]:
fig, ax = plt.subplots();

ax.scatter(X, Y);
ax.plot(X_new, Y_predict, 'r')
Out[10]:
[<matplotlib.lines.Line2D at 0x1a17315410>]