Ridge Regression

$$J(\theta) = MSE(\theta) + \alpha \dfrac{1}{2} \Sigma_{i=1}^{n}\theta_{i}^{2}$$
  • Use the regularization term only to the cost function during training
  • Once the model is trained, use the unregularized perfomrance measure
  • $\alpha = 0$, Ridge Regression is just Linear Regression
  • $\alpha = \infty$, all weights end up very close to zero and the result is a flat line
  • Ridge Regression is sensitive to the scale of the input features

close-form equation

$$\theta = (X^{T}X+\alpha A)^{-1} X^{T} y$$
  • A is the (n+1)*(n+1) identity matrix

Create Random Dataset

In [2]:
import numpy as np

X = 2*np.random.rand(100, 1)
Y = 4 + 3 * X + np.random.randn(100, 1)

Train Model with Ridge Regression

In [3]:
from sklearn.linear_model import Ridge

model = Ridge(alpha = 1, solver='cholesky')

model.fit(X, Y)
Out[3]:
Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='cholesky', tol=0.001)
In [4]:
model.intercept_, model.coef_
Out[4]:
(array([4.11992605]), array([[2.71625373]]))
In [5]:
X_new = np.linspace(0, 2, 100).reshape(-1, 1)
Y_predict = model.predict(X_new)
In [7]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots();

ax.scatter(X, Y);
ax.plot(X_new, Y_predict, 'r')
Out[7]:
[<matplotlib.lines.Line2D at 0x1a1f2b3890>]

Train Model with Stochastic Gradient Descent

In [12]:
from sklearn.linear_model import SGDRegressor

model = SGDRegressor(penalty='l2')

model.fit(X, Y)
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:724: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Out[12]:
SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
             eta0=0.01, fit_intercept=True, l1_ratio=0.15,
             learning_rate='invscaling', loss='squared_loss', max_iter=1000,
             n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=None,
             shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
             warm_start=False)
In [13]:
model.intercept_, model.coef_
Out[13]:
(array([3.63160452]), array([3.14894044]))
In [14]:
X_new = np.linspace(0, 2, 100).reshape(-1, 1)
Y_predict = model.predict(X_new)
In [15]:
fig, ax = plt.subplots();

ax.scatter(X, Y);
ax.plot(X_new, Y_predict, 'r')
Out[15]:
[<matplotlib.lines.Line2D at 0x1a1f45b850>]