Linear Regression

Create Random Dataset

In [4]:
import numpy as np

X = 2*np.random.rand(100, 1)
Y = 4 + 3 * X + np.random.randn(100, 1)

Train Model

In [7]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X, Y)
Out[7]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [9]:
model.intercept_, model.coef_
Out[9]:
(array([4.10386405]), array([[2.9677176]]))

Predcition

In [12]:
X_new = np.linspace(0, 2, 100).reshape(-1, 1)
Y_predict = model.predict(X_new)
In [19]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots();

ax.scatter(X, Y);
ax.plot(X_new, Y_predict, 'r')
Out[19]:
[<matplotlib.lines.Line2D at 0x1a1fa72590>]

Approach in Scikit-Learn

$$\theta = X^{+}Y$$
  • $X^{+}$ is calculated by np.linalg.pinv() by Singular Value Decomposition (SVD), X is decomposed into $U\Sigma V^{T}$
  • $X^{+} = V\Sigma^{+}U^{T}$
  • $\Sigma^{+}$, takes $\Sigma$ and sets to zero all values smaller than a threshold value, then replaces all the nonzero values with their inverse, and finally transposes the resulting matrix

Computational Complexity

  • Normal Equation $$O(n^{2.4}) ~ O(n^{3})$$
  • SVD $$O(n^{2})$$