Cross Decomposition¶

Find the fundamental relations between two matrices (X and Y)
Convert X and Y to low-dimensional spaces and make the covariance of the two converted matrices maximal
Allows dimensionality reduction by taking into account the targets y

Partial Least Squares Canonical (PLSCanonical)¶

Dimensional reduction
Regression

import numpy as np
from sklearn.cross_decomposition import PLSCanonical
X = np.array([[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]])
Y = np.array([[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]])

plsca = PLSCanonical(n_components=2)
plsca.fit(X, Y)

PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2,
             scale=True, tol=1e-06)

# Predict targets of given samples
plsca.predict(X[:2, :])

array([[-0.17050568, -1.29053068],
       [-0.50933055,  0.66811955]])

# Dimensional reduction
X_c, Y_c = plsca.transform(X, Y)

# Transform data back to its original space
plsca.inverse_transform(X_c)

array([[ 0.39787189, -0.33790545,  0.62699746],
       [ 0.84311677,  0.13323811,  0.1470771 ],
       [ 1.41749463,  2.49471135,  2.54609533],
       [ 2.34151671,  4.70995599,  3.67983011]])

plsca.coef_

array([[ 2.40432756,  1.68695162],
       [-0.44704791,  5.41523294],
       [ 4.86740938, -0.33590654]])

PLSSVD¶

Simplified version of PLSCanonical

from sklearn.cross_decomposition import PLSSVD
pls = PLSSVD(n_components=2).fit(X, Y)

plsca.predict(X[:2, :])

array([[-0.17050568, -1.29053068],
       [-0.50933055,  0.66811955]])

X_c, Y_c = pls.transform(X, Y)
X_c, Y_c

(array([[-1.39700475, -0.10283021],
        [-1.19678754,  0.17159333],
        [ 0.56032252, -0.10849725],
        [ 2.03346977,  0.03973413]]),
 array([[-1.22601804, -0.01930121],
        [-0.9602955 ,  0.04015847],
        [ 0.32491535, -0.04311171],
        [ 1.86139819,  0.02225445]]))

PLSRegression¶

Known as PLS1 (single targets) and PLS2 (multiple targets)
A form of regularized linear regression

from sklearn.cross_decomposition import PLSRegression
pls2 = PLSRegression(n_components=2)

pls2.fit(X, Y)

PLSRegression(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06)

pls2.predict(X[:2, :])

array([[0.26087869, 0.15302213],
       [0.60667302, 0.45634164]])

Canonical Correlation Analysis (CCA)¶

Unstable if the number of features or targets is greater than the number of samples

from sklearn.cross_decomposition import CCA
cca = CCA(n_components=2)

cca.fit(X, Y)

CCA(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06)

cca.predict(X[:2, :])

array([[-1.51106526, -2.12247471],
       [-0.43537494,  0.32314375]])

X_c, Y_c = cca.transform(X, Y)
X_c, Y_c

(array([[-1.14979915,  0.07023102],
        [-0.95304207, -0.16529138],
        [ 0.35047354,  0.17359282],
        [ 1.75236768, -0.07853247]]),
 array([[-0.85511537,  0.0249032 ],
        [-0.70878547, -0.05861063],
        [ 0.26065014,  0.06155424],
        [ 1.3032507 , -0.02784681]]))