Pipeline
¶

Pipeline¶

  • Fit
    • Transformers, fit_transform, fit and transform raw data with preprocesing functions
    • Forecaster, fit, fit model
  • Predict
    • Forecaster, predict, forecast
    • Transformers, inverse_transform, convert forecast to raw data like results
In [1]:
import warnings
warnings.filterwarnings('ignore')

Load Data¶

In [2]:
from sktime.datasets import load_longley
_, y = load_longley() # 16*5
y.head()
Out[2]:
GNPDEFL GNP UNEMP ARMED POP
Period
1947 83.0 234289.0 2356.0 1590.0 107608.0
1948 88.5 259426.0 2325.0 1456.0 108632.0
1949 88.2 258054.0 3682.0 1616.0 109773.0
1950 89.5 284599.0 3351.0 1650.0 110929.0
1951 96.2 328975.0 2099.0 3099.0 112075.0
In [3]:
from sktime.forecasting.model_selection import temporal_train_test_split
y_train, y_test = temporal_train_test_split(y, test_size=4) # hold out last 4 years

Build pipeline with TransformedTargetForecaster¶

In [4]:
from sktime.forecasting.var import VAR
from sktime.forecasting.model_evaluation import evaluate
from sktime.utils.plotting import plot_series
In [5]:
import numpy as np
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.detrend import Detrender
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.transformations.series.func_transform import FunctionTransformer

# create pipeline
trend = PolynomialTrendForecaster(degree=1)

def forward_transform(y):
    return y*10

def backward_transform(y):
    return y/10

forecaster = TransformedTargetForecaster(
    [
        ("trend", Detrender(forecaster=trend)),
        ("process-function", FunctionTransformer(func = forward_transform, inverse_func = backward_transform)),
        ("forecast", VAR()),
    ]
)

# training
forecaster.fit(y_train)


# forecasting
fh = np.arange(1, 5)
y_pred = forecaster.predict(fh)

# evaluation
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
mean_absolute_percentage_error(y_test, y_pred, symmetric=False, multioutput = 'raw_values')
Out[5]:
array([0.01881077, 0.02820391, 0.25109832, 0.10095591, 0.01535041])
In [6]:
import matplotlib.pyplot as plt

def get_plots(y_train, y_test, y_pred):
    columns = list(y_train.columns)
    
    for column in columns:
        fig, ax = plt.subplots(figsize=(8, 6))
        line1, = ax.plot(y_train.index.to_timestamp(), y_train[column], 'bo-')
        line2, = ax.plot(y_test.index.to_timestamp(), y_test[column], 'go-')
        line3, = ax.plot(y_pred.index.to_timestamp(), y_pred[column], 'yo-')
        ax.legend((line1, line2, line3), ('y', 'y_test', 'y_pred'))
        ax.set_ylabel(column)
    
# visualization
get_plots(y_train, y_test, y_pred)

Implement pipeline step by step¶

In [7]:
# create preprocessing pipeline
trend = PolynomialTrendForecaster(degree=1)

def forward_transform(y):
    return y*10

def backward_transform(y):
    return y/10

preprocessing =  Detrender(forecaster=trend) \
                * FunctionTransformer(func = forward_transform, inverse_func = backward_transform)

# preprocessing, fit_transform
data_pre = preprocessing.fit_transform(y_train)

fh = np.arange(1, 5)
forecaster = VAR()

# training, fit
forecaster.fit(data_pre)

# forecasting, predcit
data_pred = forecaster.predict(fh)

# evaluation
mean_absolute_percentage_error(y_test, y_pred, symmetric=False, multioutput = 'raw_values')
Out[7]:
array([0.01881077, 0.02820391, 0.25109832, 0.10095591, 0.01535041])

Build pipeline with multiplication¶

In [8]:
# create preprocessing pipeline
trend = PolynomialTrendForecaster(degree=1)

def forward_transform(y):
    return y*10

def backward_transform(y):
    return y/10

preprocessing =  Detrender(forecaster=trend) \
                * FunctionTransformer(func = forward_transform, inverse_func = backward_transform)

# select a model

# build pipeline
forecaster = (
   preprocessing * VAR()
)

# training
forecaster.fit(y_train)

# forecasting
fh = np.arange(1, 5)
y_pred = forecaster.predict(fh)

# evaluation
mean_absolute_percentage_error(y_test, y_pred, symmetric=False, multioutput = 'raw_values')
Out[8]:
array([0.01881077, 0.02820391, 0.25109832, 0.10095591, 0.01535041])

Reference¶

  • Forecasting
  • Build Complex Time Series Regression Pipelines with sktime
  • API