Pipeline
¶

Pipeline¶

  • Fit
    • Transformers, fit_transform, fit and transform raw data with preprocesing functions
    • Forecaster, fit, fit model
  • Predict
    • Forecaster, predict, forecast
    • Transformers, inverse_transform, convert forecast to raw data like results

Build pipeline with TransformedTargetForecaster¶

In [4]:
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.utils.plotting import plot_series
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
import numpy as np

# load data and prepare training data and test data
y = load_airline() # 144 for 12 years
y_train, y_test = temporal_train_test_split(y, test_size=36) # hold out last 3 years

#### pipeline
forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", Deseasonalizer(model="multiplicative", sp=12)),
        ("forecast", ARIMA()),
    ]
)

# training
forecaster.fit(y_train)

# forecasting
fh = np.arange(1, 37)
y_pred = forecaster.predict(fh)

# evaluation
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
mean_absolute_percentage_error(y_test, y_pred, symmetric=False)
/opt/anaconda3/envs/python3.11/lib/python3.11/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
Out[4]:
0.13969973600689436

Implement pipeline step by step¶

In [5]:
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import ARIMA
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.utils.plotting import plot_series
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
import numpy as np

# load data and prepare training data and test data
y = load_airline() # 144 for 12 years
y_train, y_test = temporal_train_test_split(y, test_size=36) # hold out last 3 years

# preprocessing
def forward_transform(y):
    return y*10

def backward_transform(y):
    return y/10

preprocessing =  Deseasonalizer(model="multiplicative", sp=12) \
                * Deseasonalizer(model="multiplicative", sp=3) \
                * FunctionTransformer(func = forward_transform, inverse_func = backward_transform)

# preprocessing, fit_transform
data_pre = preprocessing.fit_transform(y_train)

fh = np.arange(1, 37)
forecaster = ARIMA()

# training, fit
forecaster.fit(data_pre)

# forecasting, predcit
data_pred = forecaster.predict(fh)

# inverse preprocesing, inverse_transform
y_pred = preprocessing.inverse_transform(data_pred)

plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
mean_absolute_percentage_error(y_test, y_pred, symmetric=False)
/opt/anaconda3/envs/python3.11/lib/python3.11/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
Out[5]:
0.13847654137903667

Build pipeline with multiplication¶

In [6]:
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.arima import ARIMA
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.utils.plotting import plot_series
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
import numpy as np

# load data and prepare training data and test data
y = load_airline() # 144 for 12 years
y_train, y_test = temporal_train_test_split(y, test_size=36) # hold out last 3 years

def forward_transform(y):
    return y*10

def backward_transform(y):
    return y/10

# preprocessing pipeline
preprocessing =  Deseasonalizer(model="multiplicative", sp=12) \
                * Deseasonalizer(model="multiplicative", sp=3) \
                * FunctionTransformer(func = forward_transform, inverse_func = backward_transform)
    
# build pipeline
forecaster = (
   preprocessing * ARIMA()
)

# training
forecaster.fit(y_train)

# forecasting
fh = np.arange(1, 37)
y_pred = forecaster.predict(fh)

plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
mean_absolute_percentage_error(y_test, y_pred, symmetric=False)
/opt/anaconda3/envs/python3.11/lib/python3.11/site-packages/sktime/forecasting/compose/_pipeline.py:91: UserWarning: in TransformedTargetForecaster, found steps of length 1, this will result in the same behaviour as not wrapping the single step in a pipeline. Consider not wrapping steps in TransformedTargetForecaster as it is redundant.
  warn(msg)
/opt/anaconda3/envs/python3.11/lib/python3.11/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
Out[6]:
0.13847654137903667

Reference¶

  • Forecasting
  • Build Complex Time Series Regression Pipelines with sktime
  • API