Probabilistic Forecasting
¶

  • produce low/high scenarios of forecasts
  • quantify uncertainty around forecasts
  • produce expected range of variation of forecasts

Train Forecaster¶

In [2]:
import numpy as np

from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster

# until fit, identical with the simple workflow
y = load_airline()

fh = np.arange(1, 13)

forecaster = ThetaForecaster(sp=12)
forecaster.fit(y, fh=fh)
y_pred = forecaster.predict()

Predict_interval¶

  • predict_interval(fh=None, X=None, coverage=0.90)
  • produces symmetric forecasting intervals
  • 0.5 - coverage/2 = 0.05, 0.5 + coverage/2 = 0.95
In [22]:
coverage = 0.9
y_pred_ints = forecaster.predict_interval(coverage=coverage)
y_pred_ints.head()
Out[22]:
Coverage
0.9
lower upper
1961-01 418.280121 464.281951
1961-02 402.215881 456.888055
1961-03 459.966113 522.110500
1961-04 442.589309 511.399214
1961-05 443.525027 518.409480
In [23]:
from sktime.utils.plotting import plot_series

plot_series(y, y_pred, labels=["y", "y_pred"], pred_interval=y_pred_ints)
Out[23]:
(<Figure size 1600x400 with 1 Axes>,
 <Axes: ylabel='Number of airline passengers'>)
In [26]:
fig, ax = plot_series(y, y_pred, labels=["y", "y_pred"])
ax.fill_between(
    ax.get_lines()[-1].get_xdata(), # x axis
    y_pred_ints["Coverage"][coverage]["lower"], # y_low axis
    y_pred_ints["Coverage"][coverage]["upper"], # y_high
    alpha=0.2,
    color=ax.get_lines()[-1].get_c(),
    label=f"{coverage} cov.pred.intervals",
)
ax.legend()
Out[26]:
<matplotlib.legend.Legend at 0x133d629d0>
In [35]:
# Multiple Coverages
coverages = [0.5, 0.8, 0.95]
y_pred_ints = forecaster.predict_interval(coverage=coverages)
y_pred_ints
Out[35]:
Coverage
0.50 0.80 0.95
lower upper lower upper lower upper
1961-01 431.849266 450.712806 423.360378 459.201694 413.873755 468.688317
1961-02 418.342514 440.761421 408.253656 450.850279 396.979011 462.124925
1961-03 478.296822 503.779790 466.829089 515.247523 454.013504 528.063109
1961-04 462.886144 491.102379 450.188398 503.800124 435.998232 517.990291
1961-05 465.613670 496.320837 451.794965 510.139542 436.352089 525.582418
1961-06 530.331440 563.342111 515.476124 578.197428 498.874797 594.798754
1961-07 586.791063 621.954661 570.966896 637.778829 553.282845 655.462879
1961-08 584.116789 621.308897 567.379760 638.045925 548.675556 656.750129
1961-09 505.795123 544.910684 488.192511 562.513297 468.520987 582.184821
1961-10 437.370840 478.319605 418.943257 496.747188 398.349800 517.340645
1961-11 377.660798 420.364142 358.443627 439.581313 336.967779 461.057161
1961-12 426.638370 471.026993 406.662797 491.002565 384.339409 513.325954
In [36]:
fig, ax = plot_series(y, y_pred, labels=["y", "y_pred"])

colors = ['red', 'blue', 'yellow']

for i, coverage in enumerate(coverages):
    ax.fill_between(
        ax.get_lines()[-1].get_xdata(), # x axis
        y_pred_ints["Coverage"][coverage]["lower"], # y_low axis
        y_pred_ints["Coverage"][coverage]["upper"], # y_high
        alpha=0.4,
        color=colors[i],
        label=f"{coverage} cov.pred.intervals",
    )
    
ax.legend()
Out[36]:
<matplotlib.legend.Legend at 0x133e81ad0>

Predict_quantiles¶

  • predict_quantiles(fh=None, X=None, alpha=[0.05, 0.95])
  • return quantile values of forecasting
In [5]:
y_pred_quantiles = forecaster.predict_quantiles(alpha=[0.275, 0.975])
y_pred_quantiles.head()
Out[5]:
Quantiles
0.275 0.975
1961-01 432.922219 468.688317
1961-02 419.617696 462.124925
1961-03 479.746287 528.063109
1961-04 464.491077 517.990291
1961-05 467.360286 525.582418
In [ ]:
 
In [50]:
fig, ax = plot_series(y, y_pred, labels=["y", "y_pred"])

colors = ['red', 'blue', 'yellow']

ax.fill_between(
    ax.get_lines()[-1].get_xdata(), # x axis
    y_pred_quantiles["Quantiles"][0.275], # y_low axis
    y_pred_quantiles["Quantiles"][0.975], # y_high
    alpha=0.2,
    color=ax.get_lines()[-1].get_c(),
    label=f"0.275-0.975",
)
ax.legend()
Out[50]:
<matplotlib.legend.Legend at 0x13432e0d0>

Predict_var¶

  • predict_var(fh=None, X=None, cov=False)
  • produces variance forecasts
  • not all estimators support cov
In [9]:
y_pred_var = forecaster.predict_var(cov=False) #
y_pred_var.head()
Out[9]:
0
1961-01 195.540049
1961-02 276.196509
1961-03 356.852968
1961-04 437.509428
1961-05 518.165887

predict_proba¶

  • predict_proba(fh=None, X=None, marginal=True)
  • forecasting mu values and their standard deviation
In [51]:
y_pred_proba = forecaster.predict_proba()
y_pred_proba # mu, sigma
Out[51]:
Normal(columns=Index(['Number of airline passengers'], dtype='object'),
       index=PeriodIndex(['1961-01', '1961-02', '1961-03', '1961-04', '1961-05', '1961-06',
             '1961-07', '1961-08', '1961-09', '1961-10', '1961-11', '1961-12'],
            dtype='period[M]'),
       mu=         Number of airline passengers
1961-01                    441.281036
1961-02                    429.551968
1961-03                    491.038306
1961-04                    476.994261
1961-05                    480.967253
1961-06                    546.836776
1961-07                    604.372862
1961-08                    602.712843
1961-09                    525.352904
1961-10                    457.845222
1961-11                    399.012470
1961-12                    448.832681,
       sigma=                 0
1961-01  13.983564
1961-02  16.619161
1961-03  18.890552
1961-04  20.916726
1961-05  22.763257
1961-06  24.470847
1961-07  26.066814
1961-08  27.570551
1961-09  28.996409
1961-10  30.355365
1961-11  31.656036
1961-12  32.905336)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Normal(columns=Index(['Number of airline passengers'], dtype='object'),
       index=PeriodIndex(['1961-01', '1961-02', '1961-03', '1961-04', '1961-05', '1961-06',
             '1961-07', '1961-08', '1961-09', '1961-10', '1961-11', '1961-12'],
            dtype='period[M]'),
       mu=         Number of airline passengers
1961-01                    441.281036
1961-02                    429.551968
1961-03                    491.038306
1961-04                    476.994261
1961-05                    480.967253
1961-06                    546.836776
1961-07                    604.372862
1961-08                    602.712843
1961-09                    525.352904
1961-10                    457.845222
1961-11                    399.012470
1961-12                    448.832681,
       sigma=                 0
1961-01  13.983564
1961-02  16.619161
1961-03  18.890552
1961-04  20.916726
1961-05  22.763257
1961-06  24.470847
1961-07  26.066814
1961-08  27.570551
1961-09  28.996409
1961-10  30.355365
1961-11  31.656036
1961-12  32.905336)
In [78]:
# y_pred_proba.mean()
y_pred_proba.var()
Out[78]:
Number of airline passengers
1961-01 195.540049
1961-02 276.196509
1961-03 356.852968
1961-04 437.509428
1961-05 518.165887
1961-06 598.822347
1961-07 679.478807
1961-08 760.135266
1961-09 840.791726
1961-10 921.448185
1961-11 1002.104645
1961-12 1082.761105

Reference¶

  • Forecasting
  • Probabilistic Forecasting