Deep4Cast Documentation¶
Decision making incorporating uncertainty.¶
This package is under active development. Things may change :-).
Deep4Cast
is a scalable machine learning package implemented in Python
and Torch
. It has a front-end API similar to scikit-learn
. It is designed for medium to large time series data sets and allows for modeling of forecast uncertainties.
The network architecture is based on WaveNet
. Regularization and approximate sampling from posterior predictive distributions of forecasts are achieved via Concrete Dropout
.
Authors¶
- Toby Bischoff
- Austin Gross
- Kenneth Tran
References¶
- Concrete Dropout is used for approximate posterior Bayesian inference.
- Wavenet is used as encoder network.
Getting Started¶
Main Requirements¶
Installation¶
Deep4cast can be cloned from GitHub. Before installing we recommend setting up a clean virtual environment.
From the package directory install the requirements and then the package.
Tutorial: Forecasting Github Daily Active Users¶
In this notebook, we show how to use Deep4Cast to forecast a single time series of Github daily active users. The data can be gathered from Github Archive and is entirely public. The idea here is to show how to handle a short timeseries with many characteristics, e.g., positive values, trending, multiple seasonalities, etc.
[1]:
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import torch
from torch.utils.data import DataLoader
from deep4cast.forecasters import Forecaster
from deep4cast.models import WaveNet
from deep4cast.time_series_dataset import TimeSeriesDataset
import deep4cast.metrics as metrics
# Make RNG predictable
np.random.seed(0)
torch.manual_seed(0)
# Use a gpu if available, otherwise use cpu
device = ('cuda' if torch.cuda.is_available() else 'cpu')
%matplotlib inline
Forecasting parameters¶
We first need to specify how much history to use in creating a forecast of a given length:
- horizon = time steps (days) to forecast
- lookback = time steps (days) leading up to the period to be forecast
[2]:
horizon = 90
lookback = 256
Dataset¶
Loading and visualization¶
The data set consists of only one time series, Github daily active users, that we want to model.
[3]:
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
# Get data
df = pd.read_csv('data/raw/github_dau_2011-2018.csv')
# Remove NaNs
df = df.dropna()
# Convert date to datetime
df['date'] = pd.to_datetime(df.date)
# Create and age variable
df['age'] = df.index.astype('int')
# Create a day of week field
df['day'] = df.date.dt.dayofweek
# Create a month of year field
df['month'] = df.date.dt.month
# Create a boolean for US federal holidays
holidays = calendar().holidays(start=df.date.min(), end=df.date.max())
df['holiday'] = df['date'].isin(holidays).apply(int)
# Rearrange columns
df = df[
[
'date',
'count',
'age',
'month',
'day',
'holiday'
]
]
# Create monthly dummies
tmp = pd.get_dummies(df.month)
tmp.columns = ['month' + str(value) for value in tmp.columns]
df = pd.concat([df, tmp], axis=1)
# Create daily dummies
tmp = pd.get_dummies(df.day)
tmp.columns = ['day' + str(value) for value in tmp.columns]
df = pd.concat([df, tmp], axis=1)
# Reset index
df = df.reset_index(drop=True)
# Drop unnecessary columns
df = df.drop(['month', 'day', 'age'], axis=1)
data = df.dropna()
# Plot the data to help our imaginations
data[['count']].plot(subplots=True, figsize=(15, 8))
plt.xlabel('Time in days since beginning')
plt.ylabel('DAU')
plt.show()

Divide into train and test¶
[4]:
# Calculate training & testing boundary
test_ind = data[data['date'] == dt.datetime(2017,6,4)].index[0]
data = data.set_index('date')
data_arr = data.values
data_arr = np.expand_dims(data_arr.T, 0) # Get array into right shape for chunking
# Sequentialize the training and testing dataset
data_train, data_test = [], []
for time_series in data_arr:
data_train.append(time_series[:, :test_ind],)
data_test.append(time_series[:, test_ind-lookback:])
data_train = np.array(data_train)
data_test = np.array(data_test)
We follow Torchvision in processing examples using Transforms chained together by Compose.
Tensorize
creates a tensor of the example.LogTransform
natural logarithm of the targets after adding the offset (similar to torch.log1p).RemoveLast
subtracts the final value in thelookback
from bothlookback
andhorizon
.Target
specifies which index in the array to forecast.
[5]:
transform = [
{'Tensorize': None},
{'LogTransform': {'targets': [0], 'offset': 1.0}},
{'RemoveLast': {'targets': [0]}},
{'Target': {'targets': [0]}}
]
TimeSeriesDataset
inherits from Torch Datasets for use with Torch DataLoader. It handles the creation of the examples used to train the network using lookback
and horizon
to partition the time series.
[6]:
data_train = TimeSeriesDataset(
data_train,
lookback,
horizon,
transform=transform
)
data_test = TimeSeriesDataset(
data_test,
lookback,
horizon,
step=horizon,
transform=transform
)
# Create mini-batch data loader
dataloader_train = DataLoader(
data_train,
batch_size=32,
shuffle=True,
pin_memory=True,
num_workers=1
)
dataloader_test = DataLoader(
data_test,
batch_size=3,
shuffle=False
)
WaveNet¶
Our network architecture is based on ideas related to WaveNet (see references). We employ the same architecture with a few modifications (e.g., a fully connected output layer for vector forecasts). It turns out that we do not need many layers in this example to achieve state-of-the-art results, most likely because of the simple autoregressive nature of the data.
In many ways, a temporal convoluational architecture is among the simplest possible architecures that we could employ using neural networks. In our approach, every layer has the same number of convoluational filters and uses residual connections.
[7]:
# Define the model architecture
n_layers = 6
model = WaveNet(input_channels=21,
output_channels=1,
horizon=horizon,
n_layers=n_layers)
# .. and the optimizer
optim = torch.optim.Adam(model.parameters(), lr=0.001)
# .. and the loss
loss = torch.distributions.Normal
print('Number of model parameters: {}.'.format(model.n_parameters))
print('Receptive field size: {}.'.format(model.receptive_field_size))
Number of model parameters: 116734.
Receptive field size: 64.
[8]:
# Fit the forecaster
forecaster = Forecaster(model, loss, optim, n_epochs=10, device=device)
forecaster.fit(dataloader_train, eval_model=True)
Number of model parameters being fitted: 116734.
Epoch 1/10 [1940/1940 (98%)] Loss: 0.043966 Elapsed/Remaining: 0m4s/0m32s
Training error: 7.59e+00.
Epoch 2/10 [1940/1940 (98%)] Loss: -0.156904 Elapsed/Remaining: 0m8s/0m32s
Training error: -2.01e+01.
Epoch 3/10 [1940/1940 (98%)] Loss: -0.342942 Elapsed/Remaining: 0m13s/0m30s
Training error: -3.69e+01.
Epoch 4/10 [1940/1940 (98%)] Loss: -0.732276 Elapsed/Remaining: 0m17s/0m26s
Training error: -4.65e+01.
Epoch 5/10 [1940/1940 (98%)] Loss: -0.605564 Elapsed/Remaining: 0m22s/0m22s
Training error: -4.94e+01.
Epoch 6/10 [1940/1940 (98%)] Loss: -0.755465 Elapsed/Remaining: 0m26s/0m17s
Training error: -5.77e+01.
Epoch 7/10 [1940/1940 (98%)] Loss: -0.558515 Elapsed/Remaining: 0m31s/0m13s
Training error: -6.27e+01.
Epoch 8/10 [1940/1940 (98%)] Loss: -0.746483 Elapsed/Remaining: 0m35s/0m9s
Training error: -6.75e+01.
Epoch 9/10 [1940/1940 (98%)] Loss: -0.865499 Elapsed/Remaining: 0m40s/0m4s
Training error: -7.02e+01.
Epoch 10/10 [1940/1940 (98%)] Loss: -0.876480 Elapsed/Remaining: 0m44s/0m0s
Training error: -7.20e+01.
[9]:
# Let's plot the training error
plt.plot(range(1, forecaster.n_epochs+1), forecaster.history['training'])
plt.xlabel('Epoch')
plt.ylabel('Training error')
plt.show()

Evaluation¶
Before any evaluation score can be calculated, we need to transform the output forecasts.
[10]:
# Get time series of actuals for the testing period
y_test = []
for example in dataloader_test:
example = data_test.uncompose(example)
y_test.append(example['y'])
y_test = np.concatenate(y_test)
y_test = np.reshape(y_test, y_test.shape[0] * y_test.shape[2])
# Get corresponding predictions
y_samples = forecaster.predict(dataloader_test, n_samples=100)
# The samples are of the shape (n_samples, n_series, n_covariates, n_timesteps), whereas the actuals are of the shape
# (n_series, n_covariates, n_timesteps)
shape_new = (y_samples.shape[0], # number of samples
y_samples.shape[2], # number of targets
y_samples.shape[1] * # number of series
y_samples.shape[3]) # number of timesteps
y_samples = np.reshape(y_samples, shape_new)
y_test_mean = np.mean(y_samples, axis=0)
y_test_lq = np.percentile(y_samples, q=5, axis=0)
y_test_uq = np.percentile(y_samples, q=95, axis=0)
We calculate the symmetric MAPE and pinball loss, as well as the empirical coverage for the test set data.
[11]:
# Evaluate forecasts
test_smape = metrics.smape(y_samples, y_test)
test_perc = np.arange(1, 99)
test_cov = metrics.coverage(y_samples, y_test, percentiles=test_perc)
print('SMAPE: {}%'.format(test_smape))
plt.figure(figsize=(6, 6))
plt.plot(test_perc, test_perc)
plt.plot(test_perc, test_cov)
plt.xlabel('actual percentile')
plt.ylabel('model percentile')
plt.xlabel('actual percentile')
plt.ylabel('model percentile')
plt.title('Coverage')
plt.show()
SMAPE: 7.026%

Let’s have a closer look at what a forecast looks like. We can use the model output to graph the mean, 95th and 5th percentiles. We’ll also graph the actuals for comparison.
[12]:
# We're printing the test set data and the predictions for the load data
plt.figure(figsize=(16, 4))
plt.plot(y_test, 'k-')
plt.plot(y_test_mean.T, 'g-')
plt.plot(y_test_uq.T, 'g--', alpha=0.25)
plt.plot(y_test_lq.T, 'g--', alpha=0.25)
plt.ylim([0, 0.4e6])
plt.xlim([0, y_test.shape[-1]])
plt.ylabel('DAU')
plt.show()

Dataset¶
Inherits from pytorch datasets to allow use with pytorch dataloader.
-
class
time_series_dataset.
TimeSeriesDataset
(time_series: list, lookback: int, horizon: int, step=1, transform=None, static_covs=None)¶ Take a list of time series and provides access to windowed subseries for training.
Parameters: - time_series – List of time series arrays.
- lookback – Length of time window used as input for forecasting.
- horizon – Number of time steps to forecast.
- step – Time step size between consecutive examples.
- dropout_regularizer – Generally needs to be set to 2 / N, where N is the number of training examples.
- init_range – Initial range for dropout probabilities.
- channel_wise – Determines if dropout is appplied accross all input or across channels .
Initialize variables.
Transformations¶
Transformations of the time series intended to be used in a similar fashion to torchvision. The user specifies a dictionary of Transformations in a particular order.
Example:
>>> transform = [
>>> {'Tensorize': None},
>>> {'LogTransform': {'targets': [0], 'offset': 1.0}},
>>> {'RemoveLast': {'targets': [0]}},
>>> {'Target': {'targets': [0]}}
>>> ]
-
class
transforms.
LogTransform
(offset=0.0, targets=None)¶ Natural logarithm of target covariate + offset.
\[y_i = log_e ( x_i + \mbox{offset} )\]
-
class
transforms.
RemoveLast
(targets=None)¶ Subtract last time series points from time series.
-
class
transforms.
Target
(targets)¶ Retain only target covariates for output.
-
class
transforms.
Tensorize
(device='cpu')¶ Convert ndarrays to Tensors.
Models¶
WaveNet¶
Implements WaveNet architecture for time series forecasting.
- References:
-
class
models.
WaveNet
(input_channels: int, output_channels: int, horizon: int, hidden_channels=64, skip_channels=64, dense_units=128, n_layers=7, n_blocks=1, dilation=2)¶ Parameters: - input_channels – Number of covariates in input time series.
- output_channels – Number of covariates in target time series.
- horizon – Number of time steps for forecast.
- hidden_channels – Number of channels in convolutional hidden layers.
- skip_channels – Number of channels in convolutional layers for skip connections.
- dense_units – Number of hidden units in final dense layer.
- n_layers – Number of layers per Wavenet block (determines receptive field size).
- n_blocks – Number of Wavenet blocks.
- dilation – Dilation factor for temporal convolution.
-
decode
(inputs)¶ Decoder part of the architecture.
-
encode
(inputs)¶ Encoder part of the architecture.
-
forward
(inputs)¶ Returns the parameters for a Gaussian distribution.
-
n_parameters
¶ Return the number of parameters of model.
-
receptive_field_size
¶ Return the length of the receptive field.
Forecasters¶
-
class
forecasters.
Forecaster
(model: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffac20518>, loss=<sphinx.ext.autodoc.importer._MockObject object>, optimizer=<sphinx.ext.autodoc.importer._MockObject object>, n_epochs=1, device='cpu', checkpoint_path='./', verbose=True, nan_budget=5)¶ Handles training of a PyTorch model. Can be used to generate samples from approximate posterior predictive distribution.
Parameters: - model – Deep4cast neural network of class
models
- loss – PyTorch distribution
- optimizer – PyTorch optimizer
- n_epochs – number of training epochs
- device – device used for training (cpu or cuda)
- checkpoint_path – path for writing model checkpoints
- verbose – switch to toggle verbosity of forecaster during fitting
- nan_budget – how many time the forecaster will try to continue batch training when NaN encountered.
-
embed
(dataloader, n_samples=100)¶ Generate embedding vectors.
-
fit
(dataloader_train, dataloader_val=None, eval_model=False)¶ Fit model to data.
-
predict
(dataloader, n_samples=100)¶ Generate predictions.
- model – Deep4cast neural network of class
Metrics¶
Common evaluation metrics for time series forecasts.
-
metrics.
corr
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffad64978>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffad649b0>, agg=None, **kwargs)¶ Returns the Pearson correlation coefficient between observed values and aggregated predictions.
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
coverage
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6860>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6320>, percentiles=None, **kwargs)¶ Computes coverage rates of the prediction interval.
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- percentiles – Percentiles to compute coverage for.
-
metrics.
mae
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffad64550>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffad645f8>, agg=None, **kwargs)¶ Computes mean absolute error (MAE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
mape
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffad64a20>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffab38390>, agg=None, **kwargs)¶ Computes mean absolute percentage error (MAPE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
mase
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffab38470>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffab384a8>, data_insample, frequencies, agg=None, **kwargs)¶ Computes mean absolute scaled error (MASE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- data_insample – Insample time series values.
- frequencies – Frequencies used for seasonal naive forecast.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
mse
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6668>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6160>, agg=None, **kwargs)¶ Computes mean squared error (MSE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
pinball_loss
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad62e8>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffabfc7b8>, percentiles=None, **kwargs)¶ Computes pinball loss of a quantile \(\tau\) given the actual value \(y\) and the predicted quantile \(z\).
\[\begin{split}L_{\tau} ( y , z ) = \begin{cases} & ( y - z ) \tau &\mbox{if } y \geq z \\ & ( z - y ) ( 1 - \tau ) &\mbox{if } z > y \end{cases}\end{split}\]Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- percentiles – Percentiles to compute coverage for.
-
metrics.
rmse
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad64a8>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad66a0>, agg=None, **kwargs)¶ Computes root-mean squared error (RMSE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
rse
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad69e8>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6898>, agg=None, **kwargs)¶ Computes root relative squared error (RSE)
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.
-
metrics.
smape
(data_samples: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffab38358>, data_truth: <sphinx.ext.autodoc.importer._MockObject object at 0x7fcffaad6588>, agg=None, **kwargs)¶ Computes symmetric mean absolute percentage error (SMAPE) on the mean
Parameters: - data_samples – Predicted time series values (n_timesteps, n_timeseries).
- data_truth – Actual values observed.
- agg – Property of the forecast distribution to use for evaluation.