This post is a part of our series exploring different options for long-term demand forecasting. To better understand our journey, you might want to check out our introductory blog post: Long-Term Demand Forecasting

Today, we will cover another popular approach to forecasting – using Recurrent Neural Networks (RNNs), in particular LSTMs (Long Short-Term Memory) networks. We believed that the dataset is too small for this kind of machine learning models, but after initial experiments we achieved quite promising results.

Basic LSTMs

To test whether a network actually works, we implemented the most basic LSTM regression model possible – consisting of 4 lines of Keras code:

basic_model = Sequential()
basic_model.add(LSTM(500, input_shape=(X_train_vals.shape[1], X_train_vals.shape[2])))
basic_model.add(Dense(500))
basic_model.compile(loss='mean_absolute_error', optimizer='adam')

RNNs should not require any complex features for this task, so we decided to use a simple vector of sales from previous day for an input, and train the network to predict sales on the following day. We have then fed the prediction back to the network in order to predict more days – this is the same approach as we used for step by step regression models.

After seeing that this model actually works, we added more features – one-hot encoded days of the week and months of the year, in order to make sure the network takes seasonality into account. The results were quite good: we achieved a mean SMAPE of 12.40 on our validation set and 14.58 on Kaggle which was one of the top scores we had so far. This led us to explore more complex RNN architectures.

Combining LSTMs and CNNs

One of the first ideas that came to our attention was using Convolutional Recurrent Neural Networks (CNNs). We found a paper describing such architecture in enough detail to be easily reproduced. Although the authors of the paper proposed to use the network for classification of time series, not forecasting, we decided to try adapting it to our case. Adding the convolutional layers should allow the network to learn about relations between sales values, and would therefore outperform basic LSTMs and other models that predict sales of every item in every store independently.

After experimenting with a bunch of different network configurations, all based on the paper above, we found one that got really good results on our validation set (mean SMAPE 12.94) and decided to upload it, even though the validation score was slightly worse than the first LSTM model. It scored 14.02 on Kaggle which was an improvement – it seems to better generalize to new data and is less prone to overfitting. Having a promising network architecture, we tried tuning its hyperparameters, ending up with 14.01 on Kaggle.

Complex model architecture (LSTM with Conv. layers)

We experimented with various other RNN architectures, but couldn’t produce a better score. In particular, we tried double stacking LSTMs, using only CNNs and using them before LSTMs. Looking at what we managed to achieve with these models, it might be impossible to get any more accurate.

SMAPE Score Distribution on 500 time series

Code of both models is available on our Github.

Pros and cons of RNNs

Clearly, RNNs are the most powerful tools we can use for time predictions, and Keras makes them really easy to implement and train. Nonetheless, it is important to know about their limitations before actually using them in production. First of all, neural networks are not interpretable – this means that we have little to no idea about how their predictions are made. If the results need to be interpretable, it is probably best to stick with econometric models that we talked about at the very beginning of this series. For Kaggle contests, however, deep neural networks are clearly the best choice.

This is the end of our short series about forecasting demand. We invite you to still follow our blog, as there are more posts about machine learning coming soon. In the meantime, feel free to check out our code on GitHub.

Got a project idea? Let’s schedule a quick, 15-minutes call to discuss how Big Data & Data Science professional services may give you the edge your business needs. Get in touch