Seyeon An April 30, 2021

<aside> 🔗 We have reposted this blog on our Medium publication. Read this on Medium.

</aside>

Ranging from the fluctuating stock price to the complex traffic situation, most of the data we see daily are represented as time series. Such time series data have temporal patterns, and these patterns can be both clear and ambiguous. Time series forecasting is thus a common problem which is actively studied in machine learning and data mining.

Most time series data are multivariate. This refers to time series variables that have correlations to each other. We utilize the relationships between these variables to perform the jobs above, as predicting the traffic of a road, predicting the electricity consumption of a city, and predicting the price of a stock. In other words, we can hugely improve the accuracy of time series forecasting using the relationship in between different regions, different cities, and different stocks.

Thus, the core problem of the paper is to improve the multivariate time series forecasting model, which handles the problem below:

<aside> 💡 Given multivariate time series $X$—with the number of variables $d$ and the number of recent observations $w$—and prediction horizon $h$, predict the observation $y$ after $h$ time steps.

</aside>

Multivariate Time Series Forecasting

There currently exist a number of models for multivariate time series forecasting. Yet, existing models require too many parameters, since it considers the patterns in each variable and the relationships between different variables in a single step. Since there are many variables to predict with insufficient training data in the case of time series data, overfitting is very common, which lowers prediction model efficiency and makes hyperparameter tuning very hard. The trend of time series often changes over time, which forces complex models to fail at future predictions.

There have been a few past attempts for building such model. For example, the LSTNet (Lai et al., SIGIR 2018) model showed great accuracy using the RNN (recurrent neural networks) structure, the convolution kernels, and the temporal attention on time scale, but showed low efficiency because it required many parameters. Thus, the goal of this work is to build an accurate and efficient forecasting model, which improves both the accuracy and the efficiency of the existing forecasting models.

Attention-Based Autoregression (AttnAR)

This paper proposes an AttnAR (Attention-Based Autoregression) model which shows great performance using simple attention operations. The structure of the AttnAR model is illustrated in the figure below, and the main ideas of the model can be illustrated as below:

Separable Modules : Let's say we want to predict the stock price of BTC, ETH, an XRP— using the multivariate time series forecasting model. We don't want our model to lose its efficiency and speed due to its heavy volume, when being punctual is more important than anything else. That's why we've separate these modules, to make it easy to tune its capacity based on the property of each dataset, with separable modules that aim at a sepcific objective.
- The extractor module captures univariate patterns, and transforms them to pattern vectors.
- The attention module correlates multiple variables.
- The predictor module simply produces the final prediction given the pattern vectors.

Separable Modules in AttnAR

Mixed Convolution Extractor : The mixed convolution extractor model captures
Shallow dense layers connect distance time steps and shows low degree of abstraction, while deep convolution layers focus on adjacent time steps and shows high degree of abstraction. Amongst the two of them, which extractor model should be used for fast and accurate prediction?
The AttnAR model suggests the MCE (Mixed Convolution Extractor) model to capture complex variable-wise patterns, which consists of both deep convolutions and shallow fully-connected layers.

Mixed Convolution Extractor in AttnAR

Time-invariant Attention : Attention is a known technique to compute a weighted average of values given in a query.