A comprehensive introduction to how ML can add value to the design and execution of algorithmic trading strategies
View the Project on GitHub stefan-jansen/machine-learning-for-trading
In this chapter, we will build dynamic linear models to explicitly represent time and include variables observed at specific intervals or lags. A key characteristic of time-series data is their sequential order: rather than random samples of individual observations as in the case of cross-sectional data, our data are a single realization of a stochastic process that we cannot repeat.
Our goal is to identify systematic patterns in time series that help us predict how the time series will behave in the future. More specifically, we focus on models that extract signals from a historical sequence of the output and, optionally, other contemporaneous or lagged input variables to predict future values of the output. For example, we might try to predict future returns for a stock using past returns, combined with historical returns of a benchmark or macroeconomic variables. We focus on linear time-series models before turning to nonlinear models like recurrent or convolutional neural networks in Part 4.
Time-series models are very popular given the time dimension inherent to trading. Key applications include the prediction of asset returns and volatility, as well as the identification of co-movements of asset price series. Time-series data are likely to become more prevalent as an ever-broader array of connected devices collects regular measurements with potential signal content.
We first introduce tools to diagnose time-series characteristics and to extract features that capture potential patterns. Then we introduce univariate and multivariate time-series models and apply them to forecast macro data and volatility patterns. We conclude with the concept of cointegration and how to apply it to develop a pairs trading strategy.
Most of the examples in this section use data provided by the Federal Reserve that you can access using the pandas datareader that we introduced in Chapter 2, Market and Fundamental Data.
Time series data typically contains a mix of various patterns that can be decomposed into several components, each representing an underlying pattern category. In particular, time series often consist of the systematic components trend, seasonality and cycles, and unsystematic noise. These components can be combined in an additive, linear model, in particular when fluctuations do not depend on the level of the series, or in a non-linear, multiplicative model.
pandas
Time Series and Date functionality docsThe pandas library includes very flexible functionality to define various window types, including rolling, exponentially weighted and expanding windows.
pandas
window function docsAutocorrelation (also called serial correlation) adapts the concept of correlation to the time series context: just as the correlation coefficient measures the strength of a linear relationship between two variables, the autocorrelation coefficient measures the extent of a linear relationship between time series values separated by a given lag.
We present the following tools to measure autocorrelation:
The statistical properties, such as the mean, variance, or autocorrelation, of a stationary time series are independent of the period, that is, they don’t change over time. Hence, stationarity implies that a time series does not have a trend or seasonal effects and that descriptive statistics, such as the mean or the standard deviation, when computed for different rolling windows, are constant or do not change much over time.
To satisfy the stationarity assumption of linear time series models, we need to transform the original time series, often in several steps. Common transformations include the application of the (natural) logarithm to convert an exponential growth pattern into a linear trend and stabilize the variance, or differencing.
Unit roots pose a particular problem for determining the transformation that will render a time series stationary. In practice, time series of interest rates or asset prices are often not stationary, for example, because there does not exist a price level to which the series reverts. The most prominent example of a non-stationary series is the random walk.
The defining characteristic of a unit-root non-stationary series is long memory: since current values are the sum of past disturbances, large innovations persist for much longer than for a mean-reverting, stationary series. Identifying the correct transformation, and in particular, the appropriate number and lags for differencing is not always clear-cut. We present a few heuristics to guide the process.
Statistical unit root tests are a common way to determine objectively whether (additional) differencing is necessary. These are statistical hypothesis tests of stationarity that are designed to determine whether differencing is required.
statsmodels
Time Series Analysis docsUnivariate time series models relate the value of the time series at the point in time of interest to a linear combination of lagged values of the series and possibly past disturbance terms.
While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data. ARIMA(p, d, q) models require stationarity and leverage two building blocks:
An AR model of order p aims to capture the linear dependence between time series values at different lags. It closely resembles a multiple linear regression on lagged values of the outcome.
An MA model of order q uses q past disturbances rather than lagged values of the time series in a regression-like model. Since we do not observe the white-noise disturbance values, MA(q) is not a regression model like the ones we have seen so far. Rather than using least squares, MA(q) models are estimated using maximum likelihood (MLE).
Autoregressive integrated moving-average ARIMA(p, d, q) models combine AR(p) and MA(q) processes to leverage the complementarity of these building blocks and simplify model development by using a more compact form and reducing the number of parameters, in turn reducing the risk of overfitting.
We will build a SARIMAX model for monthly data on an industrial production time series for the 1988-2017 period. See notebook arima_models for implementation details.
A particularly important area of application for univariate time series models is the prediction of volatility. The volatility of financial time series is usually not constant over time but changes, with bouts of volatility clustering together. Changes in variance create challenges for time series forecasting using the classical ARIMA models.
The development of a volatility model for an asset-return series consists of four steps:
The notebook arch_garch_models demonstrates the usage of the ARCH library to estimate time series models for volatility forecasting with NASDAQ data.
Multivariate time series models are designed to capture the dynamic of multiple time series simultaneously and leverage dependencies across these series for more reliable predictions.
Univariate time-series models like the ARMA approach are limited to statistical relationships between a target variable and its lagged values or lagged disturbances and exogenous series in the ARMAX case. In contrast, multivariate time-series models also allow for lagged values of other time series to affect the target. This effect applies to all series, resulting in complex interactions.
In addition to potentially better forecasting, multivariate time series are also used to gain insights into cross-series dependencies. For example, in economics, multivariate time series are used to understand how policy changes to one variable, such as an interest rate, may affect other variables over different horizons.
The vector autoregressive VAR(p) model extends the AR(p) model to k series by creating a system of k equations where each contains p lagged values of all k series.
VAR(p) models also require stationarity, so that the initial steps from univariate time-series modeling carry over. First, explore the series and determine the necessary transformations, and then apply the augmented Dickey-Fuller test to verify that the stationarity criterion is met for each series and apply further transformations otherwise. It can be estimated with OLS conditional on initial information or with MLE, which is equivalent for normally distributed errors but not otherwise.
If some or all of the k series are unit-root non-stationary, they may be cointegrated (see next section). This extension of the unit root concept to multiple time series means that a linear combination of two or more series is stationary and, hence, mean-reverting.
The notebook vector_autoregressive_model demonstrates how to use statsmodels
to estimate a VAR model for macro fundamentals time series.
statsmodels
Vector Autoregression docsThe concept of an integrated multivariate series is complicated by the fact that all the component series of the process may be individually integrated but the process is not jointly integrated in the sense that one or more linear combinations of the series exist that produce a new stationary series.
In other words, a combination of two co-integrated series has a stable mean to which this linear combination reverts. A multivariate series with this characteristic is said to be co-integrated. This also applies when the individual series are integrated of a higher order and the linear combination reduces the overall order of integration.
We demonstrate two major approaches to testing for cointegration:
Statistical arbitrage refers to strategies that employ some statistical model or method to take advantage of what appears to be relative mispricing of assets while maintaining a level of market neutrality.
Pairs trading is a conceptually straightforward strategy that has been employed by algorithmic traders since at least the mid-eighties (Gatev, Goetzmann, and Rouwenhorst 2006). The goal is to find two assets whose prices have historically moved together, track the spread (the difference between their prices), and, once the spread widens, buy the loser that has dropped below the common trend and short the winner. If the relationship persists, the long and/or the short leg will deliver profits as prices converge and the positions are closed.
This approach extends to a multivariate context by forming baskets from multiple securities and trade one asset against a basket of two baskets against each other.
In practice, the strategy requires two steps:
Several approaches to the formation and trading phases have emerged from increasingly active research in this area across multiple asset classes over the last several years. The next subsection outlines the key differences before we dive into an example application.
A recent comprehensive survey of pairs trading strategies Statistical Arbitrage Pairs Trading Strategies: Review and Outlook, Krauss (2017) identifies four different methodologies plus a number of other more recent approaches, including ML-based forecasts:
The distance approach identifies pairs using the correlation of (normalized) asset prices or their returns and is simple and orders of magnitude less computationally intensive than cointegration tests.
The speed advantage is particularly valuable because the number of potential pairs is the product of the number of candidates to be considered on either side so that evaluating combinations of 100 stocks and 100 ETFs requires comparing 10,000 tests (we’ll discuss the challenge of multiple testing bias below).
On the other hand, distance metrics do not necessarily select the most profitable pairs: correlation is maximized for perfect co-movement that in turn eliminates actual trading opportunities. Empirical studies confirm that the volatility of the price spread of cointegrated pairs is almost twice as high as the volatility of the price spread of distance pairs (Huck and Afawubo 2015).
To balance the tradeoff between computational cost and the quality of the resulting pairs, Krauss (2017) recommends a procedure that combines both approaches based on his literature review:
This process aims to select cointegrated pairs with lower divergence risk while ensuring more volatile spreads that in turn generate higher profit opportunities.
A large number of tests introduce data snooping bias as discussed in Chapter 6, The Machine Learning Workflow: multiple testing is likely to increase the number of false positives that mistakenly reject the null hypothesis of no cointegration. While statistical significance may not be necessary for profitable trading (Chan 2008), a study of commodity pairs (Cummins and Bucca 2012) shows that controlling the familywise error rate to improve the tests’ power according to Romano and Wolf (2010) can lead to better performance.
The securities represent the largest average dollar volume over the sample period in their respective class; highly correlated and stationary assets have been removed. See the notebook create_datasets in the data folder of the GitHub repository for downloading for instructions on how to obtain the data and the notebook cointegration_tests for the relevant code and additional preprocessing and exploratory details.
The notebook statistical_arbitrage_with_cointegrated_pairs implements a statistical arbitrage strategy based on cointegration for the sample of stocks and ETFs and the 2017-2019 period.
It first generates and stores the cointegration tests for all candidate pairs and the resulting trading signals before we backtest a strategy based on these signals given the computational intensity of the process.