A comprehensive introduction to how ML can add value to the design and execution of algorithmic trading strategies
View the Project on GitHub stefan-jansen/machine-learning-for-trading
Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to a benchmark such as an index. The portion of an asset’s return that is not explained by exposure to this benchmark is called alpha, and hence the signals that aim to produce such uncorrelated returns are also called alpha factors.
If you are already familiar with ML, you may know that feature engineering is a key ingredient for successful predictions. This is no different in trading. Investment, however, is particularly rich in decades of research into how markets work and which features may work better than others to explain or predict price movements as a result. This chapter provides an overview as a starting point for your own search for alpha factors.
This chapter also presents key tools that facilitate the computing and testing alpha factors. We will highlight how the NumPy, pandas and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data.
We also preview how you can use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We discuss key alpha factor metrics like the information coefficient and factor turnover. An in-depth introduction to backtesting trading strategies that use machine learning follows in Chapter 6, which covers the ML4T workflow that we will use throughout the book to evaluate trading strategies.
Please see the Appendix - Alpha Factor Library for additional material on this topic, including numerous code examples that compute a broad range of alpha factors.
Zipline
Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.
There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitraged away.
In an idealized world, categories of risk factors should be independent of each other (orthogonal), yield positive risk premia, and form a complete set that spans all dimensions of risk and explains the systematic risks for assets in a given class. In practice, these requirements will hold only approximately.
Based on a conceptual understanding of key factor categories, their rationale and popular metrics, a key task is to identify new factors that may better capture the risks embodied by the return drivers laid out previously, or to find new ones. In either case, it will be important to compare the performance of innovative factors to that of known factors to identify incremental signal gains.
The notebook feature_engineering.ipynb in the data directory illustrates how to engineer basic factors.
The notebook how_to_use_talib illustrates the usage of TA-Lib, which includes a broad range of common technical indicators. These indicators have in common that they only use market data, i.e., price and volume information.
The notebook common_alpha_factors in th appendix contains dozens of additional examples.
The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter using the PyKalman
package for smoothing; we will also use it in Chapter 9 when we develop a pairs trading strategy.
The notebook kalman_filter_and_wavelets also demonstrates how to work with wavelets using the PyWavelets
package.
Zipline
The open source zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm’s reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.
installation
folder, including to address know issues.The notebook single_factor_zipline develops and test a simple mean-reversion factor that measures how much recent performance has deviated from the historical average. Short-term reversal is a common strategy that takes advantage of the weakly predictive pattern that stock price increases are likely to mean-revert back down over horizons from less than a minute to one month.
The Quantopian research environment is tailored to the rapid testing of predictive alpha factors. The process is very similar because it builds on zipline
, but offers much richer access to data sources.
The notebook multiple_factors_quantopian_research illustrates how to compute alpha factors not only from market data as previously but also from fundamental and alternative data.
The notebook performance_eval_alphalens introduces the alphalens library for the performance analysis of predictive (alpha) factors, open-sourced by Quantopian. It demonstrates how it integrates with the backtesting library zipline
and the portfolio performance and risk analysis library pyfolio
that we will explore in the next chapter.
alphalens
facilitates the analysis of the predictive power of alpha factors concerning the:
The analysis can be conducted using tearsheets
or individual computations and plots. The tearsheets are illustrated in the online repo to save some space.
alphalens
tutorial by Quantopian