Machine Learning for Trading

Logo

A comprehensive introduction to how ML can add value to the design and execution of algorithmic trading strategies

View the Project on GitHub stefan-jansen/machine-learning-for-trading

Appendix - Alpha Factor Library

Throughout this book, we emphasized how the smart design of features, including appropriate preprocessing and denoising, typically leads to an effective strategy. This appendix synthesizes some of the lessons learned on feature engineering and provides additional information on this vital topic.

Chapter 4 categorized factors by the underlying risk they represent and for which an investor would earn a reward above and beyond the market return. These categories include value vs growth, quality, and sentiment, as well as volatility, momentum, and liquidity. Throughout the book, we used numerous metrics to capture these risk factors. This appendix expands on those examples and collects popular indicators so you can use it as a reference or inspiration for your own strategy development. It also shows you how to compute them and includes some steps to evaluate these indicators.

To this end, we focus on the broad range of indicators implemented by TA-Lib (see Chapter 4) and WorldQuant’s 101 Formulaic Alphas paper (Kakushadze 2016), which presents real-life quantitative trading factors used in production with an average holding period of 0.6-6.4 days.

This chapter covers:

Content

  1. The Indicator Zoo
  2. Code example: common alpha factors implemented in TA-Lib
  3. Code example: WorldQuant’s quest for formulaic alphas
  4. Code example: Bivariate and multivariate factor evaluation

The Indicator Zoo

Chapter 4, Financial Feature Engineering: How to Research Alpha Factors, summarized the long-standing efforts of academics and practitioners to identify information or variables that helps reliably predict asset returns. This research led from the single-factor capital asset pricing model to a “zoo of new factors” (Cochrane 2011).

This factor zoo contains hundreds of firm characteristics and security price metrics presented as statistically significant predictors of equity returns in the anomalies literature since 1970 (see a summary in Green, Hand, and Zhang, 2017).

Code example: common alpha factors implemented in TA-Lib

The TA-Lib library is widely used to perform technical analysis of financial market data by trading software developers. It includes over 150 popular indicators from multiple categories that range from Overlap Studies, including moving averages and Bollinger Bands, to Statistic Functions such as linear regression.

Function Group # Indicators
Overlap Studies 17
Momentum Indicators 30
Volume Indicators 3
Volatility Indicators 3
Price Transform 4
Cycle Indicators 5
Math Operators 11
Math Transform 15
Statistic Functions 9

The notebook common_alpha_factors contains the relevant code samples.

Code example: WorldQuant’s quest for formulaic alphas

We introduced WorldQuant in Chapter 1, Machine Learning for Trading: From Idea to Execution, as part of a trend towards crowd-sourcing investment strategies. WorldQuant maintains a virtual research center where quants worldwide compete to identify alphas. These alphas are trading signals in the form of computational expressions that help predict price movements just like the common factors described in the previous section.

These formulaic alphas translate the mechanism to extract the signal from data into code and can be developed and tested individually with the goal to integrate their information into a broader automated strategy (Tulchinsky 2019. As stated repeatedly throughout the book, mining for signals in large datasets is prone to multiple testing bias and false discoveries. Regardless of these important caveats, this approach represents a modern alternative to the more conventional features presented in the previous section.

[Kakushadze (2016) presents 101 examples of such alphas, 80 percent of which were used in a real-world trading system at the time. It defines a range of functions that operate on cross-sectional or time-series data and can be combined, e.g. in nested form.

The notebook 101_formulaic_alphas contains the relevant code.

Code example: Bivariate and multivariate factor evaluation

To evaluate the numerous factors, we rely on the various performance measures introduced in this book, including the following:

The notebooks factor_evaluation and alphalens_analysis contain the relevant code examples.