Machine Learning for Trading

Logo

A comprehensive introduction to how ML can add value to the design and execution of algorithmic trading strategies

View the Project on GitHub stefan-jansen/machine-learning-for-trading

Alternative Data for Trading

This chapter explains how individuals, business processes, and sensors produce alternative data. It also provides a framework to navigate and evaluate the proliferating supply of alternative data for investment purposes.

It demonstrates the workflow, from acquisition to preprocessing and storage using Python for data obtained through web scraping to set the stage for the application of ML. It concludes by providing examples of sources, providers, and applications.

Content

  1. The Alternative Data Revolution
  2. Sources of alternative data
  3. Criteria for evaluating alternative datasets
  4. The Market for Alternative Data
  5. Working with Alternative Data

The Alternative Data Revolution

For algorithmic trading, new data sources offer an informational advantage if they provide access to information unavailable from traditional sources, or provide access sooner. Following global trends, the investment industry is rapidly expanding beyond market and fundamental data to alternative sources to reap alpha through an informational edge. Annual spending on data, technological capabilities, and related talent are expected to increase from the current $3 billion by 12.8% annually through 2020.

Today, investors can access macro or company-specific data in real-time that historically has been available only at a much lower frequency. Use cases for new data sources include the following:

Resources

Sources of alternative data

Alternative datasets are generated by many sources but can be classified at a high level as predominantly produced by:

The nature of alternative data continues to evolve rapidly as new data sources become available and sources previously labeled “alternative” become part of the mainstream. The Baltic Dry Index (BDI), for instance, assembles data from several hundred shipping companies to approximate the supply/demand of dry bulk carriers and is now available on the Bloomberg Terminal.

Alternative data sources differ in crucial respects that determine their value or signal content for algorithmic trading strategies.

Criteria for evaluating alternative datasets

The ultimate objective of alternative data is to provide an informational advantage in the competitive search for trading signals that produce alpha, namely positive, uncorrelated investment returns. In practice, the signals extracted from alternative datasets can be used on a standalone basis or combined with other signals as part of a quantitative strategy.

Resources

The Market for Alternative Data

The investment industry is going to spend an estimated $2bn-3bn on data services in 2018, and this number is expected to grow at double digits per year in line with other industries. This expenditure includes the acquisition of alternative data, investments in related technology, and the hiring of qualified talent.

Working with Alternative Data

This section illustrates the acquisition of alternative data using web scraping, targeting first OpenTable restaurant data, and then move to earnings call transcripts hosted by Seeking Alpha.

Code Example: Open Table Web Scraping

Note: different from all other examples, the code that uses Selenium is written to run on a host rather than using the Docker image because it relies on a browser. The code has been tested on Ubuntu and Mac only.

This subfolder 01_opentable contains the script opentable_selenium to scrape OpenTable data using Scrapy and Selenium.

Code Example: SeekingAlpha Earnings Transcripts

Update: unfortunately, seekingalpha has updated their website to use captcha so automatic downloads are no longer possible in the way described here.

Note: different from all other examples, the code is written to run on a host rather than using the Docker image because it relies on a browser. The code has been tested on Ubuntu and Mac only.

The subfolder 02_earnings_calls contains the script sa_selenium to scrape earnings call transcripts from the SeekingAlpha website.

Python Libraries & Documentation