Note: For updated version, please refer to this slide.

Introduction

Pairs trading is a market neutral arbitrage strategy first pioneered by Gerry Bamberger and later led by Nunzio Tartaglia's quantitative group at Morgan Stanley in the 1980s.

The assets in each pair should be cointegrated, which means the spreads, or differences, between their prices is mean reverting. By choosing such pair, the strategy buy the first asset and sell the second asset when the spread goes down, and sell the first asset and buy the second asset when the spread does up. To detect cointegration between assets, we can use cointegration test such as ADF test (Augmented Dickey-Fuller test).

However, past cointegration does not garantee future cointegration. To choose a pair with robust cointegration property, we can use data-driven approach to detect intrinsically similar assets.

In this project, we propose an unsupervised learning based approach for pairs selection in cryptocurrency perpetual futures market. We first use dimension reduction and clustering algorithm to bundle assets in to each group. Then, we use ADF test to select top cointegrated pairs from the same group. The result shows that our strategy is superior to pure cointegration testing strategy in terms of PnL (Profit and Loss) and Sharpe ratio .

The source code[1] of this project is available.

Dataset

We choose Binance perpetual futures market[4] as our universe of assets. To simplify the training process, we further narrow downs total assets to 12 cryptocurrencies. The data is collected via Binance API and the period is from 2021/06/01 to 2022/12/01 with 1 hour resolution.

Methodology

The overall clustering and pairs selection pipeline of our strategy is as follows:

Calculate returns = (current price - last price)/last price with varying windows as the features of the price series.
Reduce the dimension of the features to 12 principle components via Principle Component Analysis.
Apply the agglomerative clustering algorithm to the 12x12 (12 assets with 12 principle components) matrix.
Obtain 6 clusters from the previos step
For each cluster, apply ADF cointegration test to every combinations of pairs.
Choose the top 3 pairs with lowest p-score by ADF cointegration test.

The above steps are run per month to update the selected 3 pairs. Both PCA and agglomerative are implemented with scikit-learn[2] library.

For trade execution, as first apply Kalman filter to spread of each pair. When the filtered spread exceed a given threhold, we long the spread, and vice versa. We choose mean ± 2 * standard deviation as thresholds.