This article continues the series on quantitative trading, which started with the Beginner’s Guide and Strategy Identification. Both of these longer, more involved articles have been very popular so I’ll continue in this vein and provide detail on the topic of strategy backtesting.
Algorithmic backtesting requires knowledge of many areas, including psychology, mathematics, statistics, software development and market/exchange microstructure. I couldn’t hope to cover all of those topics in one article, so I’m going to split them into two or three smaller pieces. What will we discuss in this section? I’ll begin by defining backtesting and then I will describe the basics of how it is carried out. Then I will elucidate upon the biases we touched upon in the Beginner’s Guide to Quantitative Trading. Next I will present a comparison of the various available backtesting software options.
In subsequent articles we will look at the details of strategy implementations that are often barely mentioned or ignored. We will also consider how to make the backtesting process more realistic by including the idiosyncrasies of a trading exchange. Then we will discuss transaction costs and how to correctly model them in a backtest setting. We will end with a discussion on the performance of our backtests and finally provide an example of a common quant strategy, known as a mean-reverting pairs trade.
Let’s begin by discussing what backtesting is and why we should carry it out in our algorithmic trading.
Algorithmic trading stands apart from other types of investment classes because we can more reliably provide expectations about future performance from past performance, as a consequence of abundant data availability. The process by which this is carried out is known as backtesting.
In simple terms, backtesting is carried out by exposing your particular strategy algorithm to a stream of historical financial data, which leads to a set of trading signals. Each trade (which we will mean here to be a ‘round-trip’ of two signals) will have an associated profit or loss. The accumulation of this profit/loss over the duration of your strategy backtest will lead to the total profit and loss (also known as the ‘P&L’ or ‘PnL’). That is the essence of the idea, although of course the “devil is always in the details”!
What are key reasons for backtesting an algorithmic strategy?
Backtesting provides a host of advantages for algorithmic trading. However, it is not always possible to straightforwardly backtest a strategy. In general, as the frequency of the strategy increases, it becomes harder to correctly model the microstructure effects of the market and exchanges. This leads to less reliable backtests and thus a trickier evaluation of a chosen strategy. This is a particular problem where the execution system is the key to the strategy performance, as with ultra-high frequency algorithms.
Unfortunately, backtesting is fraught with biases of all types. We have touched upon some of these issues in previous articles, but we will now discuss them in depth.
There are many biases that can affect the performance of a backtested strategy. Unfortunately, these biases have a tendency to inflate the performance rather than detract from it. Thus you should always consider a backtest to be an idealised upper bound on the actual performance of the strategy. It is almost impossible to eliminate biases from algorithmic trading so it is our job to minimise them as best we can in order to make informed decisions about our algorithmic strategies.
There are four major biases that I wish to discuss: Optimisation Bias, Look-Ahead Bias, Survivorship Bias and Psychological Tolerance Bias.
This is probably the most insidious of all backtest biases. It involves adjusting or introducing additional trading parameters until the strategy performance on the backtest data set is very attractive. However, once live the performance of the strategy can be markedly different. Another name for this bias is “curve fitting” or “data-snooping bias”.
Optimisation bias is hard to eliminate as algorithmic strategies often involve many parameters. “Parameters” in this instance might be the entry/exit criteria, look-back periods, averaging periods (i.e the moving average smoothing parameter) or volatility measurement frequency. Optimisation bias can be minimised by keeping the number of parameters to a minimum and increasing the quantity of data points in the training set. In fact, one must also be careful of the latter as older training points can be subject to a prior regime (such as a regulatory environment) and thus may not be relevant to your current strategy.
One method to help mitigate this bias is to perform a sensitivity analysis. This means varying the parameters incrementally and plotting a “surface” of performance. Sound, fundamental reasoning for parameter choices should, with all other factors considered, lead to a smoother parameter surface. If you have a very jumpy performance surface, it often means that a parameter is not reflecting a phenomena and is an artefact of the test data. There is a vast literature on multi-dimensional optimisation algorithms and it is a highly active area of research. I won’t dwell on it here, but keep it in the back of your mind when you find a strategy with a fantastic backtest!
Look-ahead bias is introduced into a backtesting system when future data is accidentally included at a point in the simulation where that data would not have actually been available. If we are running the backtest chronologically and we reach time point N, then look-ahead bias occurs if data is included for any point N+k, where k>0. Look-ahead bias errors can be incredibly subtle. Here are three examples of how look-ahead bias can be introduced:
As with optimisation bias, one must be extremely careful to avoid its introduction. It is often the main reason why trading strategies underperform their backtests significantly in “live trading”.
Survivorship bias is a particularly dangerous phenomenon and can lead to significantly inflated performance for certain strategy types. It occurs when strategies are tested on datasets that do not include the full universe of prior assets that may have been chosen at a particular point in time, but only consider those that have “survived” to the current time.
As an example, consider testing a strategy on a random selection of equities before and after the 2001 market crash. Some technology stocks went bankrupt, while others managed to stay afloat and even prospered. If we had restricted this strategy only to stocks which made it through the market drawdown period, we would be introducing a survivorship bias because they have already demonstrated their success to us. In fact, this is just another specific case of look-ahead bias, as future information is being incorporated into past analysis.
There are two main ways to mitigate survivorship bias in your strategy backtests:
We will now consider certain psychological phenomena that can influence your trading performance.
This particular phenomena is not often discussed in the context of quantitative trading. However, it is discussed extensively in regard to more discretionary trading methods. It has various names, but I’ve decided to call it “psychological tolerance bias” because it captures the essence of the problem. When creating backtests over a period of 5 years or more, it is easy to look at an upwardly trending equity curve, calculate the compounded annual return, Sharpe ratio and even drawdown characteristics and be satisfied with the results. As an example, the strategy might possess a maximum relative drawdown of 25% and a maximum drawdown duration of 4 months. This would not be atypical for a momentum strategy. It is straightforward to convince oneself that it is easy to tolerate such periods of losses because the overall picture is rosy. However, in practice, it is far harder!
If historical drawdowns of 25% or more occur in the backtests, then in all likelihood you will see periods of similar drawdown in live trading. These periods of drawdown are psychologically difficult to endure. I have observed first hand what an extended drawdown can be like, in an institutional setting, and it is not pleasant - even if the backtests suggest such periods will occur. The reason I have termed it a “bias” is that often a strategy which would otherwise be successful is stopped from trading during times of extended drawdown and thus will lead to significant underperformance compared to a backtest. Thus, even though the strategy is algorithmic in nature, psychological factors can still have a heavy influence on profitability. The takeaway is to ensure that if you see drawdowns of a certain percentage and duration in the backtests, then you should expect them to occur in live trading environments, and will need to persevere in order to reach profitability once more.
The software landscape for strategy backtesting is vast. Solutions range from fully-integrated institutional grade sophisticated software through to programming languages such as C++, Python and R where nearly everything must be written from scratch (or suitable ‘plugins’ obtained). As quant traders we are interested in the balance of being able to “own” our trading technology stack versus the speed and reliability of our development methodology. Here are the key considerations for software choice:
Now that we have listed the criteria with which we need to choose our software infrastructure, I want to run through some of the more popular packages and how they compare:
Note: I am only going to include software that is available to most retail practitioners and software developers, as this is the readership of the site. While other software is available such as the more institutional grade tools, I feel these are too expensive to be effectively used in a retail setting and I personally have no experience with them.
Backtesting Software Comparison
Description: WYSIWYG (what-you-see-is-what-you-get) spreadsheet software. Extremely widespread in the financial industry. Data and algorithm are tightly coupled.
Execution: Yes, Excel can be tied into most brokerages.
Customisation: VBA macros allow more advanced functionality at the expense of hiding implementation.
Strategy Complexity: More advanced statistical tools are harder to implement as are strategies with many hundreds of assets.
Bias Minimisation: Look-ahead bias is easy to detect via cell-highlighting functionality (assuming no VBA).
Development Speed: Quick to implement basic strategies.
Execution Speed: Slow execution speed - suitable only for lower-frequency strategies.
Cost: Cheap or free (depending upon license).
Description: Programming environment originally designed for computational mathematics, physics and engineering. Very well suited to vectorised operations and those involving numerical linear algebra. Provides a wide array of plugins for quant trading. In widespread use in quantitative hedge funds.
Execution: No native execution capability, MATLAB requires a separate execution system.
Customisation: Huge array of community plugins for nearly all areas of computational mathematics.
Strategy Complexity: Many advanced statistical methods already available and well-tested.
Bias Minimisation: Harder to detect look-ahead bias, requires extensive testing.
Development Speed: Short scripts can create sophisticated backtests easily.
Execution Speed: Assuming a vectorised/parallelised algorithm, MATLAB is highly optimised. Poor for traditional iterated loops.
Cost: ~1,000 USD for a license.
Alternatives: Octave, SciLab
Description: High-level language designed for speed of development. Wide array of libraries for nearly any programmatic task imaginable. Gaining wider acceptance in hedge fund and investment bank community. Not quite as fast as C/C++ for execution speed.
Execution: Python plugins exist for larger brokers, such as Interactive Brokers. Hence backtest and execution system can all be part of the same “tech stack”.
Customisation: Python has a very healthy development community and is a mature language. NumPy/SciPy provide fast scientific computing and statistical analysis tools relevant for quant trading.
Strategy Complexity: Many plugins exist for the main algorithms, but not quite as big a quant community as exists for MATLAB.
Bias Minimisation: Same bias minimisation problems exist as for any high level language. Need to be extremely careful about testing.
Development Speed: Pythons main advantage is development speed, with robust in built in testing capabilities.
Execution Speed: Not quite as fast as C++, but scientific computing components are optimised and Python can talk to native C code with certain plugins.
Cost: Free/Open Source
Alternatives: Ruby, Erlang, Haskell
Description: Environment designed for advanced statistical methods and time series analysis. Wide array of specific statistical, econometric and native graphing toolsets. Large developer community.
Execution: R possesses plugins to some brokers, in particular Interactive Brokers. Thus an end-to-end system can written entirely in R.
Customisation: R can be customised with any package, but its strengths lie in statistical/econometric domains.
Strategy Complexity: Mostly useful if performing econometric, statistical or machine-learning strategies due to available plugins.
Bias Minimisation: Similar level of bias possibility for any high-level language such as Python or C++. Thus testing must be carried out.
Development Speed: R is rapid for writing strategies based on statistical methods.
Execution Speed: R is slower than C++, but remains relatively optimised for vectorised operations (as with MATLAB).
Cost: Free/Open Source
Alternatives: SPSS, Stata
Description: Mature, high-level language designed for speed of execution. Wide array of quantitative finance and numerical libraries. Harder to debug and often takes longer to implement than Python or MATLAB. Extremely prevalent in both the buy- and sell-side.
Execution: Most brokerage APIs are written in C++ and Java. Thus many plugins exist.
Customisation: C/C++ allows direct access to underlying memory, hence ultra-high frequency strategies can be implemented.
Strategy Complexity: C++ STL provides wide array of optimised algorithms. Nearly any specialised mathematical algorithm possesses a free, open-source C/C++ implementation on the web.
Bias Minimisation: Look-ahead bias can be tricky to eliminate, but no harder than other high-level language. Good debugging tools, but one must be careful when dealing with underlying memory.
Development Speed: C++ is quite verbose compared to Python or MATLAB for the same algorithmm. More lines-of-code (LOC) often leads to greater likelihood of bugs.
Execution Speed: C/C++ has extremely fast execution speed and can be well optimised for specific computational architectures. This is the main reason to utilise it.
Cost: Various compilers: Linux/GCC is free, MS Visual Studio has differing licenses.
Alternatives: C#, Java, Scala
Different strategies will require different software packages. HFT and UHFT strategies will be written in C/C++ (these days they are often carried out on GPUs and FPGAs), whereas low-frequency directional equity strategies are easy to implement in TradeStation, due to the “all in one” nature of the software/brokerage.
My personal preference is for Python as it provides the right degree of customisation, speed of development, testing capability and execution speed for my needs and strategies. If I need anything faster, I can “drop in” to C++ directly from my Python programs. One method favoured by many quant traders is to prototype their strategies in Python and then convert the slower execution sections to C++ in an iterative manner. Eventually the entire algo is written in C++ and can be “left alone to trade”!
In the next few articles on backtesting we will take a look at some particular issues surrounding the implementation of an algorithmic trading backtesting system, as well as how to incorporate the effects of trading exchanges. We will discuss strategy performance measurement and finally conclude with an example strategy.