The post is suitable for those who are beginning quantitative trading as well as those who have had some experience with the area. The post discusses the common pitfalls of backtesting, as well as some uncommon ones!
It also looks at the different sorts of backtesting mechanisms as well as the software landscape that implements these approaches. Then we discuss whether it is worth building your own backtester, even with the prevalence of open source tools available today.
Finally, we discuss the ins-and-outs of an event-driven backtesting system, a topic that I’ve covered frequently on QuantStart in prior posts.
A backtest is the application of trading strategy rules to a set of historical pricing data. That is, if we define a set of mechanisms for entry and exit into a portfolio of assets, and apply those rules to historical pricing data of those assets, we can attempt to understand the performance of this “trading strategy” that might have been attained in the past.
It was once said that “All models are wrong, but some are useful”. The same is true of backtests. So what purpose do they serve?
Backtests ultimately help us decide whether it is worth live-trading a set of strategy rules. It provides us with an idea of how a strategy might have performed in the past. Essentially it allows us to filter out bad strategy rules before we allocate any real capital.
It is easy to generate backtests. Unfortunately backtest results are not live trading results. They are instead a model of reality. A model that usually contains many assumptions.
There are two main types of software backtest - the “for-loop” and the “event-driven” systems.
When designing backtesting software there is always a trade-off between accuracy and implementation complexity. The above two backtesting types represent either end of the spectrum for this tradeoff.
There are many pitfalls associated with backtesting. They all concern the fact that a backtest is just a model of reality. Some of the more common pitfalls include:
There are some more subtle issues with backtesting that are not discusssed as often, but are still incredibly important to consider. They include:
Much has been written about the problems with backtesting. Tucker Balch and Ernie Chan both consider the issues at length.
A For-Loop Backtester is the most straightforward type of backtesting system and the variant most often seen in quant blog posts, purely for its simplicity and transparency.
Essentially the For-Loop system iterates over every trading day (or OHLC bar), performs some calculation related to the price(s) of the asset(s), such as a Moving Average of the close, and then goes long or short a particular asset (often on the same closing price, but sometimes the day after). The iteration then continues. All the while the total equity is being tracked and stored to later produce an equity curve.
Here is the pseudo-code for such an algorithm:
for each trading bar: do_something_with_prices(); buy_sell_or_hold_something(); next_bar();PythonCopy
As you can see the design of such a sytem is incredibly simple. This makes it attractive for getting a “first look” at the performance of a particular strategy ruleset.
For-Loop backtesters are straightforward to implement in nearly any programming language and are very fast to execute. The latter advantage means that many parameter combinations can be tested in order to optimise the trading setup.
The main disadvantage with For-Loop backtesters is that they are quite unrealistic. They often have no transaction cost capability unless specifically added. Usually orders are filled immediately “at market” with the midpoint price. As such there is often no accounting for spread.
There is minimal code re-use between the backtesting system and the live-trading system. This means that code often needs to be written twice, introducing the possibility of more bugs.
For-Loop backtesters are prone to Look-Ahead Bias, due to bugs with indexing. For instance, should you have used “i”, “i+1” or “i-1” in your panel indexing?
For-Loop backtesters should really be utilised solely as a filtration mechanism. You can use them to eliminate the obviously bad strategies, but you should remain skeptical of strong performance. Further research is often required. Strategies rarely perform better in live trading than they do in backtests!
Event-Driven Backtesters lie at the other end of the spectrum. They are much more akin to live-trading infrastructure implementations. As such, they are often more realistic in the difference between backtested and live trading performance.
Such systems are run in a large “while” loop that continually looks for “events” of differing types in the “event queue”. Potential events include:
When a particular event is identified it is routed to the appropriate module(s) in the infrastructure, which handles the event and then potentially generates new events which go back to the queue.
The pseudo-code for an Event-Driven backtesting system is as follows:
while event_queue_isnt_empty(): event = get_latest_event_from_queue(); if event.type == "tick": strategy.calculate_trading_signals(event); else if event.type == "signal": portfolio.handle_signal(event); else if event.type == "order": portfolio.handle_order(event); else if event.type == "fill": portfolio.handle_fill(event) sleep(600); # Sleep for, say, 10 minsPythonCopy
As you can see there is a heavy reliance on the portfolio handler module. Such a module is the “heart” of an Event-Driven backtesting system as we will see below.
There are many advantages to using an Event-Driven backtester:
While the advantages are clear, there are also some strong disadvantages to using such a complex system:
In this section we will consider software (both open source and commercial) that exists for both For-Loop and Event-Driven systems.
For For-Loop backtesters, the main programming languages/software that are used include Python (with the Pandas library), R (and the quantmod library) and MatLab. There are plenty of code snippets to be found on quant blogs. A great list of such blogs can be found on Quantocracy.
The market for Event-Driven systems is much larger, as clients/users often want the software to be capable of both backtesting and live trading in one package.
The expensive commercial offerings include Deltix and QuantHouse. They are often found in quant hedge funds, family offices and prop trading firms.
Cloud-based backtesting and live trading systems are relatively new. Quantopian is an example of a mature web-based setup for both backtesting and live trading.
Institutional quants often also build their own in house software. This is due to a mix of regulatory constraints, investor relations/reporting and auditability.
Retail quants have a choice between using the “cloud+data” approach of Quantopian or “rolling their own” using a cloud vendor such as Amazon Web Services, Rackspace Cloud or Microsoft Azure, along with an appropriate data vendor such as DTN IQFeed or QuantQuote.
In terms of open source software, there are many libraries available. They are mostly written in Python (for reasons I will outline below) and include Zipline (Quantopian), PyAlgoTrade, PySystemTrade (Rob Carver/Investment Idiocy) and QSTrader (QuantStart’s own backtester).
One of the most important aspects, however, is that no matter which piece of software you ultimately use, it must be paired with an equally solid source of financial data. Otherwise you will be in a situation of “garbage in, garbage out” and your live trading results will differ substantially from your backtests.
While software takes care of the details for us, it hides us from many implementation details that are often crucial when we wish to expand our trading strategy complexity. At some point it is often necessary to write our own systems and the first question that arises is “Which programming language should I use?”.
Despite having a background as a quantitative software developer I am not personally interested in “language wars”. There are only so many hours in the day and, as quants, we need to get things done - not spend time arguing language design on internet forums!
We should only be interested in what works. Here are some of the main contenders:
Python is an extremely easy to learn programming language and is often the first language individuals come into contact with when they decide to learn programming. It has a standard library of tools that can read in nearly any form of data imaginable and talk to any other “service” very easily.
It has some exceptional quant/data science/machine learning (ML) libraries in NumPy, SciPy, Pandas, Scikit-Learn, Matplotlib, PyMC3 and Statsmodels. While it is great for ML and general data science, it does suffer a bit for more extensive classical statistical methods and time series analysis.
It is great for building both For-Loop and Event-Driven backtesting systems. In fact, it is perhaps one of the only languages that straightforwardly permits end-to-end research, backtesting, deployment, live trading, reporting and monitoring.
Perhaps its greatest drawback is that it is quite slow to execute when compared to other languages such as C++. However, work is being carried out to improve this problem and over time Python is becoming faster.
R is a statistical programming environment, rather than a full-fledged “first class programming language” (although some might argue otherwise!). It was designed primarily for performing advanced statistical analysis for time series, classical/frequentist statistics, Bayesian statistics, machine learning and exploratory data analysis.
It is widely used for For-Loop backtesting, often via the quantmod library, but is not particularly well suited to Event-Driven systems or live trading. It does however excel at strategy research.
C++ has a reputation for being extremely fast. Nearly all scientific high-performance computing is carried out either in Fortran or C++. This is its primary advantage. Hence if you are considering high frequency trading, or work on legacy systems in large organisations, then C++ is likely to be a necessity.
Unfortunately it is painful for carrying out strategy research. Due to being statically-typed it is quite tricky to easily load, read and format data compared to Python or R.
Despite its relative age, it has recently been modernised substantially with the introduction of C++11/C++14 and further standards refinements.
You may also wish to take a look at Java, Scala, C#, Julia and many of the functional languages. However, my recommendation is to stick with Python, R and/or C++, as the quant trading communities are much larger.
It is a great learning experience to write your own Event-Driven backtesting system. Firstly, it forces you to consider all aspects of your trading infrastructure, not just spend hours tinkering on a particular strategy.
Even if you don’t end up using the system for live trading, it will provide you with a huge number of questions that you should be asking of your commercial or FOSS backtesting vendors.
For example: How does your current live system differ from your backtest simulation in terms of:
While Event-Driven systems are not quick or easy to write, the experience will pay huge educational dividends later on in your quant trading career.
How do you go about writing such a system?
The best way to get started is to simply download Zipline, QSTrader, PyAlgoTrade, PySystemTrade etc and try reading through the documentation and code. They are all written in Python (due to the reasons I outlined above) and thankfully Python is very much like reading pseudo-code. That is, it is very easy to follow.
I've also written many articles on Event-Driven backtest design, which you can find here, that guide you through the development of each module of the system. Rob Carver, at Investment Idiocy also lays out his approach to building such systems to trade futures.
Remember that you don’t have to be an expert on day #1. You can take it slowly, day-by-day, module-by-module. If you need help, you can always contact me or other willing quant bloggers. See the end of the article for my contact email.
I'll now discuss the modules that are often found in many Event-Driven backtesting systems. While not an exhaustive list, it should give you a “flavour” of how such systems are designed.
This is where all of the historical pricing data is stored, along with your trading history, once live. A professsional system is not just a few CSV files from Yahoo Finance!
Instead, we use a “first class” database or file system, such as PostgreSQL, MySQL, SQL Server or HDF5.
Ideally, we want to obtain and store tick-level data as it gives us an idea of trading spreads. It also means we can construct our own OHLC bars, at lower frequencies, if desired.
We should always be aware of handling corporate actions (such as stock splits and dividends), survivorship bias (stock de-listing) as well as tracking the timezone differences between various exchanges.
Individual/retail quants can compete here as many production-quality database technologies are mature, free and open source. Data itself is becoming cheaper and “democratised” via sites like Quandl.
There are still plenty of markets and strategies that are too small for the big funds to be interested in. This is a fertile ground for retail quant traders.
The trading strategy module in an Event-Driven system generally runs some kind of predictive or filtration mechanism on new market data.
It receives bar or tick data and then uses these mechanisms to produce a trading signal to long or short an asset. This module is NOT designed to produce a quantity, that is carried out via the position-sizing module.
95% of quant blog discussion usually revolves around trading strategies. I personally believe it should be more like 20%. This is because I think it is far easier to increase expected returns by reducing costs through proper risk management and position sizing, rather than chasing strategies with “more alpha”.
The “heart” of an Event-Driven backtester is the Portfolio & Order Management system. It is the area which requires the most development time and quality assurance testing.
The goal of this system is to go from the current portfolio to the desired portfolio, while minimising risk and reducing transaction costs.
The module ties together the strategy, risk, position sizing and order execution capabilities of the sytem. It also handles the position calculations while backtesting to mimic a brokerage’s own calculations.
The primary advantage of using such a complex system is that it allows a variety of financial instruments to be handled under a single portfolio. This is necessary for insitutional-style portfolios with hedging. Such complexity is very tricky to code in a For-Loop backtesting system.
Separating out the risk management into its own module can be extremely advantageous. The module can modify, add or veto orders that are sent from the portfolio.
In particular, the risk module can add hedges to maintain market neutrality. It can reduce order sizes due to sector exposure or ADV limits. It can completely veto a trade if the spread is too wide, or fees are too large relative to the trade size.
A separate position sizing module can implement volatility estimation and position sizing rules such as Kelly leverage. In fact, utilising a modular approach allows extensive customisation here, without affecting any of the strategy or execution code.
Such topics are not well-represented in the quant blogosphere. However, this is probably the biggest difference between how institutions and some retail traders think about their trading. Perhaps the simplest way to get better returns is to begin implementing risk management and position sizing in this manner.
In real life we are never guaranteed to get a market fill at the midpoint!
We must consider transactional issues such as capacity, spread, fees, slippage, market impact and other algorithmic execution concerns, otherwise our backtesting returns are likely to be vastly overstated.
The modular approach of an Event-Driven system allows us to easily switch-out the BacktestExecutionHandler with the LiveExecutionHandler and deploy to the remote server.
We can also easily add multiple brokerages utilising the OOP concept of “inheritance”. This of course assumes that said brokerages have a straightforward Application Programming Interface (API) and don’t force us to utilise a Graphical User Interface (GUI) to interact with their system.
One issue to be aware of is that of “trust” with third party libraries. There are many such modules that make it easy to talk to brokerages, but it is necessary to perform your own testing. Make sure you are completely happy with these libraries before committing extensive capital, otherwise you could lose a lot of money simply due to bugs in these modules.
Retail quants can and should borrow the sophisticated reporting techniques utilised by institutional quants. Such tools include live “dashboards” of the portfolio and corresponding risks, a “backtest equity” vs “live equity” difference or “delta”, along with all the “usual” metrics such as costs per trade, the returns distribution, high water mark (HWM), maximum drawdown, average trade latency as well as alpha/beta against a benchmark.
Consistent incremental improvements should be made to this infrastructure. This can really enchance returns over the long term, simply by eliminating bugs and improving issues such as trade latency. Don’t simply become fixated on improving the “world’s greatest strategy” (WGS).
The WGS will eventually erode due to “alpha decay”. Others will eventually discover the edge and will arbitrage away the returns. However, a robust trading infrastructure, a solid strategy research pipeline and continual learning are great ways of avoiding this fate.
Infrastructure optimisation may be more “boring” than strategy development but it becomes significantly less boring when your returns are improved!
Deployment to a remote server, along with extensive monitoring of this remote system, is absolutely crucial for institutional grade systems. Retail quants can and should utilise these ideas as well.
A robust system must be remotely deployed in “the cloud” or co-located near an exchange. Home broadband, power supplies and other factors mean that utilising a home desktop/laptop is too unreliable. Often things fail right at the worst time and lead to substantial losses.
The main issues when considering a remote deployment include; monitoring hardware, such as CPU, RAM/swap, disk and network I/O, high-availability and redundancy of systems, a well thought through backup AND restoration plan, extensive logging of all aspects of the system as well as continuous integration, unit testing and version control.
Remember Murphy’s Law - “If it can fail it will fail.”
There are many vendors on offer that provide relatively straightforward cloud deployments, including Amazon Web Services, Microsoft Azure, Google and Rackspace. For software engineering tasks vendors include Github, Bitbucket, Travis, Loggly and Splunk, as well as many others.
Unfortunately there is no “quick fix” in quant trading. It involves a lot of hard work and learning in order to be successful.
Perhaps a major stumbling block for beginners (and some intermediate quants!) is that they concentrate too much on the best “strategy”. Such strategies always eventually succumb to alpha decay and thus become unprofitable. Hence it is necessary to be continually researching new strategies to add to a portfolio. In essence, the “strategy pipeline” should always be full.
It is also worth investing a lot of time in your trading infrastructure. Spend time on issues such as deployment and monitoring. Always try and be reducing transaction costs, as profitability is as much about reducing costs as it is about gaining trading revenue.
I recommend writing your own backtesting system simply to learn. You can either use it and continually improve it or you can find a vendor and then ask them all of the questions that you have discovered when you built your own. It will certainly make you aware of the limitations of commercially available systems.
Finally, always be reading, learning and improving. There are a wealth of textbooks, trade journals, academic journals, quant blogs, forums and magazines which discuss all aspects of trading. For more advanced strategy ideas I recommend SSRN and arXiv - Quantitative Finance.