How Reliable Are Predictions from Backtest Results?

October 3, 2005

 

by Jack Parker, PhD

One of the greatest appeals and advantages of mechanical trading systems is the ability to evaluate their historical performance by "backtesting" the strategies on historical price data. While we may have just a handful of months of actual performance data available, computers and backadjusted data make it possible to see what a system "would have done" going back years and years.

The problem, of course, is that the system has been designed on this very same data. Whether intentional or not, because systems are designed on past data, they are often the victims of what we call "curve fitting", making the ability to backtest results one of the biggest disadvantages of trading systems also.

The easiest way to understand "curve fitting" is through a simple example. Imagine a system that buys or sells Soybean futures on a breakout above or below the high or low for the past X number of days. When testing the system on the past data, the testing may show $5,000 in profits when using a 10 day high/low, $10,000 in profits when using a 20 day high/low, and $20,000 when using a 30 day high/low.

If you were the developer, which value would you use in designing the system, 10, 20, or 30? I would guess most people would use the 30 value, as it gives the highest profit. Now a developer will look at more than just profit, and test for lowest drawdown or most winning months, for example; but whatever your goal for the system, it is human nature to design a system whose parameters produce results as close as possible to those desired. The problem is, just because one parameter worked on the past data does not mean it will work on the future, unknown data.

What's your Confidence Level?

A sometimes cruel seduction by mechanical systems is the lure of beautifully smooth backtested equity curves that entice naïve traders to risk (and often lose) too much of their hard-earned cash. But are these beautiful, 45 degree up angle, straight line equity curves too good to be true? Before putting money in harms way, it is prudent to ask (and preferably answer) the question "what confidence can I place in the backtest results"?

While my friends at Attain Capital will tell you that factors such as the amount of actual trading data available, the developer's background, and the amount of slippage used in the testing will go a long way in setting a confidence level for the backtested results; for a scientist, this is merely a question of statistical inference that may be addressed by various statistical methods.

To investigate further, let us consider a hypothetical portfolio consisting of a swing trading system and a day trading system for time-frame diversification. We used Mesa Bonds and Notes and R-Mesa SP as an example.

Assuming our objective is to achieve the maximum return possible while holding max drawdown below 15 percent, we can adjust how much equity we should risk per trade to come up with the desired results. We ran several backtests to find the optimum risk per trade allocations to each system, risking a little more than 2% per trade on the day trading system and a little more than 1% on the swing trade system.

Using these allocations, we ran an 8-year backtest with position sizing based on a fixed risk-to-equity ratio and obtained a mean annual return of 35 percent, an annual standard deviation of 19 percent, and a max drawdown over the test period of 15 percent.

Hypothetical Backtest Equity Curve & Performance Stats

This looks very good, but what is the likelihood of achieving this performance going forward? How much confidence can an investor have in this backtesting? This is a very difficult question to answer, and a good way to start is to flip the question around and start with a confidence limit, then see what performance numbers can be achieved at that limit.

Looking at our task that way, we can slightly recast the question by asking what is the "worst case" performance that may be expected in the future with a confidence of 99 percent (that is, with a probability of failure smaller than 1 percent).

To determine the answer to this question, it is important to remember that the backtest performance statistics are merely sample estimates of the true population values that we have inferred from a limited time series (8 years). If we had 100 years of data or more, we would have a lot more confidence in the numbers, but unfortunately we only have 8 years, and therefore must estimate future returns based on that limited time series.

Estimating values involves some uncertainty, by definition, thus we must account for this uncertainty in our backtested results.

Uncertainty in estimated mean returns may be estimated by

where S is the standard deviation of returns, Sm is the standard deviation of the mean return, N is the number of months in the backtest, and df is the degrees of freedom of the data set, equal to 1 minus the number of adjustable parameters in the trading system(s) including any nonrandom markets selected or deselected for trading.

After calculating the uncertainty adjusted standard deviation of returns, we can now compute the average annual return we can expect at different confidence levels.

The average return at a certain probability level (MLCL) may now be computed as

where Mest is the estimated mean return and t is the t-statistic at a given probability level. The formulas for calculating t-statistics are quite involved, so suffice it to say here that for a probability of 99 percent, the t-statistic is approximately 3.

We applied this analysis to the monthly returns for each of the trading systems in our example portfolio and summed them to determine the following portfolio performance. The mean annual return that may be expected in the future with a confidence of 99 percent (that is, with a probability of failure smaller than 1 percent) given the backtest uncertainty, is only 10 percent (compared to 35 percent in the backtest). The annualized standard deviation stayed about the same, but the expected future Max DD jumped from 15 percent up to 25 percent. Monte Carlo analyses based on the LCL statistics would yield larger max drawdown values still.

99% Confidence Limit Equity Curve & Performance Stats

The very nice looking equity curve in figure one degenerates to the much less attractive equity curve in figure two. But this second equity curve and statistics are what to expect in the future at a 99% confidence limit (just a 1% chance of failure). That means most, if not all, cases will end up BETTER than these results - thus the 99% confidence in them.

Our initial question was how much confidence can we have in the normal backtested results put in front of investors. In this example, the results indicate there is a 50% probability that the long term mean annual return will be greater than 35% and the max DD will less than 15% (on an 8-yr basis) and that there is a 99% probability that the mean return will be greater than 10% and the max DD will be less than 25%. While a 50% chance that returns will be less and the drawdown greater than the backtest may sound discouraging, the expected positive return and moderate DD at the 99% probability level should provide solace.

In conclusion, it is important to dig beneath the surface of computed statistics to assess the effects of uncertainty in estimates for actual trading performance prior to putting money on the line. If you can live with performance estimates at a given probability of failure and allocate capital accordingly, you'll have a very good chance of avoiding surprises and sticking with the system during normal ups and downs.

- Jack Parker

IMPORTANT RISK DISCLOSURE
Futures based investments are often complex and can carry the risk of substantial losses. They are intended for sophisticated investors and are not suitable for everyone. The ability to withstand losses and to adhere to a particular trading program in spite of trading losses are material points which can adversely affect investor returns.

Feature   |   Week In Review   |   Chart of the Week   |  

Chart of the Week : Q3 '05 Performance Summary

Feature   |   Week In Review   |   Chart of the Week   |  

Another enormous hurricane coupled with another 25 basis point rate hike and rising energy prices left the US Stock market stagnant in September. Wall Street analysts are on both sides of the fence when it comes to predicting where the SP 500 is headed in the fourth quarter. The Bears point to rising energy prices and inflation as signs the economy is ready to stumble, while Bulls look at impressive earnings and economic reports as signs the economy is shrugging off all the negatives and heading higher.

The constant push and pull left the market in a tight trading range that saw the SP futures gain only +0.05% in September. The technology stocks weren’t much better with NASDAQ futures gaining only +1.07%. Even the small caps were hard pressed for activity with Russell 2000 futures gaining just +0.02% and Midcap 400 futures gaining +0.05%. In comparison the Japanese Stock Market has had no trouble rallying as investors saw the Nikkei 225 futures climb +9.41% last month.

While the current stock index and bond market conditions have made life difficult on day and swing trading systems, the recent hurricanes look to be providing long overdue relief to long term system traders in the form of market trends. A quick scan of September's results shows 5 out of the top 7 performing system were trend following systems.

Nearly every commodity market worldwide has been affected either directly or indirectly by Hurricane Katrina and Hurricane Rita. The most obvious markets that come to mind are the energies, but it has been other markets like Copper, Sugar, and Cotton that have entered very bullish trends. Sugar +6.75%, copper +6.66%, and cotton +6.35% all traded higher last month due in no small part to the hurricanes.

The aforementioned energy markets were busy in September, and while you might not see a change at the pump, Unleaded gas and crude prices did manage to move lower during the month. Crude Oil prices fell -4.84% as Hurricane Rita did not cause as much damage to Gulf Coast oil rigs and refineries as originally predicted. Unleaded Gas futures followed suit and fell -5.54%, but the heating oil and natural gas markets are beginning to look downright scary to consumers. The winter of 2005/2006 could be the most EXPENSIVE winter ever in the US, as Natural Gas futures continue to rise to historically high levels. Last month alone saw market prices climb +15.00%.

Meanwhile, the Federal Reserve is not sitting back and taking the higher energy prices lightly. Convinced that the recent run-up is another sign of inflation the Fed raised the overnight interest rate another 25 basis points this past month and all signs point to another raise in October. This has been good news for the US Dollar, with the US Dollar Index gaining +3.07% in September , while Eurocurrency (-3.07%), the Swiss Franc (-3.41%), the Japanese Yen (-2.90%), and British Pound (-2.15%) all moved lower against the Dollar. The Canadian Dollar (+2.04%) was the only market to gain ground against the US Dollar, and not surprisingly crude oil prices have played a major role in the rally as analysts expect Canada’s oil output to increase significantly in the near future.

*Day Trading**

September is historically a down month for stocks, but unfortunately it wasn’t as easy as just selling the market short every day. Stock index futures continued to trade in a consolidated range for the month, and the end result for most day trading systems was negative returns in the last month of the third quarter.

Ironically, two systems that were profitable for the month are like night and day when it comes to their logic and trading characteristics. Helix ES had a breakout month, making +$1,272.50 per contract trading an astonishing 38 times. On the opposite end of the spectrum (pun intended), were Spectrum SP and Spectrum eRL, which made +$950 and +$540 respectively on just a single trade, bringing their total trades for the year up to five a piece.

The lack of volatility also kept some other systems out of the market more than usual. Daybreaker SP, which usually trades about two to three times a week, traded just four times for the month for a small gain of +$238.75. Impetus eRL took a similar approach and traded just three times for the month for net profits of +$10 per contract after commissions.

The rest of the results were unfortunately not as impressive. Despite an otherwise strong year, RC Success ES struggled in the third quarter and lost -$102.50 in September to close it out. RC Success still has a spot near the top for YTD returns and will look to return to its winning ways in the last quarter of the year. Nautilus ES, a system by Mariner Futures, traded three times for a small loss of -$202.50 per emini contract. The system can be traded in the SP or ES, but signals are generated from the full-size chart. R-Mesa SP has been mired in a drawdown and lost -$833.25 for the month, but the system came roaring back from its drawdown lows with profits of $3,700 in the last week of the month.

Elsewhere, BWT Zones eRL took second place for most trades in a month behind Helix ES, but lost -$1,010 when the dust settled. RC Miracles ES lost -$1,445 for the month, with most of the losses coming from reversals trades that went against the system, while Clipper eRL struggled as well, losing -$2,494.20 for the month. R-Mesa eRL could not repeat its performance from August and gave back those gains and then some. The system lost -$2,789.10 for the month but will look to bounce back in October.

And finally, Compass SP had a few near misses, and ended up losing -$3,698 by month-end, while BWT Zones SP traded about once a day for the month and lost -$5,387.50.

**Swing Trading**

Swing trading has been a very difficult method of trading for most of the year, and and September was no exception for a majority of systems.

Fortunately, not all systems saw red in September. For those investors sticking by Eclipse eRL after a tough August, they were rewarded as the system went on to earn +$1,163.3 in September and is currently holding long with open trade profits of +$730. Investors also experienced yet another positives month of trading in Axiom eMD, which was earning +1,400 in open and closed trade profits as of the months end. Axiom eMD is only $675 off of its equity high and is earning +15.2% for the year based on $15,000 invested.

Contrary to the positive performance above, other index systems were not able to capitalize on the narrow market ranges. Axiom Index ended the month down overall in the remaining 3 markets with the ES losing -$2,090, the eRL down -$475.40, and the NQ only down -$72.42.

As of the end of the month, Axiom ES had exceeded our statistically calculated stop trade point, or “Line in the Sand” and we are now recommending that all customer consider putting the individual market on hold. As a complement to our analysis, the developer has also recommended suspending the market from all portfolios indefinitely.

In other index trading the Tzar portfolio had a difficult month losing -$3,810 in the eMD, -, $2,388 in the eRL, -$2,040 in the ES, and -$1,968 in the NQ. Historically the last 3 months of the year have been impressive months for Tzar…we’ll be watching. Wrapping up the index trading Apollo ES lost -$1,555 and Athena eRL lost -$730 in open and closed trade losses.

Bond market results were mixed as the Jaws Narrowneck portfolio earned $381.25. Mesa Bonds and Notes, on the other hand, continued to hold long the entire month however bonds sold off significantly causing for open trade losses of -$3,609.38 and -$1,671.88 respectively.

Finally, energies continued to be in the lime light with a strong sell off and then subsequent rally. Axiom CL 135 is currently long the Crude Oil earning +$1,270 in open trade profits; however Axiom CL 90 was caught on the wrong side of the market, losing -$6,120 on the month.

**Long Term**

As we mentioned in the intro market, activity in commodities like copper, sugar, cotton, and coffee has picked up significantly as a result of the dual hurricanes. Systems with positions in these markets include SEMA4 Symmetry which is long in crude oil for gains of +$1800.00 per contract, in Gold for gains of +$1220.00 per contract, and in sugar for gains of +$689.20 per contract. Not surprisingly SEMA4 Symmetry was one of the top performing systems in September.

Andromeda was also active in these markets with gains of +$3032.50 per contract in High Grade Copper and +$1910.00 per contract in Gold. While, Axiom LT is long in London Sugar making good money and long in Cotton for a loss of -$750.00 per contract.

Where’s the beef? This old TV commercial brings a smile to many who remember it, and its been a mantra in the office here as the meat markets were on a roll in September. Feeder Cattle futures gained +5.10%, Live Cattle futures were up +6.00%, and lean hog futures climbed +3.72% during September. Prices might be higher at the supermarket but trend following system traders probably won’t mind too much as the trends have been good for their trading accounts. Systems with trades on include Axiom LT which is long in live cattle for gains of +790.00 per contract, and Aberration Plus which is losing -$210.00 per contract after entering long last week.

One glance at the performance table below will tell you that it wasn’t all good news for trend followers in September. Axiom LT struggled mightily in markets like the Dollar Index, US Bonds, and Crude Oil. While Aberration Plus gave back profits in the Eurobund and Palladium. Andromeda wasn’t immune to bad trades either with losses in Soybean Oil, Palladium, and the Eurobund.

Please Login to: http://www.attainaccess.com for the latest updated statistics.

IMPORTANT RISK DISCLOSURE
Futures based investments are often complex and can carry the risk of substantial losses. They are intended for sophisticated investors and are not suitable for everyone. The ability to withstand losses and to adhere to a particular trading program in spite of trading losses are material points which can adversely affect investor returns.

Feature   |   Week In Review   |   Chart of the Week   |