Out-of-Sample Testing Makes or Breaks Trading Strategies

  • Home
  • /
  • Blog
  • /
  • Out-of-Sample Testing Makes or Breaks Trading Strategies

I've seen thousands of good-looking trading strategies turn into a big, steaming pile of horse manure when traded in the real world.

How can a strategy that seems great on paper turn into a loser almost immediately?

Without a doubt, it's because the designer didn't understand the basics of statistical analysis, and out-of-sample testing that's required.

Let's discuss the importance of out-of-sample testing and then a formula taken from electrical engineering that measures a trading systems' robustness.


A Curve Fit Trading Strategy Will Ruin You

Has this ever happened to you?

You buy or build a trading strategy that looks fantastic in its back-test thinking you've just unlocked your financial freedom...

And then it falls apart once you start trading it with real money?

Over curve fit data example

Over Curve Fit Trading Strategy That Performs Poorly in Real-Time

I would say most, if not all readers of this blog would say, YES!

It happens to us all when we first discover the advantages of testing our trading rules.  So don't take it too hard, but also don't take it too lightly either.

What you see above is a swing trading strategy that has been over-optimized or "curve-fit" to all the available data.

Buyers and builders beware!

So how do we avoid this from happening in the first place?  

What Are the Basics of a Good Trading Strategy?

  • The trading strategy makes lots of money
  • The strategy has small drawdowns when it does lose money
  • The strategy has lots of trade samples
  • Out-of-sample testing looks like in-sample-testing
  • Commission and slippage are factored in (especially important for day-trading strategies)

Trading strategy design is an exercise in minimization.

You have to have certain minimums thresholds or you completely toss out the strategy!

The following are the minimums quantifiable limits I use when deciding if a strategy is robust enough to be worth trading.

Remember we want our trading strategies to be robust so they can survive the hardest trading environment on the planet, i.e., real-time!

Minimum Requirements for a Robust Strategy:

  • Minimum of 100 trades (prefer multiple hundreds)
  • At least 10 years of data (use all the data you can find), no cherry-picking!
  • A statistical significance factor ( Profit Factor * sqr ( number of trades ) ) >= 30
  • 20% or more Out-of-Sample data used
  • Out-of-Sample Profit Factor divided by In-Sample Profit Factor > 70
  • Net Profit divided by Max Drawdown > 10
  • Overall Profit Factor >= 2

Throw any strategy away if it fails to meet even one of these requirements.

But, if a strategy does pass all these criteria, it's time to evaluate it based on a formula taken from signal processing, determining how well the strategy scores based on all the above inputs.

A higher score the better.

How To Evaluate Any Trading Strategy

Score = Profit Factor * sqr ( Number of Trades )  * ( Net Profit / Max Drawdown ) * ( Out-of-Sample Profit Factor / In-Sample Profit Factor )

The first terms in the formula measure the statistical significance of the strategy:

Profit Factor * sqr ( Number of Trades )

The profit factor is basically the signal-to-noise ratio, similar to the ratio used in communications engineering.

If you have a lot of signals, but a small number of samples, you might have some significance.

You can also have significance with a low amount of signal and a large number of samples.

If the output of these terms is above 30 then I consider the strategy to have sufficient statistical significance to trade with real money.

If this value is less than 30, you are probably seeing random noise and are being fooled that something is really there.

We Don't Want Large Drawdowns

We don't want to lose $80,000 before going on to make $125,000, right?

While it might sound ok in the end, sitting through that sort of drawdown would most likely make you stop trading the strategy altogether...and it would probably give you a heart attack.

We don't want that.  

We don't want to lose lots of money, even if that means we are going to make it all back and then way more.

Thus, we want to divide our net profit by the worst drawdown:

( Net Profit / Max Drawdown )

Making this number a ratio gets rid of raw profit. The higher this ratio is the better.

Matching Out-of-Sample and In-Sample

Next comes the ratio of the Out-of-Sample and In-Sample Profit Factors.

We want a trading strategy that has similar (if not better) profit factors on the data it has "seen" compared to on data it has not "seen".

( Out Of Sample Profit Factor / In Sample Profit Factor )

It's a beautiful thing when this ratio is greater than one because then the Out-of-Sample Profit Factor is greater than the In-Sample profit factor.

Let's look at a few examples of this equation at work.

First, we will look at a swing trading strategy for the S&P 500 using the ETF SPY that I wrote years ago.

SPY Swing Trading Strategy With Out-of-Sample Testing:

SPY Swing Trading System With Out-of-Sample Testing

This SPY Swing Trading Strategy Shows the Power of Out-of-Sample Testing

All the data before the purple line is the data I used to create this strategy.

Between the purple and blue lines is the data I ran the strategy over that it had not seen yet, the Out-of-Sample data - the green equity curve keeps going up as you can see.

Then the masterstroke, everything after the blue line is real-time trading.

I can't overstate this enough: all three sections look identical to each other; this is exactly what you want to see in your trading.

Now, let's plug and chug the values into our ranking equation above:

Profit Factor * ( ( Number Of Trades ) ^ ( 0.5 ) ) * ( Net Profit / Max Drawdown ) * ( Out of Sample Profit Factor / In Sample Profit Factor ) =

2.19 * ( ( 550 ) ^ (0.5) ) * ( 538,000 / 30,000 ) * ( 2.30 / 2.20 ) = 953

Next, let's look at this Gold trend following strategy:

Gold Trend Following Strategy

Gold Trend Following Trading System With Out-of-Sample Testing

This Gold Trend Following Trading Strategy Shows the Power of Out-of-Sample Testing

You can see a huge difference between trading the S&P 500 and gold immediately.

This is due to the fundamental way each market works internally.

The S&P 500 is a mean-reverting market and Gold is a trending market.

You must use the correct trading method with the right market; a mean-reverting trading strategy does not work with a trend-following market.

You'll also note that the green equity curve for trading gold looks choppy, not as nice and clean as the S&P 500 strategy.

This again is due to the trending nature of gold; you have to put up with a lot of little losses while waiting to catch the massive trends higher.

Let's run the numbers and see how these two strategies stack up to each other.

Profit Factor * ( ( Number Of Trades ) ^ ( 0.5 ) ) * ( Net Profit / Max Drawdown ) * ( Out Of Sample Profit Factor / In Sample Profit Factor ) =

2.85 * ( (125) ^ (0.5) ) * ( 681,000 / 35,000 ) * ( 2.8 / 3 ) = 576

Different Out-of-Sample Scores Are OK

There is a pretty large scoring difference between these two algorithmic trading strategies.

Nonetheless, any strategy that scores above 400 is worth trading.

But why trade gold in the first place?

Trading different, non-correlated asset classes (Stocks, Gold, Oil, etc.) smooths out your portfolio growth over time.

When one strategy is zigging, the other is zagging.

The only real Holy Grail in trading is the use of multiple strategies trading different asset classes.

And now we have a scientific way to measure strategies against each other.

Use the above equation on your own strategies and see how they stack up against each other.

You can use the equation on any trading strategy over any time period and on any time frame (like day-trading strategies).

Why Out-of-Sample Testing Is Needed

  • Good trading strategies make lots of money in the real world
  • Have large Net-Profit to drawdowns ratios
  • They have lots of trades!
  • Their Out-of-Sample results look like their In-Sample results
  • They have commission and slippage factored in

About the Author

Hello! I'm Kurt, the "Relaxed Trader" writing the stuff on this website. Shoot me an email at kurt@relaxedtrader.com or leave a comment below. Cheers!

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}