I've seen hundreds of good looking trading strategies turn into a big, steaming pile of horse manure when traded in the real world all because they weren't out of sample tested.
How can a strategy that seems great on paper turn into a loser almost immediately?
Without a doubt it's because the designer didn't understand the basics of statistical analysis, and out-of-sample testing that's required.
But you can determine if a strategy is junk or pure gold using a formula taken from real engineering.
Let me show it to you so you don't waste anymore time and money.
A Curve Fit Trading Strategy Will Ruin You
Has this ever happened to you?
You buy or build a trading strategy that looks fantastic in its back-test thinking you've just unlocked your financial freedom...
And then it falls apart once you start trading it with real money?
I would say most, if not all readers of this blog would say, YES!
It happens to us all when we first discover the advantages of testing our trading rules. So don't take it too hard, but also don't take it too lightly either.
What you see above is a swing trading strategy that has been over optimized or "curve-fit" to all the available data.
Buyer and builders beware!
So how do we avoid this from happening in the first place?
What Are the Basics of a Good Strategy?
- The trading strategy makes lots of money
- Strategy has small drawdowns when it does lose money
- The strategy has lots of trade samples
- Out-of-sample testing looks like in-sample-testing
- Commission and slippage are factored in (especially important for day-trading strategys)
Trading strategy design is an exercise in minimization.
You have to have certain minimums thresholds or you completely toss out the strategy!
The following are the minimums quantifiable limits I use when deciding if a strategy is robust enough to be worth trading.
Remember we want our trading strategies to be robust so they can survive the hardest trading environment on the planet, i.e., real-time!
Minimum Requirements for a Robust Strategy:
- Minimum of 100 trades (prefer multiple hundreds)
- At least 10 years of data (use all the data you can find), no cherry picking!
- A statistical significance factor ( Profit Factor * sqr ( number of trades ) ) >= 30
- 20% or more Out-of-Sample data used
- Out-of-Sample Profit Factor divided by In-Sample Profit Factor > 70
- Net Profit divided by Max Drawdown > 10
- Over all Profit Factor >= 2
Throw any strategy in the trash that fails to meet even just one of these requirements.
But, if a strategy does pass all these criteria, its time to evaluate it based on a formula taken from signal processing, determining how well the strategy scores based on all the above inputs.
A higher score the better.
A Formula to Evaluate Any Trading Strategy
Score = Profit Factor * sqr ( Number of Trades ) * ( Net Profit / Max Drawdown ) * ( Out-of-Sample Profit Factor / In-Sample Profit Factor )
The first terms in the formula measure the statistical significance of the strategy:
Profit Factor * sqr ( Number of Trades )
Profit factor is basically the signal to noise ratio, similar to the ratio used in communications engineering.
If you have a lot of signal, but a small number of samples, you might have some significance.
You can also have significance with a low amount of signal and a large number of samples.
If the output of these terms is above 30 then I consider the strategy to have sufficient statistical significance to trade with real money.
If this value is less than 30, you are probably seeing random noise and are being fooled that something is really there.
We Don't Want Large Drawdowns in Our Accounts
We don't want to lose $80,000 before going on to make $125,000, right?
While it might sound ok in the end, sitting through that sort of drawdown would most likely make you stop trading the strategy altogether...and it would probably give you a heart-attack.
We don't want that.
We don't want to lose lots of money, even if that means we are going to make it all back and then way more.
Thus, we want to divide our net profit by the worst drawdown:
( Net Profit / Max Drawdown )
Making this number a ratio gets rid of raw profit. The higher this ratio is the better.
Matching Out of Sample and In Sample Parts Are Good
Next comes the ratio of the Out-of-Sample and In-Sample Profit Factors.
We want a trading strategy that has similar (if not better) profit factors on the data it has "seen" compared to on data it has not "seen".
( Out Of Sample Profit Factor / In Sample Profit Factor )
It's a beautiful thing when this ratio is greater than one, because then the Out-of-Sample Profit Factor is greater than the In-Sample profit factor.
Let's look at a few examples of this equation at work.
First we will look at a swing trading strategy for the S&P 500 using the ETF SPY that I wrote years ago.
SPY Swing Trading Strategy With Out of Sample Testing:
All the data before the purple line is the data I used to create this strategy.
Between the purple and blue lines is the data I ran the strategy over that it had not seen yet, the Out-of-Sample data - the green equity curve keeps going up as you can see.
Then the master stroke, everything after the blue line is real-time trading.
I can't overstate this enough: all three sections look identical to each other; this is exactly what you want to see in your trading.
Now, let's plug and chug the values into our ranking equation above:
Profit Factor * ( ( Number Of Trades ) ^ ( 0.5 ) ) * ( Net Profit / Max Drawdown ) * ( Out of Sample Profit Factor / In Sample Profit Factor ) =
2.19 * ( ( 550 ) ^ (0.5) ) * ( 538,000 / 30,000 ) * ( 2.30 / 2.20 ) = 953
Next we will look at a Gold trend following strategy which trades the futures contracts @GC:
Trend Trading Strategy With Good Out of Sample Testing
You can see a huge difference between trading the S&P 500 and gold immediately.
This is due to the fundamental way each market works internally.
The S&P 500 is a mean-reverting market and Gold is a trending market.
You must use the correct trading method with the right market; a mean-reverting trading strategy does not work with a trend following market.
You'll also note that the green equity curve for trading gold looks choppy, not as nice and clean as the S&P 500 strategy.
This again is due to the trending nature of gold; you have to put up with a lot of little losses while waiting to catch the massive trends higher.
Let's run the numbers and see how gold trading compares to S&P 500 trading (I'm sure you already know it's going to rank lower just by looking at the chart).
Profit Factor * ( ( Number Of Trades ) ^ ( 0.5 ) ) * ( Net Profit / Max Drawdown ) * ( Out Of Sample Profit Factor / In Sample Profit Factor ) =
2.85 * ( (125) ^ (0.5) ) * ( 681,000 / 35,000 ) * ( 2.8 / 3 ) = 576
Different Out of Sample Scores Are Okay
There is a pretty large scoring difference between these two algorithmic trading strategies.
Nonetheless, any strategy that scores above 400 is worth trading.
But why trade gold in the first place?
Trading different, non-correlated asset classes (Stocks, Gold, Oil, etc.) smooths out your portfolio growth over time.
When one strategy is zigging, the other is zagging.
The only real Holy Grail in trading is the use of multiple strategies trading different asset classes.
And now we have a scientific way to measure strategies against each other.
Use the above equation on your own strategies and see how they stack up against each other.
You can use the equation on any trading strategy over any time period and on any time-frame (like day-trading strategies).
Conclusion: Why Out of Sample Testing Is Needed
- Good trading strategies make lots of money in the real world
- (And have very simple rules, like this swing trading strategy that has a score of 1365!)
- Have large Net-Profit to Drawdowns ratios
- They have lots of trades!
- Their Out-of-Sample results look like their In-Sample results
- They have commission and slippage factored in
- They have a score based on this formula that is greater than 400:
- Score = Profit Factor * sqr ( Number of Trades ) * ( Net Profit / Max Drawdown ) * ( Out-of-Sample Profit Factor / In-Sample Profit Factor )