A backtest is not a strategy.
A backtest is a measurement under a set of assumptions – many of which quietly fail the moment you go live.
In crypto spot and futures, the gap between a beautiful curve and a real system is usually not one big mistake. It’s a stack of small, compounding mismatches: costs, fills, regime changes, operational constraints, and risk sizing interacting in ways the backtest never modeled.
This post is a checklist of where “edge” commonly disappears – and how we pressure-test ideas before they’re allowed to graduate.
1) Your fills are the product, not the signal
Most backtests implicitly assume you can trade at (or near) the price you see.
Live execution often looks like:
- you get filled when you shouldn’t (adverse selection)
- you don’t get filled when you need to (liquidity disappears)
- your fills cluster in the worst moments (volatility expansions)
A signal that works only with perfect fills is not a tradable signal.
What we do: model slippage conservatively, test sensitivity to worse fills, and measure realized slippage in early deployment stages.
2) Costs aren’t a footnote – they’re the tax on noise
Crypto strategies often live on small edges. Costs are not a minor adjustment.
Costs include:
- taker/maker fees (and fee tier drift)
- spread
- impact
- funding (perps)
- borrow/financing where relevant
- rebalances and implicit turnover costs
If your edge is 5 bps and your true cost is 8 bps, you don’t have an edge – you have an illusion.
What we do: assume costs are higher than you think, then ask “does it still work?”
3) Time and labeling errors create fake predictability
Backtests can accidentally “cheat” without anyone intending to cheat:
- Lookahead leakage (subtle timestamp issues)
- Using close prices when you can’t actually trade at the close
- Using future-revised data (especially with aggregated feeds)
- Misaligned bars across venues
- Incorrect handling of funding timestamps or settlement mechanics
These mistakes produce strategies that look stable right up until they hit production – where they immediately fall apart.
What we do: enforce strict timestamp discipline, data lineage, and reproducible pipelines. If we can’t explain the data, we don’t trade it.
4) Regimes don’t average out the way you want
Backtests often report an “average” that doesn’t exist in practice.
You might have:
- one regime where it works very well
- several regimes where it bleeds slowly
- occasional regimes where it breaks violently
And the distribution matters more than the mean, because risk limits force you to stop before the mean has time to arrive.
What we do: break results by regime proxies (volatility, trend, liquidity), stress test worst slices, and treat cliff-risk as disqualifying unless there’s a hard control.
5) Position sizing is where most strategies actually live
Two strategies with the same entry/exit can behave completely differently depending on sizing.
Common sizing failures:
- sizing that expands into volatility (without realizing it)
- no cap on correlated exposure
- compounding into drawdowns
- too much reliance on “average case” behavior
Sizing can turn a mild model error into a liquidation event.
What we do: design sizing and limits as first-class components, not an afterthought.
6) Correlation shows up when you can’t afford it
In calm markets, positions look diversified.
In stress, many things become the same trade.
The backtest may show a nice portfolio curve, but in real stress:
- correlations spike
- liquidity disappears together
- funding moves together
- de-risking becomes crowded
What we do: test portfolio behavior under stress, cap gross/net exposure, and model drawdown behavior as a system property.
7) Ops and failure modes are part of the strategy
What happens when:
- a feed drops?
- a venue API rate-limits you?
- order acknowledgements lag?
- funding logic misfires?
- your process restarts mid-position?
If your backtest doesn’t include operational failure modes, it’s incomplete. Live trading will include them.
What we do: build safe defaults, kill-switches, and monitoring. If a component is uncertain, the system should reduce risk – not guess.
What a “good” backtest is actually good for
Backtests are valuable when used correctly. We use them to:
- falsify weak ideas early
- estimate sensitivity to costs and execution
- identify regime dependence
- design controls around known failure modes
- define what must be monitored in production
A backtest should not be a sales pitch. It should be a tool for discovering how an idea breaks.
Takeaway
The strategy isn’t the curve.
The strategy is:
- the signal plus execution assumptions
- plus costs
- plus sizing and limits
- plus monitoring and failure handling
- plus the discipline of iteration
That full stack is what survives live markets.
And that’s the only thing that matters.


Leave a Reply