Systems for Systematic Markets

The Backtest Isn’t the Strategy: Where Edge Disappears in Live Trading

8 March 2026

741 words

3–5 minutes

Research Notes

backtesting, costs, execution, production, regimes, risk controls, slippage

A backtest is not a strategy.

A backtest is a measurement under a set of assumptions – many of which quietly fail the moment you go live.

In crypto spot and futures, the gap between a beautiful curve and a real system is usually not one big mistake. It’s a stack of small, compounding mismatches: costs, fills, regime changes, operational constraints, and risk sizing interacting in ways the backtest never modeled.

This post is a checklist of where “edge” commonly disappears – and how we pressure-test ideas before they’re allowed to graduate.

1) Your fills are the product, not the signal

Most backtests implicitly assume you can trade at (or near) the price you see.

Live execution often looks like:

you get filled when you shouldn’t (adverse selection)
you don’t get filled when you need to (liquidity disappears)
your fills cluster in the worst moments (volatility expansions)

A signal that works only with perfect fills is not a tradable signal.

What we do: model slippage conservatively, test sensitivity to worse fills, and measure realized slippage in early deployment stages.

2) Costs aren’t a footnote – they’re the tax on noise

Crypto strategies often live on small edges. Costs are not a minor adjustment.

Costs include:

taker/maker fees (and fee tier drift)
spread
impact
funding (perps)
borrow/financing where relevant
rebalances and implicit turnover costs

If your edge is 5 bps and your true cost is 8 bps, you don’t have an edge – you have an illusion.

What we do: assume costs are higher than you think, then ask “does it still work?”

3) Time and labeling errors create fake predictability

Backtests can accidentally “cheat” without anyone intending to cheat:

Lookahead leakage (subtle timestamp issues)
Using close prices when you can’t actually trade at the close
Using future-revised data (especially with aggregated feeds)
Misaligned bars across venues
Incorrect handling of funding timestamps or settlement mechanics

These mistakes produce strategies that look stable right up until they hit production – where they immediately fall apart.

What we do: enforce strict timestamp discipline, data lineage, and reproducible pipelines. If we can’t explain the data, we don’t trade it.

4) Regimes don’t average out the way you want

Backtests often report an “average” that doesn’t exist in practice.

You might have:

one regime where it works very well
several regimes where it bleeds slowly
occasional regimes where it breaks violently

And the distribution matters more than the mean, because risk limits force you to stop before the mean has time to arrive.

What we do: break results by regime proxies (volatility, trend, liquidity), stress test worst slices, and treat cliff-risk as disqualifying unless there’s a hard control.

5) Position sizing is where most strategies actually live

Two strategies with the same entry/exit can behave completely differently depending on sizing.

Common sizing failures:

sizing that expands into volatility (without realizing it)
no cap on correlated exposure
compounding into drawdowns
too much reliance on “average case” behavior

Sizing can turn a mild model error into a liquidation event.

What we do: design sizing and limits as first-class components, not an afterthought.

6) Correlation shows up when you can’t afford it

In calm markets, positions look diversified.

In stress, many things become the same trade.

The backtest may show a nice portfolio curve, but in real stress:

correlations spike
liquidity disappears together
funding moves together
de-risking becomes crowded

What we do: test portfolio behavior under stress, cap gross/net exposure, and model drawdown behavior as a system property.

7) Ops and failure modes are part of the strategy

What happens when:

a feed drops?
a venue API rate-limits you?
order acknowledgements lag?
funding logic misfires?
your process restarts mid-position?

If your backtest doesn’t include operational failure modes, it’s incomplete. Live trading will include them.

What we do: build safe defaults, kill-switches, and monitoring. If a component is uncertain, the system should reduce risk – not guess.

What a “good” backtest is actually good for

Backtests are valuable when used correctly. We use them to:

falsify weak ideas early
estimate sensitivity to costs and execution
identify regime dependence
design controls around known failure modes
define what must be monitored in production

A backtest should not be a sales pitch. It should be a tool for discovering how an idea breaks.

Takeaway

The strategy isn’t the curve.

The strategy is:

the signal plus execution assumptions
plus costs
plus sizing and limits
plus monitoring and failure handling
plus the discipline of iteration

That full stack is what survives live markets.

And that’s the only thing that matters.