How small backtests deceive traders: The Law of Large Numbers

Technical

How small backtests deceive traders: The Law of Large Numbers.

Written by

Tsvetomir Novak

-

February 15, 2026

<style> .math-block { margin: 1.5em 0; text-align: center; overflow-x: auto; } .qh-table-wrap { display: flex; justify-content: center; margin: 1.5em 0; } .qh-table-wrap table { border-collapse: collapse; } .qh-table-wrap th, .qh-table-wrap td { border: 1px solid #ccc; padding: 8px 24px; text-align: center; } .qh-table-wrap th { font-weight: 600; } </style> <p>You have a strategy to win over the market. You believe you have an edge, but how would you know if your hypothesis holds up in the real world? You decide to test your model on a paper trading account. It executes 200 trades over the following weeks, resulting in a 10% profit. You finally decide to let it out in the real market. A few months later, you are 15% down, and your investors want their money back. How did this happen?</p> <h3>The Core Problem</h3> <p>People often believe that trading algorithms do not require extensive testing to predict their behaviour. A single trade, a single week, even a single month of trades can look "amazing" or "horrible" purely by chance. This uncertainty often leads professional institutions and traders to the Law of Large Numbers (LLN), which helps them eliminate chance from the game. Today we will use it to model the behavior of a few algorithms in trading using probability. We will then demonstrate the true risks of small backtests and why large datasets are required.</p> <p>It comes as no surprise that, before deploying a model they believe will outperform the market and generate profits, traders would test it first before using real money. Many decide to let it play first on a paper trading account and see how it behaves. However, even with consistent short-term results, small backtests can be misleading and do not show the full picture. Before we continue, let us first define LLN.</p> <h3>The Law of Large Numbers</h3> <p>Let $X_1, X_2, \ldots$ be i.i.d random variables with $E[X_n] = \mu$ and we also define the average as $\overline{X}_n := \frac{1}{n}\sum_{i=1}^{n} X_i$, then the Law of Large Numbers says $\overline{X}_n$ converges to $\mu$.</p> <p>How could this apply to trading though? Let's apply it to a simple model of "a strategy". Imagine a model that produces a return $R_1, R_2, \ldots, R_n$ over $n$ trades (or days/weeks). The returns are random because most markets are uncertain. We will assume that $E[R] = \mu$, $\text{Var}(R) = \sigma^2$, and the average return after $n$ trades is defined as:</p> <div class="math-block">$$\overline{R}_n = \frac{1}{n}\sum_{i=1}^{n} R_i$$</div> <p>From the Law of Large Numbers, we can state that $\overline{R}_n \to \mu$ as $n \to \infty$, which simply means that with enough trades, the average return you observe starts to settle near the true average $\mu$. The question is, how fast does it settle? Even if LLN states that the average converges eventually, we need to know how unreliable the estimate is when $n$ is 20, 100 or 500.</p> <p>A crucial fact from probability is, that $\text{Var}(\overline{R}_n) = \frac{\sigma^2}{n}$, so the standard deviation of the average, also called the standard error, is: $\text{SE}(\overline{R}_n) = \frac{\sigma}{\sqrt{n}}$. This leads us to a key idea: Doubling the number of trades $n$ does not double the reliability of the model, and it only improves at the same rate as $\sqrt{n}$ increases.</p> <h3>Real World Application</h3> <p>Suppose your strategy has $\mu = 0.10\%$ per trade (your edge), and $\sigma = 2\%$ per trade, and you want to calculate the reliability of your model over time. These metrics are often realistic for institutions like hedge funds. Returns are often noisy and the average edge is tiny in the current efficient markets. By "noisy," we mean that the market moves unpredictably in the short term because many traders hold different views on the asset's valuation. From our previous formula, we know that: $\text{SE}(\overline{R}_n) = \frac{0.02}{\sqrt{n}}$, so our results will be:</p> <div class="qh-table-wrap"> <table> <thead> <tr> <th>Trades $n$</th> <th>Standard Error of Returns $\text{SE}(\overline{R}_n)$</th> </tr> </thead> <tbody> <tr><td>10</td><td>0.63%</td></tr> <tr><td>50</td><td>0.28%</td></tr> <tr><td>200</td><td>0.14%</td></tr> <tr><td>1000</td><td>0.063%</td></tr> </tbody> </table> </div> <p>From this table, we see that even after 200 trades, the standard error exceeds our market edge, making it appear partly as noise, when it is not. As a result, your hypothesis and conclusion about the model can be inaccurate, as in reality, you do not know the true $\mu$ of your model, and can only approximate it by backtesting extensively. This could also backfire, as a model with $\mu = -0.10\%$ would yield nearly identical results in the short run during backtesting.</p> <p>Now, suppose that you have another strategy, where you have calculated the probability $p = 0.51$ that your trade will yield $+1\%$ profits and the probability $1 - p$ of $-1\%$ profits. Your edge would then be: $\mu = (2p-1) \cdot 1\% = +0.02\%$ per trade, with $\sigma = 2\sqrt{p(1-p)} \cdot 1\% \simeq 0.9998\%$. The profit probability follows a binomial distribution in this case, and for $n = 1000$, we can apply the Central Limit Theorem, and get a probability of 25% that our model yields a loss, while for $n = 200$, it would be 36%.</p> <h3>What Did We Learn?</h3> <p>The Law of Large Numbers shows us the importance of extensive backtesting and the avoidance of reliance on short-term results. With small margins and noisy returns, many models do not exhibit their true accuracy without sufficient data. Even if a trading model is down by dozens or even hundreds of trades, this is entirely normal, given that the strategy may have a positive expected edge. The opposite could also be true; as a result, this could lead to a false positive in the short run. Always make sure to test your hypothesis on enough data, otherwise you are only lying to yourself.</p>