
Note: This post was written in collaboration with Oryah Lancry-Dayan, Lead Statistician
Prolog:
Running an A/B test involves a long chain of decisions for an analyst: sharpening the hypothesis, selecting the right primary KPI, defining acceptable error rates, and calculating the required sample size. Yet one decision that often flies under the radar is how to allocate users between groups.
The default approach is to split users equally between control and treatment (50/50). This rule of thumb exists for a good reason: an equal allocation maximizes statistical power. However, theory and practice do not always align, and there are situations where deviating from equal allocation is necessary.
In these cases, a natural follow-up question arises: how should the groups be structured? Suppose you want only 10% of traffic exposed to the treatment. One option is to keep equal group sizes but run the test on a smaller subset of users (a 10%–10% split). Another is to assign a small fraction to treatment while using the remainder of the population as control (a 10%–90% split).
In what follows, we examine the trade-off between these two designs. We begin by understanding the statistical implications of deviating from the classical 50/50 split, and then compare balanced (e.g., 10–10) and unbalanced (e.g., 10–90) allocation approaches in terms of statistical error and practical considerations.
The Impact of Balanced and Unbalanced Allocation on Test duration
To understand how allocation affects experiment runtime, two key points matter. First, for a fixed power and effect size, total sample size is minimized under equal allocation; a 50/50 split is the most statistically efficient (Figure 1).

Second, with unequal allocation, while the total sample size grows, the treatment group itself becomes smaller. This happens because a larger control group provides a more precise baseline estimate, allowing fewer treated users while maintaining statistical power. Analytically, the required treatment size scales with the allocation ratio as:
How does this translate into the difference between a balanced (10–10) and an unbalanced (10–90) design? A balanced design corresponds to equal allocation, where users are split evenly between treatment and control. This allocation minimizes the total number of users required to achieve a given level of statistical power. However, when treatment exposure is constrained (e.g., only 10% of users can receive the treatment), a balanced design effectively uses only a subset of the available traffic, since a large portion of users (in the 10–10 example, 80%) remains unassigned to any experimental condition.
In contrast, an unbalanced design corresponds to unequal allocation, where users are intentionally split asymmetrically between groups. This approach increases the total sample size required to maintain statistical power, but it reduces the number of users exposed to the treatment while making full use of the available traffic.
To illustrate this trade-off, consider a scenario where only 10% of users can be exposed to the treatment. Suppose a power analysis shows that a balanced design (10–10 allocation) requires 10,000 users per group, for a total of 20,000 users. Under a 10–90 allocation, the control group is nine times larger (r = 9), and the scaling relationship implies that the treatment group requires only about 36% of the sample size needed under equal allocation. In this case, the treatment group would include approximately 3,600 users, with roughly 32,400 users in control, for a total of 36,000 users.
Although this total is larger than in the balanced design, the unbalanced approach makes use of the entire traffic pool, whereas the balanced design leaves a substantial portion of users unexposed.
To make this trade-off more concrete, assume the product receives 1,000 users per day. The table below compares the expected runtime under each scenario.

Unsurprisingly, a 50–50 allocation yields the fastest test. But once unequal allocation is required, an interesting pattern emerges: in this example, a 10–10 design would take almost three times longer to complete!
The Impact of Balanced and Unbalanced Allocation on Error Rate
In the previous section, we saw that unbalanced allocation can reduce runtime compared with a balanced design. But before declaring a clear winner, it’s important to remember that unbalanced designs may skew the distribution and slow convergence to normality, potentially affecting test validity and increasing the likelihood of errors.
Which errors? In A/B testing, statistical inference generally addresses two key questions:
To examine how allocation affects false positives and coverage errors, we simulated 5,000 tests using balanced (r = 1) and unbalanced (r = 9) designs with three highly skewed revenue datasets (skewness: 11.82, 28.63, 40.17) to ensure ecological validity. For false positives, control and treatment groups were sampled from the same distribution. For coverage, a true effect was simulated by scaling treatment observations by 1 ± MDE (depending on hypothesis direction), and we assessed whether the confidence intervals captured this effect. We examined three factors that interact with skewness and influence error rates:

Our simulations confirm that, as expected, datasets with higher skewness are more prone to inflated false positive rates when using an unbalanced design. This effect is observed for two-sided and left-tailed hypotheses, whereas for right-tailed hypotheses, the error rate is actually below the nominal threshold. Furthermore, applying winsorization markedly reduces skewness and brings the false positive rate to the intended level.

A similar pattern appears for confidence interval coverage: unbalanced designs without winsorization show lower-than-desired coverage. However, as sample size increases and winsorization is applied, coverage approaches the intended level.
The Impact of Balanced and Unbalanced Allocation on Differences between Groups
So far, we have focused on the statistical properties of unbalanced allocation, highlighting its ability to reduce experiment duration and the importance of sample size, hypothesis type and winsorization for maintaining test validity. In practice, however, unbalanced designs can make experiments more vulnerable to operational effects that would otherwise affect groups more evenly. When group sizes differ substantially, system dynamics can interact with the allocation itself, introducing unintended biases.
For example, in an unbalanced design, the smaller treatment group may be more prone to cookie churn. Experiments typically rely on cookies to track users across sessions. Since users sometimes lose or refresh cookies, those in smaller groups are more likely to be re-randomized into a different group. This can have two key consequences:
Another potential issue arises when groups share system resources. Unbalanced designs can introduce bias if users compete for limited resources, such as LRU caches. The larger variant naturally occupies more cache entries, which can give it a performance advantage (e.g., faster page loads) over the smaller variant. This may confound experimental results by creating an artificial boost unrelated to the treatment itself.
To Balance or Not to Balance: Choosing the Right Experiment Design
The goal of this blog was to highlight key considerations when unequal allocation is needed for business reasons. While unbalanced designs (e.g., a 10–90 split) can reduce experiment duration, several factors require careful attention. Let’s summarize the main takeaways:
While the main advantage of unbalanced allocation is faster experiments, operational factors can introduce unintended biases. As a best practice, running an A/A test with the intended allocation ratio helps ensure the design does not create sample ratio mismatches or other unintended differences between variants. Ultimately, there is no one-size-fits-all answer: choosing between balanced and unbalanced designs depends on the context and the characteristics of the testing environment and system.