Bell | Blog | You can have it all: Parallel Testing in A/B Testing

Allon Korem

Chief Executive Officer

Note: This post was written in collaboration with Oryah Lancry-Dayan, Lead Statistician

In today's fast-paced industry, analysts are under constant pressure to run rapid experiments and deliver quick insights. However, many struggle to keep up with these demands, especially in companies that operate under the constraint that only one A/B test can run at a time for a given aspect of the product. This approach creates a bottleneck, delaying the launch of new experiments until the ongoing one is completed.

In this blog, we challenge the idea that A/B testing must always be conducted one at a time, advocating instead for parallel testing. We’ll explain why running multiple experiments simultaneously is not only feasible but also beneficial, explore its key advantages and potential challenges, and share best practices for successful implementation. If you're looking to accelerate your experimentation process, you’ve come to the right place!

Why Test in Parallel?

Well, perhaps the first question to ask is: why not? Analysts often hesitate to run parallel tests due to a common concern that simultaneous experiments might interfere with each other, making it difficult to isolate the impact of a specific change. In extreme cases, there’s even a fear that parallel changes could cancel each other out, causing a genuine effect to go undetected.

While this concern is understandable, statisticians have developed methods to manage such possible interactions between treatments. As we’ll explore below, we can assess whether two treatments influence each other, specifically, whether the effect of one treatment remains consistent across all levels of the other. In most cases, given that individual changes in A/B testing tend to have relatively small effects, the likelihood of significant interference is low. Regardless, we’ll outline how to estimate and interpret these interactions in the last section of this blog.

Now that we can set aside concerns about the feasibility of parallel testing, it is time to discuss why we would like parallel testing in the first place:

‍Accelerate A/B Testing Timelines - As discussed earlier, limiting experiments to a one-at-a-time approach creates a bottleneck that slows down the overall testing process. Often, new experiments are ready to launch but remain on hold until the current test concludes. By removing the restriction of sequential testing, analysts can significantly increase the pace of A/B testing, enabling faster insights and more efficient product development.
‍Enhance Statistical Power - When tests are conducted sequentially, analysts often feel pressured to shorten their duration to keep the testing pipeline moving. To achieve this, they may compromise statistical power to reduce the required sample size. As a result, the risk of missing meaningful effects increases. While minimizing sample size is also a consideration in parallel testing, the urgency is reduced since one experiment does not delay the next. Thus, shifting to a parallel testing approach can help maintain high statistical power and reduce the likelihood of overlooking impactful changes.
‍Unlock Deeper Insights Through Complex Experimentation - While analysts often worry about interactions between tests, discovering such interactions can be valuable. For example, if you're testing how color and font size impact revenue, and the real improvement comes from their combination rather than each factor alone, you would only uncover this insight by testing them in parallel. Running experiments simultaneously allows for a more comprehensive understanding of how multiple factors interact, leading to more informed decision-making.

What Should You Watch Out For?

The feasibility of parallel testing does not mean that any tests can automatically be run simultaneously. Before launching tests in parallel, there are two key considerations to keep in mind:

Avoid Negative Product Experiences - When implementing multiple changes to a product simultaneously, it’s essential to ensure that their combined effect doesn’t result in a negative user experience. For example, some experiments may involve sending push notifications via email. While a single email campaign can add value, running several tests that each trigger email notifications can overwhelm users, creating a spam-like experience that may ultimately lead them to unsubscribe from the service.‍
Consider Interaction Between Treatments - While it’s possible to test for interactions between two treatments, it’s important to avoid scenarios where multiple changes may unintentionally cancel each other out. For instance, you might want to assess the effectiveness of pop-ups in drawing user attention to specific features on your website. However, running several tests with different pop-ups at the same time can dilute their impact, as each one competes for attention in different areas of the page, leading to inconsistent user experiences and inconclusive results.

How Can You Test in Parallel Effectively?

Testing in parallel is conceptually similar to testing one experiment at a time, but with an important preliminary step: checking for interaction effects between the tests. Statistically, this means examining whether the effect of one treatment varies across the levels of the other treatment.

To conduct an interaction analysis, the first step is to create an interaction variable by multiplying two (or more) independent variables, representing the different levels of the groups. This interaction variable captures the combined effects of the treatments across their various levels, allowing analysts to assess whether it significantly predicts the key performance indicator (KPI).

If the interaction effect is not significant, you can proceed with analyzing each test separately, as the treatments do not influence each other. This allows you to treat each test as independent.
However, in the rare cases where the interaction is significant, you need to interpret the combined effect carefully and adjust your analysis to account for the interaction. In this case, you should look at each test separately, and examine the influence of the other test . Based on these insights, you may want to revisit the design or consider new hypotheses about how the treatments work together.

To better understand how parallel testing works in practice, let’s walk through a classic example: testing changes to a product’s purchase button. Suppose Test A investigates the impact of button color (control: white, test: black), while test B examines the effect of font color (control: red, test: blue). Your main KPI is conversion rate (CR), the percentage of users who completed a purchase.

The first step is to create dummy variables for each level of the tests. For instance, we can code button color as 0 for white (control) and 1 for black (test), and apply a similar coding for font color (e.g., 0 for red, 1 for blue). To capture potential interaction effects, we then include a new variable that represents the product of these two dummy variables. Regression models, easily applied in R or Python, allow you to incorporate this interaction term into your model and assess whether the combination of treatments leads to a different outcome than expected from their individual effects.

The illustration below outlines two possible scenarios for this kind of a test:

‍As shown in the illustration, the tests do not need to be perfectly aligned in time. Instead, we can analyze potential interactions by focusing on the overlap period, when users are exposed to both tests simultaneously (the parallel testing phase shown in the graph). The figure considers two possible scenarios:

‍No significant interaction between the two tests (top) - This is the more common case. In this scenario, the effect of one test remains consistent across the variations of the other. For instance, the black button consistently shows a 10% higher conversion rate than the white button, regardless of the font color. Here, we can safely analyze each test independently. The results would indicate that the black button performs better, and the blue font is more effective. The optimal combination, therefore, would be a black button with blue print.
‍Significant interaction between the two tests (bottom) - In rarer cases, the two tests may interact, meaning the effect of one test depends on the variation of the other. In the illustration, for example, font color has a different impact depending on the button color. While conversion rates are similar for both font colors on the white button, the blue font performs better on the black button. In this case, we must examine the combinations of test variations rather than analyzing each test in isolation. This leads us to the conclusion that the best-performing combination is the black button with blue print.

This illustration clearly shows the benefits of parallel testing. In the first scenario, the primary advantage is time efficiency: we don’t need to wait for one test to conclude before starting the next. In the second scenario, the benefits go beyond just saving time. Parallel testing allows us to uncover insights that would be missed if the tests were run sequentially. For example, suppose we start with Test A (button color) and find no significant difference (since the conversion rate is 20% across both font colors). Based on this, we choose to stick with the white button. Next, we run Test B (font color) and find that the blue print performs better, leading us to adopt a white button with blue print. However, as the illustration shows, this combination isn’t the optimal one. Only by testing both variables in parallel and observing their interaction can we identify that the best-performing combination is actually the black button with blue print.

Conclusions

Parallel testing provides a straightforward and effective way to accelerate A/B testing cycles. While concerns about interactions between treatments and potential negative product experiences are valid, these issues can be mitigated with careful planning and analysis. Specifically, by examining the interaction effect, you can determine whether each treatment can be assessed independently. So, why wait? There’s no reason to hold back your experiments any longer!

You can have it all: Parallel Testing in A/B Testing

In this blog, we’ll explain why running multiple experiments simultaneously is not only feasible but also beneficial, explore its key advantages and potential challenges, and share best practices for successful implementation.

Why Test in Parallel?

What Should You Watch Out For?

How Can You Test in Parallel Effectively?

Conclusions