Note: This post was written in collaboration with Oryah Lancry-Dayan, Lead Statistician
In today's fast-paced industry, analysts are under constant pressure to run rapid experiments and deliver quick insights. However, many struggle to keep up with these demands, especially in companies that operate under the constraint that only one A/B test can run at a time for a given aspect of the product. This approach creates a bottleneck, delaying the launch of new experiments until the ongoing one is completed.
In this blog, we challenge the idea that A/B testing must always be conducted one at a time, advocating instead for parallel testing. We’ll explain why running multiple experiments simultaneously is not only feasible but also beneficial, explore its key advantages and potential challenges, and share best practices for successful implementation. If you're looking to accelerate your experimentation process, you’ve come to the right place!
Well, perhaps the first question to ask is: why not? Analysts often hesitate to run parallel tests due to a common concern that simultaneous experiments might interfere with each other, making it difficult to isolate the impact of a specific change. In extreme cases, there’s even a fear that parallel changes could cancel each other out, causing a genuine effect to go undetected.
While this concern is understandable, statisticians have developed methods to manage such possible interactions between treatments. As we’ll explore below, we can assess whether two treatments influence each other, specifically, whether the effect of one treatment remains consistent across all levels of the other. In most cases, given that individual changes in A/B testing tend to have relatively small effects, the likelihood of significant interference is low. Regardless, we’ll outline how to estimate and interpret these interactions in the last section of this blog.
Now that we can set aside concerns about the feasibility of parallel testing, it is time to discuss why we would like parallel testing in the first place:
The feasibility of parallel testing does not mean that any tests can automatically be run simultaneously. Before launching tests in parallel, there are two key considerations to keep in mind:
Testing in parallel is conceptually similar to testing one experiment at a time, but with an important preliminary step: checking for interaction effects between the tests. Statistically, this means examining whether the effect of one treatment varies across the levels of the other treatment.
To conduct an interaction analysis, the first step is to create an interaction variable by multiplying two (or more) independent variables, representing the different levels of the groups. This interaction variable captures the combined effects of the treatments across their various levels, allowing analysts to assess whether it significantly predicts the key performance indicator (KPI).
To better understand how parallel testing works in practice, let’s walk through a classic example: testing changes to a product’s purchase button. Suppose Test A investigates the impact of button color (control: white, test: black), while test B examines the effect of font color (control: red, test: blue). Your main KPI is conversion rate (CR), the percentage of users who completed a purchase.
The first step is to create dummy variables for each level of the tests. For instance, we can code button color as 0 for white (control) and 1 for black (test), and apply a similar coding for font color (e.g., 0 for red, 1 for blue). To capture potential interaction effects, we then include a new variable that represents the product of these two dummy variables. Regression models, easily applied in R or Python, allow you to incorporate this interaction term into your model and assess whether the combination of treatments leads to a different outcome than expected from their individual effects.
The illustration below outlines two possible scenarios for this kind of a test:
As
shown in the illustration, the tests do not need to be perfectly aligned in time. Instead, we can analyze potential interactions by focusing on the overlap period, when users are exposed to both tests simultaneously (the parallel testing phase shown in the graph). The figure considers two possible scenarios:
This illustration clearly shows the benefits of parallel testing. In the first scenario, the primary advantage is time efficiency: we don’t need to wait for one test to conclude before starting the next. In the second scenario, the benefits go beyond just saving time. Parallel testing allows us to uncover insights that would be missed if the tests were run sequentially. For example, suppose we start with Test A (button color) and find no significant difference (since the conversion rate is 20% across both font colors). Based on this, we choose to stick with the white button. Next, we run Test B (font color) and find that the blue print performs better, leading us to adopt a white button with blue print. However, as the illustration shows, this combination isn’t the optimal one. Only by testing both variables in parallel and observing their interaction can we identify that the best-performing combination is actually the black button with blue print.
Parallel testing provides a straightforward and effective way to accelerate A/B testing cycles. While concerns about interactions between treatments and potential negative product experiences are valid, these issues can be mitigated with careful planning and analysis. Specifically, by examining the interaction effect, you can determine whether each treatment can be assessed independently. So, why wait? There’s no reason to hold back your experiments any longer!