× Patterns Blog BetterUI Review Coaching Courses Datastories

Is It Correct To Make Multiple Design Changes In A Single Test Variation?

Tagged As: Question

As you look and analyze a given UI screen or flow, design ideas of how you might improve it rush into your consciousness. Here you are faced with two general approaches: you either a/b test each change individually, or you group some of them together into a single variation. Which is right?

Your Opinion [Poll Closed]

My Thoughts & Experience [Updated Sep 27, 2018]

First of all, thank you for voting everyone! I’m surprised we received that many votes. Thanks again!

As for the results, it clearly looks like the more people are in favor of isolating a single change within an experiment. This I think is often motivated by a desire to understand if a given change has an impact, or not (as seen in the comments below). Alternatively, grouping multiple changes together in a single variation is done with the hope of striving for a higher overall impact. This is done on the assumption that most of the changes are positive and stack up. Of course the reality is also that multiple changes run the risk of cancelling each other out (with some being negative and some positive).

How Do Bigger Vs Smaller Tests Compare In Terms Of Impact?

To answer the above question, we should look at evidence how larger tests (with multiple changes) compare to smaller tests (with isolated changes). Luckily we have some data on this.

For a number of years we’ve been running larger tests for our clients and writing about it under the GoodUI Datastories project. Some of these stories include retests from failed attempts so the median impact is slightly inflated. Nevertheless when we look across 26 such projects, we have a median impact of 23%.

Now in comparison, we have also been collecting more isolated test results as patterns. Currently, we have published 159 a/b tests that are smaller with more isolated changes with a median impact of 6.6%.

This comparison is our first signal validating the potential of grouping changes into single variations.

Should Make As Many Changes As We Can Think Of?

Quite recently were were also privileged to run an interesting experiment on a landing page for a premium service of an online driving school. I just want to focus on the high level test setup which was designed in the following way:

As you can imagine, the results were the following:

Test Results
Test Results Comparing High Probability Changes To As Many Changes As Possible

This suggestive test (it was stopped early for external business reasons) is a subtle demonstration that “more changes” are not necessarily better as seen in the C variation. Instead, the B variation combined only a handful of changes based on already tested patterns (all with net positive probability). Essentially were using positive past test results to decide which changes to group together. Such and approach taken in the B variant outperformed both C and A.

Summarizing This As Principles

Taking all of the above into consideration, we have come up with the following two guiding principles to be used on our projects:

Designing Experiments

IF: YOU WANT THE BIGGEST IMPACT IN THE SHORTEST TIME
THEN: GROUP HIGH PROBABILITY IDEAS INTO 1 VARIATION
Designing Experiments

IF: YOU WANT TO LEARN IF A CHANGE HAS AN EFFECT
THEN: ISOLATE THE CHANGE INTO 1 VARIATION

To make things more interesting, please also share your thoughts as to why you voted in a particular way as a comment. Perhaps there are cases when both answers are true? Let’s talk about it.


Share this story
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Posted by Jakub Linowski on Sep 24, 2018

15
Leave Your Reply

avatar
11 Comment threads
4 Thread replies
7 Followers
 
Most reacted comment
Hottest comment thread
12 Comment authors
JoergAdrianRichardNeilVito Recent comment authors
newest oldest most voted
Annand
Annand

I think it’s just important to understand the trade off of testing atomically or testing multiple changes at once. At that point, you can decide what makes the most sense for your tests.

Rob Spangler

On an outdated page with multiple problems, it’s simply too expensive to test individual updates. In that case, I’d suggest determining a winner and then going back through to test individual elements by priority. On the other hand, you may have a page that’s working generally well and needs several enhancements. If these are tested all at once in a single variation, you’ll never know what update caused the difference. Perhaps your conversion went up 10% due to the CTA, but conversion went down 5% due to the heading. You’ll only see +5%, but remain blind to the cause. What… Read more »

Steve Place

I would say generally, no however, it depends on what you’re testing. Subtle design changes in a single variation could be okay but several copy or flow changes would not be okay.

elisa
elisa

Actually, it depends on the reason why those changes may occur. Is it to improve the UX based on data results (high abandon rate on a specific page)? Is it just based on assumptions (“I am sure this will look or work better if we make those changes”)? I believe that any requested change should be data driven (when available of course) vs. assumption based. Once an issue has been identified through GA or tools such as Hotjar for instance, the idea would be to implement the change(s) related to a specific issue. Then To analyze all the changes based… Read more »

Bryce Lee

For an optimization test, I would say no. But you can’t optimize your way to a new design, so if you are redesigning the site, you can do a benchmarking test before and after, where everything has changed.

Liz Pok Lynch

I agree that in a perfect testing world, you’d want to isolate each variable. But in the imperfect real world, where you might need a very large impact very quickly, I think it becomes “ok” to change multiple things in a single test. It’s critical that everyone understand that that’s the case, however, so no one points to a single change and says that’s what drove the response. If you really need to understand the impact of a single variable, then you’d better isolate that variable.

Vito

Even a single change may consist in multiple micro changes. Still to better understand the cause a certain result, it is better to limit the variations.

Neil
Neil

Making multiple changes in one variant is not best practice as you really need to be able to determine the impact of each change – not all of them may show a positive uplift.

However, running an experiment between two wildly different experiences can help save time by confirming a new baseline, should the new experience win, to run future iterative changes on.

Richard

In my opinion both approaches are right and it all comes down to a purpose of given testing. I see two basic options – a) to enhance overall user flow / UX of given web page or b) to optimize performance of a page by enhancing certain website sections. In other words, if your goal is to fine-tune performance of specific website elements it should be more effective to test each of these changes separately. But if you try to enhance more complex behavior-based experience, I believe it would be necessary to make several connected changes that could cause that… Read more »

Adrian

It’s perfectly OK to group mutiple changes into a single variation. It gives you a greater chance to get bigger lifts. Generally, small changes = small gains. Also, unless you have a lot of traffic, it can take too long to test mulitple variations/iterations to get the significance required to guage a real win. You need to go for 10%+ lifts in conversion because any lower, particularly in the 3 – 5% range, the winners often ‘disappear’ due to a number of reasons – natural variance/noise being just one.

Joerg

In my experience multiple changes in one test are fine as long as all of them aim at the same hypothesis. Especially with lower traffic sites our test strategy is more to confirm the right hypothesis instead of the incremental impact of each individual change (which would take ages to get to significance levels given the low contrast between variations with subtle design changes only)