Better Experiments: Increase Testing Velocity

Increasing Testing Velocity

The more design experiments that you can run, the more likely you are to identify new insights with positive effects that you could benefit from on your site (unless all of your tests flop of course, which is rather unlikely). Stated in another way, if you don’t test, it’s a given that you will not identify any interesting (and possibly profitable) new gains. So the number of tests that you can execute in a given time frame is a key starting criteria for getting better at optimizing sites for conversions. Over the years, here are the ways that we learned to use to improve this important factor:

1) Start That Test Sooner - Start It Today

Chances are that you have a screen (homepage, checkout, search results, landing page, shopping cart page) that is getting exposed to traffic and is not generating any new insights. People are coming and going and you are not learning anything new from that precious traffic. From this perspective, I strongly believe that it’s better to test anything than not (improving win rates and effect size is something we’ll write about in a future article). Non testing days are to be avoided due to regret and opportunity cost. Initially, it’s good practice to start with an easy test and as its running work on something with a greater probability of success.

TOOL RECOMMENDATION: VWO is a great starting a/b testing tool if you are looking for something better than free (Google Optimize) and not as expensive as Optimizely for example.

2) Agile Stop Rules

Some companies test using fixed time frames which limits them to a predictable number of tests, nothing more, nothing less (ex: 1 test per week or 1 test per month are common). Moving to a more agile or flexible time-frame approach allows you to stop less promising tests faster and thus increases your testing velocity.

EXAMPLE: One such stopping rule that we might apply on projects is to stop the test if: 1) either the control or variation reach 100 conversions, and 2) you have a negative result with a p-value of less than 0.03 as in this example. This could be adjusted to what you find acceptable. The point is that if you are getting certain about exposing your site to a loss, there is no need to run a hopeless test for a complete per-established time-frame. Agile stop rules allow you to cut your losses sooner, and move on to more promising tests.

3) Parallel A/B Testing

Another area where huge testing velocity gains are possible is in the number of parallel tests that are run together. Instead of running just a single test, you could be running a test on your checkout, another test on your homepage, and another on a paid traffic landing page all at the same time. This is an area of debate but most experts believe that the benefits are often greater than the risk of data pollution (by increasing variance). We definitely advocate this approach and recommend the following tips when running parallel tests:

  • Avoid running two tests on the same page (unless you have the right tools and professional statisticians to help you analyze the results)
  • Avoid testing a similar idea in two or more tests at the same time (ex: social proof being tested on a homepage with social proof also being tested on a following search results screen)
  • Try to keep parallel tests as far away from each other as possible (ex: it's ok to run a test on your homepage and a separate one on a checkout screen that's 4-5 steps apart)
  • Try to track different metrics for parallel tests (ex: homepage test could track signups as primary, while a checkout test could track sales as primary)
  • If you are really worried about parallel test interactions, you can always exclude participants that enter one test, from the second test

4) Prebuild Tests

Finally, we typically see slow downs when a given test completes and the question comes up: so what should we test next? Days or weeks go by before the next test is started which is a big testing velocity opportunity. Instead of slowing down, make sure you have tests that are prebuilt, checked, and ready to run as soon as some other tests stop.

What about you? How have you managed to increase testing velocity?


  • WpWebhelp

    WpWebhelp 5 years ago 00

    AB testing is one of the most important thing to do when consider the UX part. Specially in today's era as search engines also look at UX for their ranking algorithm.

  • Cameron Howieson

    Cameron Howieson 5 years ago 00

    These are great ideas that make a lot of sense. Solid post, Jakub.

    At Opencare, we have this challenge and have started to run design sprints on our tests before we run an A/B test. This allows us to get 4-5 qualitative data points in one day that tell us two primary things:

    a) Point out any glaring issues with the experiment, e.g. comprehension of copy, unnoticed buttons, unexpected user behaviour.

    b) Gives us a sense of the enthusiasm for the change. If users don't notice / act more quickly, or say "yeah this is better / clearer", it tends to imply that the change won't make a big difference. For example, we recently changed one of our landing pages. Previously, almost everyone would scroll down and try to get more information. In our design sprint with the new design, every person immediately went into the call to action, without scrolling. Once A/B tested, we saw a 60% increase in clicks on the call to action.

  • brian birkhead

    brian birkhead 5 years ago 20

    Hi Jakub

    V. interesting article.

    Just a few points:

    1. the process of how you infer & learn from tests is equally as important as how many tests you do
    2. these sequential inferences should drive the design of the next set of tests, so the idea of pre-building tests may not always be a good idea
    3. at some point there needs to be a balance struck between how many customers get put into tests and how many are treated with winning variants from previous tests (i.e. maximizing the exploitation of learning)
    4. I'm one of the strongest advocates of Testing you'll ever find, but a cycle of:
    hypothesis - screen - design - test - learn - (exploitation, new hypotheses)
    should lie at the heart of all testing programmes to temper the enthusiasm to test everything that moves
    5. "screen" in point 4. above refers to filtering hypotheses by their potential bottom-line impact before testing & provides an effective check on that enthusiasm
    6. in terms of early stopping - rather than use rules of thumb, there is a body of statistical theory which will provide the rule specific to each test you undertake