Better Experiments: Predict & Run Your Highest Impact A/B Tests First
Some a/b tests are better than others and it would be in your best interest to run your best test ideas first. From a business perspective, I think that the criteria for selecting your best tests should include something of the following: highest impact and lowest effort. Effort is easier to assign when you talk to your developers. But how do you predict the impact of your upcoming tests? Here is how we do it.
For Completely New Ideas: We Simply Don’t Know
If we are testing a new design idea for which no past data exists, then we admit that we simply don’t know what effect to expect. We stopped ranking ideas as low, medium and high impact based on pure guess work and started acknowledging total uncertainty in these cases. I think that’s healthy and honest – it’s fine not to know.
For Tested Ideas: Median Data
The more tests we run (and talk to people who run a/b tests), the more we realize that people test similar changes. From this perspective, if we only choose to group similar tests together and remember their effects then we can begin to sort (and run) the highest impact tests first. We built GoodUI Fastforward with exactly this predictive mindset in mind. For each pattern, defined by a change or set of changes, we calculate median data from past test results. We like medians as they don’t skew our effects as much when we introduce big effect outliers (possibly from under powered tests).
We also go one step further and calculate the medians of both shallow and deep metrics. Shallow metrics for us usually measure low intent visits to the next immediate step. And deeper metrics, such as real signups or real sales, measure a higher intent of people actually carrying through multiple steps all the way till the end.
Here is an example of a few patterns that contain test data for checkout pages. As you can see, the No Coupon Field pattern so far has the highest median effect (+24%) based on 3 tests, followed by the Fewer Form Field pattern (+13%) that is based on 4 tests.
When we optimize multiple web sites, having such median data is very useful in helping us select the highest impact ideas. And the amazing thing is that with each test we run, our defined patterns become more accurate as they are corrected with new results.