Simulating How Long To Run Your Test
How much time is enough for the true performance of your variations to come through the noise?
In this video, we'll see a simulation of an A/A/B/C/D test as it moves from the initial state dominated by chance towards a state of equilibrium. In the process, we observe how the performance of variations can change over time due to chance alone and what sorts of intermediate outcomes we can expect. How does a false positive tend to behave over time? What is a true +10% winner likely to do half way into the test? Answering these questions helps me interpret real tests.
To speed things up, this simulation is based on a 20% baseline conversion rate and 1,000 visitor hits per day. The duration of 10 days is just an example. In your real tests, the conversion rate might be as low as 1%, which means it would take far longer to get to a similar equilibrium.
- Use Evan Miller's Sample Size Calculator to calculate the sample size needed to detect a 10% relative lift over a 20% baseline (answer is at the bottom of this post) - leave the power and significance on default.
- Rewatch the simulation video and see how the test behaves as it approaches this sample size target.
- Consider: How accurate is the relative performance of each variation at this point? What sort of outcomes are still possible by chance alone that would obscure the true performance of the variations? Based on this simulation would you run your test longer or less than this target?
Do you use simulations for planning and analysis? Share with us.
(Answer: 6347 visitors per variation)