A/B Testing Sample Size – The Convergence Method
Running A/B split tests is a common practice for the internet marketer. The approach is fairly straight forward: Run two concurrent versions of you ad / landing page / creative / etc, one being a control (current version) the other your treatment (version that you think will do better). While the approach is fairly simple, quite a few people get hung up on how to determine the correct sample size. There are plenty of resources out there that give recommendations ranging from always using 100 conversions to complex statistical formulas. While you can pull a number out of a hat or run some complex math, there is a much easier method: The Convergence Method
It's pretty simple. Let's break it down:
In a typical test you run the following:
- A - Control
- B - Treatment
You stop the test once you hit your predetermined sample size you spent hours calculating.
Using the Convergence Method, you run the following:
- A1 - Control #1
- A2 - Control #2
- B - Your treatment
By monitoring the convergence in performance of the two exact same controls you're able to understand when you've hit your minimum sample size. In other words, when the cumulative performance of A1 and A2 come within your required variance the overall results of your test are statistically valid. An expected variance might be .5% or whatever you choose.
Here's what it looks like:
As you can see above, the the cumulative conversion variance between A1 and A2 decreases over time and eventually converges into a relatively steady variance of approximately +-.1% around hour 13. It's at this point in time that we consider the overall results of the test valid. It's also important to note that your measure must be cumulatively computed over time, not at discrete time intervals.
In addition to monitoring the convergence of the controls, you'll also want to consider the length of the test based on potential variability in performance over a given time period. In this test I chose 24 hours. You may want to run it longer if you feel day of the week would affect the performance of your control and treatment at independent levels. Don't get too worried about this though. Odds are that most time of day and day of week variances will affect your control and treatments equally.
A lot of material out there regarding calculating sample sizes assumes you need calculate it before you run the test. In the offline world this is usually true. You need to know how many mailers to print or survey candidates to find. With internet marketing tests you generally have access to real-time performance measures, which is what enables this method to work.
So put down that stats book and calculator and use convergence in your next test.

LinkedIn – jeffbollinger
Twitter @jbollinger