How to Determine Your A/B Testing Sample Size & Time Frame

How to Determine the Optimal Sample Size and Testing Time for Email A/B Tests

When it comes to email marketing, A/B testing is one of the most powerful tactics for optimizing your results. By comparing the performance of different versions of an email, you can identify what resonates best with your audience and continuously improve your email program.

However, running a successful email A/B test requires more than just coming up with two variations and seeing which one gets more opens or clicks. To get trustworthy, statistically valid results that you can act on with confidence, you need to put careful thought into your sample size and testing time frame.

Send your test to too small of an audience or end it too soon and your results may not be significant. But wait too long to get results and you miss out on the opportunity cost of implementing the winning variation.

Determining the right sample size and testing duration for your email A/B tests is both an art and a science. This post will break down the key factors to consider and walk through how to calculate your ideal email testing parameters.

Factors That Impact Email A/B Test Sample Size and Duration

There are a few key variables that impact how many recipients should be included in your email A/B test and how long the test should run:

  1. Overall email list size
  2. Typical email engagement rates
  3. Desired confidence level and margin of error
  4. Urgency to get results
  5. Number of variations

Let‘s look at each one in more detail.

Overall email list size

The size of your email list is one of the biggest factors that determines your required A/B test sample size. The larger your full email list, the smaller the proportion you need to include in your test to reach statistically significant results.

As a general rule of thumb, aim to have at least 1,000 contacts in each variation in order to have a big enough sample size relative to the full list. If you have a list of 10,000 contacts, that means allocating about 20% of your list to the A/B test. But if you have a list of 100,000 contacts, you only need to devote about 2% of your list to the test.

When your email list is small, reaching statistical significance becomes much harder. If your list has fewer than 1,000 contacts total, you likely need to allocate such a large portion to the A/B test (over 80%) that you might as well just do a 50/50 split test of the full list.

Typical email engagement rates

Your typical email engagement rates impact the number of recipients needed to get reliable results. The lower your standard open and click rates, the more contacts you need to include to reach the same level of precision.

For example, if your typical click rate is 5%, then a list of 1,000 contacts will yield about 50 data points (clicks). But if your typical click rate is 1%, then you need a list of 5,000 to produce that same 50 clicks.

Before running an A/B test, look at the metrics from your 5-10 most recent email campaigns to get a sense of your average open, click, and conversion rates. This will help inform your sample size calculation.

Desired confidence level and margin of error

In statistics, confidence level refers to how confident you can be that your sample results accurately represent the full population. A 95% confidence level is the common standard for most research and means if you repeated the test 100 times, 95 of the results would fall within the margin of error.

Margin of error is the range your results might fall between. For example, if Variation A had a 25% open rate and a margin of error of 5%, then the true open rate for the full population could be anywhere from 20-30%.

To increase your confidence level or decrease your margin of error, you need a larger sample size. Deciding on the right balance depends on the importance of the metric you‘re measuring and the risk tolerance for your business.

For most email marketing A/B tests, a 95% confidence level and 5% margin of error provide sufficient precision. Use this as a starting point, but consider adjusting up or down based on the stakes of the test.

Urgency to get results

The urgency with which you need test results impacts how long you can let your test run. If you have flexibility on timing, you can extend your test duration to a few days to gather the most complete data.

However, if you‘re testing time-sensitive content like a flash sale announcement or new product launch email, you may only have a few hours to get results before you need to send to the full list. Determining winners based on a shorter time frame is not ideal but is sometimes necessary to ensure your email is still relevant.

When running tests on a short turnaround, send your test to a larger proportion of your list to reach your required sample size more quickly. Just be sure to still reserve a large enough remainder of the list to send the winning variation to.

Number of variations

Most email A/B tests involve comparing 2 variations against each other – version A and version B. But some marketers choose to test 3 or more variations in a single test.

The more variations you include, the more contacts you need in your sample for each one. Otherwise you spread your sample too thin.

For example, if you‘re testing 2 variations of an email to a list of 10,000 contacts, you might send each version to 2,000 contacts (20% of the list). But if you‘re testing 3 variations, you need to send each version to about 2,600 contacts (26% of the list) to maintain the same level of statistical significance.

In general, it‘s best to limit the number of variations in a single A/B test to 2-3 to avoid diluting your sample size. If you want to try out multiple variations, run a series of tests rather than testing them all at once.

How to Calculate Sample Size for an Email A/B Test

Now that you understand the key factors involved in email A/B test sample size, here are the steps to actually calculate it:

  1. Determine your baseline conversion rate from past email campaigns. This is typically your click rate. Let‘s say it‘s 4%.

  2. Choose your minimum detectable effect (MDE). This is the lift you want to be able to detect from your test. A 10-20% relative change is a good starting point. Let‘s say we want to detect a 20% lift.

  3. Set your desired confidence level and margin of error. We‘ll use the standard of 95% and 5%.

  4. Plug these numbers into a sample size calculator. There are many free tools available online. Using one I found with a quick Google search, I entered these parameters:

Baseline conversion rate: 4%
MDE: 20%
Confidence level: 95%
Margin of error: 5%

The calculator shows we need a sample size of about 3,300 contacts for each variation. So if we‘re testing 2 variations, the total sample size should be approximately 6,600 contacts, split evenly between the two.

If your email list has 50,000 contacts, this means allocating about 13.2% of the list to the A/B test. 1,675 contacts would receive Variation A, another 1,675 contacts would receive Variation B, and the remaining 46,650 contacts would receive the winning variation after the test concludes.

Here‘s a step-by-step overview of how this might play out:

  • Email list size: 50,000 contacts
  • Baseline click rate: 4%
  • Variations: 2
  • Desired confidence level: 95%
  • Desired margin of error: 5%
  • Minimum detectable effect: 20% lift (4.8% vs. 4%)
  • Contacts per variation needed: 3,300
  • Total test sample size: 6,600
  • Percentage of list in test: 13.2%
  • A/B test duration: 24 hours
  • Winning variation click rate: 4.6%
  • Contacts receiving winning variation: 43,400

Of course, all of these parameters can be adjusted based on your unique goals and constraints. But this provides a general framework for approaching email A/B test sample size and duration.

How to Determine Email A/B Test Duration

In addition to sample size, you also need to decide how long to run your email A/B test before determining a winner. The right testing time frame depends on the nature of your email and how quickly you need to act on the results.

If you have flexibility, look at past email sends to see when engagement tends to trail off. Calculate the percentage of total opens or clicks that occur in the first 4 hours, 8 hours, 12 hours, 24 hours, and 48 hours after sending.

For example, you might find that for your list:

  • 15% of opens occur in the first 4 hours
  • 35% of opens occur in the first 8 hours
  • 55% of opens occur in the first 12 hours
  • 75% of opens occur in the first 24 hours
  • 90% of opens occur in the first 48 hours

There are diminishing returns to waiting longer than 24 hours in this scenario. The vast majority of engagement happens within the first day, so you can likely confidently call your test and send the winning version within that time frame without sacrificing reach.

Of course, if your email is promoting a flash sale or other time-bound offer, you may need to declare a winner even sooner, like within a 2-4 hour window. While not ideal from a statistical perspective, some data is better than flying blind. Just be sure to adjust your sample size to reach your threshold more quickly.

Some email testing tools have a built-in feature that automatically detects when a winner reaches statistical significance and immediately sends that version to the remaining list. This is very useful for maximizing the impact of your tests while gathering sufficient data.

Challenges of A/B Testing with Small Email Lists

As noted above, reaching statistical significance with email lists under about 1,000 contacts is very difficult. The percentage of your list you would need to include in your test to be representative of the full population becomes impractically large.

If you have a list of 500 contacts and want to test two variations at 95% confidence, you would need to send each variation to around 430 contacts, or 86% of your list. At that point, you might as well just split the list down the middle, send one version to each half, and compare results.

While this 50/50 split testing approach won‘t produce statistically significant results, it‘s still useful directional data as you‘re growing your list. You can use your learnings to inform future email strategies. Once your list reaches a substantial size, you can transition to sending a small portion an A/B test and then immediately sending the winner to the rest of the list.

Some other tactics to consider when you have a small email list:

  • Combine similar segments into a larger list for testing. For example, rather than sending separate tests to your lists of 300 prospects and 400 customers, combine them into a single test to a list of 700.

  • If you have multiple small email lists, run the same A/B test to all of them and aggregate your results. Just be sure your lists are relatively similar for this data consolidation to be useful.

  • Test more substantial changes. When your sample size is limited, you likely won‘t have enough precision to detect small performance differences. Focus your tests on elements that will produce larger swings, like completely different subject lines or CTAs.

  • Run tests at a cadence that aligns with your list size. If you only have 1,000 contacts, running an A/B test every week will quickly wear out your list. Instead, focus on a small number of high-impact tests each month or quarter.

There‘s no magic threshold at which A/B testing suddenly becomes possible, but the larger your list, the more flexibility you have to run insightful tests. Growing your email list should be an ongoing priority for all email marketers.

Conclusion

A/B testing is an indispensable tool for enhancing your email marketing program. But to actually benefit from your tests, you need to be mathematical in your approach to sample size and duration. Sending tests to an arbitrary percentage of your list or waiting too long (or not long enough) for results can lead you astray.

Luckily, getting your email testing variables right doesn‘t have to involve overly complicated calculations. By gathering a few key inputs like your baseline engagement rates, desired confidence levels, and required turnaround time, you can plug the numbers into a sample size calculator and be on your way.

You should now have the knowledge to approach your next email A/B test strategically to produce reliable, meaningful results. Adhering to statistical principles and best practices will help you maximize learnings, avoid costly mistakes, and continuously improve your email performance.