A/B Testing Myths and Quagmires
By Justin Talerico
Apr 16 2019
I love testing. I tested everything I could for the last 11 years or so. As a bootstrapped SaaS CEO masquerading as a CMO, testing was the motherload of certainty that helped us waste less capital. But there are some A/B testing misconceptions I feel compelled to share. There are situations in which testing is a waste of time and others where it’s downright destructive to the business. So here are my eight ways to preserve growth and momentum by avoiding those A/B testing quagmires.
1. Sample Size is a Blocker
If your sample size is niche startup small, you can test nothing. Small sample sizes invalidate everything but perhaps the most black and white of options. Why? The more different alternatives are, the faster a test will likely reach statistical significance. If you’re A/B testing shades of gray, you’ll need a larger sample size to find an answer.
2. Statistical Confidence is a Fork in the Road
The certainty attributed to a result is its statistical confidence. Do you want to be 80% sure or 99% sure of the outcome? Everyone logically says 99%, but higher confidence takes longer to declare, and time is often the enemy of the agile SaaS. And, if your sample sizes are small and you want 99% confidence, you will likely pay off your mortgage before you have results. The flip side is that you have to go into your CEO’s office and tell her that you’re only 80% sure of an outcome. Is that enough? The answer to that question often depends on the context and value of the decision being made. And remember, testing is just tilting the odds in your favor. There are no absolute certainties.
3. All A/B Tests are Not Equal
This tracks back to both sample size and confidence — all A/B tests are not of equal value, and consequently, all don’t require the same certainty. For example, you might be message testing and be perfectly happy to declare winners at 80%. That keeps you responsive to the market, but scientific in your process and decision making. On the other end of the spectrum, you might be A/B testing price for a mass-market product. In that case, there may be millions of dollars at stake. You better be damn sure you’re right when you declare a winner (99%).
4. Variables Add Time
Just like varieties of apples take longer to test than apples and oranges, adding variables adds time. The most common example is a clear A/B test muddied with C and D alternatives (A/B/n test). The quote often heard in the room is “well, let’s just test that one too”. So, instead of two options, there are four. Traffic being equal, it’s likely to take a lot longer to declare a winner from four options instead of two options. Another example is from back when we had multivariate testing (MVT) in the ion platform. Customers could test any number of elements within a page. That might mean six headlines, three images, three calls to action, and a couple of forms. That’s 108 alternatives and a recipe for a years-long test unless you have Google homepage-level traffic. So just because you CAN test more options, doesn’t mean you SHOULD. Sanity check time and traffic first.
5. Email A/B Testing Trip Lines
Email marketing is ripe for A/B testing, but there are a couple of common ways it goes badly. One is misidentifying what is reasonably testable. Email is great for testing because you get a big, instant burst of traffic. So something like the subject line is perfect because everyone sees it (significant sample) and opens are super measurable. But once you get beyond the subject line to elements within the email, your sample size is drastically smaller and so your time to declare is drastically longer. Since email A/B testing is typically done on a sample of the overall drop, in advance of the drop, time is not typically something you have laying around. You want to test, get a fast winner and send that version to the other 80% of your list. This is one reason why subject line testing should be measured by click-throughs more than by opens — because opens aren’t the goal — deeper engagement is. So there are your two email trip lines — what to test and what metric to measure. Oh, and by the way, if you have 100 customers and a customer nurture stream that drips to them — you can’t test (see 1 & 2 above).
6. Overgeneralization of Results
Oh, this is a big one. Some testing folks are methodical and treat it as an everyday discipline (that was me). Some think that by testing something in one context, they are okay to apply the results to other contexts. Nope. Doesn’t work. That’s dangerous. I’ve seen the same A/B test yield 180° different results in different contexts (even different streams of traffic to the same A/B test). If you’re going to test in PPC and apply the findings to social, your kidding yourself to think that’s valid. Test everywhere you can, as often as you have volume and value.
7. A/B Testing for No Good Reason
Someone at some point has to ask why. If the outcome of a test will not have value, don’t run the test. Testing isn’t busy work and it shouldn’t be done just as a party trick. When it’s limited to testing alternatives that are different and have organizational value, then it’s important and meaningful. If testing decisions degrade towards things that are merely interesting or clever, you’re missing the point and robbing energy from other areas that will more directly grow your SaaS.
8. Small Differences, Small Wins
I’ve alluded to this one but need to call it out as its own thing because it’s such a big problem. The easiest way to think about this one is testing big things for big gains and little things for little gains. Since all tests require roughly the same amount of work, why not swing for the fences and test big things that are fundamentally different from one another? The more different alternatives are, the greater the gains. Other upsides are faster results and potentially higher confidence intervals.
So there are my eight A/B testing potholes to avoid on your road to SaaS growth utopia. I continue to believe in testing to tilt the odds in my favor and bring a semblance of scientific reasoning to the world of subjective marketing. So go and test messaging, pricing and conversion. And leave the adjective testing for someone less busy.