What is A/B testing and why do we need it?

A/B testing is the process of making a change to your product and determining whether that change altered your key metrics.

Your ‘product’ is whatever you’re working on that you’d like to improve. It could be a website or an app (or part of one). It could be an advert, a landing page, an email, a growth campaign, or so on.

Let’s call your current product A and the product you’d like to test B. B is the same as A but with one, small difference. Half your audience will use A and the other half will use B.

By constructing a rigorous A/B test (known as a controlled experiment) and analysing it correctly we can compare the results from group A with group B to see whether the small change we introduced in B causes a change in the performance of your product. B is called the “challenger” (or the “treatment”) and A is called the “control” (because no change has been made to your product in A so we can use it as a fair comparison).

By “analysing it correctly” we mean using null hypothesis statistical significance testing, and objective reasoning. The “results” we compare is the data measured during the experiment about how people in group A and group B behaved. And the “performance of your product” is the measured values of your key metrics. We’ll cover all these in later sections though.

For example, Say you work for a magazine and your product is managing the subscription side of the business. You’d like to see whether adding a headline proclaiming the number of existing subscribers will increase the number of new people who subscribe. You print half the magazines with the normal subscription advert (this is A) and half the magazines with the additional headline in the advert (this is B). You then mix up the magazines (like shuffling a deck of cards) and send them to the shops. When people buy a copy, they will get either A or B. Whenever you receive a subscribe form you know whether it came from A or B. You can count how many A forms and B forms are returned to infer whether adding the heading increased the subscription rate.

magazine-example

To make the best possible product we need to understand how well it works, how well it addresses your customers’ needs, how well it solves the problem that you have identified as a business opportunity (even if that problem is “having fun”, because fun, after all, is the most important problem to solve).

We measure things like visitors, clicks, impressions, conversions, purchases, referrals, and so on, to gauge how our product is doing. These are examples of metrics and the most of important ones to you are called key metrics.

We A/B test ideas and run controlled experiments because we need to measure whether changes we want to introduce improve the performance of our product. If we cannot do this, then we are flying blind and relying on luck. By comparing the changed product with the original product, we can measure the difference in performance and decide whether we are moving in the right direction.

Why can’t I release my change and compare the metrics to the previous week’s metrics?

Unfortunately, it’s not as simple as making the change, measuring your key metrics, and comparing those values to your original product’s values. Lots of other things in the world could also have changed in the meantime. There are many factors that can affect these metrics and influence the performance of your product, not just the change you introduced. They could be in-house things that you were unaware of, like marketing campaigns or other changes to your product. They could be things you can’t control, like effects due to seasonality, bank holidays or major sports events. Alternatively, they could be things that you can’t predict, like extreme weather, political instability, plane crashes, or other breaking news events.

For example, imagine you wanted to reduce the time it takes to drive to work and tried a new route that was a shorter distance. On arriving at work, you are delighted to see that it took you 20 minutes less that your normal route. However, it was sunny that day and unknown to you more people than usual had decided to cycle instead of drive. Plus, it was a bank holiday for some people, which you didn’t realise and meant there were fewer cars on the road. So you may change your route only for the new one to take the same time or more under normal circumstances. The scenario is no better if the new route takes 20 minutes longer than your normal route. Say the trains happened to be on strike that day so more people had to drive, and there were roadworks on a nearby street so more cars than usual were using your road. You might rule out a new route that is actually quicker due to a higher-than-usual volume of traffic that day.

The aim with A/B testing is to remove as many external influences as possible in order to isolate the impact your change has had.

By randomly splitting your customers evenly between A and B they both experience the same conditions. The only difference between them is the change you are testing (in B). If everything else is the same then any observed differences between your metrics are probably* due to the change you introduced in B.
*we will use statistics to help us rule out any differences that could be caused by chance alone.

What caused this increase?

Designing, running and analysing an experiment properly is hard. There are lots of things to consider and pitfalls to avoid.

In addition, we face two considerable pressures: wanting results quickly, and wanting our ideas to succeed.

It’s always tempting to skip steps and try to speed up the process. It’s human nature to see the data that corroborates your idea, while ignoring the data that argues against it (due to influences like confirmation bias and the Ikea effect, or due to pressure you experience to deliver results). We need to resist these temptations. Each step in the process needs to be done well because getting even a small part of it wrong can lead to utterly incorrect conclusions. We need to be impartial and objective. Because the consequences of making a change to your product without a thorough understanding of its impact can be disastrous. Would you want to be responsible for introducing a change that makes the product worse and damages your company’s performance?

A lack of thorough knowledge of the experimental process is equally as dangerous, and no less excusable. Making a change in good faith, but flawed understanding, could decrease performance undetected and risk your company’s future.

Remember: experiments are not about improving performance. They’re about learning. Even if an A/B test shows that the proposed change would make your key metrics worse, you’ll still gain priceless insights. You’ll increase your knowledge and the more you understand about your customers and domain the more likely you are to make better decisions and develop even better A/B tests in the future. Plus, ruling out ideas is just as valuable as releasing changes. It saves you spending time, energy and money developing something that would eventually turn out to be a dead end.