Multi-Armed Bandit Vs. A/B Testing In SaaS Price Optimization

This was originally posted on Forbes Technology Council.

A/B testing is popular among digital marketers, content strategists and web designers—and for good reason. Apart from increasing a website’s conversion rates, it also improves user engagement, comes with low risk, reduces bounce rates and helps with improved content. You probably already know how to run the good-old A/B testing to reap its many benefits. But if you don’t understand how to use AB to test prices, let’s do a brief overview before heading to the meat of this article.

A/B testing is the process of comparing two variables against each other to figure out which is more effective. Putting it in practice, let’s assume you want to test two different prices on the pricing page to see which would lead to increased revenue. Usually, the lower price looks more attractive and leads to more conversion, but it brings less money. Remember that your goal is to increase revenue, which simply means increasing the number of purchases. Thus, you can then create two pricing versions—version A and version B—and show them to different audiences for a period of time. At the end of the experiment, the version that performs better will be your new price.

It becomes a different ballgame altogether when you’re testing prices for subscriptions since you have to take into account the following.

You are likely to test for more than two prices. Sometimes you may not even know which price to set and may need to explore about five to 10 options.
You are likely to waste money on poorly performing prices.
You may be unable to reach statistical significance due to the low number of positive samples caused by the low number of paid conversions.
Your objective becomes complicated as you are not checking for only conversion but also targeting an increase in your company’s revenue.

Judging from the above, A/B tests might not be the ideal method to use in every situation because it has issues with the first three bullet points listed above. It’s at this point we’ll turn to other experiments to help us out.

What Are The Other Types Of Experiments?

To answer this question, let’s first consider the characteristic features of an experiment. A number of attributes distinguish every experiment, and they are:

Number Of Arms

Arms represent a version of something. So A/B testing is a two-arms test, while a test that requires three arms or more is a multi-arm test.

Experiment Duration

This is all about how long you plan on running an experiment. It is of two types:

Fixed duration: This is computed in advance according to the website or mobile app traffic and the number of samples required for statistical significance. For example, running an experiment strictly for 35 days and then looking at the final results when the time is up.

Flexible duration: This type involves running an experiment until one arm turns out to be statistically significantly better than the other arms.

Traffic Split

There are two types of traffic split.

Equal traffic split between arms, which is peculiar to A/B test.

Unequal traffic split where the worse performing arm receives less traffic so as to reduce financial losses that you would have gotten selling your product at bad prices.

Experiment Types

Based on the above features, we can now divide price experiments into two major types.

A/B/N test family: This type has two or more arms, gets equal traffic split and runs for a fixed pre-computed duration that can’t be stopped abruptly. A/B/N tests are robust and inspire strong trust in results.

Bayesian test family: This type also has two or more arms but gets either equal or unequal traffic split. It runs for a flexible duration as it usually highlights the winning arm much faster than A/B test and, thus, can be stopped at any time. But unlike A/B tests, Bayesian test results are harder to interpret. Besides, its flexible duration makes it attractive to stop the experiment too early, leading to incorrect results. Typically, multi-armed bandit algorithms are used for Bayesian test family.

How Does A Multi-Armed Bandit Work?

During a test, the multi-armed bandit algorithms are automatically triggered to allocate more traffic to well-performing variations while reducing traffic to underperforming ones. This approach not only delivers high returns but also minimizes regrets caused by loss of valuable conversions.

In any case, understand that both tests are ideal for different purposes. But for this article, we are making a case for why Bayesian-style experimentation (multi-armed bandit) might be better than an A/B test.

Minimal Wastage

Thanks to the equal traffic split, A/B tests waste money on poorly performing arms, which can’t be prevented till the end of the experiment. In contrast, a multi-armed bandit allows you to reduce the amount of traffic sent to arms with poor performance.

Data Efficiency

Since A/B tests require a lot of samples for statistical significance, it’s usually impossible to test more than two to three arms. For example, if paid conversion is 5%, then to detect a giant change in paid conversion ratio +-1%, you will need 7,663 signed users for each arm. A muti-armed bandit, on the other hand, is more data-efficient and allows you to test more arms because traffic is not wasted on poorly performing arms. But be aware that seasonality effects might cause bias in the final results, which won’t be the case with equal traffic split.

Difference Between The Bayesian Experiment And The Multi-Armed Bandit

The multi-armed bandit is an algorithm family, while the Bayesian approach is the way to interpret collated data and provide experiment results using a set of formulas from Bayesian statistics. Note that not every multi-armed bandit is Bayesian. Typical multi-armed bandits are:

Epsilon-greedy.
UCB exploration.
Thompson sampling (Bayesian).
Contextual bandit (ML-based).

Last Words From Practical Experience

In practice, it makes sense to combine both experiment types.

Run a Bayesian experiment for a while to get your winning candidate
Run a strict A/B test to compare the winning candidate with the baseline price to decide if you want to change the price for all users

Multi-Armed Bandit Vs. A/B Testing In SaaS Price Optimization

What Are The Other Types Of Experiments?

Number Of Arms

Experiment Duration

Traffic Split

Experiment Types

How Does A Multi-Armed Bandit Work?

Minimal Wastage

Data Efficiency

Difference Between The Bayesian Experiment And The Multi-Armed Bandit

Last Words From Practical Experience

Related articles

Price Experimentation 101: A Guide to Finding the Best Growth Strategy For Your SaaS Company

How To Price Your PLG Product

How Linktree’s Culture of Experimentation Led to a 14% Increase in LTV

Multi-Armed Bandit Vs. A/B Testing In SaaS Price Optimization

What Are The Other Types Of Experiments?

Number Of Arms

Experiment Duration

Traffic Split

Experiment Types

How Does A Multi-Armed Bandit Work?

Minimal Wastage

Data Efficiency

Difference Between The Bayesian Experiment And The Multi-Armed Bandit

Last Words From Practical Experience

Enhance your pricing knowledge

Related articles

Price Experimentation 101: A Guide to Finding the Best Growth Strategy For Your SaaS Company

How To Price Your PLG Product

How Linktree’s Culture of Experimentation Led to a 14% Increase in LTV