Calculator Archives - Blog https://vwo.com/blog/calculator/ Wed, 18 Oct 2023 11:18:56 +0000 en-US hourly 1 19 Must-Have Tools To Begin Your CRO Journey https://vwo.com/blog/cro-tools/ Mon, 12 Oct 2020 14:08:14 +0000 https://vwo.com/blog/?p=54658 Conversion rate optimization (CRO) is incomplete without the right tools. To aid you in your optimization journey, a curated list of essential CRO tools has been provided in this blog. But before delving into the list, let’s first explore what CRO tools are and when you need them.

What are CRO tools?

Conversion rate optimization tools are software apps or platforms that marketers, developers, and UI/UX designers use to dig into user behavior, spot any roadblocks, and experiment with the user experience in the hopes of boosting their conversion rates. 

CRO tools come with a range of useful features, either as standalone options or bundled together. The features usually allow heat mapping, session recording, form analytics, A/B testing, personalization, and feedback tools that allow to create of opt-in forms and deploy surveys.

CRO tools help to overcome leaky bucket syndrome and improve the returns on business efforts.

When do you need CRO tools?

A website is a home to many actions. Button clicks, form fills, purchases, and many more events collectively constitute engagement and conversions. Experience managers tend to look at these actions from both macro (site-wide) and micro (visitor-level) lenses to understand their visitors and map their business’s overall conversion rate. A CRO tool helps them in analyzing and optimizing the conversion rate. 

Download Free: Conversion Rate Optimization Guide

But there’s a problem.

Even though experienced managers make sure to take every bit of information into account to analyze customer behavior and optimize their business performance, many often don’t use the right conversion rate optimization tools, leaving room for flaws and glitches to breed. To avoid such hiccups, we highly recommend following CRO tools based on each conversion rate optimization stage to effectively enhance your website and business’s performance.

CRO

Before you add any CRO tools to your arsenal, evaluate whether you can build them yourself or buy them after going through the following carefully. 

How to choose the right CRO tools?

GDPR Compliance

Since you’re engaging with visitor data at each stage, it’s essential to ensure that your data collection tools must be GDPR compliant.

Integrations

To avoid data silos when using multiple data analytics tools, ensure that the tool has ‘integrations’ with other tools you use or open APIs to build custom integrations. This helps prevent data duplication, confusion, and related uncertainties.

Security

If you plan to install supporting CRO software on your website, ensure the tool doesn’t get breached, especially when running experiments. Your CRO tools must be safe and secure to use. Single sign-ons, multi-step logins, etc. help ensure security. Check your CRO tool under consideration for these critical features. 

A multi-user-friendly dashboard

When selecting a CRO tool, make sure it offers an integrated dashboard where mapping your experiments and other activities is easy. A user-friendly dashboard also allows cross-team collaboration, which is a building block in CRO. 

After evaluating the tools, let’s now look at some must-have CRO tools based on different conversion rate optimization stages. 

Top CRO tools for different testing stages

The research stage

When it comes to conducting research, multiple CRO tools exist that help map both quantitative and qualitative data. Quantitative data tools, such as web analytics, offer insights into what’s happening on your site. Qualitative tools such as heatmaps, scroll maps, surveys, and the like give context to why it’s happening.

Mentioned below are some of the best CRO analyzer and research tools you can use to collect necessary visitor data to form data-backed hypotheses for your CRO test campaigns.   

Google Analytics

Google Analytics is one of the best web analytics tools that track website traffic and user activities, such as session duration, pages per session, bounce rate, and more in real-time, across various site pages. It also offers additional information such as traffic source(s), visitor location and demographics, page performance, and conversions. It is one of the free CRO tools that has a premium version, Google Analytics 360, to unlock more in-depth insights. 

Cost/month – Free

screenshot of the Google Analytics Dashboard
Image source: Google Analytics

VWO Insights

VWO Insights is a popular and must-have user behavior research product for CRO professionals. It helps understand customer behavior through heat mapping tools, session recordings, on-page surveys, funnel analytics, and more. The qualitative user behavioral data you get from VWO Insights helps form thorough hypotheses for your CRO roadmap.

Cost/month: Free plan – for 5000 monthly tracked users, while the cost for three paid plans (Growth, Pro, and Enterprise) varies with the number of monthly plan users. Visit plans and pricing to know more.

Vwo Insights Must Have Cro Tool

Heap Analytics

Heap Analytics is one of the conversion optimization tools that captures visitor interactions, including clicks, form submits, and transactions, and helps identify behaviors, and marketing channels that convert the most. Heap also has a clean data analytics dashboard that’s handy and easy to use. When using the tool, you don’t have to create additional ‘events’ to track basic website interactions, as in Google Analytics.

Cost/month: Four comprehensive plans, including Free, Growth, Pro, and Premium. Only custom pricing is available.

Heap Analytics features
Image source: Heap Analytics

Mixpanel

If a popular web analytics tool like Google Analytics sheds light on what’s exactly happening on your website, Mixpanel helps you see who did what. With visitor behavior tracking, you also get the advantage of viewing specific insights into which set of website visitors have entered your sales funnel, which ones are bouncing off, and so on.

Mixpanel also offers an additional second data channel to compare numbers against Google, as it’s never a good idea to trust one tool for all analytical data blindly. 

Cost/month: Free plan, Growth – $20+/month, Enterprise – $833+/month

Mixpanel features
Image source: Mixpanel

UsabilityHub

One of the biggest advantages of having UsabilityHub in your conversion tool kit is that it eliminates the guesswork out of design decisions by validating them with real users. In other words, it’s an opinion-gathering tool that enables you to sample responses from real users and make necessary design decisions.  

UsabilityHub comes packed with capabilities, including design survey tools, first-click tests, five-second tests, and preference tests to uncover useful user insights.

Cost/month: It offers a free plan, an $89/month basic plan, a $199/month pro plan, and Custom pricing for enterprise services

Usabilityhub features
Image source: UsabilityHub

Intercom

Besides the above-mentioned user behavior research tools, having Intercom in your CRO arsenal is also a must. The tool helps fetch immense valuable data. Intercom enables experienced managers and optimizers to analyze chat logs, call recordings, and customer support threads to discover any patterns. Recurring questions can serve as useful insights to uncover problematic areas on the website and subsequently fix them to offer a better user experience. Such insights also prove beneficial while drafting testing ideas for your CRO program.

Cost/month: Three plansStarter ($74/month) and custom plans and pricing for Pro and Premium plan. 

Intercom features
Image source: Intercom

Userpeek

Although Userpeek is not an A/B testing tool, the CRO software surely helps understand how users interact with your website. It enables you to test your site for ideas, hypotheses, and prototypes, and map the performance of existing site assets with real users in the shortest possible time with minimal effort and budget. The tool can give you a clear insight into which direction to follow when it comes to A/B testing.

Cost/month: Flex Plan – $55/test (pay per test); Pro Plan – $211/month. Custom pricing for Team Plan.

The Hypothesis stage

Once you’ve gathered and analyzed all the necessary website visitor data using the above-mentioned research tools, it’s time to move to the next stage – create a hypothesis – for your (next) experiment(s). 

A data-backed hypothesis is key to properly running a CRO experiment. Here are some must-have tools to use when crafting a hypothesis.

Craig Sullivan’s hypothesis framework

Craig Sullivan is a CRO influencer who shares a handy kit with two options to write a good, data-backed hypothesis. Here take a look: 

Craig Sullivan’s Hypothesis Framework

While each has its own meaning and relevance, we’d recommend using the Advanced Kit when running high-impact experiments.  

Test hypothesis creator

This 7-step test hypothesis creator draws inspiration from Sullivan’s ‘legacy’ hypothesis format. The tool allows you to easily create a hypothesis by filling in the empty form fields. However, we’d recommend you use one more hypothesis creator to validate your hypothesis. 

Cost: Free

7-step test hypothesis creator based on Sullivan’s legacy hypothesis format

VWO Plan

VWO Plan enables you to manage and prioritize all your experimentation programs in one single place. It provides an expert-recommended framework that helps you build comprehensive hypotheses. You can also grant scores to each of these hypotheses, based on impact, confidence, and ease, to create a prioritization pipeline. 

VWO Plan gives you the advantage of automating your workflow as well. You can quickly move hypotheses from testing to completion and archive stage as they move across various testing stages. 

Cost: Free

Vwo Plan CRO Tool for planning

Download Free: Conversion Rate Optimization Guide

The prioritization stage

Once you’ve created your hypothesis, it’s time to design, develop, and test the variations you’re planning to run against the original versions. You can either hire a dedicated resource or a team to build A/B testing experiments or use the support provided by VWO

Here are some must-have tools for this stage.

TestLodge

TestLodge is another test management tool that helps manage various project requirements, test plans, test cases, test runs, user testing, and related reporting. With a simple and easy-to-learn interface and no user limits, the tool allows you and your entire team to manage respective A/B testing plans, requirements, and test cases, all in one single place. Another feature that makes TestLodge stand out is that it’s a web-based, cloud-hosted tool enabling you to access the tool anytime and anywhere you want. 

Cost/month: Team – $99/month, Business – $299/month, and Enterprise – $499/month

Testlodge CRO Tool features
Image source: TestLodge

VWO Plan

Besides enabling you to build hypotheses, with the VWO Plan, you can prioritize multiple experimentation ideas. Each hypothesis can be scored on its winning likelihood, expected impact on macro goals, technical feasibility, and time investment. Additionally, you can manage hypotheses through their entire lifecycle with VWO’s integrated custom workflows. 

At a glance, you can track a backlog of test opportunities, what’s next in the A/B testing pipeline, and which ones have completed their course. 

Cost: Free

Vwo Plan CRO Tool for prioritization

TestRail

TestRail is one of the leading test case and test management software tools. It enables you to easily manage and track your testing ideas and hypotheses. Its web-based user interface enables you to easily create test cases, manage different test suites, and even coordinate throughout the entire testing process with various stakeholders. The tool also provides real-time insights into your testing progress and even boosts productivity by personalizing your to-do lists, filters, and email notifications. 

Cost/month: The pricing depends on the number of users and cloud and server management.

Testrail CRO Tool for prioritization
Image source: TestRail

The testing stage

Now that the treatment is ready, it’s time to run your planned CRO experiments. But before you jump in, ensure to get your sample size and test duration right as they’re two crucial factors to running a meaningful CRO experiment. Here are some handy tools to help you at this stage.

Pre-testing stage tools

A/B test duration calculator

This handy A/B test duration calculator enables you to easily and quickly calculate how long a test should run to get a statistically significant result. It provides comprehensive insights about the test, optimizations, UI/UX, and more. Additionally, the A/B testing calculator uses the Bayesian test approach to eliminate the chances of implementing non-significant test variations, which may show negative results.  

Cost/month: Free

A/B test significance calculator

An A/B test significance calculator is a simple calculator that shows if an A/B (or multivariate) test exhibits a statistically significant result. 

Cost/month: Free

AB Test Significance Calculator

Sample size calculator

A sample size calculator helps calculate a sample size of customers that you must consider when running an A/B test to ensure statistical significance.

Cost/month: Free

The split test duration calculator

If you’re planning to run a split test, then this is your go-to tool. A Split test duration calculator, as designed and offered by Michael Kjeldsen, shows how long your split test must run based on your current conversion rate and planned uplift, number of variations being tested, average daily traffic, and statistical confidence level.

Cost/month: Free

Split Test Duration Calculator

Testing stage tools

VWO Testing

VWO is one of the most comprehensive and popular testing tools for all types and sizes of businesses. With VWO Testing, you can easily do A/B testing, split testing, and multivariate testing using its code and visual editor. From A/B testing small changes such as changing the CTA button position on a product page to experimenting with multiple page elements, the possibilities of experimentation are endless. The design dashboard is also quite simple and easy to understand.

Cost/month: Free plan: for 5000 monthly tracked users, while the cost for three paid plans (Growth, Pro, and Enterprise) varies with the number of monthly plan users.

Vwo Testing CRO Tool

The Five-Second Test

The Five-Second Test tool has been designed on the lines that five seconds are enough for customers or visitors to interact with a website’s design and grab the primary message. These five seconds can shed light on useful user insights such as their actions, first impressions about the brand, and more. Hence, the Five Second Test can be a valuable addition to your CRO tool kit. 

Cost/month: It offers a free plan, an $89/month basic plan, a $199/month pro plan, and Custom pricing for enterprise services.

The Five Second Test
Image source: Five-Second Test

The learning stage

This is the stage where you conclude your tests, close the conversion rate optimization loop, and make a note of all the learnings. With a CRO solution like VWO, you get access to an in-built reporting system to analyze the performance of your tests and customer journeys and pen down learnings for the next set of experiments, irrespective of whether your test wins or doesn’t.

Wrapping it up!

While this isn’t an exhaustive list of all the CRO tools that you must have or use, but a glimpse of the most recommended tools across the entire conversion rate optimization domain. You can select a handful of these tools as per your optimization requirements and get started. If you’re someone who prefers using only one single, integrated tool for all their optimization needs, then VWO is your go-to tool. 

VWO gives you the advantage of collecting data using its sophisticated data-gathering capabilities and creating and penning down all hypotheses in one single place. You can also test priorities, quickly set and run A/B tests, and even map results through the entire optimization process using various other VWO capabilities. 

Here’s a quick glimpse into VWO’s capabilities.

A quick overview of everything that you can do with VWO

If you have any questions related to VWO or wish to learn more about its capabilities, sign up for a free trial or request a demo today!

What factors should you consider before selecting a CRO tool?

In today’s time, we have shared four main factors for selecting a CRO tool.
GDPR Compliance
– Friendly User Interface
– Integrations with other complementary tools
– Security

What are some of the CRO tools in the market?

It depends. The answer is directly related to different stages within a CRO program- Research, Hypothesis, Prioritization, Testing, and learning. We have covered some of the best conversion rate optimization tools, such as VWO (our own platform), Google Analytics, and TestRail, and categorized each of them into the respective different stages of a CRO program.

Are CRO tools really worth it?

Yes, CRO tools are worth it. Bear Mattress saw an uplift of 16% in revenue because of using VWO as conversion rate optimization software.

]]>
How to Calculate AdWords Profitability for your SaaS Business https://vwo.com/blog/saas-adwords-profitability-calculator/ https://vwo.com/blog/saas-adwords-profitability-calculator/#comments Wed, 15 Jan 2014 22:01:37 +0000 https://vwo.com/blog/?p=16023 Google Ads (formerly AdWords) is a core digital marketing channel for most businesses. But how do you know whether it will be profitable for you? 

In this post we’ll show you how to calculate if a PPC campaign will work for your SaaS business as a profitable customer acquisition channel. We’ll also provide a free PPC Profitability Calculator you can use to figure out if AdWords (or any other PPC channel) works for you.

Download Free: A/B Testing Guide

Six easy steps to calculate the profitability of Google Ads

Step 1: Find a keyword you want to bid on

Find a keyword that is relevant to your product or business. As an example, let’s assume your product is a self-service tool to create and host landing pages, so ‘landing page creator’ will be a relevant keyword. Go to Google Adwords and click on Keyword Planner tool under the Tool menu.

Keyword Planner
Image source: Google

Click on ‘Get traffic estimates for a list of keywords’ and type ‘landing page creator’. This will show you a graph of how many clicks and impressions the keyword gets in a day and at what cost. See here how to create high-converting search ads.

Step 2: Decide your average CPC

If the CPC you’re willing to bid is lower than the maximum shown, shift the slider to bring it to your desired CPC. In this example, we set it at $4.

CPC

With CPC set at $4, 100 clicks on the ad will cost a total of $400. If I get 100 clicks on the ad for the duration of the campaign, then I’ve spent a total of $400.

Step 3: Calculate the average number of free trials

Assuming the AdWords visitor to free trial conversion rate is 10%, then you’ve just acquired 10 free trial users at a total cost of $400 — meaning $40 per free trial customer. Tip: optimizing your landing pages will help to maximize conversions from website visitors into free trials and customers. VWO Testing allows you to A/B test any element on your website and increase conversion rates.

Step 4: Calculate the number of paying customers

According to Userpilot, the average conversion rate for free trial to paying customers is 25%. But let’s be conservative and use a 15% conversion rate for this example. So the number of paid customers will be 15% of 10 (number of free trials), giving us 1.5 paying customers.

Step 5: Calculate your customer acquisition cost (CAC)

The number of paying customers acquired from that investment was 1.5. So, the average customer acquisition cost (CAC) is $400 / 1.5 = $266. This may sound expensive, but to understand whether you’ve made a profit we need to take into account the revenue from a customer on average.

Step 6: Calculate the average customer lifetime value (LTV)

Simply determine the average number of months a customer pays for your product before churning, and the value of the most commonly bought subscription plan. Suppose your most commonly bought plan is $49 and the average lifetime is 10 months, then the average customerlifetime value (LTV) is $49 x 10 = $490

Download Free: A/B Testing Guide

Step 7: Deduct CAC from LTV to calculate gross profit

LTV = $490

CAC = $266

Gross profit = $224

So we calculate that from this Google Ads cost, you’ll make a profit of $224. From an AdWords purely marketing perspective, this seems okay as this marketing channel is profitable. But in truth, there is more to consider.

Considering profitability more broadly

AdWords probably won’t be your only marketing channel, and there are other costs beyond media spend to calculate the Return On Investment (ROI) of all your marketing efforts. But you can use these same calculations to determine the profitability of any sales or marketing spend. Also, consider other operating costs such as salaries, rent, and hardware to calculate the profitability of your company overall. If you’ve got the exact numbers down, you’re good. If not, then aim for LTV to be at least three times CAC.

Here’s the AdWords / PPC Profitability calculator

PPC Profitability calculator
PPC Viability Calculator


Access the calculator on Google Spreadsheets here.

Banner Calculator
]]>
https://vwo.com/blog/saas-adwords-profitability-calculator/feed/ 2
A/B Testing for Big Wins – When You Should Do It & How You Should Do It https://vwo.com/blog/ab-test-big-changes/ https://vwo.com/blog/ab-test-big-changes/#comments Tue, 03 Sep 2013 13:30:53 +0000 https://vwo.com/blog/?p=7675 Button colour tests. Font size tests. Headline tests. Run them all you want.

But remember that your website’s conversion rate is only as limited as the risks you take. Small tweaks = small wins. If you’re craving for big wins, you’ll have to make big changes.

I understand that you want to “play it safe” and be cautious. But sometimes you should test a radical website redesign against your original one, instead of testing small tweaks over and again.

Download Free: A/B Testing Guide

Scenarios When a Radical Redesign Makes More Sense than Testing Small Tweaks:

When You Have a Low-traffic Site

The problem with a low-traffic site is, small tweaks will take a long time to reach the 95% confidence level for a valid test result. But if you test two radically different designs against each other, you will require a smaller sample size and the conclusive results will be reached in much lesser duration.

You can see the table below elaborating this, where the test duration has been calculated from our Split test duration calculator:

Assumptions:
Number of visitors: 500 per day
The current conversion rate of the site: is 2%
Variations: 2

Percentage Increase Split Test Duration
5% 1254 Days
10% 314 Days
15% 139 Days
25% 50 Days
50% 13 Days

Assumptions:
Number of visitors: 20,000 per day
The current conversion rate of the site: is 2%
Variations: 2

Percentage Increase Split Test Duration
5% 31 Days
10% 8 Days
15% 3 Days
25% 1 Day
50% Less than 1 day

As you can see in the above tables, the higher the expected increase in the conversion rate (which also reflects how big the change made by you), the lesser will be the duration for which you will have to run the test.

The only difference between the two tables is the number of daily visitors received by the sites.

This clearly shows that a high-traffic site can test small changes as much as they want. But for sites that receive less traffic, it is best to make big changes (at least to start with). After this, you can tweak your new, winner-design and optimize it further for small wins.

When No Matter What Tweak You Test, You Get Negative Results

When you focus on small tests only and test every element that you can find on the page, there will come a time when any tweak that you make will only give you negative results. This means that your current design has reached its maximum conversion potential, also known as the “local maximum.”

Solution? Test your fully-optimized page against a new design that is entirely different from your current design.

When Your Current Design Has Huge Scope of Improvement

If you’re a beginner in A/B testing, it’s perfectly fine that you start with 2-3 small tests first because you’re testing the waters. And if you’re someone who needs to convince your management of the power of AB testing, small tests are again your best bet.

But if you are well aware of conversion optimization practices and see a huge scope for improvement in the current website design, be bold and start with testing a different design at once.

Wasting time on testing smaller tweaks on the current design that needs a complete overhaul is a mere waste of time.

When You are Not Ready to Settle for Less and You Want Big Wins

Small tweaks can only take you so far. But if you hear about huge wins and wish when you will hit the jackpot, it’s time you leave the “safe harbour” of running small tests and testing some radical redesigns.

This involves high risk, yes. But if you do it right, your chances of hitting your jackpot will also be much higher.

But wait…”how exactly can I do it right?” Is that what you’re thinking? Read the approaches given below and you will hopefully find your way to a great hypothesis.

4. Approaches to Come Up with a Solid Hypothesis for a High-Converting Redesign

1. Take Them Closer to the Must-have Experience

The must-have experience is allowing visitors to understand the true value of the product by trying it out themselves.

Reduce the number of steps or even the number of clicks that a user must complete to try out your product. If your product is powerful, making people realize how useful your product is, can do wonders for your conversions.

Eliminate any complicated steps or requirements, unless necessary. The lesser the effort required by the user, the better it is.

Hello Bar understands this perfectly well and guides visitors to try out their product with a single click on their homepage. Here’s what their visitors see on the next page:

Hello Bar's Must Have Experience is Only One-click Away for Their Visitors
Image Source: Hellobar

Notice how the fields above do not ask for any personal information of the user at all. Visitors can try out their product right away without giving any email address, or name, or filling out signup forms.

The point is, that your conversion goal should come after the must-have experience and not before it.

It might seem like this approach is only fit for SaaS websites, but that’s not true. See the “Click to look inside” option provided by Amazon:

Amazon provides the must-have experience to its customers
Image Source: Amazon.com

If the nature of your business does not allow you to provide this must-have experience on your website, you can treat your final order or sale as your must-have experience.

Adding a sign-up goal before their must-have experience (final sale), may make you lose tons of money. This is why guest checkouts are so popular these days. And how can you forget the awesome 1-click checkout of Amazon?

Start from your conversion funnel in web analytics. Lookout for steps where you are getting the most drop-offs.

Consider if you can completely skip any of these steps, or at least reduce the friction for visitors by reducing the number of clicks or form fields that are in their way in getting closer to the must-have experience.

2. Challenge the Approach of the Current Design

Testing a radical makeover doesn’t mean that you design a page randomly that is drastically different from your current website design.

The point here is to focus on challenging the assumptions or approaches of the current design and testing it based on a solid hypothesis.

For example, Sierra Tucson is a rehabilitation facility that tested a radical redesign on its web page. They found that their trust-focus landing page converted better than their luxury-inclined page. This got them 220% more leads.

Download Free: A/B Testing Guide

3. Conduct Customer Surveys

You cannot be your customer. To understand what special points about your product or service resonate best with your audience or if they have any concerns or apprehensions, conducting customer surveys can provide great insights.

Once you know what your customers like the most about you, you can emphasize it on your page to get positive responses. You can combine this with heatmaps/clickmaps in your VWO app to see what interest your audience the most and redesign your page accordingly.

One of our customers recently conducted a customer survey, followed by testing a radical redesign against their original page. The insights from the customer survey were used to design almost every element on the redesigned page along with the changes in the copy that used the same words that their customers used in the survey to define the core benefit of their service.

The redesigned page increased their sales by 64.8%. You can read the complete case study here.

4. Test Pricing Experiments

When it comes to prices, asking people how much they would pay for a particular thing is not a good idea. After all, spending hypothetical dollars is a lot easier than spending real ones. This is why experimenting with prices is a long shot. And when you have A/B testing to do it in real-time, what’s better than that?

One of the most popular pricing experiments you must have seen around is the prices ending with the magical number, 9. For example, a $1400 bed will be displayed for the charming price of $1399.

You can even try playing with your pricing plans. One of our previous customers gave up his Freemium plan which increased his paid signups by 268.14%. This test involved a lot of risks but the percentage increase from the test sure made it worthwhile.

Contrast is another great concept that works well with pricing. Conversion expert, Peep Laja, explains this well in his post about pricing experiments:

Nothing is cheap or expensive by itself, but compared to something.

Once you’ve seen a $150 burger on the menu, $50 sounds reasonable for a steak. At Ralph Lauren, that $16,995 bag makes a $98 T-shirt look cheap.

What’s the best way to sell a $2000 wristwatch? Right next to a $12 000 watch.

Just like in any other business, you will have to take calculated risks to grow your online business too. As long you are making informed, data-driven risks, rather than random testing, it should be fine. Remember, with greater risks come greater rewards.

]]>
https://vwo.com/blog/ab-test-big-changes/feed/ 6
How to Calculate A/B Testing Sample Sizes? https://vwo.com/blog/how-to-calculate-ab-test-sample-size/ https://vwo.com/blog/how-to-calculate-ab-test-sample-size/#comments Wed, 22 Aug 2012 13:20:18 +0000 https://vwo.com/blog/?p=3483 (This post is a scientific explanation of the optimal sample size for your tests to hold true statistically. VWO’s test reporting is engineered in a way that you would not waste your time looking up p-values or determining statistical significance – the platform reports ‘probability to win’ and makes test results easy to interpret. Sign up for a free trial here)

“How large does the sample size need to be?”

In the online world, the possibilities for A/B testing just about anything are immense. And many experiments are done indeed, the result of which are interpreted following the rules of null-hypothesis testing, “are the results statistically significant?”

An important aspect in the work of the database analyst then is to determine appropriate sample sizes for these tests.

Download Free: A/B Testing Guide

On the basis of a daily case, a number of current approaches for calculating desired sample size are discussed.

Case for calculating sample size:

The marketer has devised an alternative for a landing page and wants to put this alternative to a test. The original landing page has a known conversion of 4%. The expected conversion of the alternative page is 5%. So the marketer asks the analyst “how large should the sample be to demonstrate with statistical significance that the alternative is better than the original?”

Solution: “default sample size”

The analyst says: split run (A/B test) with 5,000 observations each and a one-sided test with a reliability of .95. Out of habit.

What happens here?

What happens when drawing two samples to estimate the difference between the two, with a one-sided test and a reliability of .95? This can be demonstrated by infinitely drawing two samples of 5,000 observations neach from a population with a conversion of 4%, and plotting the difference in conversion per pair (per ‘test’) between the two samples in a chart.

Figure 1: sampling distribution for the difference between two proportions with p1=p2=.04 and n1=n2=5,000; a significance area is indicated for alpha=.05 (reliability= .95) using a one-sided test.

Sampling distribution for the difference between two proportions with p1=p2=.04 and n1=n2=5,000; a significance area is indicated for alpha=.05 (reliability= .95) using a one-sided test

This chart reflects what is formally called the ‘sampling distribution for the difference between two proportions.’ It is the probability distribution of all possible sample results calculated for the difference between p1=p2=.04 andn1=n2=5,000. This distribution is the basis –the reference distribution- for null hypothesis testing. The null hypothesis being that there is no difference between the two landing pages. This is the distribution used for actually deciding on significance or non-significance.

p=.04 means 4% conversion. Statisticians usually talk about proportions that can lie between 0 and 1, whereas in the everyday language mostly percentages are communicated. In order to comply with the chart, the proportion notation is used.

This probability distribution can be replicated roughly using this spss syntax (thirty paired samples from a population.sps). Not infinitely, but 30 times two samples are drawn with p1=p2=.04 and n1=n2=5,000. The difference between the two samples are then plotted in a histogram with the normal distribution inputted (the last chart in the output). This normal curve will be quite similar to the curve in figure 1. The reason for performing this experiment is to demonstrate the essence of a sampling distribution.

The modal value of the difference in conversion between the two groups is zero. That makes sense, both groups come from the same population with a conversion of 4%. Deviations from zero both to the left (original does better) and to the right (alternative does better) can and will occur, just by chance. The further from zero, however, the smaller the probability of happening. The pink area with the character alpha indicated in it is the significance area, or unreliability=1-reliability=1-.95.

If in a test the difference in conversion between the alternative page and the original page falls in the pink area, then the null hypothesis that there is no difference between both pages is rejected in favour of the hypothesis that the alternative page returns a higher conversion than the original. The logic behind this is that if the null hypothesis were really true, such result would be a rather ‘rare’ outcome.

The x axis in figure 1 doesn’t display the value of the test statistic (Z in this case) as would usually be the case. For clarity sake the concrete difference in conversion between the two landing pages has been displayed.

So when in a split run test the alternative landing page returns a conversion rate that is 0.645% higher or more than the original landing page (hence falls in the significance area), then the null hypothesis stating there is no difference in conversion between the landing pages is rejected in favour of the hypothesis that the alternative does better (the 0.645% corresponds to a test statistic Z value of 1.65).

Advantage of the approach “default sample size” is that by choosing a fixed sample size, a certain standardization is brought in. Various tests are comparable ‘stand an equal chance’ to that respect.

Disadvantage to this approach is that whereas the chance to reject the null hypothesis when the null hypothesis (H0) is true is well known, namely the self-selected alpha of.05, the chance to not reject H0when H0 is not true remains unknown. These are two false decisions, known as type 1 error and type 2 error respectively.

A type 1 error, or alpha, is made when His rejected, when in fact H0 is true. Alpha is the probability of saying on the outcome of a test there is an effect for the manipulation, while on population level there actually is none. 1-alpha is the chance to accept the null hypothesis when it is true –a correct decision-. This is called reliability.

A type 2 error, or beta, is made when His not rejected, when in fact H0 is not true. Beta is the probability of saying on the outcome of a test there is no effect for the manipulation, while on population level there actually is. 1-beta is the chance to reject the null hypothesis when it is not true–a correct decision-. This is called power.

Power is a function of alpha, sample size and effect (the effect here is the difference in conversion between the two landing pages, i.e. at population level the added value of the alternate site compared to the original site). The smaller alpha, sample size or effect the smaller power is.

In this example alpha is set by the analyst at.05. Sample sizes are also set by the analyst, 5000 for original, 5000 for alternative. Which leaves the effect. And the actual effect is by definition unknown. However it is not unrealistic to use commercial targets or experiential numbers as an anchor value, as was formulated by the marketer in the current case: an expected improvement from 4% to 5%. Now if that effect were really true, the marketer of course would want to find statistically significant results in a test.

An example may help to make this concept insightful and to clarify the importance of power: suppose the actual (=population) conversion of the alternative page is indeed 5%. The sampling distribution for the difference between two proportions with conversion1=4%, conversion2=5% and n1=n2=5,000is plotted in combination with the previously shown sampling distribution for the difference between two proportions with conversion1=conversion2=4% and n1=n2=5,000 (figure 1).

Figure 2: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=5,000(red line) and p1=.04, p2=.05, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95.

Figure 2: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=5,000(red line) and p1=.04, p2=.05, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95.

The dotted blue line shows the sampling distribution of the difference in conversion rates between original and alternative when in reality (on population level) the original page makes 4% conversion and the alternate page 5%, with samples of 5,000 each. The sampling distribution whenH0 is true, the red line, has basically shifted to the right. The modal value of this new distribution with the supposed effect of 1% is of course 1%, with random deviations both tothe left and to the right.

Now, all outcomes, i.e. test results, on the right side of the green line (marking the significance area) are regarded as significant. All observations on the left side of the green line are regarded as not significant. The area under the ‘blue’ distribution left of the significance line is beta, the chance to not reject H0 when H0is in fact not true (a false decision), and it covers 22% of that distribution.

That makes the area under the blue distribution to the right of the significance line the power area and this area covers 78% of the sampling distribution. The probability to reject H0 when H0is not true, a correct decision.

So the power of this specific test with its specific parameters is .78.

In 78% of the cases when this test is done, it will yield a significant effect and consequent rejecting of H0. Could be acceptable, or could perhaps not be acceptable; that is a question for marketer and analyst to agree upon.

No simple matter, but important. Suppose for example that an expectation of 10% increase in conversion would be realistic as well as commercially interesting: 4.0% original versus 4.4% for the alternative. Then the situation changes as follows.

Figure 3: sampling distributions for the difference between two proportions with p1=p2=.040, n1=n2=5,000 (red line) and p1=.040, p2=.044, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95.

Figure 3: sampling distributions for the difference between two proportions with p1=p2=.040, n1=n2=5,000 (red line) and p1=.040, p2=.044, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95.

Now the power is.26. Under these circumstances the test would not make much sense, is in fact counter-productive, since the chance that such test will lead to a significant result is as low as .26.

The above figures are calculated and made with the application ‘Gpower’:

This program calculates achieved power for many types of tests, based on desired sample size, alpha, and supposed effect.

Likewise required sample size can be calculated from desired power, alpha and expected effect, required alpha can be calculated from desired power, sample size and expected effect and required effect can be calculated from desired power, alpha and sample size.

Should a power of .95 be desired for a supposed p1=.040, p2=.044, then the required sample sizes are 54.428 each.

Figure 4: sampling distributions for the difference between two proportions with p1=p2=.040 (red line) and p1=.040, p2=.044 (dotted blue line), using a one-sided test, with a reliability of .95 and a power of .95.

Figure 4: sampling distributions for the difference between two proportions with p1=p2=.040 (red line) and p1=.040, p2=.044 (dotted blue line), using a one-sided test, with a reliability of .95 and a power of .95.

This figure shows information omitted in previous charts. This also gives an impression of the interface of the program.

Download Free: A/B Testing Guide

Important aspects of power analysis are careful evaluation of the consequences of rejecting the null hypothesis when the null hypothesis is in fact true – e.g. based on test results a costly campaign is implemented under the assumption that it will be a success and that success doesn’t come true – and the consequences of not rejecting the null hypothesis when the null hypothesis is not true -e.g. based on test results a campaign is not implemented, whereas it would have been a success.

Solution: “default number of conversions” 

The analyst says: split run with a minimum of 100 conversions per competing page and a one-sided test with a reliability of .95.

In the current case with expected conversion of the original page 4% and expected conversion of the alternate page 5%, a minimum of 2,500 observations per page will be advised.

When put to the power test though, this scenario demonstrates a power of just little over .5.

Figure 5: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2500 (red line) and p1=.04, p2=.05, n1=n2=2500 (dotted blue line) rusing a one-sided test, with a reliability of .95.

Figure 5: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2,500 (red line) and p1=.04, p2=.05, n1=n2=2,500 (dotted blue line)using a one-sided test, with a reliability of .95.

For a better power, a greater effect should be present, a larger sample size must be chosen, or alpha should be increased, e.g. to .2:

Figure 6: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2,500 (red line) and p1=.04, p2=.05, n1=n2=2,500 (dotted blue line), using a one-sided test, with a reliability of .80.

An alpha of .2 returns a power of .8. The power is more acceptable; the ‘cost ‘ for this bigger power consists of a magnified chance to reject H0 when His actually true.

Again, business considerations involving the impact of alpha and beta play a key role in such decisions.

Approach “default number of conversions” with its rule of thumb on the number of conversions actually puts a kind of limit on effect sizes that still make sense to be put to a test (i.e. with a reasonable power). In that regard it also comprises a sort of standardization and that in itself is not a problem, as long as its consequences are understood and recognised.

Solution: “significant sample result”

The analyst says: split run with enough observations to get a statistical significant result if in the test the supposed effect andactually occurs, tested one-sided with a reliability of .95.

That sounds a little weird, and it is. Unfortunately this logic is often applied in practice. The required sample size is basically calculated assuming the supposed effect to actually occur in the sample.

In the used example: if in a test the original has a conversion of 4% and he alternative 5%, then 2,800 cases per group would be necessary to reach statistical significance. This can be demonstrated with the accompanying spss syntax (limit at significant test result.sps).

These sort of calculations are applied by various online tools offering to calculate sample size. This approach ignores the concept of random sampling error, thus ignoring the essence of inferential statistics and null hypothesis testing. In practice, this will always yield a power of .5 plus a small additional excess.

Figure7: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2800 (red line) and p1=.04, p2=.05, n1=n2=2800 (dotted blue line), using a one-sided test, with a reliability of .95.

Figure7: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2800 (red line) and p1=.04, p2=.05, n1=n2=2800 (dotted blue line), using a one-sided test, with a reliability of .95.

Using this system a sort of standardisation is actually also applied, namely on power, but that’s not the apparent goal this method was invented for.

Solution: “default reliability and power”

The analyst says: split run with a power of .8 and a reliability of .95 with a one-sided test.

In the current case with 4% conversion for original page versus 5% expected conversion for the alternate page, alpha=.05 and power=.80, Gpower advises two samples of 5313.

Figure 8: sampling distributions for the difference between two proportions with p1=p2=.04(red line) and p1=.04, p2=.05 (dotted blue line), using a one-sided test with reliability .95 and power .80.

Figure 8: sampling distributions for the difference between two proportions with p1=p2=.04(red line) and p1=.04, p2=.05(dotted blue line), using a one-sided test with reliablity .95 and power .80.

This approach uses desired reliability, expected effect and desired power in the calculation of the required sample size.

Now the analyst has grip on the probability an expected/desired/necessary effect will lead to statistically significant results in a test, namely .8.

Some online tools, for example VWO’s Split Test Duration Calculator, use the concept of power in their sample size calculation.

In a presentation by VWO “Visitors needed for A/B testing” a power of .8 is mentioned as a regular measure.

It can be questioned why that should be an acceptable rule? Why could the size of the power, as well as the size of the reliability not be used more dynamically?

Solution: “desired reliability and power”

The analyst says: split run with desired power and reliability using a one-sided test.

Follows a discussion on what is acceptable power and reliability in this case, with as a conclusion, say, both 90%. Result according to Gpower, 2 times 5.645 observations:

Figure 9: sampling distributions for the difference between two proportions with p1=p2=.04 (red line) and p1=.04, p2=.05 (dotted blue line), using a one-sided test with reliability=.90 and power=.90.

Figure 9: sampling distributions for the difference between two proportions with p1=p2=.04 (red line) and p1=.04, p2=.05 (dotted blue line), using a one-sided test with reliability=.90 and power=.90.

What if the marketer says “It takes too long to gather that many observations. The landing page will then not be important anymore. There is room for a total of 3,000 test observations. Reliability is equally important as power. The test should preferably be carried out and a decision should follow”?

Result on the basis of this constraint: reliability and power both .75. If this doesn’t pose problems for those concerned, the test may continue on the basis of alpha=.25 and power=.75.

Figure 10: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=1500 (red line), and p1=.04, p2=.05, n1=n2=1500(dotted blue line), using a one-sided test with equal reliability and power.

Figure 10: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=1500 (red line), and p1=.04, p2=.05, n1=n2=1500(dotted blue line), using a one-sided test with equal reliability and power.

This approach allows for flexible choice of reliability and power. The consequent lack of standardization is a disadvantage.

Conclusion

There are multiple approaches to calculate the required sample size, from questionable logic to strongly substantiated.

For strategically important ‘crucial experiments’, preference goes out to the most comprehensive method in which both “desired reliability and power” are involved in the calculation. If there is no possibility of checking against prior effects, an effect can be estimated using a pilot with “default sample size” or “default number of conversions”.

For the majority of decisions throughout the year “default reliability and power” is recommended, for reasons of comparability between tests.

Working with the recommended approaches based on calculated risk will lead to valuable optimization and correct decision making.

Note: Screenshots used in the blog belong to the author.

FAQs on A/B Testing Sample Size

What is the formula for determining sample size?

There are multiple approaches to determine the required sample size for A/B testing. For strategically important ‘crucial experiments’, preference goes out to the most comprehensive method in which both “desired reliability and power” are involved in the calculation.

What should be the required sample size for an ab test?

In the online world the possibilities for a/b testing just about anything are immense. The sample size should be large enough to demonstrate with statistical significance that the alternative version is better than the original.

]]>
https://vwo.com/blog/how-to-calculate-ab-test-sample-size/feed/ 20
A/B Test Duration Calculator [Free Downloadable Excel] https://vwo.com/blog/ab-test-duration-calculator/ https://vwo.com/blog/ab-test-duration-calculator/#comments Tue, 26 Apr 2011 14:00:55 +0000 https://vwo.com/blog/?p=1501 In a previous post, I provided a downloadable A/B testing significance calculator (in excel). In this post, I will provide a free calculator which lets you estimate how many days you should run a test to obtain statistically significant results. But, first, a disclaimer.

There is no guarantee of results for an A/B test.

When someone asks how long should s/he run an A/B test, the ideal answer would be until eternity or till the time you get results (whichever is sooner). In an A/B test, you can never say with full confidence that you will get statistically significant results after running the test X number of days. Instead, what you can say is that there is an 80% (or 95%, whatever you choose) probability of getting a statistically significant result (if it indeed exists) after X number of days. But, of course, it may also be the case that there is no difference in the performance of control and variation so no matter how long you wait, you will never get a statistically significant result.

Download Free: A/B Testing Guide

So, how long should you run your A/B test?

Download and use the calculator below to find out how many visitors you need to include in the test. There are 4 pieces of information that you need to enter:

  • The conversion rate of the original page
  • % difference in conversion rate that you want to detect (if you want to detect even the slightest improvement, it will take much longer)
  • Number of variations to test (more variations you test, more traffic you need)
  • Average daily traffic on your site (optional)

Once you enter these 4 parameters, the calculator below will find out how many visitors you need to test (for 80% and 95% probability of finding the result). You can stop the test after you test those many visitors. If you stop before that, .you may end up getting wrong results.

A/B test duration calculator (Excel spreadsheet)

Click below to download the calculator:

ab test duration calculator excel sheet

Download A/B testing duration calculator.

Please feel free to share the file with your friends and colleagues or post it on your blog/twitter.

By the way, if you want to do quick calculations, we have a version of this calculator hosted on Google Docs (this will make a copy of the Google sheet into your own account before you can make any changes to it).

For all the people looking to calculate their calculation without the trouble of going through “sheets or documents”, we created a simple to use A/B testing duration calculator.

Download Free: A/B Testing Guide

How does the calculator work?

Ah! The million dollar calculator. Explaining how it works is beyond the scope of this post as it is too technical (needs a separate post). But, if you have got the stomach for it, below is a gist of how we calculate the number of visitors needed to get significant results.

how does the a/b testing duration calculator work?

The graph above is taken from an excellent book called Statistical Rules of Thumb.

Luckily, the chapter on estimating sample size is available to download freely. Another excellent source to get more information on sample size estimation for A/B testing is Microsoft’s paper: Controlled Experiments on the Web: Survey and Practical Guide.

Hope you like the a/b test calculator and it helps your testing endeavours.

Banner Ab Test Duration Calculator
]]>
https://vwo.com/blog/ab-test-duration-calculator/feed/ 32
A/B Test Statistical Significance Calculator [Free Excel] https://vwo.com/blog/ab-testing-significance-calculator-spreadsheet-in-excel/ https://vwo.com/blog/ab-testing-significance-calculator-spreadsheet-in-excel/#comments Sun, 25 Apr 2010 10:30:53 +0000 https://vwo.com/blog/?p=777 The statistics of A/B testing results can be confusing unless you know the exact formulas. Earlier, we had published an article on the mathematics of A/B testing and we also have a A/B test statistical significance calculator on our website to check if your results are significant or not.

The calculator provides an interface for you to calculate your A/B test’s statistical significance but does not give you the formulas used for calculating it. The article, on the other hand, provides an introduction to A/B testing statistics and talks about the math that goes behind A/B split testing and the importance of statistical significance.

Download Free: A/B Testing Guide

VWO’s A/B testing solution helped retail company Greene improve their revenue by almost 60%. However, A/B tests can be tricky to execute and interpret. So, unless you believe in predicting A/B test results using Indian astrology, this blog will tell you the math behind easy calculation of statistical significance of your tests.

The ‘what’, ‘why’ and ‘how’ of statistical significance

Before we move to complex statistical significance formulas, let’s first understand what it is, why it is important, and how to ensure that your tests conclude with statistical significance.

What is statistical significance?

Statistical significance is nothing but the probability that the gap between conversion rates of any chosen variation and the control is not because of random chance but due to a well planned, data-backed process. In this data backed process, you first gather user insights on how they are interacting with your website and then use the gathered data to formulate a scientific testing hypothesis.

Your significance level also reflects your confidence level as well as risk tolerance.

For instance, if you run an A/B test with 80% significance, while determining the winner you can be 80% confident that the results produced are not a product of any random hunch or chance. Moreover, 80% significance also reflects that there is a 20% chance that you may be wrong.

Why is statistical significance important?

For A/B testing to be successful, the test results should be statistically significant. You cannot tell for certain how future visitors will react to your website. All you can do is observe the next few visitors, record their behavior, statistically analyze it, and based on that, suggest and make changes to optimize the experience of the next users. A/B testing allows you to battle the aforementioned uncertainty and improve your website’s user experience provided each and every step is planned considering each variable in play like total website traffic, sample traffic, test duration and so on. A good example of this is Germany based company Dachfenster-rollo.de that improved their conversion rate by 33% by A/B testing their user experience.

Your marketing team’s quest for exact predictions about future visitors and the inherent uncertainty in making such predictions necessitates statistical significance. Statistical significance is also important because it serves as a source of confidence and assures you that the changes you make do have a positive impact on your business goals.

How to ensure the statistical significance of a test?

Statistical significance depends on 2 variables:

  • The number of visitors, i.e your sample size.
  • The number of conversions for both control and variation(s).

To ensure that your A/B tests conclude with statistical significance, plan your testing program keeping both these variables in mind. Use our free A/B test significance calculator to know your test’s significance level.

Download Free: A/B Testing Guide

How to calculate statistical significance in excel sheet with A/B testing formulas?

We have come up with a FREE spreadsheet which details exactly how to calculate statistical significance in an excel. You just need to provide the number of visitors and conversions for control and variations. The excel calculator automatically shows you the significance, p-value, z-value and other relevant metrics for any kind of A/B split testing (including Adwords). And, to add to our article on the mathematics of A/B testing and free A/B  test statistical significance calculator, we share the A/B testing significance formula in excel for calculating test result significance.

statistical significance calculator in excel sheet

Click here to download the A/B testing significance calculator (excel sheet)

Please feel free to share the file with your friends and colleagues or post it on your blog and social media handles.

PS: By the way, if you want to do quick calculations, we have a version of this A/B testing significance calculator hosted on Google Sheets

(You will have to make a copy of the Google sheet into your own Google account before you make any changes to it).

At VWO, we believe that understanding the statistics behind A/B testing should not be your headache. Your tool should take care of this. If you’d like to try VWO SmartStats that offers intuitive and intelligent reports of your A/B tests, take a guided free trial.

Banner Ab Test Statistical Significance Calculator

Frequently asked questions

What is a statistically significant p-value?

The p-value or probability value is a statistical measurement that helps determine the validity of a hypothesis based on observed data.  Typically, a p-value of 0.05 or lower is commonly accepted as statistically significant, suggesting strong evidence against the null hypothesis. When the p-value is equal to or less than 0.05, it tells us that there’s good evidence against the null hypothesis and supports an alternative hypothesis. 

What is p-value 0.05 in Excel?

A p-value of 0.05 indicates a commonly accepted threshold for statistical significance in an excel. This signifies that there is a 5% chance that the observed result is due to random chance and the null hypothesis is true. If the p-value is less than or equal to 0.05, it serves as an evidence against the null hypothesis and supports the alternative hypothesis. For instance, you have two data sets, A and B, and you determine the statistical difference between their means. By calculating statistical significance in an excel, you get a p-value of 0.03. With this result, you can conclude that the data gives you strong evidence to reject the null hypothesis and the significant difference between data set A and B.

How do you explain p-value to non-statisticians?

Simply put, the p-value can be thought of as a “probability of chance”. It quantified the likelihood of getting the observed results by random chance, assuming that there is no actual difference between the means of two data sets. A lower p-value means that the results are less likely to be due to chance or more likely to indicate a meaningful effect. 

However, the interpretation of the p-value must be considered along with other factors like sample size, test, duration, context of the research, and so on to reach statistically significant results.

How to calculate statistically significant p-value?

While you can use the A/B testing significance formula in excel, we suggest you try our  A/B test statistical significance calculator. Using this free calculator, you can get accurate calculations without spending too much time to obtain the statistical significance. In fact, you can spend the saved time on other critical activities like hypothesis formulation, test result analysis, and user behavior research. After all, why worry when you select the right tool to do the job on your behalf?  

]]>
https://vwo.com/blog/ab-testing-significance-calculator-spreadsheet-in-excel/feed/ 48
What You Really Need To Know About The Mathematics Of A/B Split Testing https://vwo.com/blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/ https://vwo.com/blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/#comments Tue, 26 Jan 2010 16:26:47 +0000 https://vwo.com/blog/?p=60 Recently, I published an A/B split testing case study where an eCommerce store reduced the bounce rate by 20%. Some of the blog readers were worried about the statistical significance of the results. Their main concern was that a value of 125-150 visitors per variation is not enough to produce reliable results. This concern is a typical by-product of having superficial knowledge of statistics which powers A/B (and multivariate) testing. I’m writing this post to provide an essential primer on the mathematics of A/B split testing so that you never jump to a conclusion on the reliability of test results simply on the basis of the number of visitors.

Download Free: A/B Testing Guide

What exactly goes behind A/B split testing?

Imagine your website as a black box containing balls of two colors (red and green) in unequal proportions. Every time a visitor arrives on your website, he takes out a ball from that box: if it is green, he makes a purchase. If the ball is red, he leaves the website. This way, essentially, that black box decides the conversion rate of your website.

A key point to note here is that you cannot look inside the box to count the number of balls of different colors to determine the true conversion rate. You can only estimate the conversion rate based on different balls you see coming out of that box. Because conversion rate is an estimate (or a guess), you always have a range for it, never a single value. For example, mathematically, the way you describe a range is:

“Based on the information I have, 95% of the times conversion rate of my website ranges from 4.5%-7%.”

As you would expect, with more visitors, you get to observe more balls. Hence, your range gets narrower, and your estimate starts approaching the true conversion rate.

The maths of A/B split testing

Mathematically, the conversion rate is represented by a binomial random variable, which is a fancy way of saying that it can have two possible values: conversion or non-conversion. Let’s call this variable p. Our job is to estimate the value of p, and for that, we do n trials (or observe n visits to the website). After observing those n visits, we calculate how many visits resulted in a conversion. That percentage value (which we represent from 0 to 1 instead of 0% to 100%) is the conversion rate of your website.

Now imagine that you repeat this experiment multiple times. It is very likely that, due to chance, every single time, you will calculate a different value of p. Having all (different) values of p, you get a range for the conversion rate (which is what we want for the next step of analysis). To avoid doing repeated experiments, statistics has a neat trick in its toolbox. There is a concept called standard error, which tells how much deviation from the average conversion rate (p) can be expected if this experiment is repeated multiple times. The smaller the deviation, the more confident you can be about estimating the true conversion rate. For a given conversion rate (p) and the number of trials (n), the standard error is calculated as:

Standard Error (SE) = Square root of (p * (1-p) / n)

Without going much into details, to get a 95% range for conversion rate multiply the standard error value by 2 (or 1.96 to be precise). In other words, you can be sure with 95% confidence that your true conversion rate lies within this range: p % ± 2 * SE

(In VWO, when we show the conversion rate range in reports, we show it for 80%, not 95%. So we multiply standard error by 1.28).

Apart from standard error, while doing A/B testing, you would have to take into consideration Type I & Type II errors.

Download Free: A/B Testing Guide

What does it have to do with reliability of results?

In addition to calculating the conversion rate of the website, we also calculate a range for its variations in an A/B split test. Because we have already established (with 95% confidence) that the true conversion rate lies within that range, all we have to observe now is the overlap between the conversion rate range of the website (control) and its variation. If there is no overlap, the variation is definitely better (or worse if the variation has a lower conversion rate) than the control. It is that simple.

As an example, suppose control conversion rate has a range of 6.5% ± 1.5% and a variation has a range of 9% ± 1%. In this case, there is no overlap, and you can be sure about the reliability of the results.

Do you call all that math simple?

Okay, not really simple, but it is definitely intuitive. To save the trouble of doing all the math by yourself, either use a tool like VWO Testing which automatically does all the number crunching for you. Or, if you are doing a test manually (such as for Adwords), use our free A/B split test significance calculator.

So, what is the take-home lesson here?

Always, always, always use an A/B split testing calculator to determine the significance of results before jumping to conclusions. Sometimes you may discount significant results as non-significant solely on the basis of the number of visitors. Sometimes you may think results are significant due to the large number of visitors when in fact they are not.

You really want to avoid both scenarios, don’t you?

End Banner VWO Split Testing
]]>
https://vwo.com/blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/feed/ 51