A/B Testing | Notion

A/B testing is the closest thing marketing has to a truth serum — and most teams are still making decisions based on gut feel instead. I've watched a single headline test on a landing page lift conversions 37%. I've also watched teams run tests with 200 visitors and declare a winner. The tool is only as good as the discipline behind it.

What Is A/B Testing?

A/B testing (also called split testing) is a controlled experiment where you compare two versions of a marketing asset — Version A (the control) and Version B (the variant) — to determine which performs better against a specific metric. You randomly split your audience so each group sees only one version, then measure the difference in outcomes like conversion rate, click-through rate, revenue per visitor, or engagement.

The method comes from randomized controlled trials in clinical research. The logic is identical: isolate one variable, test it against a control, and let the data tell you what works. In marketing, you can test virtually anything — email subject lines, landing page layouts, pricing displays, CTA button colors, ad copy, checkout flows, even entire brand positioning angles.

What separates real A/B testing from just "trying stuff" is statistical rigor. You need a hypothesis, a sufficient sample size, a defined test duration, and a predetermined significance threshold (usually 95% confidence). Without these, you're just gambling with data.

The Framework

Step	Action	Key Consideration
1. Hypothesize	"Changing X will improve Y by Z%"	Be specific — vague tests produce vague results
2. Calculate sample size	Use a power calculator (Optimizely, VWO, or Evan Miller's calculator)	Underpowered tests are the #1 source of false positives
3. Randomize	Split traffic 50/50 between control and variant	Ensure random assignment, not time-based splitting
4. Run the test	Let it run for the full predetermined duration	Don't peek and call it early
5. Analyze	Check statistical significance at 95%+ confidence	Look at the confidence interval, not just the point estimate
6. Implement	Roll out the winner to 100% of traffic	Document learnings for institutional knowledge

Real-World Examples

Company	What They Tested	Result	Impact
Amazon	One-click checkout vs. standard cart flow	One-click increased conversion significantly	Patented the feature — it became a core competitive advantage
Obama 2008 campaign	24 combinations of hero image + CTA button	Winner outperformed original by 40.6%	Generated an estimated $60M in additional donations
HubSpot	Long-form vs. short-form landing pages for enterprise	Long-form increased qualified leads by 20%	Changed their entire landing page playbook for high-ACV products
Booking.com	Urgency messaging ("Only 2 rooms left!")	12-17% lift in booking completion	Became a UX pattern across the entire travel industry
Netflix	Thumbnail images for content	Personalized thumbnails increased click-through by 20-30%	Now runs thousands of concurrent tests across 230M+ subscribers

Common Mistakes

Calling tests too early. This is the cardinal sin. With a small sample, random variance looks like a real difference. A test that shows a "25% lift" after 500 visitors might show 0% after 5,000. Commit to a sample size before you start and don't touch the results until you hit it.

Testing too many variables at once. If you change the headline, image, CTA, and layout simultaneously, you can't know which change drove the result. Test one variable at a time (A/B test) or use multivariate testing if you have enough traffic to support it.

Ignoring practical significance. A test might be statistically significant (p < 0.05) but only show a 0.3% improvement. Is that worth the engineering effort to implement? Statistical significance and business significance are different things.

Not accounting for external factors. Running a test during Black Friday and comparing it to normal traffic will produce misleading results. Segment your analysis and watch for seasonal, day-of-week, and promotional period effects.

Testing low-impact elements. Button color tests are the meme of A/B testing for a reason. Test things that matter: value propositions, pricing structures, offer framing, page layouts, and positioning angles. Test big, not small.

How It Connects to Other Concepts

Conversion rate optimization is A/B testing's primary domain. Every conversion rate improvement project should be backed by test data, not opinions.

A/B testing helps determine optimal positioning by testing different value propositions and messaging angles against real audience behavior rather than focus group opinions.

Penetration pricing vs. price skimming decisions can be informed by price sensitivity tests — showing different price points to different segments and measuring price elasticity in real time.

ROMI improves directly when A/B testing eliminates underperforming creative and optimizes high-performing variants.