A/B Test Calculator
Calculate A/B test statistical significance (p-value, z-score) or required sample size. Two-sided z-test for two proportions. No registration required.
Variant A (control)
Variant B (test)
Verdict
Not significant — collect more data
Confidence: 84,71% · p-value: 0,1529
Conversion rate A
10%
Conversion rate B
12%
Uplift
+20%
z-score
1,429
Two-proportion z-test formula
z = (p₂ − p₁) / √(p̄ · (1 − p̄) · (1/n₁ + 1/n₂))
p̄ is the pooled conversion rate across both groups. The two-sided p-value is computed via the standard normal distribution.
How to use the calculator
One tool covers both jobs of a product analyst: planning a test before it starts and judging significance after it ends.
Pick a mode
Use the toggle at the top to switch between Significance (analyzing collected data) and Sample size (planning a future test).
Enter data
For significance — visitors and conversions for each variant. For sample size — baseline conversion, MDE, significance level, and statistical power.
Read the result
You get a verdict with p-value and z-score, or the exact number of users you need to allocate to each test variant.
Why use this calculator
Two tools in one
Plan the sample size before the test. Check statistical significance afterwards. No need to switch between services.
Transparent formulas
We use the classic two-proportion z-test and the standard sample-size formula for two proportions. No black boxes — math is shown on the page.
Works for any binary metric
Conversion rate, CTR, retention, button clicks — anything measured as «happened / didn't happen» per unique user.
FAQ about A/B tests
What p-value is considered significant?
Traditionally p < 0.05, which corresponds to 95% confidence that the difference between variants is not random. In sensitive areas (medicine, finance) people use a stricter threshold of 0.01.
What is MDE?
Minimum Detectable Effect — the smallest improvement the test will be able to detect at the chosen power. If baseline is 10% and MDE = 10%, the test will detect any change to 11% or higher. A smaller MDE requires a disproportionately larger sample size.
What is statistical power?
The probability of correctly detecting an effect if it actually exists. The industry default is 80%: a test with this power will miss a real improvement 20% of the time. For high-stakes decisions use 90%.
Can I stop the test early if it's already significant?
No. Peeking at intermediate results dramatically inflates the false positive rate — your real p-value will be much higher than the one shown. Run the test to its planned N and only then look at the result. If you need early stopping, use sequential testing or Bayesian methods.