ConvertiloConvertilo

A/B Test Calculator

Calculate A/B test statistical significance (p-value, z-score) or required sample size. Two-sided z-test for two proportions. No registration required.

Variant A (control)

Variant B (test)

Verdict

Not significant — collect more data

Confidence: 84,71% · p-value: 0,1529

Conversion rate A

10%

Conversion rate B

12%

Uplift

+20%

z-score

1,429

Two-proportion z-test formula

z = (p₂ − p₁) / √(p̄ · (1 − p̄) · (1/n₁ + 1/n₂))

p̄ is the pooled conversion rate across both groups. The two-sided p-value is computed via the standard normal distribution.

How to use the calculator

One tool covers both jobs of a product analyst: planning a test before it starts and judging significance after it ends.

Pick a mode

Use the toggle at the top to switch between Significance (analyzing collected data) and Sample size (planning a future test).

Enter data

For significance — visitors and conversions for each variant. For sample size — baseline conversion, MDE, significance level, and statistical power.

Read the result

You get a verdict with p-value and z-score, or the exact number of users you need to allocate to each test variant.

Why use this calculator

Two tools in one

Plan the sample size before the test. Check statistical significance afterwards. No need to switch between services.

Transparent formulas

We use the classic two-proportion z-test and the standard sample-size formula for two proportions. No black boxes — math is shown on the page.

Works for any binary metric

Conversion rate, CTR, retention, button clicks — anything measured as «happened / didn't happen» per unique user.

FAQ about A/B tests

What p-value is considered significant?

Traditionally p < 0.05, which corresponds to 95% confidence that the difference between variants is not random. In sensitive areas (medicine, finance) people use a stricter threshold of 0.01.

What is MDE?

Minimum Detectable Effect — the smallest improvement the test will be able to detect at the chosen power. If baseline is 10% and MDE = 10%, the test will detect any change to 11% or higher. A smaller MDE requires a disproportionately larger sample size.

What is statistical power?

The probability of correctly detecting an effect if it actually exists. The industry default is 80%: a test with this power will miss a real improvement 20% of the time. For high-stakes decisions use 90%.

Can I stop the test early if it's already significant?

No. Peeking at intermediate results dramatically inflates the false positive rate — your real p-value will be much higher than the one shown. Run the test to its planned N and only then look at the result. If you need early stopping, use sequential testing or Bayesian methods.