A plain-language decision guide for choosing the right test for comparing groups — without needing a statistics degree.

Choosing the wrong statistical test is one of the most common issues flagged in peer review. This guide walks through the decision the way a working researcher actually makes it: how many groups, is the data normally distributed, and how big is the sample.

Note: this is a practical orientation, not a substitute for statistical advice on your specific study. When in doubt, consult a statistician.

The three questions that decide your test

1. How many groups are you comparing?

Two groups → t-test family or Mann-Whitney.
Three or more groups → ANOVA family or Kruskal-Wallis.

2. Is your data roughly normally distributed?

Approximately normal → parametric tests (t-test, ANOVA).
Not normal / skewed / has outliers / ordinal → non-parametric tests (Mann-Whitney, Kruskal-Wallis).

3. How big is your sample?

Very small samples make normality hard to assume; non-parametric tests are often safer.

The common tests, and when to use each

Two groups, normal data → Welch's t-test. Comparing two independent groups (e.g. control vs treated) with roughly normal data. Welch's t-test is the better default than Student's t-test because it does not assume equal variances between groups — and unequal variance is common in real experiments.

Two groups, non-normal data → Mann-Whitney U test. The non-parametric counterpart. Use it when the data is skewed, ordinal, has outliers, or the sample is too small to assume normality. It compares distributions rather than means.

Three or more groups, normal data → one-way ANOVA. Compares the means of three or more groups simultaneously. If ANOVA finds a significant difference, follow up with a post-hoc test (e.g. Tukey) to find which groups differ — don't run many individual t-tests, which inflates false positives.

Three or more groups, non-normal data → Kruskal-Wallis. The non-parametric counterpart to one-way ANOVA. Use it for three or more groups when normality doesn't hold. Follow up with an appropriate post-hoc test (e.g. Dunn's).

Paired/repeated measurements (same subjects measured twice, or before/after) use paired versions (paired t-test, or Wilcoxon signed-rank for non-parametric) — these are a different family from the independent-group tests above.

A quick decision summary

2 groups, normal → Welch's t-test
2 groups, not normal → Mann-Whitney U
3+ groups, normal → one-way ANOVA (+ post-hoc)
3+ groups, not normal → Kruskal-Wallis (+ post-hoc)

Common mistakes

Running multiple t-tests instead of ANOVA for 3+ groups — this inflates your false-positive rate. Use the omnibus test, then post-hoc.
Assuming normality without checking — especially with small samples.
Using Student's t-test by default when variances are unequal — Welch's is the safer default.
Reporting a significant ANOVA without post-hoc — the omnibus test tells you that groups differ, not which.

Doing it without hand-calculating

You can run these tests in R, Python, GraphPad Prism, or SPSS. If you'd rather see the test alongside the figure, FigureGuild computes the appropriate test as you build the chart — Welch's t-test for two groups, one-way ANOVA with post-hoc for three or more, and Mann-Whitney / Kruskal-Wallis for non-parametric data — with the statistics computed locally from your data, not estimated by AI.

Try it free at figureguild.com.

Final thought

Most of the time, three questions get you to the right test: how many groups, is the data normal, and is the sample reasonable. Match the test to those, use the non-parametric option when normality is doubtful, and always follow a significant omnibus test with the right post-hoc. When the stakes are high, confirm with a statistician.

FigureGuild builds publication-grade charts from your data with the right statistical test built in. Free to try.