What Is a Paired Samples T-Test?
A paired samples t-test is a statistical test used to compare two related measurements. It is commonly used when the same participants, patients, samples, or matched subjects are measured twice.
Unlike an independent samples t-test, which compares two separate groups, the paired samples t-test focuses on the change within each pair. The two values are linked because they come from the same individual or from a carefully matched pair.
Typical examples include measurements taken before and after treatment, scores under two experimental conditions, or laboratory values measured at baseline and follow-up.
Why Is This Test Useful?
In many studies, researchers are interested in whether a condition changes after an intervention. For example, a clinician may measure blood pressure before and after a new medication, or a psychologist may measure anxiety scores before and after a therapy program.
Because the same individuals are measured twice, each person serves as their own comparison. This is useful because people can differ greatly from one another at baseline. Some patients naturally have higher blood pressure, some students naturally perform better on exams, and some participants may respond differently to treatment.
By comparing each person with themselves, the paired samples t-test removes much of the variability caused by individual differences. This often makes the test more sensitive than comparing two unrelated groups.
How Does the Test Work?
The paired samples t-test begins by calculating a difference score for each pair.
For example, in a before-and-after study, the difference might be:
Difference = After value − Before value
After calculating a difference for every subject, the test examines whether the average difference is significantly different from zero.
If the treatment or intervention has no effect, the average difference should be close to zero. If the average difference is clearly above or below zero, this suggests that a real change may have occurred.
The test produces a t-statistic, which compares the mean difference with the variability of the difference scores. A larger absolute t-value means that the average change is large relative to the variation in individual changes.
The test also provides a p-value. The p-value indicates how likely it would be to observe a mean difference as large as the one in the data if there were truly no change in the population.
A commonly used significance level is 0.05. If the p-value is below 0.05, the result is usually considered statistically significant.
What Does the Result Mean?
A statistically significant paired samples t-test suggests that the average value changed between the two related measurements.
For example, if patients have significantly lower pain scores after treatment than before treatment, the result supports the idea that the treatment is associated with reduced pain.
However, statistical significance does not automatically mean that the change is clinically or practically important. A small change may be statistically significant in a large study but may not be meaningful in real-world practice.
For this reason, researchers should also consider the size of the change, confidence intervals, and effect size. A common effect size for paired data is Cohen’s d calculated using the difference scores.
A non-significant result means that the data do not provide strong evidence of a change. It does not prove that there is no change at all. The study may have had too few participants, large variability in the differences, or limited statistical power.
Key Assumptions
Several assumptions should be considered before using a paired samples t-test.
1. Paired or matched observations
Each value in one condition must be linked to one value in the other condition. This usually means the same subject is measured twice, such as before and after treatment.
It can also apply to matched pairs, such as twins, matched patients, or paired biological samples.
2. Continuous outcome variable
The measured outcome should be continuous, such as blood pressure, body weight, test score, biomarker concentration, reaction time, or symptom score.
3. Normally distributed difference scores
The paired samples t-test assumes that the difference scores are approximately normally distributed.
Importantly, the two original sets of scores do not each need to be perfectly normal. What matters most is whether the differences between paired values are reasonably normal.
When the number of pairs is moderate or large, the test is generally more robust to mild departures from normality.
4. No extreme outliers in the differences
Extreme difference scores can strongly affect the mean difference and distort the result. It is good practice to inspect the differences before performing the test.
If the difference scores are highly skewed or contain serious outliers, a non-parametric alternative such as the Wilcoxon signed-rank test may be more appropriate.
When Should You Use It?
Use a paired samples t-test when:
- The same subjects are measured twice.
- The outcome is continuous.
- Each observation in one condition has a natural match in the other condition.
- You want to test whether the average change is different from zero.
Common examples include:
- Comparing blood pressure before and after treatment.
- Comparing pain scores before and after surgery.
- Comparing exam scores before and after training.
- Comparing laboratory values at baseline and follow-up.
- Comparing two methods measured on the same samples.
When Should You Not Use It?
A paired samples t-test should not be used when the two groups are completely separate and unrelated. In that situation, an independent samples t-test is more appropriate.
If there are three or more related measurements, such as baseline, 1-month follow-up, and 3-month follow-up, a repeated measures ANOVA or a mixed-effects model may be needed.
If the outcome is categorical, such as improved/not improved or positive/negative, other methods such as McNemar’s test may be more suitable.
If the paired differences are clearly non-normal or ordinal, the Wilcoxon signed-rank test may be considered.
A Simple Example
Suppose a researcher wants to know whether a 12-week exercise program reduces resting heart rate. She measures the resting heart rate of 20 participants before the program and again after the program.
Before training, the average resting heart rate is 74 beats per minute. After training, the average resting heart rate is 68 beats per minute.
For each participant, the researcher calculates the difference between the before and after measurements. She then performs a paired samples t-test and obtains:
- t = 4.21
- p = 0.0005
Because the p-value is less than 0.05, the result is statistically significant. The researcher concludes that resting heart rate decreased significantly after the exercise program.
However, she should also report the mean difference, standard deviation of the differences, confidence interval, and effect size to describe how large the change was.
How to Report the Result
A paired samples t-test result can be reported like this:
“The mean value after treatment was significantly different from the mean value before treatment, t(df) = value, p = value.”
For example:
“Resting heart rate was significantly lower after the exercise program than before the program, t(19) = 4.21, p = 0.0005.”
A more complete report may include:
- Sample size or number of pairs.
- Mean and standard deviation at each time point.
- Mean difference.
- Standard deviation of the differences.
- t-statistic and degrees of freedom.
- p-value.
- Confidence interval.
- Effect size.
Summary
The paired samples t-test is a useful method for comparing two related measurements. It is especially common in before-and-after studies, repeated measurements, matched-pair designs, and crossover studies.
Its main advantage is that it accounts for the natural pairing between observations. By focusing on within-subject changes, it can often detect effects more efficiently than methods designed for independent groups.
To use the test correctly, researchers should confirm that the observations are paired, examine the distribution of difference scores, check for extreme outliers, and interpret the p-value together with effect size and practical importance.