# Bias of Sample Variance

## Theorem

Let $X_1, X_2, \ldots, X_n$ form a random sample from a population with mean $\mu$ and variance $\sigma^2$.

Let:

$\displaystyle \bar X = \frac 1 n \sum_{i \mathop = 1}^n X_i$

Then:

$\displaystyle \hat {\sigma^2} = \frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \bar X}^2$

is a biased estimator of $\sigma^2$, with:

$\displaystyle \operatorname{bias} \paren {\hat {\sigma ^2}} = -\frac {\sigma^2} n$

## Proof

If $\hat {\sigma^2}$ is a biased estimator of $\sigma^2$, then:

$\displaystyle \expect {\hat {\sigma^2} } \ne \sigma^2$

We have:

 $\displaystyle \expect {\hat {\sigma^2} }$ $=$ $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \bar X}^2}$ $\displaystyle$ $=$ $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren{\paren {X_i - \mu} - \paren {\bar X - \mu} }^2}$ writing $X_i - \bar X = X_i - \bar X - \mu + \mu$ $\displaystyle$ $=$ $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren{\paren {X_i - \mu}^2 - 2 \paren {\bar X - \mu} \paren {X_i -\mu} + \paren {\bar X - \mu}^2} }$ Square of Difference $\displaystyle$ $=$ $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \mu}^2 - \frac 2 n \paren {\bar X - \mu} \sum_{i \mathop = 1}^n \paren {X_i -\mu} + \frac 1 n \paren {\bar X - \mu}^2 \sum_{i \mathop = 1}^n 1}$ Summation is Linear

We have that:

 $\displaystyle \frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \mu}$ $=$ $\displaystyle \frac 1 n \sum_{i \mathop = 1}^n X_i - \frac n n \mu$ from $\displaystyle \sum_{i \mathop = 1}^n 1 = n$, noting that $\mu$ is independent of $i$. $\displaystyle$ $=$ $\displaystyle \bar X - \mu$ by our definition of $\bar X$

So:

 $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \mu}^2 - \frac 2 n \paren {\bar X - \mu} \sum_{i \mathop = 1}^n \paren {X_i -\mu} + \frac 1 n \paren {\bar X - \mu}^2 \sum_{i \mathop = 1}^n 1}$ $=$ $\displaystyle \expect {\frac 1 n \sum_{i \mathop = 1}^n \paren {X_i - \mu}^2 - 2 \paren {\bar X - \mu}^2 + \paren {\bar X - \mu}^2}$ $\displaystyle$ $=$ $\displaystyle \frac 1 n \expect {\sum_{i \mathop = 1}^n \paren {X_i - \mu}^2} - \expect {\paren {\bar X - \mu}^2}$ Linearity of Expectation Function $\displaystyle$ $=$ $\displaystyle \frac 1 n \sum_{i \mathop = 1}^n \expect {\paren {X_i - \mu}^2} - \var {\bar X}$ Definition of Variance, Linearity of Expectation Function $\displaystyle$ $=$ $\displaystyle \frac 1 n \sum_{i \mathop = 1}^n \var {X_i} - \frac {\sigma^2} n$ Definition of Variance, Variance of Sample Mean $\displaystyle$ $=$ $\displaystyle \frac n n \sigma^2 - \frac {\sigma^2} n$ $\var {X_i} = \sigma^2$, $\displaystyle \sum_{i \mathop = 1}^n 1 = n$ $\displaystyle$ $=$ $\displaystyle \sigma^2 - \frac {\sigma^2} n$ $\displaystyle$ $\ne$ $\displaystyle \sigma^2$

So $\hat {\sigma^2}$ is a biased estimator of $\sigma^2$.

Further, we have:

$\displaystyle \operatorname{bias} \paren {\hat {\sigma ^2}} = \sigma^2 - \frac {\sigma^2} n - \sigma^2 = -\frac {\sigma^2} n$

$\blacksquare$