Definition:Chi-Squared Test/Goodness of Fit

From ProofWiki
Jump to navigation Jump to search

Definition

The chi-squared test for goodness of fit is a test of goodness of fit of observations to some theoretical probability distribution.

Let $n \in \Z_{>0}$.

Let a value $x_i$ for $i \in \set {1, 2, \ldots, n}$ be expected to occur $E_i$ times.

Let $x_i$ actually occur $O_i$ times.

Then the statistic:

$\ds \chi^2 = \sum_i \dfrac {\paren {O_i - E_i}^2} {E_i}$

has a $\chi$-squared distribution with $n - p$ degrees of freedom where $p$ is the number of distribution parameters estimated from the data and used to compute the $E_i$.


Significantly high values of $\chi^2$ lead to the rejection of the hypothesised distribution.


Modifications are needed if some $E_i$ are small, that is, if several values of $E_i$ are $5$ or less.


Continuous Distribution

The $\chi$-squared test for goodness of fit can be adapted for grouped data from a continuous probability distribution for when these are the only data there are.

However, if individual observations are available, they should not be arbitrarily grouped together simply so that the test can be applied, because the outcome of the test is not independent of the choice of class intervals.

Some groupings may lead to a significant value for the $\chi$-squared statistic, while others may not.


Examples

Cast of Dice

Let $D$ be a die which we want to determine is fair or not.

Let $D$ be cast $96$ times.

Then:

$x_i \in \set {1, 2, 3, 4, 5, 6}$

If $D$ is fair, then for all $i$, the number of times we expect to observe each face of $D$ is:

$E_i = 96 \times \dfrac 1 6 = 16$

Suppose in our trial, the number of times each face comes up is shown in the table below:

\(\ds O_1\) \(=\) \(\ds 14\)
\(\ds O_2\) \(=\) \(\ds 19\)
\(\ds O_3\) \(=\) \(\ds 11\)
\(\ds O_4\) \(=\) \(\ds 21\)
\(\ds O_5\) \(=\) \(\ds 12\)
\(\ds O_6\) \(=\) \(\ds 19\)

Then:

\(\ds \chi^2\) \(=\) \(\ds \sum_{i \mathop = 1}^6 \dfrac {\paren {O_i - E_i}^2} {E_i}\)
\(\ds \) \(=\) \(\ds \dfrac {\paren {14 - 16}^2} {16} + \dfrac {\paren {19 - 16}^2} {16} + \dfrac {\paren {11 - 16}^2} {16} + \dfrac {\paren {21 - 16}^2} {16} + \dfrac {\paren {12 - 16}^2} {16} + \dfrac {\paren {19 - 16}^2} {16}\)
\(\ds \) \(=\) \(\ds \dfrac 4 {16} + \dfrac 9 {16} + \dfrac {25} {16} + \dfrac {25} {16} + \dfrac {16} {16} + \dfrac 9 {16}\)
\(\ds \) \(=\) \(\ds \dfrac {88} {16}\)
\(\ds \) \(=\) \(\ds 5.5\)

The expectation of $16$ is computed from the data, so there are $6 - 1 = 5$ degrees of freedom.



The $\chi^2$ value is not significant at the $5 \%$ level (i.e. is $< 11.07$), so the hypothesis that $D$ is fair is not rejected.

Chi squared.png


Also see

  • Results about the $\chi$-squared test can be found here.


Sources