Wilcoxon Two Independent Samples Test
{From the Institute of
Phonetic Sciences (IFA):
http://www.fon.hum.uva.nl/}
Note: this test is identical to the Mann-Whitney U test for two independent samples.
Characteristics:
A most usefull test to see whether the values in two samples differ in
size. It resembles the
Median-Test in scope, but it is much more sensitive. In fact, for
large numbers it is almost as sensitive as the
Two Sample Student t-test. For small numbers with unknown distributions this
test is even more sensitive than the Student t-test.
As it is only on rare occasions that we do know that values are Normal
distributed, this test is to be preferred over the Student t-test.
H0:
The populations from which the two samples are taken have identical median
values. To be complete, the two populations have identical distributions.
Assumptions:
None realy.
Scale:
Ordinal.
Procedure:
Rank order all N = m + n values from both samples (m
and n) combined. Sum the ranks of the smallest sample (Wsmallest). This
value is used to determine the level of significance.
Level of Significance:
Look up the level of significance in a table using Wsmallest, m and n.
Calculating the exact level of significance is based on calculating all possible
permutations of ranks over both samples. This is computationally demanding if
n and m are larger than 7.
Approximation:
If m>10 and n>10,
Z = ( Wsmallest - 0.5 - m * ( m + n + 1 ) / 2 ) / sqrt(
m * n * ( m + n + 1 ) / 12 )
is approximately
Normal distributed.
(Use Wsmallest - 0.5 if Wsmallest > N*(N+1)/4,
else use Wsmallest + 0.5)
Remarks:
In this example, exact probabilities are calculated for m <= 10 or n
<= 10. If both are larger than 7 this can take more time than is available
within this system (the number of calculations grows as N!/(m!*n!), with
N!=N*(N-1)*(N-2)*...*1). Therefore, if it is anticipated that the calculations
take too much time, the Normal approximation is used. However, the resulting
values are unreliable and this will be indicated with a *. You are advised to
check the level of significance in a table.
You can compute this test by clicking HERE.
For m > 10 and n > 10 the Normal approximation is used. A perl
script of the test is available
here. A
minimalist Windows version (with dosperl interpreter) is available
here (<500
kB).