Faster Methods for Computing Randomization Tests in the Presence of
Repeated Data Values
Dominique Roelants van Baronaigien,
Department of Computer Science,
University of Victoria, Canada.
Frank Ruskey,
Department of Computer Science,
University of Victoria, Canada.
William R. Engels,
Department of Genetics,
University of Wisconsin, USA.
Abstract:
Let C(k ;
n0,n1,...,nt)
denote the set of all cardinality k sub-multisets of the
multiset consisting of ni occurrences of
i for i = 0,1,...,t.
For example, C(2;2,1,1) = { {0,0}, {0,1}, {0,2}, {1,2} }.
The elements C(k ;
n0,n1,...,nt)
are called combinations of a multiset.
We develop an algorithm that generates all combinations of
a multiset in time O( C(k ;
n0,n1,...,nt) )
as long as 0 < k <
n0 + · · · + nt.
No faster algorithm is possible.
We then give two applications of our algorithm.
First, it is used to significantly improve the performance of
randomization tests in the presence of repeated data values.
Secondly, it is used to generate all contingency tables with fixed
marginals in time proportional to the number of such
tables; theoretically this is the optimal running time.
-
The postscript file is ???,??? bytes,
the dvi file is ??,??? bytes.
-
Submitted to the journal Computational Statistics.
-
The algorithms have been implemented and are used as the basis of
some of the online combinations of multiset generation algorithms of
COS,
the Combinatorial Object Server, in the section on
permutations and combinations of a multiset.
Back to list of publications.