Faster Methods for Computing Randomization Tests in the Presence of Repeated Data Values

Dominique Roelants van Baronaigien, Department of Computer Science, University of Victoria, Canada.
Frank Ruskey, Department of Computer Science, University of Victoria, Canada.
William R. Engels, Department of Genetics, University of Wisconsin, USA.


Let C(k ; n0,n1,...,nt) denote the set of all cardinality k sub-multisets of the multiset consisting of ni occurrences of i for i = 0,1,...,t. For example, C(2;2,1,1) = { {0,0}, {0,1}, {0,2}, {1,2} }. The elements C(k ; n0,n1,...,nt) are called combinations of a multiset. We develop an algorithm that generates all combinations of a multiset in time O( C(k ; n0,n1,...,nt) ) as long as 0 < k < n0 + · · · + nt. No faster algorithm is possible. We then give two applications of our algorithm. First, it is used to significantly improve the performance of randomization tests in the presence of repeated data values. Secondly, it is used to generate all contingency tables with fixed marginals in time proportional to the number of such tables; theoretically this is the optimal running time.

Back to list of publications.