Faster Methods for Computing Randomization Tests in the Presence of
Repeated Data Values
Dominique Roelants van Baronaigien, 
  Department of Computer Science, 
  University of Victoria, Canada. 
Frank Ruskey, 
  Department of Computer Science, 
  University of Victoria, Canada. 
William R. Engels, 
  Department of Genetics, 
  University of Wisconsin, USA. 
Abstract:
Let C(k ;
  n0,n1,...,nt) 
  denote the set of all cardinality k sub-multisets of the
  multiset consisting of ni occurrences of
  i for i = 0,1,...,t.
For example, C(2;2,1,1) = { {0,0}, {0,1}, {0,2}, {1,2} }.
The elements C(k ;
  n0,n1,...,nt) 
  are called combinations of a multiset.
We develop an algorithm that generates all combinations of
  a multiset in time O( C(k ;
  n0,n1,...,nt) )
  as long as 0 < k < 
  n0 + · · · + nt.   
No faster algorithm is possible.  
We then give two applications of our algorithm.  
First, it is used to significantly improve the performance of 
  randomization tests in the presence of repeated data values.  
Secondly, it is used to generate all contingency tables with fixed
  marginals in time proportional to the number of such
  tables; theoretically this is the optimal running time.
-  
  The postscript file is ???,??? bytes,
  the dvi file is ??,??? bytes.
- 
  Submitted to the journal Computational Statistics.
- 
  The algorithms have been implemented and are used as the basis of 
  some of the online combinations of multiset generation algorithms of 
  COS, 
  the Combinatorial Object Server, in the section on 
  
  permutations and combinations of a multiset.
Back to list of publications.