Efficient Codon Optimization with Motif Engineering

It is now common to add protein coding genes into cloning vectors for expression within non-native host organisms. Codon optimization supports translational efficiency of the desired protein product, by exchanging codons which are rarely found in the host organism with more frequently observed codons. Motif engineering, such as removal of restriction enzyme recognition sites or addition of immuno-stimulatory elements, is also often necessary. We present an algorithm for optimizing codon bias of a gene with respect to a well motivated measure of bias, while simultaneously performing motif engineering. The measure is the previously studied codon adaptation index, which favors the use, in the gene to be optimized, of the most abundant codons found in the host genome. We demonstrate the efficiency and effectiveness of our algorithm on the GENCODE dataset and provide a guarantee that the solution found is always optimal.

An Algorithm for Road Coloring

A coloring of edges of a finite directed graph turns the graph into a finite-state automaton. The synchronizing word of a deterministic automaton is a word in the alphabet of colors (considered as letters) of its edges that maps the automaton to a single state. A coloring of edges of a directed graph of uniform outdegree (constant outdegree of any vertex) is synchronizing if the coloring turns the graph into a deterministic finite automaton possessing a synchronizing word. The road coloring problem is the problem of synchronizing coloring of a directed finite strongly connected graph of uniform outdegree if the greatest common divisor of the lengths of all its cycles is one. The problem posed in 1970 has evoked noticeable interest among the specialists in the theory of graphs, automata, codes, symbolic dynamics as well as among the wide mathematical community. A polynomial time algorithm of O(n^{3}) complexity in the worst case and quadratic in the majority of studied cases for the road coloring of the considered graph is presented below. The work is based on the recent positive solution of the road coloring problem. The algorithm was implemented in the freeware package TESTAS.

Periods in Partial Words: An Algorithm

Partial words are finite sequences over a finite alphabet that may contain some holes. A variant of Fine and Wilf's theorem states that any partial word with h holes, having periods p, q and length at least the so-denoted L(h, p, q) has also the greatest common divisor of p and q, gcd(p,q), as a period. In this paper, we associate a graph with each p- and q-periodic word, and study two types of vertex connectivity on such a graph: modified degree connectivity and r-set connectivity where r = q mod p. As a result, we give an algorithm for computing L(h, p, q) in the general case.

Parameterized Longest Previous Factor

The longest previous factor (LPF) problem is defined for traditional strings exclusively from the constant alphabet Σ. A parameterized string (p-string) is a sophisticated string composed of symbols from a constant
alphabet Σ and a parameter alphabet Π. We generalize the LPF problem to the parameterized longest previous factor (pLPF) problem defined for p-strings. Subsequently, we present a linear time solution to construct the pLPF array. Given our pLPF algorithm, we show how to construct the pLCP (parameterized longest common prefix) array in linear time. Our algorithm is further exploited to construct the standard LPF and LCP arrays all in linear time.

p-Suffix Sorting as Arithmetic Coding

The challenge of direct parameterized suffix sorting (p-suffix sorting) for a parameterized string (p-string) is the dynamic nature of parameterized suffixes (p-suffixes). In this work, we propose transformative approaches to direct p-suffix sorting by generating and sorting lexicographically numeric fingerprints and arithmetic codes that correspond to individual p-suffixes. Our algorithm to p-suffix sort via fingerprints is the first theoretical linear time algorithm for p-suffix sorting for non-binary parameter alphabets, which assumes that each code is represented by a practical integer. We eliminate the key problems of fingerprints by introducing an algorithm that exploits the ordering of arithmetic codes to sort p-suffixes in linear time on average.

A Unifying Property for Distribution-Sensitive Priority Queues

A priority queue is presented that supports insert in worst-case constant time, and delete, find-min, delete-min, and decrease-key on element x in worst-case
O( lg ( min{w_{x}, q_{x}}+2))
time, where w_{x} (respectively q_{x}) is the number of elements that were inserted after x (respectively before x)
and are still present at the time when operating on x. Our priority queue then has both the working-set and the queueish properties, and more strongly it satisfies these properties in the worst-case sense. We also argue that these bounds are the best possible with respect to the considered measures. Moreover, we modify our priority queue to satisfy a new unifying property ---the time-finger property--- which encapsulates both the working-set and the queueish properties. In addition, we prove that the working-set property is asymptotically equivalent to the unified bound (which is the minimum per operation among the static finger, static optimality, and the working-set bounds). This latter result is of tremendous interest by itself as it had gone unnoticed since the introduction of such bounds by Sleater and Tarjan. Together, these results indicate that our priority queue also satisfies the static-finger, the static-optimality and the unified bounds.

Two Constant-Factor-Optimal Realizations of Adaptive Heapsort

In this paper we introduce two priority queues, which are theoretically efficient. For both priority queues, insert requires O(1) amortized time, and extract-min requires O(lg n) worst-case time and involves at most lg n + O(1) element comparisons. One priority queue is based on weak heaps (array-based) and the other on weak queues (pointer-based). The main idea is to temporarily store the inserted elements in a buffer, and once it is full move its elements to the main queue using an efficient bulk-insertion procedure. By employing the new priority queues in adaptive heapsort, we guarantee that the formula for the number of element comparisons performed by the algorithm is optimal up to the constant factor of the high-order term. We denote such performance as constant-factor optimality. In contrast to earlier constant-factor-optimal sorting algorithms, which are mainly of theoretical interest, adaptive heapsort relying on the developed priority queues is practically workable.

Cancelled.

Quasi-Cyclic Codes over GF(13)

Let d_{q}(n,k) be the maximum possible minimum Hamming distance of a linear [n,k] code over GF(q). Tables of best known linear codes exist for all fields up to q=9. In this paper, linear codes
over GF(13) are constructed for k up to 6.
The codes constructed are from the class of quasi-cyclic codes. In addition, the minimum distance of the extended quadratic residue code of length 88 is determined.

Testing Monotone Read-Once Functions

A checking test for a monotone read-once function f depending
essentially on all its n variables is a set of vectors M distinguishing f from all other monotone read-once functions of the same variables. We describe an inductive procedure for obtaining individual lower and upper bounds on the minimal number of vectors T(f) in a checking test for any function f. The task of deriving the exact value of T(f) is reduced to a combinatorial optimization problem related to graph connectivity. We show that for almost all functions f expressible by read-once conjunctive or disjunctive normal forms T(f) is
asymptotically equal to n / ln n. For several classes of functions our results give the exact value of T(f).

Generating All Simple Convexly-Drawable Polar Symmetric 6-Venn Diagrams

An *n-Venn diagram* consists of n curves drawn in the plane in such a way that each of the
2^{n} possible intersections of the interiors and exteriors of the curves forms a connected non-empty region. A Venn diagram is *convexly-drawable* if it can be drawn with all curves convex and it is *simple* if at most two curves intersect at any point. A Venn diagram is called *polar symmetric* if its stereographic projection about the infinite outer face is isomorphic to the projection about the innermost face. We outline an algorithm that shows there are exactly 406 simple convexly drawable polar-symmetric 6-Venn diagrams.

Enumerating Tatami Mat Arrangements of Square Grids

We prove that the number of monomer-dimer tilings of an n x n square grid, with m < n monomers in which no four tiles
meet at any point is m 2^{m}+(m+1) 2^{m+1}, when m and n have the same parity. In addition, we present a new proof of the result that there are n 2^{n-1} such tilings with n monomers, which divides the tilings into n classes of
size 2^{n-1}. The sum of these over all m ≤ n has the closed form 2 ^{n-1}(3n-4)+2 and, curiously,
this is equal to the sum of the squares of all parts in all compositions of n.

Ranking and Loopless Generation of k-ary Dyck Words in Cool-lex Order

A binary string B of length n = k t is a *k-ary Dyck word*
if it contains t copies of 1, and the number of 0s in every prefix of B is at most k-1 times the number of 1s.
We provide two loopless algorithms for generating k-ary Dyck words in cool-lex order: (1) The first requires two index variables and assumes k is a constant; (2) The second requires t index variables and works for any k.
We also efficiently rank k-ary Dyck words in cool-lex order.
Our results generalize the “coolCat” algorithm by Ruskey and Williams (Generating balanced parentheses and binary trees by prefix shifts in CATS 2008) and provide the first loopless and ranking applications of the general cool-lex Gray code by Ruskey, Sawada, and Williams (Binary bubble languages and cool-lex order under review).

The Rand and Block Index of Pairs of Set Partitions

The *Rand index* of two set partitions is the number of pairs {x, y} such that there is a block in one partition
containing both x and y, but x and y are in different blocks in the other partition. Let R(n,k) denote the number of distinct (unordered) pairs of partitions of n that have Rank index k. For fixed k we prove that R(n,k) can be expressed as
Σ_{j} c_{k,j} (n choose j) B_{n-j}.
where c_{k,j} is a non-negative integer and B_{n} is a Bell number. For fixed k we prove that there is a constant
K_{n} such that R(n, (n choose 2)-k) can be expressed as a polynomial of degree 2k in n for all n ≥ K_{n}.
This polynomial is explicitly determined for 0 ≤ k ≤ 11. The *block index* of two set partitions is the number of elements that are not in common blocks. We give formulae and asymptotics based on N(n), the number of pairs of partitions with no blocks in common. We develop an O(n) algorithm for computing the block index.

Stable Sets of Threshold-based Cascades on the Erdös-Rényi Random Graphs

Consider the following *reversible cascade* on the Erdös-Rényi
random graph G(n,p). In round zero, a set of vertices, called the seeds, are active. In round t ∈ Ζ^{+}, a non-isolated vertex is activated or deactivated, respectively, if at least or
less than a ρ ∈ (0,1] fraction of its neighboring vertices are active in round t-1.
An *irreversible cascade* is defined similarly except that active vertices cannot be deactivated. A set of vertices, S, is said to be *stable* if no vertex will
ever change its state, from active to inactive or vice versa, once the set of active vertices equals S. For both the reversible and the irreversible cascades, we show that for any constant ε > 0,
all p ∈ [(1+ ε)(ln(e/ρ))/n,1] and with probability
1-n^{-Ω(1)}, every stable set of G(n,p) has size O(⌈ρ n⌉)
or n-O(⌈ρ n⌉).