US 7081839 B2 Abstract A method and apparatus are disclosed that compress an input string to an equivalent word relative to a noncommutation graph. The disclosed compression system compresses an input string in a manner that an equivalent string is produced upon decompression. The disclosed compression algorithms are based upon normal forms. First, a normal form of the interchange class is produced containing the source output string. Thereafter, a grammar-based lossless data compression scheme (or another compression scheme) is applied to the normal form. Upon decompression, the compressed string produces an equivalent string. A normal form generation process is employed to compute the lexicographic normal form or the Foata normal form of an interchange class from one of its members, using only a single pass over the data.
Claims(20) 1. A method for compressing an input string, comprising the steps of:
generating a lexicographic normal form from said input string, using only a single pass over said input string, wherein said input string has symbols belonging to a partially commutative alphabet; and
applying a compression scheme to said lexicographic normal form.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
employing a stack corresponding to each vertex νÅV, where w is a word over an alphabet V;
processing symbols of w from right to left;
upon seeing a letter u, pushing a u on its stack and a marker pushed on the stacks corresponding to symbols which are adjacent to u in a noncommutation graph G; and
once the entire word has been processed, using said stacks to determine said lexicographic normal form for an interchange class containing the word.
7. A method for compressing an input string, comprising the steps of:
generating a Foata normal form from said input string, wherein said input string has symbols belonging to a partially commutative alphabet; and
applying a compression scheme to said Foata normal form.
8. The method of
9. The method of
10. The method of
11. The method of
12. The method of
employing a stack corresponding to each vertex ?ÅV, where w is a word over an alphabet V;
processing symbols of w from right to left;
upon seeing a letter u, pushing a u on its stack and a marker on the stacks corresponding to symbols which are adjacent to u in a noncommutation graph G; and
once the entire word has been processed, using said stacks to determine said Foata normal form for an interchange class containing the word.
13. A compression system, comprising:
a memory; and
a processor operatively coupled to said memory, said processor configured to:
generate a normal form from said input string, using only a single pass over said input string, wherein said input string has symbols belonging to a partially commutative alphabet; and
applying a compression scheme to said normal form.
14. The compression system of
15. The compression system of
16. The compression system of
17. The compression system of
18. The compression system of
19. The compression system of
20. The compression system of
employ a stack corresponding to each vertex νÅV, where w is a word over an alphabet V;
process symbols of w from right to left;
upon seeing a letter u, pushing a u on its stack and a marker on the stacks corresponding to symbols which are adjacent to u in the noncommutation graph G; and
once the entire word has been processed, using said stacks to determine said normal form for an interchange class containing the word.
Description The present invention relates generally to data compression techniques, and more particularly, to methods and apparatus for compressing an input string in a manner that an equivalent string relative to a noncommutation graph is produced upon decompression. The ordering of events is fundamental to the study of the dynamic behavior of a system. In a sequential process, it is natural to use strings of symbols over some alphabet to specify the temporal ordering of events. The symbols may, for example, correspond to the states, commands, or messages in a computation. J. Larus, “Whole Program Paths,” ACM SIGPLAN Conf. Prog. Lang. Des. Implem., 259–69 (May, 1999), applies a lossless data compression algorithm known as “Sequitur” to the sequence of events or signals determining the control flow or operations of a program's execution. Sequitur is an example of a family of data compression algorithms known as grammar-based codes that take a string of discrete symbols and produce a set of hierarchical rules that rewrite the string as a context-free grammar that is capable of generating only the string. These codes have an advantage over other compression schemes in that they offer insights into the hierarchical structure of the original string. J. Larus demonstrated that the grammar which is output from Sequitur can be exploited to identify performance tuning opportunities via heavily executed subsequences of operations. The underlying premise in using lossless data compression for this application is the existence of a well-defined linear ordering of events in time. A partial ordering of events is a more accurate model for concurrent systems, such as multiprocessor configurations, distributed systems and communication networks, which consist of a collection of distinct processes that communicate with one another or synchronize at times but are also partly autonomous. These complex systems permit independence of some events occurring in the individual processes while others must happen in a predetermined order. Noncommutation graphs are used for one model of concurrent systems. To extend Larus' ideas to concurrent systems a technique is considered for compressing an input string in a manner that an equivalent string relative to a noncommutation graph is produced upon decompression. The compression of program binaries is important for the performance of software delivery platforms. Program binaries are files whose content must be interpreted by a program or hardware processor that knows how the data inside the file is formatted. M. Drinić and D. Kirovski, “PPMexe: PPM for Compressing Software,” Proc. 1997 IEEE Data Comp. Conf., 192–201 (March 2002), discloses a compression mechanism for program binaries that explore the syntax and semantics of the program to achieve improved compression rates. They also compress data relative to a noncommutation graph. The disclosed compression algorithm employs the generic paradigm of prediction by partial matching (PPM). While the disclosed compression algorithm performs well for many applications, it introduces certain inefficiencies in terms of compression and delays. A need therefore exists for a more efficient algorithm for compressing an input string given a set of equivalent words derived from a noncommutation graph. A further need exists for a decompression technique that reproduces a string that is equivalent to the original string. Generally, a method and apparatus are provided for compressing an input string relative to a noncommutation graph. The disclosed compression system compresses an input string in a manner that an equivalent string is produced upon decompression. The disclosed compression algorithms are based upon normal forms (i.e., a canonical representation of an interchange or equivalence class). Generally, the disclosed compression process can be decomposed into two parts. First, a normal form of the interchange class is produced containing the source output string. Thereafter, a grammar-based lossless data compression scheme (or another compression scheme) is applied to the normal form. Upon decompression, the compressed string produces an equivalent string. A normal form generation process is employed to compute the lexicographic normal form or the Foata normal form of an interchange class from one of its members, using only a single pass over the data. A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings. The present invention provides compression algorithms based upon variations of a standard notion in trace theory known as normal forms. A normal form is a canonical representation of an interchange class. The 1978 Lempel-Ziv data compression scheme (LZ '78), described, for example, in J. Ziv and A. Lempel, “Compression of Individual Sequences Via Variable-Rate Coding,” IEEE Trans. Inform. Theory IT-24, 530–36 (1978), can be viewed as an example of a grammar-based code. LZ '78 asymptotically compresses the output of an ergodic source to the source entropy with probability 1. J. C. Kieffer and E. -H. Yang, “Grammar-Based Codes: A New Class of Universal Lossless Source Codes,” IEEE Trans. Inform. Theory 46, 737–54 (2000), defines the notion of an irreducible grammar transform and demonstrates that any grammar-based codes that use an irreducible grammar transform is also universal in the sense that it almost surely asymptotically compresses the output of an ergodic source to the source entropy. In the illustrative embodiments described herein, any universal grammar-based lossless data compression scheme may be employed. While it is unknown if Sequitur is a universal compression technique, J. C. Kieffer and E. -H. Yang offers a modification of Sequitur that is provably universal. Two examples are discussed for which the codes of the present invention attain a new graph entropy referred to herein as the interchange entropy. In both cases, it is assumed for simplicity that the original source string was the output of a discrete, memoryless source; the analysis can be extended to finite-state, unifilar Markov sources, as would be apparent to a person of ordinary skill. In one instance, the dependence relation on the source alphabet is a complete k-partite graph and in the other case, the noncommutation graph contains at least one vertex which is adjacent to all others. For a further discussion of interchange entropy, see, S. A. Savari, “Concurrent Processes and the Interchange Entropy,” Proc. of IEEE International Symposium on Information Theory, (Yokohama, Japan, July 2003); S. A. Savari, “On Compressing Interchange Classes of Events in a Concurrent System,” Proc. of IEEE Data Compression Conference, (Snowbird, Utah, March 2003), or S. A. Savari, “Compression of Words Over A Partially Commutative Alphabet,” Information Sciences (IS) Seminar, Cal. Tech., Aug. 27, 2003, each incorporated by reference herein. Dependence Relations Trace theory is a known approach to extending the notions and results pertaining to strings in order to treat the partial ordering of event occurrences in concurrent systems. The idea is to combine the sequence of atomic actions observed by a single witness of a concurrent system with a labeled and undirected dependence relation or noncommutation graph specifying which actions can be executed independently or concurrently. Two words with symbols over a vertex set V are congruent or equivalent with respect to a noncommutation graph G if each can be obtained from the other through a process of interchanging consecutive letters that are nonadjacent vertices in G. For example, if the noncommutation graph G is given by a-b-c-d then the two words ddbca and bdadc are congruent since ddbca≡ There are two special cases of the dependence relation which are standard in information theory. When G is the complete graph on the vertex set V, i.e., when there is an edge connecting every pair of vertices, every word over V is congruent only to itself. At the other extreme, if G is the empty graph on the vertex set, i.e., if no two vertices are adjacent, then two words are congruent if and only if the number of occurrences of each symbol in V is the same for both words. The equivalence classes on words are frequently called type classes or composition classes in the information theory literature and rearrangement classes or abelian classes in combinatorics. A congruence class of words for an arbitrary noncommutation graph G is often referred to as a trace because they represent traces of processes, i.e., the sequence of states traversed by the process from initialization to termination, in nonsequential systems. Because the word trace has numerous connotations, the term interchange class is used herein to refer to an equivalence class of words. Motivated by the success of J. Larus in applying lossless data compression algorithms to a string of events in a sequential system, R. Alur et al., “Compression of Partially Ordered Strings,” 14th Int'l Conf. on Concurrency Theory (CONCUR 2003), (Sep. 3, 2003), introduces a compression problem where it is only necessary to reproduce a string which is in the same interchange class as the original string. R. Alur et al. describes some compression schemes for the congruence class of a string that in the best cases can be exponentially more succinct than the optimal grammar-based representation of the corresponding string. This compression problem also appears in the compression of executable code. As previously indicated, executable code or program “binaries” are files whose content must be interpreted by a program or hardware processor which knows exactly how the data inside the file is formatted in order to utilize it. One of the techniques given in M. Drinić and D. Kirovski for this compression application is “instruction rescheduling,” in which instructions can be reordered if the decompressed program is execution-isomorphic to the original. Interchange Entropy The present invention considers this compression problem from an information theoretic perspective. A new generalization of Kolmogorov-Chaitin complexity referred to as the interchange complexity is proposed and a version of the subadditive ergodic theorem is used to provide sufficient conditions on probabilistic sources so that an extension of the asymptotic equipartition property to interchange classes holds. The average number of bits per symbol needed to represent an interchange class is referred to as the interchange entropy. The interchange entropy is a functional on a graph with a probability distribution on its vertex set. For memoryless sources, there are two earlier graph entropies which have received considerable attention. The Korner graph, described, for example, in J. Korner, “Coding of an Information Source Having Ambiguous Alphabet and the Entropy of Graphs,” in Proc. 6th Prague Conf. on Information Theory, 411–25 (1973); or G. Simonyi, “Graph Entropy: A Survey,” in L. Lovász, P. Seymour, and W. Cook, ed., DIMACS Vol. 20 on Special Year on Combinatorial Optimization, 399–441 (1995), has been found to have applications in network information theory, characterization of perfect graphs, and lower bounds on perfect hashing, Boolean formulae size and sorting. Chromatic entropy was defined in connection with certain parallel-computing models in R. B. Boppana, “Optimal Separation Between Concurrent-Write Parallel Machines,” in Proc. 21st Ann. ACM Symp. Theory Comp., 320–26 (1989) and demonstrated in N. Alon and A. Orlitsky, “Source Coding and Graph Entropies,” IEEE Trans. Inform. Theory 42, 1329–339 (1996), to be linked to the expected number of bits required by a transmitter to convey information to a receiver who has some related data. As discussed below, the interchange entropy has some properties in common with these other graph entropies. The compression algorithms of the present invention can asymptotically achieve the interchange entropy for a large collection of dependence alphabets. R. Alur et al., referenced above, propose three methodologies for encoding a string given a partial order on the source alphabet. The first approach is to attempt to find a string equivalent to the source output string which compresses well. R. Alur et al. and M. Drinićand D. Kirovski put an alphabetical ordering on the symbols and sort the letters of the source output string to produce the equivalent string which is minimal under this ordering. The other algorithms of this variety simultaneously determine the equivalent string and a grammar-based code for it. These algorithms appear not to be easily amenable to an information theoretic analysis. The second class of procedures put forward in R. Alur et al. involve projections of the string onto subsets of the alphabet. A projection of a string σ on alphabet V onto a subalphabet A The asymptotic equipartition property is central to the study of lossless data compression. It states that most long sequences from a discrete and finite alphabet ergodic source are typical in the sense that their mean self-information per symbol is close to the entropy of the source. A consequence of this result is that the average number of bits per symbol required to losslessly encode the output of an ergodic source is asymptotically bounded from below by the binary entropy of the source. In order to find a counterpart for this lossy compression problem, the least amount of information is considered about an individual string that must be described in order to reproduce another string within the same interchange class. The appropriate framework for this discussion is algorithmic information theory. For a finite length string x over the vertex set V, C(x) denotes the Kolmogorov complexity of x and refer to M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 2d Ed., §2.1, 107, (Springer, New York, 1997), for the basic properties of C(x). Let V* be the set of all finite words from V and |V| denote the cardinality of V. The interchange complexity of uv≡ The following result is one way to characterize the equivalence of two strings with respect to a noncommutation graph G: Theorem 2.1 (D. Perrin, “Words Over a Partially Commutative Alphabet,” in A. Apostolico and Z. Galil, ed., Combinatorial Algorithms on Words, NATO ASI Series, Volume F12, 329–40 (Springer, Berlin, 1985)): For any subset A of the vertex set V and any word w over V, let π Since Theorem 2.1 specifies the necessary and sufficient conditions for two words to be congruent with respect to a non-commutation graph G, the interchange class containing a string can be completely determined by any element of the interchange class which can be used to provide the type class and edge projections. Conversely, given the type class and edge projections of an interchange class, it is possible to use a knowledge of these to produce a word in the interchange class for a noncommutation graph G as follows. If G is the empty graph, then it is straightforward to use the type class to reconstruct a word consistent with the type. If G is not the empty graph, the type class is used to determine the number of appearances of any symbol which commutes with every other symbol in V. The symbols appearing in the edge projections remain. The leftmost symbol in each projection is initially a possibility for the next symbol in our word. If there are any two symbols, say u and v, among these which do not commute, then the projection onto edge {u,v} determines which symbol appears first in the projection, and the other is removed from the set of possible next symbols. This procedure is iterated until the set of possible next symbols contains no pair of symbols which are adjacent in G. Any symbol from this set can be chosen as the next letter. If symbol u is chosen, then the leftmost u is removed from every edge projection onto u and its neighbors in G. This algorithm is repeated until every edge projection is empty. It follows that C Suppose there are words u,v,w,xεV* with u≡ Let l(u) denote the length of word uεV*. C For a word u Theorem 2.2 (N. G. DeBruijn and P. Erdös, “Some Linear and Some Quadratic Recursion Formulas I,” Indag. Math. 13, 374–82 (1952)): Suppose φ is a positive and nondecreasing function that satisfies Hence equation (2) and Theorem 2.2 imply that the asymptotic per symbol information content needed to convey a word equivalent to the original bound is well-defined. More specifically: Proposition 2.3: For any word u Next, a probabilistic version of Proposition 2.3 is found. The appropriate frame of reference is subadditive ergodic theory. The following theorem is utilized: Theorem 2.4 (Y. Derriennic, “Un Théorème Ergodique Presque Sous-Additif,” Ann. Prob. 11, 669–77 (1983)): Let X -
- 1) X
_{0,n}≦X_{0,m}+X_{m,n}+A_{m,n}. - 2) X
_{m,n }is stationary, i.e., the joint distributions of X_{m,n }are the same as the joint distributions of X_{m+1,n+1 }and ergodic. - 3) E[X
_{0,1}]<∞ and for each n, E[X_{0,n}]≧c_{0}n with c_{0}>−∞. - 4) A
_{m,n}≧0 and lim_{n→∞}E└A_{0,n}/n┘=0. Then
- 1) X
Theorem 2.4 is applied to the output of two very broad categories of sources. A discrete source is said to be stationary if its probabilistic specification is independent of a time origin and ergodic if it cannot be separated into two or more different persisting modes of behavior. A more precise definition of a discrete, stationary, and ergodic source can be found in R. G. Gallager, Information Theory and Reliable Communication, §3.5 (Wiley, New York, 1968). A unifilar Markov source with finite alphabet V and finite set of states S is defined by specifying for each state sεS and letter vεV -
- 1) the probability p
_{s,v }that the source emits v from state s; - 2) the unique next state S[s, v] after v is output from state s.
Given any initial state s_{0}εS, these rules inductively specify both the probability P(σ/s_{0}) that any given source string σεV* is emitted and the resulting state S[s_{0},σ] after σ is output. For the null string Ø and each state sεS, the convention is that P(Ø|s)=1. It is assumed that the source has a single recurrent class of states; i.e., for each pair of states s and r, there is a non-null string σεV* such that P(σ|s)>0 and S[s,σ]=r. The class of unifilar Markov sources is fairly general and includes, for each l≧1, the group of sources for which each output depends statistically only on the l previous output symbols. The following result is obtained:
- 1) the probability p
Theorem 2.5 (A.E.P. for interchange classes): Let U Unless otherwise specified, it is assumed hereafter that we have probabilistic sources P in which the random variables n It is generally considered to be difficult to determine or even bound the limiting constants obtained by a subadditivity argument. For the present problem, there are two straightforward approaches to bounding H The moment generating function for the number of interchange classes for words of a given length was shown to be equal to the inverse of the Mobius polynomial corresponding to a function of G. Recently, a formula for the dominant term in the asymptotic expansion of the number of traces was provided in M. Goldwurm and M. Santini, “Clique Polynomials Have a Unique Root of Smallest Modulus,” Information Processing Letters 75(3), 127–132, (2000). In the special case where G is the empty graph, it is well known that the number of type classes of length n for a vertex set V with cardinality |V| is at most (n+1) The characterization of interchange classes by type class and edge projections provided in Theorem 2.1 implies that the interchange entropy is monotonic, subadditive, and for memoryless sources satisfies two special cases of additivity under vertex substitution. Let E denote the edge set of a graph. Proposition 2.6 (Monotonicity): If F and G are two graphs on the same vertex set and the respective edge sets satisfy E(F) Proposition 2.7 (Subadditivity): Let F and G be two graphs on the same vertex set V and define F∪G to be the graph on V with edge set E(F)∪E(G). For any word x, C The concept of substitution of a graph F for a vertex v in a disjoint graph G is described in G. Simonyi, §3. The idea is that v and the edges in G with v as an endpoint are removed and every vertex of F is connected to those vertices of G that were adjacent to v. This notion can be extended to a property of Korner graph entropy known as “additivity of substitution.” The concept does not hold in general for the interchange entropy, but there are two special cases which apply. The first one is concerned with graphs consisting of more than one connected component. Proposition 2.8: Let the subgraphs G
An example illustrates that Proposition 2.8 fails in general to hold for the output of sources with memory. Suppose that V={a, b, c, d}, G=a-b c-d, and the source is an order-1 Markov chain with P(c|a)=P(d|b)=1, P(a|c)=P(b|c)=P(a|d)=P(b|d)=0.5. Assume that the first symbol is equally likely to be an a or a b. In other words, the source outputs two symbols at a time independently with half being ac and the other half being bd. It is easy to verify that the entropy of the original source is 0.5 bits per symbol. Next suppose F=a-b c-d. In order to represent a word congruent to the source output with respect to F, the projection of the string onto the subalphabet {a, b} must be precisely characterized. Note that this projection looks like the output of a binary, memoryless source with P(a)=P(b)=0.5. Since half of the symbols from the original string appear in the projection, it follows that H A second example of additivity of substitution for the interchange entropy is considered assuming the original source string is the output of a memoryless source. Proposition 2.9: Let F be a graph consisting of two vertices x and y and an edge connecting them, let G be a graph with vertex set disjoint from F, and let v be a vertex of G. Form the graph G For discrete memoryless sources, the exact expression is obtained for H Theorem 2.10: Assume a discrete, memoryless source with probability distribution P on vertex set V. Suppose V is of the form V=V
Theorem 2.10 leads to the following property of the interchange entropy for the output from a discrete, memoryless source. Corollary 2.11: Assume a discrete, memoryless source with probability distribution P on vertex set V(G). If G is not the complete graph on V(G), then H The example following Proposition 2.8 illustrates that it is possible for a source with memory to satisfy H An example illustrates some of the results in this section. Suppose the noncommutation graph G is a-b-c and P(a)=P(b)=P(c)=⅓. A simple upper bound for H
The following section considers some universal compression algorithms for the problem of representing interchange classes and begins with a discussion of normal forms. There are two types of normal forms which are frequently discussed in the trace theory literature. One of these is known as the lexicographic normal form and was first considered in A. V. Anisimov and D. E. Knuth, “Inhomogeneous Sorting,” Int. J. Comp. Inform. Sci. 8, 255–260 (1979). The other normal form is called the Foata normal form, described in P. Cartier and D. Foata, “Problémes Combinatoires de Commutation et Réarrangements, Lecture Notes in Mathematics 85 (Springer, Berlin, 1969). In order to compute either normal form, a total ordering on the vertex set V must be given. The lexicographic normal form of an interchange class is the unique word in the interchange class which is minimal with respect to the lexicographic ordering. Continuing the example considered in the introduction, assume a noncommutation graph G is given by a-b-c-d and suppose that a<b<c<d. The lexicographic normal form of the interchange class containing the two words ddbca and bdadc is baddc. It has been shown that a necessary and sufficient condition for a word w to be the lexicographic normal form of an interchange class is that for all factorizations w=xvyuz such that u and v are commuting symbols in V with u<v; x and z are possibly empty words over V, and y is a non-empty word over V, there exists a letter of y which does not commute with u. In order to define the Foata normal form, the notion of finite non-empty subsets of pairwise independent letters is needed. Define the set F by F={F Each FεF F is called an elementary step and it can be converted into a type class denoted by [F] consisting of words which are products of all of the elements of F. The Foata normal form of an interchange class c is the unique string of elementary steps vεφ -
- c=[φ
_{1}][φ_{2}] . . . [φ_{r}] - for each l≦i<r and each letter uεφ
_{i+1 }there exists a letter vεφ_{i }either satisfying v=u or u and v are adjacent in the noncommutation graph G.
- c=[φ
The number of elementary steps r in the Foata normal form is a measure of the parallel execution time associated with an interchange class. P. Cartier and D. Foata was the first to establish that the Foata normal form is well-defined and there are many proofs of this result. To return to the previous example, when the noncommutation graph G is given by a-b-c-d, it follows that F={{a},{b},{c},{d},{a,c},{a,d},{b,d}} and the Foata normal form for the interchange class containing the words ddbca and bdadc is {b,d},{a,d},{c}. An algorithm (as well as exemplary pseudocode) to compute both the lexicographic normal form and the Foata normal form of an interchange class from one of its members was provided in D. Perrin, “Words Over a Partially Commutative Alphabet,” in A. Apostolico and Z. Galil, ed., Combinatorial Algorithms on Words, NATO ASI Series, Volume F12, 329–340, (Springer, Berlin, 1985), incorporated by reference herein. -
- To obtain the lexicographic normal form: At each step the next letter of the normal form is the minimum letter u with respect to the lexicographic ordering which is currently at the top of some stack. u is popped from its stack and also pop a marker from each stack corresponding to a vertex vεV which is adjacent to u in G. This procedure is iterated until every stack is empty.
- To derive the Foata normal form: At each step the members of the next elementary step are those letters which are on the tops of stacks. We pop these letters from their stacks and for each member u of the elementary step we also pop a marker from each stack corresponding to a letter vεV which does not commute with u. This procedure is iterated until every stack is empty.
Resuming the preceding example, when the dependence relation G is a-b-c-d and the original word is ddbca, the resulting stacks are shown in Given these notions of normal forms, there are three categories of techniques that will be considered for transforming a source output string before a universal grammar-based lossless data compression scheme is applied. The first of these selects a total ordering on the vertex set V and finds the lexicographic normal form of the interchange class containing the source output string. Observe that for every pair of symbols u and v with u<v which commute in G, the lexicographic normal form derived from a word never contains the substring vu. The other two categories of processing the source output string are based upon the Foata normal form. Let F For the last transformation, a superalphabet V The transformations defined above can be used for any noncommutation graph G. It is mentioned in passing that when G is not connected, the option is available of finding its components, projecting the original string onto each subalphabet consisting of the vertices of a component, and proceeding to use any of the three categories of normal form representations listed above for mapping the projections of the original string. Combining Normal Forms and Irreducible Grammar-Based Codes The normal form can be as the string which is the output of an auxiliary source. In general, the auxiliary source is not ergodic. For example, suppose you have a binary source which is not necessarily ergodic emitting the digits 0 and 1 and the digits commute. As discussed above, the interchange entropy of this source is zero. If the lexicographic order is selected 0<1 and the binary string contains 1 zeroes and m ones, then its lexicographic normal form is a run of 1 zeroes followed by a run of m ones, its Foata normal form is min{l,m} copies of the string 01 concatenated with 1−m zeroes if 1>m or m−1 ones if m>1, and the final transformation is min{l,m} copies of the auxiliary symbol V To illustrate another difficulty, the example following Proposition 2.8 is again considered. Suppose once more that V={a,b,c,d},G=a-b c-d, and the source is an order-1 Markov chain with P(c|a)=P(d|b)=1,P(a|c)=P(b|c)=P(a|d)=P(b|d)=0.5. As discussed above, H(G,P)=H(P)=0.5 bits per symbol. Next assume that the total ordering of the vertex set is a<b<c<d and begin to process a source output string by converting it into its lexicographic normal form. Then, for a string of length Two instances are demonstrated below for which the auxiliary source is Markov with a countably infinite state space and which has the property that the auxiliary source entropy is equal to the original source's interchange entropy. Since a universal grammar-based code compresses an ergodic source to its entropy, the combined codes of the present invention compress a source to the interchange entropy in these special cases. First consider dependence relations which are complete k-partite graphs. As in the section entitled “Interchange Complexity and Interchange Entropy,” the vertex set V is represented by V An auxiliary source is specified which captures both the mapping into lexicographic normal form and the first transformation into Foata normal form. It is assumed that each phrase from the original source is converted to a string which is the unique designated representative for the type class for that phrase. The auxiliary source is then a countably infinite Markov chain where the state at any time consists of the suffix of the designated representative phrase beginning with the current symbol. While within a phrase the auxiliary source has no uncertainty in the transition from one state to the next; i.e., there is a single possible transition that occurs with probability 1. All of the uncertainty resides in the transition from the final letter in a phrase to the first state corresponding to the next phrase, and these transition probabilities depend only on the vertex subset associated with the current phrase. Let H Theorem 4.1: Assume a discrete memoryless source with probability distribution P on the vertex set of a complete-k partite graph K Consider the third transformation of the original source into an auxiliary source. In this case, the superalphabet V Theorem 4.2: The entropy H Consider the case where the noncommutation graph contains at least one vertex which is adjacent to all others. Let V
Next, consider auxiliary sources (viewed as a countably infinite Markov chain where the state at any time consists of the suffix of the present designated representative phrase beginning with the current symbol). While within a phrase, there is a single possible transition from one state to the next that occurs with probability 1. All of the uncertainty lies in the transition from the final letter in a phrase to the first state marking the beginning of the next phrase, and these transitions are independent and identically distributed. In order to compute the entropy H Consider the case where the original source is a finite state, unifilar Markov source and the dependency graph is either a complete k-partite graph or a graph where at least one vertex is adjacent to all of the others. In this case, the interchange class of the phrases combined with some information about the state of the original process at the beginning and end of the phrases forms a countably infinite state, ergodic Markov chain. The exact states of the original process at the beginning and end of the phrases need not be necessary. For example, in the complete k-partite case if V As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks, or memory cards) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk. The computer systems and servers described herein each contain a memory that will configure associated processors to implement the methods, steps, and functions disclosed herein. The memories could be distributed or local and the processors could be distributed or singular. The memories could be implemented as an electrical, magnetic or optical memory, or any combination of these or other types of storage devices. Moreover, the term “memory” should be construed broadly enough to encompass any information able to be read from or written to an address in the addressable space accessed by an associated processor. With this definition, information on a network is still within a memory because the associated processor can retrieve the information from the network. It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Patent Citations
Non-Patent Citations
Referenced by
Classifications
Legal Events
Rotate |