US 20070040710 A1 Abstract An enumerator employs “indexing volumes” as the add-on values used to compute indexes for n-item ordered sets such as symbol sequences. Each indexing volume is associated with a different class into which the allowed ordered sets are partitioned. The indexing volumes all equal or exceed the number of ordered sets that belong to their respective classes. Additionally, the indexing volumes are quantized such that each volume V equals wr
^{s}, where r is an integer greater than unity, s is a non-negative integer, w is a positive integer whose resolution is less than required for some set counts. As a result, the addition operations used to compute the indexes can be performed with limited precision, and storage requirements for the add-on values can be relatively modest. By storing less than all the volumes needed but computing the remainder from those that are stored, the storage requirement can be reduced further. Claims(3) 1. In an enumerative encoder, an index-computation circuit that:
A) for a plurality of symbol populations (i, k), where k is the number of occurrences of a given binary symbols in a binary sequence of length i, contains respective pre-stored volume values B(i, k) such that B is an integer greater than or equal to the sum of every indexing volume associated with a sequence-length-(i−1) symbol population that is a predecessor the symbol population (i, k) and B(i, k)=2 ^{s}·w, where s is a non-negative integer, w is a positive integer less than 2^{k−1}, and, for some sequence whose length is less than some length n, h is the number of binary digits in the smallest quotient that results from evenly dividing the sequence count of that symbol population by a positive-integer power of 2; B) computes an index I(a _{1}a2 . . . a_{n}) for an n-symbol symbol sequence (a_{1}a_{2 }. . . a_{n}) by computing indexes I(a_{1}a_{2 }. . . a_{t}) for successive values of t in accordance I(a_{1}a_{2 }. . . a_{t})=I(a_{1}a_{2 }. . . a_{t−1})+b_{t}·B(t−1, k_{t}), where k_{t }is the number of occurrences of the given symbol in a_{1}a_{2 }. . . a_{t}, b_{t }equals zero if a_{t }has one of the symbol values, b_{t }equals one if a_{t }has the other of the symbol values, B(t−1, k_{t}) is obtained for some values of t by fetching the pre-stored value of B(t−1, k_{t}), and B(t−1, k_{t}) is computed in accordance with B(t−1, k_{t})=B(t−2, k_{t})+B(t−2, k_{t}−1) for other values of t; and C) generates an output from the index thus computed. 2. A storage medium containing machine instructions readable by a computer system to configure it as an entropy encoder that:
A) for a plurality of symbol populations (i, k), where k is the number of occurrences of a given binary symbols in a binary sequence of length i, contains respective pre-stored volume values B(i, k) such that B is an integer greater than or equal to the sum of every indexing volume associated with a sequence-length-(i−1) symbol population that is a predecessor the symbol population (i, k) and B(i, k)=2 ^{s}·w, where s is a non-negative integer, w is a positive integer less than 2^{h−1}, and, for some sequence whose length is less than some length n, h is the number of binary digits in the smallest 1I quotient that results from evenly dividing the sequence count of that symbol population by a positive-integer power of 2; B) computes an index I(a _{1}a_{2 }. . . a_{n}) for an n-symbol symbol sequence (a_{1}a_{2 }. . . a_{n}) by computing indexes I(a_{1}a_{2 }. . . a_{t}) for successive values of t in accordance I(a_{1}a_{2 }. . . a_{t})=I(a_{1}a_{2 }. . . a_{t−1})+b_{t}·B(t−1, k_{t}), where k_{t }is the number of occurrences of the given symbol in a_{1}a_{2 }. . . a_{t}, b_{t }equals zero if at has one of the symbol values, b_{t }equals one if a_{t }has the other of the symbol values, B(t−1, k_{t}) is obtained for some values of t by fetching the pre-stored value of B(t−1, k_{t}), and B(t−1, k_{t}) is computed in accordance with B(t−1, k_{t})=B(t−2, k_{t})+B(t−2, k_{t}−1) for other values of t; and C) generates from the index thus computed an output that represents an entropy code for the n-symbol sequence. 3. A method of entropy encoding comprising:
A) for a plurality of symbol populations (i, k), where k is the number of occurrences of a given binary symbols in a binary sequence of length i, storing in a computer system respective pre-stored volume values B(i, k) such that B is an integer greater than or equal to the sum of every indexing volume associated with a sequence-length-(i−1) symbol population that is a predecessor the symbol population (i, k) and B(i, k)=2 ^{s}·w, where s is a non-negative integer, w is a positive integer less than 2^{h−1}, and, for some sequence whose length is less than some length n, h is the number of binary digits in the smallest quotient that results from evenly dividing the sequence count of that symbol population by a positive-integer power of 2; and B) employ the computer system to:
i) compute an index I(a
_{1}a_{2 }. . . a_{n}) for an n-symbol symbol sequence (a_{1}a_{2 }. . . a_{n}) by computing indexes I(a_{1}a_{2 }. . . a_{t}) for successive values of t in accordance I(a_{1}a_{2 }. . . a_{t})=I(a_{1}a_{2 }. . . a_{t−1})+b_{t}·B(t−1, k_{t}), where k_{t }is the number of occurrences of the given symbol in a_{1}a_{2 }. . . a_{t}, b_{t }equals zero if a_{t }has one of the symbol values, b_{t }equals one if a_{t }has the other of the symbol values, B(t−1, k_{t}) is obtained for some values of t by fetching the pre-stored value of B(t−1, k_{t}), and B(t−1, k_{t}) is computed in accordance with B(t−1, k_{t})=B(t−2, k_{t})+B(t−2, k_{t}−1) for other values of t; and ii) generate from the index thus computed an output that represents an entropy code for the n-symbol sequence.
Description The present application is a continuation of commonly assigned copending U.S. patent application Ser. No. 11/015,894, which was filed on Dec. 17, 2004, by Ratko V. Tomic for Fast, Practically Optimal Entropy Coding and claimed the benefit of U.S. Provisional Patent Application Ser. No. 60/603,464, which was filed on Aug. 20, 2004, by Ratko V. Tomic for a Fast, Practically Optimal Entropy Encoder, and of U.S. Provisional Patent Application Ser. No. 60/606,681, which was filed on Sep. 2, 2004, by Ratko V. Tomic for a Fast, Practically Optimal Entropy Encoder, all of which are hereby incorporated by reference. 1. Field of the Invention The present invention concerns algorithmically indexing ordered sets. It is particularly, but not exclusively, applicable to entropy encoding. 2. Background Information Data compression usually includes multiple phases, where the initial phases are more dependent on the specific data source. The initial phases typically identify the source-specific higher-level regularities and convert them into more-generic forms. The final output of this higher-level processing is a sequence of symbols in which higher-level, domain- or source-specific regularities have been re-expressed as simple, generic (quantitative) regularities, such as a highly skewed distribution of the produced symbols (picturesquely described as a “concentration of energy” when the statistical imbalances vary across the output sequence). The task of the entropy coder is to transform these simple regularities into fewer bits of data. Optimal encoding is quantified as the message entropy, i.e. as the minimum number of bits per message averaged over all the messages from a given source. In the case of a source with a finite number of M distinct messages, all equally probable, the entropy H (per message) is log More often, though, messages' probabilities re not equal. A common entropy-coding scenario is the one in which messages are sequences of symbols selected from an alphabet A of R symbols a This value is less than log M if the probabilities are not equal, so some savings can result when some messages are encoded in fewer bits than others. Taking advantage of this fact is the goal of entropy coding. The two types of general entropy-coding algorithms that are most popular currently are Huffman coding and arithmetic coding. The Huffman algorithm assigns to each symbol as a unique bit string whose length is approximately log(1/p The principal weakness of the Huffman code is its sub-optimality in the case of more-general probabilities (those not of the form 1/2 A second important weakness of the Huffman code is that its coding overhead increases, both in speed and memory usage, when the adaptive version of the algorithm is used to track varying symbol probabilities. For sufficiently variable sources, moreover, even adaptive Huffman algorithm cannot build up statistics accurate enough to reach coding optimality over short input-symbol spans. In contrast to Huffman coding, arithmetic coding does not have the single-bit-per-symbol lower bound. As a theoretical, albeit impractical, method, arithmetic coding goes back to Claude Shannon's seminal 1948 work. It is based on the idea that the cumulative message probability can be used to identify the message. Despite minor improvements over the decades, its fatal drawback was the requirement that its arithmetic precision be of the size of output data, i.e., divisions and multiplications could have to handle numbers thousands of bits long. It remained a textbook footnote and an academic curiosity until 1976, when an IBM researcher (J. Rissanen, “Generalised Kraft Inequality and Arithmetic Coding,” Theoretically, the slow-adaptability problem that these two popular entropy-encoding techniques share can be overcome by a relatively obscure compression technique known as “enumerative coding.” The roots of enumerative coding extend farther into the past than modem information theory, going back to the enumerative combinatorics of the Nineteenth and early Twentieth Centuries. And using combinatorial objects for ranking, as conventional enumerative encoding does, had actually been part of common computer-programming folklore for over a decade in 1966, when Lynch (T. J. Lynch, “Sequence Timecoding for Data Compression,” Conceptually, enumerative encoding lists all messages that meet a given criterion and optimally encodes one such message as an integer representing the message's index/rank within that list. In words, an example would be, “Among the 1000-bit sequences that contain precisely forty-one ones (and the rest zeros), the sequence that this code represents is the one with whose pattern we associate index 371.” That is, the example encoding includes both an identification of the source sequence's symbol population, (41 ones out of 1000 in the example), and an index (in that case, 371) representing the specific source sequence among all those that have the same symbol population. Since the number of patterns for a given population can be quite large, it would not be practical to arrive at a significant-length sequence's pattern index by storing associations between indexes and patterns in a look-up table. Instead, one would ordinarily arrive at any given source pattern's index algorithmically, and the index-determining algorithm would typically be based on the value that the sequence represents. In accordance with one such indexing approach, for example, the prior example may alternatively be expressed in words as, “The sequence that this code represents is the 371 Consider the seven-bit sequence 1001010, for example, i.e., one of the sequences that has three ones out of seven bits. The task is to determine an index that uniquely specifies this sequence from among all that have the same population, i.e., from among all seven-bit sequences that have three ones and four zeros. In accordance with an indexing scheme in which indexes increase with the sequence's value and the more-significant bits are those to the left, the index can be computed by considering each one-valued bit in turn as follows. Since the example sequence's first bit is a one, we know that its value exceeds that of all same-population sequences in which all three ones are in the remaining six bits, so the index is at least as large as the number of combinations of three items chosen from six, i.e., 6!/(3!·3!), and we start out with that value. Out of all same-population sequences that similarly start with a one bit, the fact that the example sequence has a one in the fourth bit position indicates that its index exceeds those in which both remaining ones are somewhere in the last three bit positions, so the index is at least as large as the result of adding the number of such sequences to the just-mentioned number in which all three are in the last six positions. By following that reasoning, the index I can be determined in accordance with:
Now, that index requires five bits, and it would take three bits to specify the population value, so the resultant eight bits exceeds the length of the (seven-bit) source sequence. But it is apparent that the comparison of the source-sequence length with the index length would be more favorable for a more-skewed population in a longer sequence. And the number of bits required for the “side information” that specifies the population increases only as the logarithm of the sequence length. Over a group of such sequences, moreover, that side information can itself be compressed. So the resultant code length approaches source entropy as the source-sequence length becomes large. The combinatorial values used as “add-on” terms in the index calculation can be expensive to compute, of course, but in practice they would usually be pre-computed once and then simply retrieved from a look-up table. And it is here that enumerative coding's theoretical advantage over, say, arithmetic coding is apparent. Just as combinatorial values are successively added to arrive at the conventional enumerative code, successive “weight” values are added together to produce an arithmetic code. And arithmetic coding's weights can be pre-computed and retrieved from a look-up table, as enumerative coding's combinatorial values can. In arithmetic coding, though, the values of such add-on terms are based on an assumption of the overall sequence's statistics, and the arithmetic code's length will approach the source sequence's theoretical entropy value only if statistics of the source sequence to be encoded are close to those assumed in computing the add-on terms. To the extent that source statistics vary, the look-up table's contents have to be recomputed if near-optimal compression is to be achieved, and this imposes a heavy computational burden if the source statistics vary rapidly. In contrast, enumerative coding's table-value computation is not based on any assumption about the sequence's overall statistics, so it can approach theoretical entropy without the computation expense of adapting those values to expected statistics. Enumerative coding has nonetheless enjoyed little use as a practical tool The reason why can be appreciated by again considering the example calculation above. The sequence length in that example was only seven, but the lengths required to make encoding useful are usually great enough to occupy many machine words. For such sequences, the partial sums in the calculation can potentially be that long, too. The calculation's addition steps therefore tend to involve expensive multiple-word-resolution additions. Also, the table sizes grow as N Arithmetic coding once suffered from the same drawback, but the Rissanen approach mentioned above solved the problem. Basically, Rissanen employed add-on values that could be expressed as limited-precision floating-point numbers. For example, the resolution might be so limited that all of each add-on value's bits are zeros except the most-significant ones and that the length of the “mantissa” that contains all of the ones is short enough to fit in, say, half a machine word. Even if such an add-on value's fixed-point expression would be very long and that value is being added to a partial sum that potentially is nearly as long, the resolution of the machine operation used to implement that addition can be small, since the change if any in the partial sum occurs only in a few most-significant bits. Rissanen recognized that add-on values meeting such resolution limitations could result in a decodable output if the total of the symbol probabilities assumed in computing them is less than unity by a great enough difference and the values thus computed are rounded up meet the resolution criterion. (The difference from unity required of the symbol-probability total depends on the desired resolution limit.) Still, the best-compression settings of modern implementations require multiplications on the encoder and divisions on the decoder for each processed symbol, so they are slower than a static Huffman coder, especially on the decoder side. (The particular degree of the speed penalty depends on the processor.) By some evaluations, moreover, the arithmetic coder compresses even less effectively than the Huffman coder when its probability tables fail to keep up with the source probabilities or otherwise do not match them. I have recognized that an expedient somewhat reminiscent of Rissanen's can be used to reduce the computation cost of enumerative encoding in a way that retains its general applicability and sacrifices little in compression ratio. I have recognized, that is, that such a result can come from replacing the conventional combinatorial values with limited-resolution substitutes. Now, there is no straightforward way of applying the Rissanen approach to enumerative coding. As was explained above, the tactic Rissanen used to produce decodable output was to reduce the assumed symbol probabilities on which the his add-on-value computations were based, whereas the computation of conventional enumerative coding's add-on values is not based on assumed probabilities. And straightforward rounding of the conventional combinatorial values to lower-resolution substitutes does not in general produce decodable results: more than one source sequence of the same symbol population can produce the same index. So, although substituting limited-resolution add-on values for conventional ones has been tried before in enumerative coding, previous approaches to using short-mantissa substitutes for conventional combinatorial values were restricted to source sequences that are constrained in ways that most source sequences are not. They have therefore been proposed for only a few niche applications. But I have recognized that these limitations can be overcome by using what I refer to as “quantized indexing.” In quantized indexing, gaps are left in the sequence of possible indexes: for a given symbol population, that is, the index values used to identify some sequences having that population will sometimes exceed certain values not so used. I leave gaps in such a way that the add-on values used to compute the indexes can be expressed in low-resolution representations that can be added in low-resolution operations and can require relatively little storage space. As will be seen below, such add-on values can readily be so chosen as to comply with the “pigeonhole principle” i.e., to result in decodable indexes by employing a “bottom-up” approach to add-on-value computation, i.e., by deriving add-on values for longer sequences' symbol populations from those for smaller sequences. The invention description below refers to the accompanying drawings, of which: Before we consider ways in which the present invention can be implemented, we will briefly consider a typical environment in which an entropy encoder may be used. The entropy encoder may be a constituent of a composite encoder Despite the differential operations, there is usually some skew in the resultant output's symbol distribution, and it is at this point that the entropy coding For the sake of explanation, it is convenient to represent the operations as In that drawing, a computer system In any event, the ROM Of course, few computer systems that implement the present invention's teachings will be arranged in precisely the manner that To introduce those teachings, we will start by returning to conventional enumerative encoding and describing it in accordance with a conceptual framework that helps present certain of the present invention's aspects. Of special interest are binary sources, i.e., sources whose outputs are sequences of the symbols 0 and 1, since most other types of data sources can be reduced to this canonical source. We will map such sequences to paths on a square lattice depicted in We digress here to point out that references in this discussion below to 0 bits and 1 bits in the sequence to be encoded is arbitrary; 0 refers to one of the two possible bit values and 1 to the other, independently of what arithmetic meaning they are accorded outside of the encoding operation. Also, although it is advantageous if the sequence-bit value to which we refer as 1 occurs less frequently in the sequence than the value to which we refer as 0, there is no such requirement. Before we compute the index of a particular path, we will examine how many different paths (constructed by our mapping rule) there are from point A to point B. For all edge points (x,0) or (0,y) the path counts for the neighbors at (x,−1) or (−1,y) are 0 since these neighbors cannot be reached by our lattice-walk rules. (The only valid steps are right or down.) And we define the origin (0,0)'s path count as 1 (corresponding to the path of 0 steps) in order to avoid separate equations for the edge-point path counts (which are always 1). Eq. (2) enables us to compute the path counts for all (x, y) points along an n-step front (the points along the line x+=n) from path counts of the points on the (n−1)-step front. Since we already have the path counts for the two-step front, we will propagate them, as Having found the path count N(B)≡N(5,3)=56, we know that numbers in the range [0 . . . 55] are sufficient to guarantee a unique numeric index to every distinct path to point B. To arrive at a specific numeric classification of the paths to B, we will adopt a divide-and-conquer strategy, splitting the problem into smaller sub-problems until the sub-problems become non-problems. Following the hint of the Eq. (2), we notice that the fifty-six paths reaching point B (after 8 steps) consist of thirty-five paths arriving from B's left neighbor B If we had an indexing scheme U In summary, we can construct indexing for the eight-step paths to B from the seven-step indexing by directly reusing the seven-step index for the paths coming from the left neighbor B We can follow this approach for any given path, moving back along the path, while accumulating the full index offset by adding the left neighbor's path count whenever the next back-step is going up, and reducing in each step the unknown residual index to the next-lower order. Eventually, we will reach an edge point (x=0 or y−0), where the path counts are 1. Since this single path is indexed by a single value 0, that completes our residual index reduction. The resulting index of the full path is thus the accumulated sum of the index offsets alone. The numbers circled in Since the index reduction described above is the foundation of enumerative coding and the springboard for the new approach described below, we will rewrite it symbolically for a general point B This is merely a concise symbolic restatement of the earlier conclusion about the reuse of the previous order index I Although (5) could be used to backtrack visually along the path (as in With this identification, the path counts being summed in (5) become:
The only non-zero contributions to the sum (5) come from those i for which b The encoding proceeds by scanning the input data until the jth instance of a bit set to 1 is found at some zero-based bit index n Some boundary cases of interest are strings consisting of all zeroes (k=0) or all ones (k=n). Since the path counts in these cases are C(n,0)=1 and C(n,n)=1, the number of bits for the path index is log(C)=log(1)=0; i.e., no compressed bits are transmitted. If the block size n is pre-defined, the only data sent are the count of 1's, which is 0 or n. The decoder starts with the received index I, the count of 1's (the value k) and the known (e.g., pre-arranged) total number of expanded bits n. If the special boundary cases k=0 and k=n have been handled separately, the decoding proceeds as the Sliding Window Enumerative Coding Having now examined conventional enumerative encoding in detail (and described a self-sufficient way of implementing it), we are now ready to consider one way to practice the invention. To motivate the main constructs of that approach, we will revisit the conventional enumerative-coding results from the In addition to the index I, the decoder needs to know in advance where the end-point B was, i.e., what the source sequence's symbol population was, so more data (the side information) needs to be sent. Since there is a constraint x+y=n and in this example the two sides have agreed to a common value of n, the decoder can infer the symbol population simply from the count of 1's (the y coordinate). For our block size of 8 bits, the count of 1's could be any number from 0 to 8, spanning a range of 9 values, so it takes log(9)=3.17 bits on average to send the side information. This is more than half of the “main” compressed data size, and it makes the total compressed size 8.98 bits. That is, the “compressed” data's size exceeds even that of the uncompressed data. By using Eq. (1), we can compute the entropy of a binary source that produces 3/8=37.5% 1's and 5/8=62.5% 0's for a block of 8 bits and obtain: H(3/8,8)=5 log(8/5)+3 log(8/3)=7.64 bits. Although our “main” compressed data, the bit-string index, had used only 5.81 bits, which is less than the entropy of 7.64 bits, the side information's overhead (the 3.17 bits) turned the encoding into a net data expansion. If we were to use blocks larger than eight bits, the compression would improve, because the side information grows slowly, only as log(n), i.e., much more slowly than the (linearly increasing) entropy. For example, for a block size of 256 bits instead of 8 bits and the same fraction of 3/8 for 1's, the side-information overhead is at most 8.01 bits, and the index would use 240.1 bits, yielding the net compressed output of 248.1 bits (or about 245 bits if the side information itself is being compressed, as could be done in a case in which there is a larger number of blocks and the number of 1's follows a Gaussian distribution), while the entropy in this case is 244.3 bits. If the block size is 256 bits or above and the side information is itself compressed, enumerative coding compresses at practically the entropy rate (i.e., 245 bits vs. 244.3 bits). To quantify the output properties of enumerative coding beyond the illustrative examples, we need to examine the general case of the path index (6)-(7). The size (in bits) of the path index I We can express the bit counts above in terms of the corresponding probabilities through p(1)≡p=k/n and p(0)≡q=(n−k)/n, which transforms (9) into:
Comparing (10) with the entropy (1) for a two-symbol alphabet (R=2 in (1)) reveals that n[p log(1/p)+q log(1/q)] is this n-bit string's entropy. The second term (which is logarithmic in n) is a small negative correction, which reduces the size of the path count N(n−k,k) to a value slightly below the source entropy. This is the effect exhibited by the earlier numeric examples. The reduction is: ½ log(2π npq)=½ log(2πqk) bits. Since the bit cost of sending k, the count of 1's (or 0's if 0 is the less frequent symbol) is log(k) bits (if sent uncompressed), the reduction in (10) is around half the bit cost of sending k, so the total output (path index plus side information) exceeds the entropy by ½ log(2π npq). Another redundancy, not explicit in (10), is one that becomes more significant for smaller blocks. It is the fractional-bit-rounding loss, which results from the fact that the compressed data can be sent only in whole numbers of bits. From the example of the (8,3) block, the index is a number in the range [0 . . . 55], so it contains log(56)=5.81 bits of information. A six-bit number is required to transmit that index, but a number of that size can represent a larger range, i.e., [0 . . . 63], so sending the index wastes the unused eight values of the range [56.63]. In terms of bits, this is a waste of 6−5.81=0.19 bits, or about 3.3% of every 5.81-bit index sent. In summary, enumerative coding is optimal to within the ½ log(2π npq)/n of the source entropy (per input symbol), so the block size n is the key controllable factor that determines the degree of the optimality. Another ingredient in (10) that affects the optimality (but to a lesser degree) is the product pq, but that is the property of the source and not of the encoder. For lower-entropy sources (pq→0), the degree of optimality is higher than for higher-entropy sources (p, q→½). Although Eq. (10) demonstrates the appeal of larger blocks, it also shows why they cannot be achieved in a straightforward way (such as the way that To introduce our solution for both problems, we need to examine more closely the arithmetic of the enumerative encoder. We will reuse the example from The indicated additions illustrate the growth of entropy as the coding progresses. The self-sufficiency property of colex indexing (7) implies that any add to the existent sum increases the size (in bits) of the sum by the entropy of the symbol that triggered the add. Roughly speaking, since the adds occur on encountering bit=1 (the less frequent of the two symbols), the running entropy has to increase by more than one bit for each add, so the add-on terms almost always have to be at least of the size of the existent sum. We can see this pattern, as (11) above demonstrates. A further heuristic observation is that the bulk of the entropy production occurs at the leading (the most-significant) bits of the sum Although carry propagation in the lower bits can lengthen the sum, that happens only rarely. (The probability of such an occurrence drops exponentially with the distance d of the bit from the sum's leading edge). So the activity in the lower bits, far away from the leading edge, seems to be of little importance except that it expands the required arithmetic precision to the output-data size. Now, that unfortunate result would be eliminated if the ones in the add-on terms' resolutions were limited. (We will say that the resolution in radix r of a value N is h if h is the number of radix-r digits in the smallest quotient that results from dividing N evenly by a non-negative-integer power of r.) It could be eliminated, that is, if the conventional add-on terms N(x,y) (which are by (7) binomial coefficients) were replaced with values V(x,y) that could be expressed as floating-point numbers whose mantissas are short. It turns out, though, that simply rounding the conventional path-count values to such low-resolution values does not work; the resultant indexes are not always unique. But I have recognized that the add-on values can be selected in a way that both satisfies the short-mantissa requirement and produces a decodable result and still achieve nearly the same degree of compression that the conventional binomial values do. A way in which this can be achieved can readily be understood by returning to Note that in principle the “rounding up” can be any operation that results in an appropriate-resolution value greater than or equal to the value for which it is being substituted; it need not be the lowest such value. Indeed, the rounding-up operation can be performed even for values whose resolution is already low enough. In practice, though, it will ordinarily be preferable to employ the lowest such value. In the discussion that follows we will therefore assume an embodiment that observes that restriction. In that discussion it will be convenient to take an object-oriented-programming perspective and treat the add-on values during their computation in this embodiment of the invention as instances of a “sliding-window-integer” (“SW integer”) class of data objects. This class's data member takes the form of a floating-point number (although not typically one represented in accordance with, say, the IEEE 754 standard). Additionally, this class will include method members. The method members perform what we will refer to as “sliding-window arithmetic,” which implements the above-mentioned rounding uniquely and is used to compute further add-on values (but not the resultant indexes). Before we describe SW arithmetic in detail, we need to examine the requirements that arise from the add-on-values' computation. We also need to assess how feasible using them for enumeration is in the first place, especially for arbitrary-length input blocks. Initially, we assume only the properties of the SW integers without which they would not be useful at all. Their defining structural feature is the formal separation of the significant digits (the window or the mantissa) from the tail of zeros (specified as the shift or the binary exponent for the window). We can express this feature as follows:
We will now examine how large an SW integer's shift s and mantissa size m need to be to represent the binomials in (7). In the high-entropy range of p, i.e., where p=q=½, Equation (10) shows that log(C(n,n/2))≦n, so the binomial uses at most n bits. From (12a) it then follows that shift s requires no more than ┌log(n−m)┐ bits. The mantissa size m is a parameter that affects the path index's size (and therefore the compression ratio). For SW integers to represent n Using the packed format for the add-on-value tables results in the entry size for the binomial tables of 2·┌log(n)┐ bits, which is ½ n/log(n) times as small as the tables that conventional enumerative coding would require. For example, the tables would be 50 times as small for n=1024, or 293 times as small for n=4096. And the speed advantage of using low-resolution add-on values is at least twice that great. This speed estimate is based on the assumption that the new terms being added in accordance with Equation (7) are roughly of the same size or slightly larger than the partial sums to which they are being added, i.e. that they exhibit the pattern shown in (11). A case that would cause a problem would be a situation in which the cumulative sum is large (e.g. roughly of the size of compressed data), while the term being added is comparatively small. In such an instance, the add-on term's leading digits could be far behind the sum's, and carry propagation could require the adds to proceed across the entire gap between the addends' leading digits. Our earlier preliminary argument against this type of occurrence was based on the growth of the instantaneous entropy. I.e., since the adds occur only when a less-frequent symbol is encountered, they have to increase the entropy of the output by more than a single bit, so the add-on term has to be at least as large as the partial sum to which it is added. However, the ratio of 1's and 0's can change over the span of a block: what was the less-frequent symbol initially may later become the more-frequent one overall. To clarify the potential carry-propagation problem, we will examine the individual adds in (7) more closely. The sum of interest is the one at point B (Note that (15) assumes k<n. Otherwise, k=n→x=n−k=0, so B would be on the left edge and there would be no left neighbor B In most implementations, the index calculations will be so arranged that the additions occur on the less-frequently occurring symbol, which in these discussions is assumed to be 1. Since k is the number of 1's and (n−k) is the number of 0's up to any point B, Equation (15)'s tighter inequality, i.e., r<k/(n−k), means that r<1 for all points at which the cumulative count of 0's dominates the count of 1's. In such path regions the add-on terms are greater than the current sum, as entropy considerations and example (11) already indicated. If the index computation is performed in order of increasing volume values, then a key implication of Equation (15) concerns the compressed data's buffering and output. Since it is only the SW integer's m-bit mantissa w that is being added to the (machine) integer, and since the (SW-integer) add-on terms in (15) will never needed to be added to any bit positions more than log(n) bits from the end of the output buffer, no bits farther back than the distance d=m+┌log(n)┐=(2 ┌log(n)┐+1) bits from the current sum's leading bit will change any more. So those bits can be output immediately while the encoding progresses. Also, the output buffer can be very small; a d-bit buffer would suffice. These are features that conventional enumerative coding lacks. In view of this carry-propagation analysis, it is likely that most index-computation circuits that employ the present invention's teachings will perform the limited-precision additions corresponding to those of Eq. (7)'s (unlimited-precision) additions in the order of increasing j in that equation, i.e., will sequence add-on-term addition from the smaller ones to the larger ones. For the proposed SW-integer add-on terms, this implies that the additions in (7) will go from smaller shift values of s to larger ones (which is a binary digit position for the mantissa as shown in (12a)). This ordering plays the same role as the analogous rule in elementary arithmetic that the additions of multi-digit numbers advance from the least-significant digits toward the more-significant; if they proceeded the other way, carries would propagate in the direction opposite from that in which the additions do, and this would necessitate backtracking to fix up the carry in the digits already left behind. So most embodiments will probably observe the n-sequencing rule and thereby avail themselves of the resultant efficiency advantage. However, it may be important in some circumstances not only for the compressed bits to be sent incrementally with minimum coding delay but also for the decoder to be able to decode the incoming bits as they arrive, without waiting for the block completion. Embodiments that operate in such circumstances may violate the n-sequencing rule. The coding would proceed from n Quantized Indexing Enumeration Having established the properties of SW integers that show their storage and computation advantages, we now turn in detail to the more-basic question: does the above-mentioned approach to selecting SW-integer replacements for the binomials in (7) result in output that is decodable and that can be nearly optimal? Perusal of the steps that lead to the binomials in (7) reveals that recurrence (2) is the step that fixed the choice to binomials, and it correctly specifies path counts. But the real objective there was to construct a path-indexing approach; the path counts were merely a tool used at that stage to limit the size of the indexing space. The connection between the path counts and the indexing space's size needs to be loosened if SW integers are used for the enumeration. To make the distinctions between those concepts more precise, we define a separate quantity, the indexing volume V(x,y) at a point (x,y), as the size of indexing space reserved for the paths reaching (x,y). In these terms, conventional enumerative coding's largely unstated assumption, which we will call tight indexing, is:
We will drop this constraint. Instead:
That is, we will require that volumes be proper SW integers with mantissa size m. The arguments sand win W(w,s,m) are themselves functions of x and y, i.e. w=w(x,y) and s=s (x,y), while m is chosen to produce the application-specific best compromise between the compression optimality and table size. In most embodiments it will be a constant for a given block size n. For reasons that will become apparent, it will usually satisfy the condition m=m(n)>log(n)+1 and m(n)→log(n)+1 for n→∞. Of course, the path counts are still relevant; the pigeonhole principle requires that the volumes have to be at least as large as the number of paths to be indexed. But instead of complying with (16), which imposes the pigeonhole principle constraints maximally tight and all at once, we will phase in these constraints on the volumes gradually, leaving enough room to continue satisfying the requirement that the volumes remain SW integers (for a given mantissa size m). To express the rest of the formula for generating volumes for any point (x,y), we will need to extend the SW arithmetic to the case SW+SW→SW. Since adding the integer forms of two SW numbers can result in more significant bits than the maximum allowed m (e.g. if their shifts s differ significantly), we will need some rules for turning the excess nonzero bits to zero. Keeping in mind the generator for the path counts (2) and that the pigeonhole principle limits us from below, i.e., that in addition to keeping the result as a proper SW integer, we need to maintain V≧N throughout, the smallest resulting volume that can satisfy both requirements will be an SW integer whose mantissa w results from rounding up the result of the addition to the nearest larger SW integer. We therefore introduce the following rounding rule. To add W We now apply this SW-addition rule to one way of computing volumes of the type that can be used to practice the invention. In most embodiments, the volumes V(x,y) for the boundary points (x, 0) and (0,y) will be set to 1). The volume values for the remaining points will then be determined in accordance with:
Recall that by (17) the volumes are SW integers. Therefore, although (18) appears the same as its counterpart (2), the addition in (18) is SW addition, the resultant V(x,y) actually can sometimes be greater than the conventional, non-SW sum of V(x−1,y) and V(x,y−1). In contrast, the result N(x, y) in (2) was always exactly equal to the conventional sum N(x−1,y)+N(x,y−1). By using (17) and (18), the index volume V(x, y) can be computed for all lattice points of interest. By (18), the index space size V(x,y) for any point (x,y), satisfies the pigeonhole principle, so each point has enough index space (and, because of the rounding in (18), generally more than enough) to enumerate all the paths arriving there from the two predecessor points (i.e., the neighbors to the left and above). Therefore, the index-reduction recursion (4) (and its expanded form (5)) will apply as is (i.e., without further rounding), becoming:
Eq. (6), which identifies N(x,y) as binomials C(n, k), will not apply, since the volumes V are not exact binomials, so the counterpart of the final Eq. (7) will retain the volumes from (20). To switch from the coordinate parameters (x,y) to the direct bit-string parameters (n, k), as we did in (7), we will define coefficients B(n, k) V(x,y), where n=x+y and k=y. This leads from (20) to a counterpart of Eq. (7):
Encoding and decoding procedures employing quantized indexing can therefore be performed largely as in FIGS. Of course, quantized indexing does impose a cost. Of all the sequences that have a given symbol population, there is one that results in the highest index: each symbol population is associated with a maximum index. In a quantized-indexing scheme, a symbol population's maximum index is often greater than the number of sequences that have that symbol population. If I To obtain a general redundancy estimate for a given mantissa size m (or to find the value of m that keeps the redundancy below some specified value), we will first note that, for a given block size n and count k of 1's, the index size in bits has to be log(C(n,k)) for tight coding or log(B(n,k)) for quantized coding, independently of the sum (7) or (21) obtained. This is true because the index-calculation recurrences guarantee that the index for any path is smaller than the reserved indexing space C(n,k) or B(n,k). (For brevity we here assume a fixed-to-variable output-pacing scheme. Variable-to fixed or variable-to variable schemes would actually produce marginally better results.) Therefore, to assess the quantized-indexing scheme's redundancy in comparison with tight indexing for any particular (n,k) pair, it is necessary only to find how much larger than the tight-indexing space C(n,k) the quantized-indexing space B(n,k) can become We will therefore examine the error generation and propagation in the volume-table-computation formula given by (17) and (18), which includes the SW+SW=SW rounding rule. Since w is at least 2 To halve the maximum number of added bits c, for example, we need to increase mantissa width by one bit. Similarly, if we wish to double the block size n, we also need to increase mantissa width by one bit. Eq. (22) also gives the maximum number of extra bits for a given block size n and mantissa width m as c=n/2 It is likely that in most embodiments the volume values for will be so selected as to limit the redundancy to a single bit or less. Actually computing B(n, k) for all block sizes n up to 16384 with the mantissa length m set to ┌log(n)┐+1 (i.e., with the mantissa length specified by (22) for a maximum error c no more than a single bit) yields a maximum redundancy of 0.5 bit per block and an average redundancy (over all k) of 0.3 bit/block. Both figures remained roughly constant over the tested range of n. So embodiments can be designed to limit redundancy to a single bit and still violate the sufficient but not necessary constraint set forth in Equation (22). Now, the add-on values will not in all embodiments be so chosen as to restrict the redundancy to a single bit. Some, for example, may permit up to two bits of redundancy, and some may permit more. But few if any embodiments will be so designed as to permit c to exceed n/8. Most designs will likely restrict c to less than n/12 or n/16—indeed, to less than n/24 or n/32. Although the description so far has concentrated on embodiments that apply the present invention's teachings to a binary alphabet, their applicability is not so limited; as will be explained below, they can be applied to larger alphabets. Before we turn to such alphabets, though, we will consider Now, it is conceivable that some encoders that use the present invention's teachings will bring them into play only in certain circumstances. For example, they may use them only in cases where the number of symbol sequences that share the received sequence's symbol count is high. In view of machine-architecture considerations, for example, a “high” symbol count may be, say, 2 In any event, , As was explained above, a sequence's index in most embodiments is the same as its prefix's index if that sequence differs from that prefix by only the addition of a terminal zero: an add-on term is added only if the current bit is a one. Now, it was mentioned above that in some embodiments the entropy encoder's output may not simply be the result of that addition; in embodiments that compute the index from the large-add-on-value end first, for example, the encoder may add extra, carry-accumulator bits into the output so that the decoder can begin decoding before it receives all of the code's bits. So Particularly since in enumerative coding the add-on values do not need to depend on expected symbol statistics and therefore do not have to be recomputed as statistics change, the add-on values will usually have been pre-computed and stored before actual index computation. So As was explained above, the additions that block The predictor may, for example, base its prediction on higher-level information, such as what the corresponding bit's value was in a previous image frame's data. Or it may use lower-level information, such as what the immediately previous bit was. In many embodiments that use this expedient, the basis of the prediction may simply be knowledge of which bit predominates in the input block. (Although that requires accumulating a whole block before starting to encode, the resultant delay will be essentially “free” in many applications because other latencies will mask it.) That is, if 1's pre-dominate, the predictor output will simply be a 1, so the index-computation operation will see the complementary sequence, in which 0's predominate. With the exception of the add-on values that it employs, the index-computation circuit Rather than access all of the big integer's words Methods for Reducing the Table Size If the mantissa size m is set to ┌log(n)┐+1 and the packed format is used for the entries B(n,k) in the add-on-value table, the size of each entry (which contains the mantissa w and the shift s) will be 2 ┌log(n)┐ bits. The full table up to a given n needs to hold T(n)=n In the situations where the memory is very limited or large block sizes are needed, the Pascal-triangle like recurrences (18) offer a flexible choice for trading speed off for reduction in lookup-table-memory size. One can cut memory size in half, for example, by omitting from storage the table entries for every second value of n and computing the unstored values on the fly in accordance with
More generally, for a required size-reduction factor r, we can skip r−1 rows out of every r rows by using (18a) applied (r−1) times to its own terms. For the worst case this yields the following coefficient computation:
In most practical embodiments that use such skipping, the reduction factors r in (18c) will be relatively small in comparison with n so that the multiplications of the small add-on terms can be performed by adds and shifts or a multiplication table that has r Multi-Block Coding As is apparent from the decoding routine set forth in Most simply, the side information for each block can be sent in a separate, dedicated field for each block. Another approach is to reduce redundancy by employing a field that contains a code for combinations of population values and initial index bytes. Since the value of an index is, on the average, one half the maximum index for the same population, a field dedicated the index will usually have some redundancy, but that redundancy will almost always be limited to the most-significant byte or two. And there will usually be some redundancy in a field that gives the side information separately. So the overall redundancy can be reduced by employing a code for the combination of the population-indicator value and the values of the index's most-significant byte or bytes. Another approach is particularly applicable to arrangements that use an entropy limit rather on the input sequence rather than a length limit and therefore provide an output whenever the input sequence reaches a symbol population on what would be a somewhat-hyperbolic front of symbol populations in Alternatively, or in addition, the sequence of side-information values for a sequence of blocks can be taken together and encoded. For example, successive counts of 1's in Fixed-to-Variable coding can be encoded by using the multi-alphabet encoder presently to be described. Since for a stationary source these counts satisfy the binomial distribution (which becomes approximately Gaussian for npq>9 or Poisson for n>100 and p<0.05), variable-length codes for these counts can be constructed directly. In any event, since the present invention makes it practical to employ enumerative coding on long blocks and the side information per symbol diminishes as O(log(n)/n), the problem of efficiently encoding the side information tends not to be important. Multi-Alphabet Sources There were two approaches proposed for generalizing the enumerative coding to non-binary alphabet A A recently introduced approach (L. Öktehm, The same type of problem eliminates a simpler scheme, one consisting of merely representing the q symbols from A Having identified the most-common pitfalls of multi-alphabet encoding, we take as our starting point the correct multinomial generalization of the binary source. We will consider a sequence of n symbols S Since we will need flexibility in the way we expand a multinomial into binomials selection of the expansion form we show below (by using the example of q=4 and denoting k It is clear from (32) that instead of inserting the redundant factors (k The combinatorial interpretation of the factorizations such as (32)-(33) (including the 4!=24 variants obtained by permuting the symbol labels 1,2,3,4) is that the multinomial enumeration is equivalent to the various chains of binomial enumerations. To apply this equivalence to the multi-alphabet-enumeration problem, we will interpret these binomial chains in terms of the multi-alphabet reduction as splitting the alphabet into two subsets, then splitting each of the two subsets containing more than two symbols into a further pair of subsets, until every final subset contains no more than two different symbols. This transforms a non-binary sequence into multiple binary sequences. We will now show an example of how to represent the described splits in the form of strings of binary digits and how to encode these strings using the binary encoder without increasing the entropy of the output above the entropy of the original multi-alphabet string. For the q=4 decompositions of (32-33) we will use alphabet A The first binomial factor in (32), C((k The reductions shown in More generally, any reduction of an alphabet A Step 1: Replace all symbols a Step 2: Split Plane 2 into two fragments, Plane 2.0 and Plane 2.1, so that each bit of Plane 2.0 has a zero bit above itself in Plane 1 and each bit of the Plane 2.1 has a one bit above itself in Plane 1. I.e., Plane 2.1 is the sequence of the second bits of codes t Step 3: Split Plane 3 into fragments by using the Plane 2 fragments as a template. I.e., form Plane 3.0 and Plane 3.1 so that each bit of Plane 3 is assigned to Plane 3.0 if the bit above is in Plane 2.0 and to Plane 3.1 if the bit above is in Plane 2.1. (There may be fewer bits in Plane 3 than in Plane 2). Split each of these Plane 3.f fragments (for f=0,1) into 2 fragments: Plane 3.f.0 and Plane 3.f1 according to the value of the bit above in Plane 2.f Thus Plane 3.f Step 4: Following down from Plane 3, split Plane (k+1) using the already fragmented Plane k.f Step 5: The process terminates when an empty bit plane is reached (after Plane z has been partitioned). At the termination, the total number of fragments from all planes will be n In summary, a binary sequence to be encoded is formed for each j-bit sequence of prefix-code bits such that the high-radix-alphabet sequence includes more than one symbol whose prefix code is longer than j bits and begins with that j-bit sequence. The sequence to be encoded consists of, for each digit in the high-radix sequence that equals such a symbol, the (j+1)st bit of the prefix code for that digit's symbol. (Here we consider every prefix code to begin with the degenerate, j=0 bit sequence: the prefix-code-bit sequence formed for j=0 contains a bit for every digit in the high-radix-alphabet sequence.) And a separate index is computed for every binary sequence thereby formed. Before we describe the fragment compression and decompression in the general case, we will discuss the selection of the prefix codes. In Since our construction from the prefix codes shows in step 1 that the total number of (uncompressed) bits produced is same as the total length of the S When Huffman codes are to be constructed to produce the prefixes, the initial symbol order in the conventional Huffman-construction pair-merge steps is preferably so ordered as to result in favoring the production of 0's over 1's in the bit-plane fragments: the Huffman construction should systematically place larger-probability symbols to the left (or to the right, depending on convention). Some embodiments may use codes other than Huffman codes. Some may use the quicker Shannon-Fano codes. Others may use slice codes. The slice codes may, for example, be based on fixed-code-length tables or use ┌log(1/p)┐ Shannon code lengths based on a quick, low-precision-integer log-function implementation. Any other prefix code can be used, too, but it is best that it be compact, i.e., that the Kraft inequality become an equality. As a general rule, lower-entropy sources would save more working space for uncompressed bit arrays by using better codes. For high-entropy sources and alphabets where q=2 Encoding When binary-partition steps 1-5 have been completed, there are q−1 bit-plane fragments to compress. If coding optimality is to be achieved, each fragment will need to compressed separately. Typically, Plane 1 (which is always a single fragment) is sent first, followed by the compressed fragments of Plane 2, and so on, until the q-1 fragments have been sent. We digress at this point to note that the hierarchical-set-partitioning methods used in wavelet image coding separate the wavelet coefficients into bit planes for entropy coding. This is similar to Step 1 above. But those methods encode each bit plane as a whole across the bit-plane fragments(and use an arithmetic coder to do it). This generates redundancy due to the binomial inequality C(x+y,a+b)≧C(x,a)C(y,b). Independently of whether they use enumerative encoding, therefore, image coding applications can benefit from the bit-plane-fragment-aligned encoding described above, preferably with Huffman or other compact codes used to guide the fragmentation. The encoder and the decoder have to agree, of course, on the codes used for the partitioning. Since sending the symbol counts k Decoding The decoder receives the compressed data as well as side information from which it can infer the counts k From the expanded Plane 1 and the prefix-code tables, the decoder can establish the size of Plane 2. If no prefix codes are of length 1, then Plane 2 also has n bits. If there is a code of length 1, Plane 2's size is n minus the count k If Plane 2 has two fragments, on the other hand, the sizes of the fragments are computed from already-known counts of 1's and 0's in Plane 1. From the prefix-code tables, we also know which codes belong to which fragment of Plane 2. (Those whose prefix codes begin with a 1 fall into Plane 2.1, and those that begin with a 0 fall into Plane 2.0.) So the code counts k Once they have been expanded, the Plane-2 fragment(s) thereby obtained are interleaved as the layout of 1's and 0's in Plane 1 dictates: for each source-sequence digit, the second bit of the prefix code that will specify that digit's symbol is concatenated with that first prefix code's first bit. Having fully expanded Plane 1 and Plane 2, the decoder can infer the layouts of all Plane-3 fragments. I.e., it can determine the number of fragments and the source-sequence digits to which their bits correspond. It can also compute the symbol counts and therefore the number of 1's for each Plane-3 fragment from the code tables and the known symbol counts for the Plane-2 fragments just as it determined those for the Plane-2 fragments from the Plane-1 fragments. From the known sizes and counts of 1's, the decoder can decode the Plane-3 fragments and therefore add the third bits to the source sequence's prefix codes. This procedure continues until all digits' prefix codes are completed and their symbols thereby determined. General Symbol-Population-Based Sliding-Window Quantized Indexing Sliding-window integers and quantized-indexing enumeration can be applied to other encoding and indexing problems. To appreciate their applicability, it helps to distill the essential properties of SW integers into the following elements: -
- 1. SW integer W=W(w,s,m) is a variable size/extendible integer (large integer) n bits wide, with the number of significant bits limited to m. (The mantissa width m will be less than the block size n, and it will typically be on the order of log n.) The significant bits of Ware contained in an m-bit mantissa w. The bits of W that follow the mantissa are a sequence of s zero bits (n=m+s). The symbolic form of W is shown in (12), and its expanded form is shown in (12a).
- 2. Arithmetic and relational operators (such as +,−,>,=,<) applied to SW integers depend on the destination operand. The general pattern of this dependency is as follows (with {circle around (×)} denoting any of the operators and {circle around (+)} denoting SW addition):
- a) SW/LargeInt{circle around (×)}SW→LargeInt: The SW operand or operands behave here as large integers in the form (12a), with the key distinction that the operations now have complexity O(log(n)) instead of O(n) (which is characteristic of the corresponding operations on regular large integers). Since the large integers have extendable precision, there is no precision loss in these operations, as there can be in floating-point operations.
- b) SW/LargeInt{circle around (+)}SW/LargeInt→SW: Any SW operands on the left side are expanded to LargeInt form (12a), and the operation is carried out by the regular large-integer rules (with, if m is so chosen that w always fits into a machine word, O(1) complexity). The resulting large integer L is rounded up to the nearest SW integer whose mantissa length does not exceed the assumed mantissa-length limit m. If we denote SW rounding of x as: {x}
_{SW}, the sum rule is symbolically: SW{circle around (+)}SW→SW{SW+SW)}_{SW}→SW. - c) SW/LI{circle around (+)}SW/LI{circle around (+)} . . . {circle around (+)}SW/LI→SW: The rounding addition defined in (b) is not associative: the result of adding several addends in successive two-addend additions of that type is not in general independent of the order in which those additions occur. So, for three or more addends, we define addition as being performed with delayed SW rounding: the SW operands are expanded to large-integer format, the sum is computed exactly, and only the final result is rounded up:
SW{circle around (+)}SW{circle around (+)}SW{circle around (+)} . . . SW{SW+SW+SW+ . . . }_{SW}→SW
To present a general form of Quantized Indexing, we will start with multi-alphabet enumeration and extend it by dropping the requirement that all symbols in a string S The lattice points (vectors, R-tuples) are labeled as M=(x A string S In words, Eq. (40) says that the point M Eq. (41) says that the symbol b Eq. (42) gives the total number of steps n to reach some point M The mixed-radix conditions 0≦b The unconstrained R-dimensional lattice paths correspond to the fixed radix integer representation (the radix is then R and all the ranges are same: R The general enumeration of the R-dimensional lattice paths is based on the same reasoning that led to Eq. (2): the path count for some n-step point M The Iverson's selector [M Eqs. (44-45) yield the multinomial coefficients of Eq. (32) for the path counts if the lattice paths are unconstrained. Otherwise, although (44) will not generally result in a closed-form expression for the N(M), it can still be used as a formula for computing n-step path counts from the known (n−1)-step path counts and problem-specific constraints [M:k]. The computation would start with the one-step points, where all one-step path counts are either 1, for the allowed first-step directions k, or 0 for the disallowed ones, as illustrated by Eq. (43)). It would then use (44) to compute from these values the path counts for all two-step points, then use the two-step path counts to compute the three-step path counts, etc. The reasoning that led to the (tight) index-reduction recurrence (4) applies in this more-general setting as well. Namely, if we have an indexing for the (n−1)-step paths, then we can construct an index for the n-step paths to some point M This construction follows the same method of avoiding the n-step index collisions that Eq (4) uses: as it visits the alternatives (the one-step neighbors of M), it keeps track of the index space reserved for the alternatives traversed so far and uses this reserved space as the offset to separate the current neighbor's path index from those of the already-visited alternatives. Thus, for the paths arriving via the first neighbor M By using (46) itself, the residual index I To extend the quantized-indexing method to the more-general enumeration given in (47), we will define the volume of a lattice point M(denoting it as V(M)) as the size of the indexing “space,” i.e., the interval of consecutive integers reserved for enumeration of paths ending in M. The tight-indexing assumption used in deducing Eqs. (46-47) is then expressed as:
This assumption represents the selection of the absolutely minimum volumes consistent with the pigeonhole principle for all points M. General quantized indexing reduces the tight constraints on volumes imposed by (48) and constructs volumes that simultaneously satisfy the pigeonhole principle (but less tightly than (48)) and the requirements of computational efficiency. Because of the latter consideration, most embodiments will adopt a requirement that all volumes V(M) be represented by the sliding-window integers with the same boundary conditions as the path counts (44-45):
We digress briefly to note that volume-value storage could theoretically be reduced by employing variable mantissa widths, using smaller mantissas for shorter prefix lengths. But the mantissa length needed to keep redundancy to, say, less than a single bit varies only logarithmically with the number of steps to reach M. In most practical implementations of SW-based encoders, any table-space savings that result from varying mantissa length m would be outweighed in most cases by the efficiencies that result from using fixed table-entry sizes that align with addressing granularities. For blocks with n≦256, therefore, some implementations may use eight bits to store w (possibly without storing the implicit leading 1), while sixteen bits will be typical for 256<n≦2 Since the SW integers W operate as regular integers for W<2 To make clear the distinction between exact and rounded SW arithmetic (the latter being used only to compute volumes), we have been using the symbol “{circle around (+)}” instead of “+” to denote the rounded additions defined for SW+SW→SW. Some of the properties of the {circle around (+)} operations in SW integers are:
For example, if we use a four-bit mantissa and set (in binary): a=10010, b=101, c=11, the left side of (51c) evaluates as: 100100{circle around (+)}101=10111, which rounds up to 11000, then 11000{circle around (+)}011=1011, which rounds up to 11100. For the right hand side of (51c) we have: 101{circle around (+)}11=1000 and 100100{circle around (+)}1000=11010≠11100. The two rounding-up steps on the left hand side added 1 twice, while no rounding up occurred on the right hand side, so the right side's result which was smaller by 2 than the left side's result. The lack of associativity precludes immediate extension of the general path-count recurrence (44) to the analogous equation in terms of volumes, since the multi-term summation in (44) would become ambiguous if the exact path counts were replaced with volumes. So we need to adopt a convention for repeatably performing multiple-addend additions. To sort out the choices, we step back to the context of the problem The objective was to compute volumes that can guarantee a decodable index; i.e., volumes large enough to comply with the pigeonhole principle. The path IDs are assigned by Eq. (47), which performs exact arithmetic. We notice that its inner sum (over index k for the (n−1)-step neighbors M Since quantized indexing uses exact arithmetic for the index computation (encoding), the reasoning that led to the index computation (47) holds as is for the volumes, with the exception that in order to separate the lower-order indices from different neighbors, we will offset their indices with their volumes rather than their path counts. The resulting general enumeration formula for quantized indexing is therefore obtained by a simple replacement of path counts N(M) in (47) with volumes V(M):
Unlike the volume recurrence (52), which performs delayed SW rounding on the result of the sum the index computation in (53) performs exact arithmetic. As noted in the multi-alphabet discussion, general sums of type (52) will require table sizes exponential in alphabet size. That consideration led us to introduce a binary reduction procedure so that we could use more-compact SW binomial tables to perform encoding. But significant reductions in table sizes may occur in the presence of the strong domain-specific path constraints [M:k], which could eliminate most of the terms in (52). To provide useful examples of applying quantized indexing in accordance with (52-53), we will look to the high-entropy limit, i.e. to cases in which symbols are either uniformly distributed or have rapidly varying frequencies averaging over longer runs to the same uniform distribution. In the binary case, the result of enumerative encoding is, as was explained above, at best no better than no encoding at all and at worst ½ log(k) above the entropy. But we will examine ways in which quantized indexing can afford advantages for some non-binary sources and can do so without excessively large volume tables. Encoding Fixed-Radix Sequences Where R≠2s In the first example we will consider an unconstrained high-entropy source with some fixed radix (alphabet size) R. The n-symbol entropy is H We will start with tight enumeration, i.e., with the path counts given in (44) and indexes given in (47). Absent constraints, we can remove the Iverson's selector from (44). We will then simplify the sum in (44), which runs over different path counts N(M We refer to this largest path count as N _{n}=R^{n } (54) With the path counts approximated by (54), we can turn to index computation via (47). Since we are assuming no path constraints, we replace Iverson's selectors in (47) with 1. Using the approximate path counts N For the binary-alphabet case, R=2, that encoding is simply the uncompressed sequence of the input bits (as expected from the introductory conclusions implied by (10)). The same is true for any alphabet with R=2 But the same is not true of non-radix-2 The problem with (55) for non-power-of-2 alphabets is that, if we want to use (55) use encode the input sequence b Slower than this but still much faster than the optimal one of Eq. (55)) is a method based on slice codes. This method is equivalent to a Huffman code obtained for uniform symbol distribution, except that the Huffman tree need not be computed explicitly. The method involves encoding the U=2 To illustrate the numbers above, let the alphabet have the three symbols a, b, and c so that R=3. This means that r=2, i.e., 2 Now we will instead use the quantized-indexing method of Eqs. 52-3. For the unconstrained case, we can remove the Iverson's selectors from those equations. For the volume computations of Eq. 52, we will make the same simplification we made for the tight-indexing method. That is, we will replace all volumes of the R neighbors M In this construction, volumes V We will now estimate the SW parameters of V(w,s,m). Since the largest numbers represented are of the size R Calculations with this choice of m for n to 2 The high-entropy-limit index computation applies (53) with the constraint selector removed and precomputed volumes V As in binary entropy coding, we can trade table space off for extra adds. In this case we can compute volumes on the fly by using n log(R) low-precision adds and shifts, requiring no volume tables and avoiding all multiplies in (57) since the partial sums of b Encoding Permutations Permutations of n numbers 0, 1, 2, . . . , n−1 can be encoded in a factorial radix. Given permutation P In this scheme d To compute the entropy of D The problem with (58) is that the encoding requires O(n) high-precision multiplications and additions. The required precision is H(D To apply quantized indexing to this problem, we first examine the tight enumerative coding of D But, if the substitutions that we applied to tight indexing to yield Eq. 59 are instead applied to the quantized-indexing recurrence Eq. 52, the result is:
The quantized-indexing recurrence thus has not lead to a closed-form solution like Eq. 59, in which the tight-indexing recurrence resulted. Instead, it had yielded a low-complexity formula that requires only O(log(n))-bit multiplications instead of the O(n log(n))-bit multiplications that Eq. 59 requires. The volumes V The SW parameters for V The index computation works exactly as in the fixed-radix case, i.e., in accordance with Eq. 57, and the same arithmetic-complexity considerations apply, with the exception that the input-digit size is log(n) rather than log(R). The same computational shortcuts apply as well. To illustrate performance, let n=16. The entropy H(D The results are even better for larger n. With n=1000, for example, the entropy is log(1000!)=8529.4 bits. The uncompressed data occupy 1000·(10 bits/symbol)=10,000 bits, or 17% above entropy. Slice codes use 8973 bits, or 5.2% above the entropy. Quantized indexing with an eleven-bit mantissa encodes D Encoding General Mixed-Radix Sequences The general case of a mixed-radix sequence D The entropy of the D As in the previous examples, the full-precision enumerator given by Eqs. 44 and 47 reproduces the optimum representation set forth in Eq. 62 and the corresponding entropy given by Eq. 63, but its computation requirements tend to be onerous. So we use the quantized-indexing encoder set forth in Eqs. 52-53 in the high-entropy limit to compute volumes and index. Applying Eq. 43's mixed-radix constraints, i.e., [M In some applications the volumes may be pre-computed into tables of size O(n log(n)) bits and used in the index computation Eq. 57, where n low-precision (O(log(n)- and O(log(R))-bit-operand) multiplies (or an equivalent using O(log(R)) shifts and adds) are performed. Alternatively, as noted in the fixed-radix discussion, a potentially slightly slower method, performing O(n log(R)) low precision adds, can be used to compute the volumes (64) and the index (57), without requiring any tables or multiplies. General Quantized Indexing The foregoing examples are specific applications of a quantized-indexing approach that can be to employed for enumeration generally. Generic enumerative tasks of interest here are to compute a unique numeric index for a given “arrangement” A and to reconstruct the arrangement A from a given numeric index. As used here, an arrangement is an ordered set of elements, i.e., the set elements can be labeled as the first, second, . . . nth elements, which we will call items. Although the existence of an order relation allows us to represent arrangements symbolically as abstract sequences of item symbols, the order relations may be implicit in many concrete enumerative tasks, such as encoding trees, graphs, networks, printed-circuit-board layouts, etc. The order relation may be implicit through, e.g., some formula or state machine, and the instances of the arrangements may be represented as some mix of indicators of computational rules along with their control values and data. So the enumerator does not necessarily receive the representation of an arrangement instance as a sequence of symbols representing the items. As enumerator inputs, arrangements are more general than sequences of item symbols. In particular, the enumerator may never need to compute symbolic values for the items in order to execute enumerative procedures, since these item values occur in these formulas only as the abstract control parameters (e.g. for neighbor scan), and these controls may in practice be implemented more directly and more efficiently by some application-specific arrangement space-traversal and -generation procedures that use whatever form the instance data may have. We will denote an arrangement of n items as A An enumerative space for a given enumerator is a set of arrangements that a given enumerator can process. The corresponding indexing space is a range of numbers that the computed index may have. For tight (exact) enumerators the index range is compact: the possible indexes for a given enumerative space are 0,1,2, . . . N−1., where N is the number of possible arrangements in that (possibly constituent) enumerative space. Now, the set that encompasses all arrangements that a given enumerator can process may be partitioned into separate constituent enumerative spaces, as the sequence spaces in the examples above were partitioned in accordance with symbol population; that is, there may be separate, independent indexing spaces for each constituent enumerative space. For tight enumerators this implies that each of these index spaces restarts its indexing range at 0. To characterize an enumerative space's structure, we will label the set of all arrangements An containing n items as F Enumerators usually partition into constituent enumerative spaces the composite enumerative space that encompasses all arrangements a given enumerator can process, and the encoded output will contain the partition indicator and the index within that partition. (Either of these two output components may be implicit; e.g., if the count of 1's is 0 or n in a binary coder, no index has to be sent.) Partitioning is advantageous whenever the source produces arrangements with non-uniform probabilities, since the probabilities within resultant constituent spaces are usually more uniform. In fixed-to-variable coding, for example, this leads to non-uniform, and therefore compressible, partition indicators. Partitioning may be done for reasons other than just to reduce the size of the encoded output. For example, it may speed up or to simplify encoding and/or decoding, reduce coder-table sizes, expose some data features not apparent before encoding, or facilitate some type of processing of the encoded data. E.g., for sources with slowly varying or stationary non-uniform symbol probabilities, each optimal partition would be associated with a symbol count, which all arrangements in that class would have. In other applications, each partition may be associated with a respective symbol-value sum. More generally, each partition may be associated with a respective value of some application-defined arrangement classifier v(A). In terms of n-step fronts, partitioning separates the arrangements belonging to the n-step front F We will denote the number of arrangements belonging to a point M as G≡N(M), thus in the explicit set notation we can express all the arrangements belonging to a point M as M≡{A(g): g=1,2, . . . G}≡{A(g)}. Our notation for an item addition or removal to or from an arrangement extends naturally to an addition or removal item to or from a point. Adding item <a> to a point M With this background, we present the general concept of quantized indexing in terms of the way in which it can be used to improve existing (or for that matter, not-yet-designed) enumerators. Conceptually, the first step is a “bottom-up” description of the existing enumerator. Specifically, the relationship between the index I and the arrangement A The improved enumerator is based on replacing the N(M A quantized approximation for which Q(x)≧x for all x will be called an expanding quantization. And, for a given application, we will say that the quantization Y is streamlined quantization, or that Y is a streamlined format (SF) of X and denote the mapping y=Q(x) as y=SF(x), if in that application additive-arithmetic operations (additions, subtractions, comparisons) and/or storage requirements of one or more numbers x from X become on average more efficient when one or more of these numbers x are replaced by their quantized approximations y=Q(x). For this purpose, the average is taken over all instances in which the additive arithmetic and/or storage of numbers from X occurs in that application, and it is weighted by the instance-usage frequency and any application-specific importance weights. Efficiency, too, is an application-specific criterion for this purpose. For storage, a more-compact representation is commonly considered more efficient, and arithmetic operations are commonly considered more efficient if they execute faster and/or use less working memory, registers, or power. Quantization is often employed in the context of evaluating some complex expression. When a quantized value V is to be derived from a complex sequence of computations E, one may use minimum quantization: perform the entire sequence E and quantize only the final result before assigning it to V; i.e. V=Q(E). An alternative is to quantize one of more of computation E's intermediate results and continue the computation on the quantized results, performing the last quantization on the final result. At the opposite extreme from minimum quantization V=Q(E) is maximum quantization: quantizing every intermediate result in the computation of E. We will call any non-maximum quantization of E a delayed quantization of E or, when quantization is performed through rounding operations, delayed rounding. Computational complexity and working-storage requirements will tend to differ among different quantization alternatives, as will numeric results. When fewer intermediate quantizations are employed, the result tends to be more accurate (in expanding quantization, greater accuracy implies more-compact encoding) at the expense of greater arithmetic complexity and working-storage requirements, especially when the quantization is of the streamlined-format type. So choosing among the alternatives involves balancing those factors. We will use notation {E} Most commonly used quantization approaches are limited-precision quantizations: the number of significant digits in the numbers y used to approximate numbers x is capped, typically to some value less than the maximum number of significant digits in x. The sliding-window-integer arithmetic described above, for example, employs one type of limited-precision quantization. In limited-precision quantization the significant-digit limit is usually but not always imposed on y as expressed in binary, i.e., on the number of binary digits. Three common variants of limited-precision quantization are rounding up, in which the y value used to approximate x is the smallest y value greater than or equal to x: y=┌x┐, rounding down, in which the y value used to approximate x is the largest y value less than or equal to x: y=┌x┐, and rounding to the nearest, in which the y value used to approximate x is the y value nearest to x: y=┌x┐. Rounding up is a special case of expanding quantization. Another kind of limited-precision quantization is least-digit quantization, in which the least significant digit of numbers y in some number base r is restricted to a fixed value. The base r may vary with x and may depend on the maximum or minimum x in X. A common convention in this type of quantization is to set the least significant digit to 0. Having now defined what is meant by quantization, we are ready to describe how to arrive at the improved enumerator from the base enumerator described by Eqs. 65-68. First, we select a streamlined expanding quantization function SF(X) for the number in set X, where X contains at least all add-on values required in the quantization operations set forth below and all the sum values in Eqs. 71 and 72 below. Then substitute for the base enumerator's N(M) values V(M) values as follows:
As was exemplified by the radix and permutation coders described above, the dependence on the volume V(M By using the present invention's teachings, encoders can, for all practical purposes, compress to the entropy of the source. By using them, I have been able to produce a prototype coder whose performance was superior to what I believe is a state-of-the-art arithmetic coder. The compression improvement in comparison with the arithmetic coder varied from only a couple of percent when the input data were almost incompressible to nearly 50% for the most-compressible input data, i.e., for precisely the kind of data on which arithmetic coders have demonstrated the strongest performance advantage over Huffman coders. In execution speed, the prototype showed an even greater advantage over the arithmetic coder against which it was tested, running from 20% to 1800% faster, with the lowest gains again being for nearly incompressible data (which a production version would probably pass without encoding, tagging it as “uncompressed”). Additionally, since encoders that employ the present invention's teachings employ an approach that is predominantly combinatorial rather than probabilistic, they can compress at a near-entropy level without accumulating or maintaining symbol statistics. So they do not suffer, as Huffman and arithmetic coders do, from poor adaptability to quickly changing sources, or, more generally, from the large compression-ratio drop that results when the coder-assumed probability distribution fails to match the source distribution. Such situations often occur, for instance, when data are compressed in very small chunks such as those encountered in the incremental state updates used by video and audio codecs. The present invention therefore constitutes a significant advance in the art. Referenced by
Classifications
Rotate |