WO2002061948A2 - Lossless and near-lossless source coding for multiple access networks - Google Patents

Lossless and near-lossless source coding for multiple access networks Download PDF

Info

Publication number
WO2002061948A2
WO2002061948A2 PCT/US2002/003146 US0203146W WO02061948A2 WO 2002061948 A2 WO2002061948 A2 WO 2002061948A2 US 0203146 W US0203146 W US 0203146W WO 02061948 A2 WO02061948 A2 WO 02061948A2
Authority
WO
WIPO (PCT)
Prior art keywords
code
lossless
partition
optimal
coding
Prior art date
Application number
PCT/US2002/003146
Other languages
French (fr)
Other versions
WO2002061948A3 (en
Inventor
Qian Zhao
Michelle Effros
Original Assignee
California Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by California Institute Of Technology filed Critical California Institute Of Technology
Priority to AU2002253893A priority Critical patent/AU2002253893A1/en
Publication of WO2002061948A2 publication Critical patent/WO2002061948A2/en
Publication of WO2002061948A3 publication Critical patent/WO2002061948A3/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention relates to the implementation of lossless and near-lossless source coding for multiple access networks.
  • Source coding also known as data compression, treats the problem of efficiently representing information for data transmission or storage.
  • Data compression has a wide variety of applications.
  • compression is used to reduce the amount of data transferred between the sources and the destinations.
  • the reduction in data transmitted decreases the time needed for transmission and increases the overall amount of data that can be sent.
  • fax machines and modems all use compression algorithms so that we can transmit data many times faster than otherwise possible.
  • the Internet uses many compression schemes for fast transmission; the images and videos we download from some bulletin boards are usually in a compressed format:
  • data compression allows us to store more information on our limited storage space by efficiently representing the data. For example, digital cameras use image compression schemes to store more photos on their memory cards, DVDs use video and audio compression schemes to store movies on portable disks, we could also utilize text compression schemes to reduce the size of text files on computer hard disks.
  • data is represented by a stream of binary digits called bits (e.g., 0 and 1).
  • bits e.g., 0 and 1
  • An encoder encodes the data into a stream with a smaller number of bits.
  • an image file to be sent across a computer network may originally be represented by 40,000 bits.
  • the encoded data is sent to the destination where a decoder decodes the data.
  • the 10,000 bits are received and decoded to give a reconstructed image.
  • the reconstructed image may be identical to or different from the original image.
  • MP3 audio files people use special audio compression schemes to compress the music and store them on the compact discs or on the memory of MP3 players. For example, 700 minutes of MP3 music could be stored on a 650MB CD that normally stores 74 minutes of music without MP3 compression. To listen to the music, we use MP3 players or MP3 software to decode the compressed music files, and get the reconstructed music that usually has worse quality than the original music. When transmitting digital data from one part of a computer network to another, it is often useful to compress the data to make the transmission faster. In certain networks, known as multiple access networks, current compression schemes have limitations. The issues associated with such systems can be understood by a review of data transmission, compression schemes, and multiple access networks.
  • Lossless compression techniques involve no loss of information.
  • the original data can be recovered exactly from the losslessly compressed data.
  • text compression usually requires the reconstruction to be identical to the original text, since very small differences may result in very different meanings.
  • computer files, medical images, bank records, military data, etc. all need lossless compression.
  • Lossy compression techniques involve some loss of information. If data have been compressed using lossy compression, the original data cannot be recovered exactly from the compressed data. Lossy compression is used where some sacrifice in reconstruction fidelity is acceptable in light of the higher compression ratios of lossy codes. For example, in transmitting or storing video, exact recovery of the video data is not necessary. Depending on the required quality of the reconstructed video, various amounts of information loss are acceptable. Lossy compression is widely used in Internet browsing, video, image and speech transmission or storage, personal communications, etc. One way to measure the performance of a compression algorithm is to measure the rate
  • P(x) is the probability of x.
  • Another way is to measure the distortion, i.e., the average difference between the original data and the reconstruction.
  • a fixed- length code uses the same number of bits to represent each symbol in the alphabet.
  • ASCII code is a fixed-length code: it uses 7 bits to represent each letter.
  • the codeword for letter a is 1000011, that for letter A is 1000001, etc.
  • variable-length code does not require that all codewords have the same length, thus we may use different number of bits to represent different symbols. For example, we may use shorter codewords for more frequent symbols, and longer codewords for less frequent symbols; thus on average we could use fewer bits per symbol.
  • Morse code is an example of a variable- length code for the English alphabet. It uses a single dot ( • ) to represent the most frequent letter E, and four symbols: dash, dash, dot, dash (-- ⁇ -) to represent the much less frequent letter Q. Non-singular, Uniquely decodable. Instantaneous, Prefix-free Code
  • a non-singular code assigns a distinct codeword to each symbol in the alphabet.
  • a non- singular code provides us with an unambiguous description of each single symbol.
  • a non-singular code does not promise an unambiguous description.
  • the first code assigns identical codewords to both symbol '1 ' and symbol '2', and thus is a singular code.
  • the second code is a non-singular code, however, the binary description of the sequence '12' is '110', which is the same as the binary description of sequence '113' and that of symbol '4' . Thus we cannot uniquely decode those sequences of symbols.
  • a uniquely decodable code is one where no two sequences of symbols have the same binary description. That is to say, any encoded sequence in a uniquely decodable code has only one possible source sequence producing it. However, one may need to look at the entire encoded bit string before determining even the first symbol from the corresponding source sequence.
  • the third code in Table 1 is an example of a uniquely decodable code for the source alphabet. On receiving encoded bit ' 1', one cannot determine which of the three symbols '1', '2', '3' is transmitted until future bits are received.
  • Instantaneous code is one that can be decoded without referring to future codewords.
  • the third code is not instantaneous since the binary description of symbol ' 1' is the prefix of the binary description of symbols '2' and '3', and the description of symbol '2' is also the prefix of the description of symbol '3'.
  • a prefix code is always an instantaneous code; since the end of a codeword is always immediately recognizable, it can separate the codewords without looking at future encoded symbols.
  • An instantaneous code is also a prefix code, except for the case of multiple access source code where instantaneous code does not need to be prefix free (we will talk about this later).
  • the fourth code in Table 1 gives an example of an instantaneous code that has the prefix free property.
  • the set of instantaneous codes is a subset of the set of uniquely decodable codes, which is a subset of the set of non-singular codes.
  • the codeword of a symbol can be obtained by traversing from the root of the tree to the node representing that symbol. Each branch on the path contributes a bit ('0' from each left branch and ' 1 ' from each right branch) to the codeword.
  • a prefix code the codewords always reside at the leaves of the tree.
  • a non-prefix code some codewords will reside at the internal nodes of the tree.
  • the decoding process is made easier with the help of the tree representation.
  • the decoder starts from the root of the tree. Upon receiving an encoded bit, the decoder chooses the left branch if the bit is '0' or the right branch if the bit is ' 1 '. This process continues until the decoder reaches a tree node representing a codeword. If the code is a prefix code, the decoder can then immediately determine the corresponding symbol.
  • each single symbol (T, '2', '3', '4') is assigned a codeword.
  • This code is called a block code with block length n (or coding dimension n).
  • a Huffman code is the optimal (shortest average length) prefix code for a given distribution. It is widely used in many compression schemes. The Huffinan procedure is based on the following two observations for optimal prefix codes. In an optimal prefix code:
  • the two longest codewords have the same length and differ only in the last bit; they correspond to the two least probable symbols.
  • the Huffman code design proceeds as follows. First, we sort the symbols in the alphabet , according to their probabilities. Next we connect the two least probable symbols in the alphabet to a single node. This new node (representing a new symbol) and all the other symbols except for the two least probable symbols in the original alphabet form a reduced alphabet; the probability of the new symbol is the sum of the probabilities of its offsprings (i.e. the two least probable symbols). Then we sort the nodes according to their probabilities in the reduced alphabet and apply the same rule to generate a parent node for the two least probable symbols in the reduced alphabet. This process continues until we get a single node (i.e. the root). The codeword of a symbol can be obtained by traversing from the root of the tree to the leaf representing that symbol. Each branch on the path contributes a bit ('0' from each left branch and ' 1 ' from each right branch) to the codeword.
  • the fourth code in Table 1 is a Huffman code for the example alphabet. The procedure of how we build it is shown in Figure 2A.
  • the entropy of source X is defined as: H(X) — — ⁇ T p(x)logp(x) . Given a probability
  • the entropy is the lowest rate at which the source can be losslessly compressed.
  • the rate R of the Huffman code for source X is bounded below by the entropy H(X) of source and bounded above by the entropy plus one bit, i.e., H(X) ⁇ R ⁇ H(X)+1.
  • Arithmetic Code Arithmetic code is another, increasingly popular, entropy code that is used widely in many compression schemes. For example, it is used in the compression standard JPEG-2001.
  • Huffman code rate is 1.85 bits per symbol.
  • Table 2 gives an example of a Huffman code for the corresponding extended alphabet with block length two; the resulting rate is 1.8375 bits per symbol showing performance improvement.
  • Huffman coding is not a good choice for coding long blocks of symbols, since in order to assign codeword for a particular sequence with length n, it requires calculating the probabilities of all sequences with length n, and constructing the complete Huffman coding tree (equivalent of assigning codewords to all sequences with length n).
  • Arithmetic coding is a better scheme for block coding; it assigns codeword to a particular sequence with length n without having to generate codewords for all sequences with length n. Thus it is a low complexity , high dimensional coding scheme.
  • ⁇ X k x M ⁇ are ordered subintervals of A with lengths proportional to p(x k+x ) .
  • Figure 2B shows how to determine the interval for sequence '132'. Once the interval [0.3352, 0.3465] is determined for '132', we can use binary code to describe the mid-point 0.34085 to sufficient accuracy as the binary representation for sequence '132'.
  • Rate R is then bounded as: ( X ⁇ H(X + 2
  • R is arbitrarily close to the source entropy when coding n n dimension n is arbitrarily large.
  • a multiple access network is a system with several transmitters sending information to a single receiver.
  • a multiple access system is a sensor network, where a collection of separately located sensors sends correlated information to a central processing unit.
  • Multiple access source codes yield efficient data representation for multiple access systems when cooperation among the transmitters is not possible.
  • An MASC can also be used in data storage systems, for example, archive storage systems where information stored at different times is independently encoded but all information can be decoded together if this yields greater efficiency.
  • FIG. 3 A two correlated information sequences ⁇ , ⁇ ", and ⁇ TM , are drawn i.i.d.
  • near-lossless MASCs The interest in near-lossless MASCs is inspired by the discontinuity in the achievable rate region associated with going from near-lossless to truly lossless coding. For example, if p(x,y)>0 for all (x,y) pairs in the product alphabet, then the optimal instantaneous lossless MASC achieves rates bounded below by H(X) and H(Y) in its descriptions of Xan ⁇ Y, giving a total rate bounded below by H(X)+H(Y). In contrast, the rate of a near-lossless MASC is bounded below by H(X ), which may be much smaller than H(X)+H(Y). This example demonstrates that the move from lossless coding to near-lossless coding can give very large rate benefits.
  • MASC on the contrary, takes advantage of the correlation among the sources; it uses independent encoding and joint decoding for the sources. (Joint encoding is prohibited because of the isolated locations of the source encoders or some other reasons.)
  • Witsenhausen, Al Jabri, and Yan treat the problem as a side information problem, where both encoder and decoder know X, and the goal is to describe Y using the smallest average rate possible while maintaining the unique decodability of Y given the known value of X.
  • Witsenhausen nor Al Jabri is optimal in this scenario, as shown in Yan.
  • Yan and Berger find a necessary and sufficient condition for the existence of a lossless instantaneous code with a given set of codeword lengths for Y when the alphabet size of X is two. Unfortunately their approach fails to yield a necessary and sufficient condition for the existence of a lossless instantaneous ' code when the alphabet size for is greater than two.
  • Prandhan and Ramchandran tackle the lossless MASC code design problem when source 7 is guaranteed to be at most a prescribed Hamming distance from source X. Methods for extending this approach to design good codes for more general ⁇ p.m.f.sp(x,y) are unknown.
  • Embodiments of the invention present implementations for multiple access source coding (MASC).
  • the invention provides a solution for independently encoding individual sources and for decoding multiple source data points from the individually encoded streams in a single decoder.
  • the invention provides a way to separately encode samples from data source x and date source y - using no collaboration between the encoders and requiring no knowledge of y by the encoder of x or vice versa - and a way to decode data pairs (x, y) using the individual encoded data streams for both x and y.
  • the algorithmic description includes methods for encoding, decoding, and code design for an arbitrary p.m.f. p(x,y) in each of the above four scenarios.
  • One embodiment of the present invention provides a solution that partitions the source code into optimal partitions and then 'finds a matched code that is optimal for the given partition, in accordance to the aforementioned definition of the class of algorithmns.
  • the source alphabet is examined to find combinable symbols and to create subsets of combinable symbols. These subsets are then partitioned into optimal groups and joined in a list. The successful groups from the list are then used to create complete and non-overlapping partitions of the alphabet. For each complete and non-overlapping partition, an optimal matched code is generated. The partition whose matched code provides the best rate is selected.
  • the matched code can be a Huffman code, an arithmetic code or any other existing form of lossless code.
  • Embodiments of the present invention can be used to provide lossless and near- lossless compression for a general compression solution for environments where multiple encoders encode information to be decoded by a single decoder or for environments where one or more encoders encode information to be decoded by a single decoder to which side information is available.
  • Figure 1 shows the binary trees for the second to the fourth code in Table 1.
  • Figure 2A illustrates an example Huffman code building process.
  • Figure 2B illustrates an example sequence determination process for Arithmetic coding.
  • Figure 3A shows an example MASC configuration.
  • Figure 3B shows the achievable rate region of multiple access source coding according to the work of Slepian-Wolf.
  • Figure 4 is a flow diagram of an embodiment of the present invention.
  • Figure 5 is a flow diagram of an embodiment of finding combinable symbols of the present invention.
  • Figure 6 is a flow diagram of an embodiment for building a list of groups.
  • Figure 7 is a flow diagram for constructing optimal partitions.
  • Figure 8 is flow diagram of an embodiment for constmcting a partition tree and labeling of each node within the tree.
  • Figure 9 is a block diagram of a side-information joint decoder embodiment of the invention.
  • Figures 10A - 10D illustrate node labeling and coding using the present invention.
  • Figure 11 is a flow diagram illustrating Huffman code word generation using the present invention.
  • Figures 12A — 12C illustrate arithmetic coding using the present invention.
  • Figures 13 illustrates a flow chart for a general coding scheme for an alternate algorithm embodiment.
  • Figure 14 show a comparison of three partition tress generated from the various embodiments of the present invention.
  • Figure 15 is a graph of general lossless and near-lossless MASC results.
  • Figure 16 is diagram showing how two groups are combined according to one embodiment of the invention.
  • Figure 17 is a flow diagram for generating matched code according to an embodiment of the present invention.
  • Figure 18 is a flow diagram for building matched codes that approximate the optimal length function according to another embodiment of the present invention.
  • Embodiments of the present invention relate to the implementation of lossless and near- lossless source coding for multiple access networks.
  • numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that embodiments of the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
  • the invention provides a general data compression scheme for encoding and decoding of data from multiple sources that have been encoded independently.
  • the invention can also be implemented in a side-information environment where one of the data sources is known to the decoder.
  • the invention is a general solution for multiple data sources, the invention is described by an example of a two data source network.
  • the present invention is described herein and by way of example to two data sources X and Y that provide data stream x ⁇ , %2, ⁇ 3 > ⁇ ⁇ n and data stream 2 ⁇ , 2 2.
  • Vz, ⁇ Vn respectively to dedicated encoders.
  • the streams are provided to a single decoder that can produce decoded data pairs (-- , Vn).
  • a lossless instantaneous MASC for joint source (X,Y) consists of two encoders x : X ⁇ ⁇ 0, 1 ⁇ * and ⁇ y. y ⁇ ⁇ 0, l ⁇ * and a decoder l ⁇ ⁇ 0, l ⁇ * x ⁇ 0, 1 ⁇ * -> Xx y.
  • dedicated encoder ⁇ x is encoding data source X which has alphabet X into strings of 0's and l's
  • a second dedicated encoder ⁇ y is doing the same for data source Y which has alphabet y .
  • a single decoder _1 recovers X and y from the encoded data streams. f ⁇ (x) and ⁇ (y)
  • P e P n ⁇ 1 (- ⁇ (X), fY)) ⁇ X, Y)) .
  • P e is the probability of occurrence for the discrepancy between
  • the present invention provides coding schemes for the extension of Huffman coding to
  • MASCs for optimal lossless coding and for near-lossless coding
  • extension of arithmetic coding to MASCs for low complexity, high dimension lossless coding and for near-lossless coding.
  • the embodiments of the invention are described with respect to two environments, one, lossless side-information coding, where one of the data sources is known to the decoder, and another environment, the general case, where neither of the sources must be independently decodable.
  • Figure 4 is a flow diagram that describes one embodiment of the invention.
  • the alphabet of symbols generated by the sources is obtained. These symbols are organized into combinable subsets of symbols at step 402. These subsets are such that there is no ambiguity between subsets as will be explained below.
  • the subsets are formed into optimal groups. These optimal groups are listed at step 404. The groups are used to find and define optimal partitions at step 405 that are complete and non-overlapping trees of symbols.
  • the successful partitions are used to generate matched codes at step 406, using either arithmetic or Huffman codes. One skilled in the art will recognize that lossless codes other than Huffman and arithmetic can be utilized as well.
  • the partition whose matched code has the best rate is selected and used for the MASC solution.
  • One embodiment of the present invention presents an implementation for lossless side- information source coding.
  • This problem is a special case of the general lossless MASC problem.
  • the decoder In a general MASC, the decoder has to decode both sources (i.e. X and Y) without knowing either one).
  • the side-information application one of data sources is known to the decoder. The goal is to find an optimal way to encode one of the data sources given the other source is known.
  • Figure 9 shows an example side-information multiple access network.
  • Side-information X is perfectly known to the decoder 902 (or losslessly described using an independent code on X), and the aim is to describe Y efficiently using an encoder 901 that does not know X.
  • This scenario describes MASCs where ⁇ x encodes X using a traditional code for p.m.f. ⁇ p(x) ⁇ xe ⁇
  • the code ⁇ ⁇ is a lossless instantaneous code for Y given Xor a
  • the side-information as shown in the figure comes from an external source to decoder 902.
  • This external source can come from a wide variety of places. For example it is possible that the decoder aheady has embedded side information within it. Another example is that the external source is a data stream from another encoder similar to encoder 901.
  • FIG. 5 is a flow diagram that describes the operation of finding combinable symbols and creating subsets of step 402. This example is directed to finding the combinable symbols of Y data.
  • a symbol y is obtained and at step 502 we find the set Cy — ⁇ z G y : z can be combined with y under p( ⁇ , y) ⁇ . Symbols in set Cy can be combined with
  • ⁇ Q is
  • step 504 we find the nonempty subsets for each set C. y .
  • the non empty subsets for set C y of symbol ⁇ 0 are ⁇ ⁇ x ⁇ , ⁇ ⁇ 4 ⁇ , ⁇ ⁇ 7 ⁇ , ⁇ ⁇ x , ⁇ 4 ⁇ , ⁇ , ,
  • step 505 it is determined if each set Cy has been checked.
  • FIG. 6 is a flow diagram of the group generation 403 and list making steps 404 of Figure 4. At step 601 the nonempty subsets for a set Cy generated by step 402 of Figure 4 are
  • the optimal partition is found for each nonempty subset.
  • a root is added to the optimal partition to create an optimal group. For example, for an optimal
  • any p(x,y), (y) is a special case of a 1-level group.
  • the tree representation 1(0) for 1-level group Q is a single node representing all members of Q.
  • an W-level group is also called a multi-level group.
  • the tree representation for any 1 -level group is a single node.
  • the members of C(72) are ⁇ OQ, 0 2 , ⁇ , 0 7 ⁇ ; the
  • members of Qi are ⁇ Q, 0 2 , ⁇ 4 , ⁇ , 0 ⁇ .
  • -7 2 is a 2-level group since symbol ⁇ 4 can be combined
  • representation 1(0 ⁇ is a 2-level tree.
  • the tree root has three children, each of which is a single
  • the root pi) of the three-level group has three children: the first two children are nodes T(Go) and T(G ⁇ ) ; the third child is a 2-level tree with root node 7 ⁇ ( ⁇ 2 ) and children T ⁇ ) and
  • the tree representation ⁇ Gz) is a 3-level tree.
  • the partition design procedure for groups is recursive, solving for optimal partitions on sub-alphabets in the solution of the optimal partition on V.
  • V' C y the procedure begins by making a list C yl of all (single- or multi-level) groups that can appear in an
  • step 700 we initialize z equal to 1.
  • step 702 we add the "/ th" group from C y , to VA
  • step 703 we check to see if they ' th group overlaps or is
  • step 706 we check to see if V' is complete. If not, increment _/ ' at step 704 and return to step 703. If V' is complete then see if/ is the last group in y , at step 707. If so, make a list of successful partitions at step 708. If not, then increment i
  • partition tree The tree representation of a partition is called a partition tree.
  • a partition tree representation for each Gf then, link the root of all 1(0-), - S ⁇ l, . .., m. ⁇ to a single node, which is defined as the root r of A partition tree is not necessarily a regular k-ary tree; the number of children at each node depends on the specific multi-level group.
  • the algorithm systematically builds a partition, adding one group at a time from -C y > to set V(y') until V(y') is a complete partition. For 0 € y ,
  • Figure 10A gives an example of a partition tree from the example of Table 1.
  • the partition ⁇ )
  • level group 1( ⁇ 3 , OQ) and the other is a 3-level group consisting of root node 7( ⁇ 7 ), with children
  • the branches of a partition are labeled.
  • n 1 ⁇ ) when it is clear from the context that we are talking about the node rather than the 1-level group at that node (e.g. n e ) rather than 2[n) e Kf(y)).
  • n's children are labeled as nl, n2, .. ., r- ⁇ f(n), where n/c is a vector created by concatenating fc to n and K( ) is the number of children descending from n.
  • partition tree for Figure 10A appears in Figure 10B.
  • the node probability q(n) of a 1-level group n with n € T QA)) is the sum of the probabilities of that group's members.
  • the subtree probability Q( ) of the 1-level group at node n G T( )) is the sum of probabilities of n's members and descendants.
  • the root node is labeled "r” and the first level below, comprising a pair of children nodes, is numbered “1" and "2" from left to right as per the convention described above.
  • the concatenation convention and left to right convention results in the three children nodes being labeled "21", “22", and “23” respectively. Accordingly, the children at root "23" are labeled "231" and "232".
  • the present invention determines the optimal partitions by generating matched code for each partition.
  • the partition whose matched code has the best rate (of compression) is the partition to use for the MASC solution.
  • a partition tree is constructed for each partition. (Note that this step is described above).
  • the order of descendants is fixed and numbered from left to right.
  • the node at each level is labeled with a concatenation vector.
  • n's children are labeled as nl, n2, ..., r-ET(n), where ⁇ ik is a vector created by concatenating fc to n and K(ri) is the number of children descending from n.
  • the labeled partition tree for Figure 10A appears in Figure 10B.
  • a matched code is generated for the partition. This matched code can be generated, for example, by Huffman coding or Arithmetic coding.
  • a matched code for a partition is defined as follows.
  • V(X) is a binary code such that for any node n € T(P(y)) and symbols 21, /2 G n and 23 G n -,
  • FIG. 17 shows how a matched code is generated according to one embodiment of the invention.
  • the process begins at the root of the tree.
  • the prefix code for each nodes' offsprings are designed.
  • the ancestors' codewords are concatenated to form the resulting matched code.
  • a partition specifies the prefix and equivalence relationships in the binary descriptions of y € y.
  • a matched code is any code with those properties.
  • the above definitions enforce the condition that for any matched code, y ⁇ ,y 2 G for some x e X implies
  • Theorem 1 establishes the equivalence of matched codes and lossless side-information codes.
  • Theorem 1 Code 7y is a lossless instantaneous side-information code for p( ⁇ ,y) if and only if 7?
  • a matched code for partition V(y) is a lossless instantaneous side-information code for Y. This proof follows from the definition of a matched code. In a matched code for partition V(y) , only symbols that can be combined can be assigned codewords that violate the prefix condition, thus only symbols that can be combined are indistinguishable using the matched code description. Since symbols 21 and 2/2 can be combined only if
  • partition y ⁇ describes a matched code for VQ?).
  • path jj downward from the root of the tree (here '0' and T correspond to left and
  • each non-root node in T represents a 1-level group.
  • n G 1 The binary description of any internal node n G 1 is the prefix of the descriptions of its descendants. Thus for 7r to be prefix free on -4- for each x G X, it must be possible to combine n with any of its descendants to ensure lossless decoding. Thus n and its descendants form a multi-level group, whose root 72 is the 1-level group represented by n. In this case, C(72) is the set of (possibly multi-level) groups descending from n in T.
  • Theorem 2 provides a method of calculating the optimal length function.
  • Three strategies for building matched codes that approximate the optimal length fimction of Theorem 2.
  • Figure 18 shows the process of building matched codes. At step 1801 the process begins at root. Then at 1802 one of three strategies is used (Shannon / Huffman / Arithmetic code) for code design for each node's immediate offsprings based on their normalized subtree probabilities. At 1803 the ancestors' codewords for each node are concatenated.
  • nfc using a Shannon code with alphabet ⁇ l, ..., K(n) ⁇ and p.m.f. (Q(n ⁇ )/ ⁇ J ⁇ ⁇ 5( n j) ⁇ i ;
  • step 1101 we begin at the root node and we design a Huffman code on the set of nodes descending from 1 's root, according to ' their subtree probabilities, i.e. nodes ⁇ (0 3 ,03), (0 7 ) ⁇ with p.m.f.
  • node n is the concatenation of the codewords of all nodes traversed in moving from root 1( ⁇ ) to node n in T .
  • the codewords for this example are shown in Figure IOC.
  • data sequence Y 1 is represented by an interval of the [0, 1) line.
  • Y n by describing the mid-point of the corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals.
  • the interval for y n recursively, by first breaking [0, 1) into intervals corresponding to all possible values of 2/ ⁇ , then breaking the interval for the observed into subintervals
  • pW (n) pW (A) 0 Q((rnin 0 ))--aq(n 0 ) (here p ⁇ n) is defined to equal 1 for the unique node r at depth 0). Refining the interval for
  • sequence Y l ⁇ 1 to find the subinterval for Y % involves finding the 1-level group n £ Vty) such
  • Figure 12B shows these intervals.
  • intervals of some symbols overlap in the matched arithmetic code.
  • the intervals associated with symbols 04 and 05 subdivide the interval associated with
  • the decoder can uniquely distinguish between symbols with overlapping intervals to correctly decode Y n using its side information about X n .
  • H(p ⁇ ) > H(p 2 ) does not imply R ⁇ H) (p ⁇ ) ⁇ R H) ⁇ p2).
  • V ⁇ (y) is a better partition for Huffman coding while VQ is better for arithmetic coding, ⁇
  • G 0 denotes the 1-level group at some node n-, in
  • Group 0" modifies Gj by replacing 0 o with 1-level group (I,G Q ) and adding the descendants
  • Figure 16 shows the subtree probabilities associated with combining Gi with G at & . Let the resulting new group be G k .
  • ⁇ lj represents the portion of the average rate unchanged by the combination of Gi
  • Theorem 4 does not hold for matched Huffman coding.
  • Theorem 5 shows a result that does apply in Huffman coding.
  • T(P(y)) respectively.
  • the decoder can determine that it has reached the end of a single stage description if and only if the matched code is itself instantaneous.) If either of the nodes reached is empty, then the decoder knows that it must read more of the description; thus we assume, without loss of generality, that n, ⁇ and n y axe not empty.
  • T x and T y be the subtrees
  • step before the decoding halts one (or more) of the conditions (A), (B), and (C) must be satisfied.
  • Lemma 6 Partition pair (P(X), PQ ) for p(x , y) yields a lossless instantaneous MASC if and only if for any ⁇ , x' E X such that ⁇ ix(x), x(x') ⁇ does not satisfy the prefix condition,
  • condition (A) is satisfied, but condition (D) is violated.
  • MASC prefix condition is likewise violated.
  • TSt is the prefix of ⁇ (x l ) and 7y(y) is the prefix of 7y(y') .
  • Optimality of a matched code for partition V(y) is independent of whether P y) is used in a side-information code or an MASC.
  • our optimal matched code design methods from lossless side-information coding apply here as well, giving optimal matched Shannon, Huffman, and arithmetic codes for any partition pair (P(X) , V(y) ) for p(x, y) that satisfies the MASC prefix condition.
  • V(y)) is optimal for use in a matched Huffinan MASC on p(x , y) if (El ⁇ XX), E- (£ ° ,(Y))
  • Lemma 7 we again restrict our attention to partitions with no empty nodes except for the root. The proof of this result does not follow immediately from that of the corresponding result for side-information codes.
  • Lemma 6 whether or not two symbols can be combined for one alphabet is a function of the partition on the other alphabet. Thus we must here show not only that removing empty nodes does not increase the expected rate associated with the optimal code for a given partition but also that it does not further restrict the family of partitions allowed on the other alphabet.
  • X' is named £-p (X i ) .
  • MASC ((lx, 1Y) , 1 ⁇ ) a near-lossless instantaneous MASC for P e ⁇ e if
  • Theorem 6 gives the near-lossless MASC prefix property. Recall that the notation
  • Theorem 6 Partition pair (V(X), T )) can be used in a near-lossless instantaneous MASC on p( ⁇ ,y) if and only if both of the following properties are satisfied:
  • condition (A) or condition (B) If either condition (A) or condition (B) is not satisfied, then there exist symbols x,x' E X and y, y' £ y, such that y, y' £ A x U A x ', and one of the following is true:
  • a decoding error occurs if and only if there is more than
  • the decoder reconstructs the symbols as a ⁇ g max ⁇ x y)e g i ⁇ ) ⁇ g ⁇ ) p( , y) .
  • any 1-level group G Q 3 7 is a legitimate group in near-lossless side-information coding of Y given X.
  • the error penalty of a multi-level group equals the sum of the error penalties of the 1-level groups it contains.
  • a 1-level group 0 - ⁇ -V is a legitimate group for a general near-lossless MASC given V(X) if for any y,j/ , y and y 1 do not both belong to -U A-, for
  • near-lossless MASC design is to consider all combinations of 1-level groups that yield an error within the allowed error limits, in each case design the optimal lossless code for the reduced alphabet that treats each such 1-level group G as a single symbol ig (xg . X if ⁇ G ⁇ > 1) or yg (yg $ -V if ⁇ G ⁇ > 1),
  • X Xf) ⁇ xi, . . ., x m ⁇ c U ⁇ g ⁇ and y and p.m.f.
  • Gy (y ⁇ , . . -,yk) then the error penalty is
  • a one-level group can be combined with the root of a distinct (one- or multi-level) group if and only if
  • Two (one- or multi-level) groups can be made descendants of a single root if and only if the groups hold adjacent positions in the ordering. 4.
  • the group formed by combining two symbols or two groups occupies the position associated with those symbols or groups in the alphabet ordering. Given that only adjacent symbols can be combined, there is no ambiguity in the position of a group.
  • G[i, j] can reside at the root of -7 [7+1, k] .
  • G[i, j] , /( w[i, + 1, k] ) equals £[z, 7] 's best rate; when G[i, j] and [ + 1, k] must be
  • w[i, i + L] min /e , , ⁇ ,. +lj .., +I _ l ⁇ /( w[i, )
  • Figure 13 illustrates the process in the alternate algorithm embodiment.
  • an ordering of the alphabet is fixed.
  • the variables weight, group, etc
  • L is set to 1.
  • i is set to 1.
  • L and i are counter variables for the loop starting at box 1305, which iterates through the ordering and progressively creates larger combination out of adjacent groups until an optimal code for the ordering is obtained.
  • the current combination i,j, i+L
  • the function/ for the combination is also determined at this point.
  • the weight and grouping of the current combination are determined.
  • the algorithm may be used in a number of different ways.
  • the code designer may simply fix the ordering, either to a choice that is believed to be good or to a randomly chosen value, and simply use the code designed for that order. For example, since only symbols that are adjacent can be combined, the designer may choose an ordering that gives adjacent positions to many of the combinable symbols.
  • the designer may consider multiple orderings, finding the optimal code for each ordering and finally using the ordering that gives the best expected performance.
  • the designer may also choose a first ordering O at random, find the best code £(C.) for
  • G( m+ ) is guaranteed to be at least as good
  • Table 5 gives another example of the joint probability of source X and Y, with
  • Huffman code for Y subject to the constraints imposed by ordering ⁇ a x , a 2 , a 3 , ⁇ 4 , a 5 ⁇ on 3 ⁇ •
  • This section shows optimal coding rates for lossless side-information MASCs, lossless general MASCs, and near-lossless general MASCs for the example of Table 3. We achieve these results by building the optimal partitions and matched codes for each scenario, as discussed in earlier sections. Both Huffman and arithmetic coding rates are included.
  • H(X) and R H (X) are the optimal and Huffinan rate for source X when X is coded
  • Figure 15 shows general lossless and lossy MASC results.
  • the optimal lossless MASC gives significant performance improvement with respect to independent coding of X and Y but does not achieve the Slepian- Wolf region.
  • error probability 0.01 which equals min. p(-c, y), i.e. the smallest error probability that may result in different rate region than in
  • the achievable rate region is greatly improved over lossless coding, showing the benefits of near-lossless coding.
  • error probability 0.04 we get approximately to the Slepian- Wolf region for this example.
  • Table 7 gives examples of a few randomly chosen orderings' Huffman code rates and arithmetic code rates.
  • a multiple access source code is a source code designed for the following network configuration: a pair of correlated information sequences ⁇ -X. ⁇ ⁇ and ⁇ l". ⁇ 0 ⁇ is drawn i.i.d. according to joint probability mass function (p.m.f.) p(x, y) the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources.
  • p.m.f. joint probability mass function
  • p(x, y) the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources.
  • the work of Slepian and Wolf describes all rates achievable by MASCs of infinite coding dimension (n — »• oo) and asymptotically negligible error probabilities (P e - 0). In this paper, we consider the properties of optimal MASCs for practical coding applications.
  • a multiple access network is a system with several transmitters sending information to a single receiver.
  • One example of a multiple access system is a sensor network, where a collection of separately located sensors sends correlated information to a central processing unit.
  • MASCs yield efficient data representations for multiple access systems when cooperation among the transmitters is not possible.
  • Figure 1 (a) An MASC and (b) the Slepian- Wolf achievable rate region
  • Prandhan and Ramchandran solve the lossless MASC code design problem when source Y is guaranteed to be at most a prescribed Hamming distance from source X. Methods for extending this approach to design good codes for more general p.m.f.s p(x,y) are unknown.
  • the first problem involves losslessly describing source Y when source X is treated as side information known perfectly to the decoder for source Y but unknown to the encoder for source Y.
  • the solution to this problem is applicable both to the problem of coding with side information and to a special case of the MASC problem.
  • the MASC application arises in applications where source X is losslessly described using a traditional, independent lossless code (e.g., a Huflman code matched to the marginal p.m.f.
  • Section II generalizes the Huffman code design algorithm to the scenario where X is known to the decoder of Y but unknown to the encoder of Y. Section II also treats the problem of arithmetic code design for the same scenario, allowing low complexity, high dimension entropy MASCs.
  • the solution to the side information problem considered in Section II yields codes that rninimize Ry at the expense of a high Rx (when X is treated as side information) and codes that minimize Rx at the expense of a high Ry (when Y is treated as side information).
  • the general MASC problem considered in Section III relaxes the assumption that one of the two sources should be independently decodable to find the lossless MASC with the best possible tradeoff between Rx and Ry. In this case, we consider all codes with which an independently encoded X and Y can be jointly and instantaneously decoded with probability of error zero.
  • the goal of the code design is to find the code that minimizes ⁇ R ⁇ + (1 — ⁇ )R ⁇ for an arbitrary value of ⁇ € [0, 1].
  • the two side information codes correspond to special cases ( ⁇ E ⁇ 0, 1 ⁇ ) of the generalized problem.
  • the result is a family of codes with intermediate values of Rx and Ry.
  • Section IV treats the near-lossless MASC problem.
  • the problem is to design the code that minimizes ⁇ R ⁇ + (1 — ⁇ )R ⁇ over all instantaneously decodable MASCs with probability of error no greater than P e .
  • ⁇ E [0, 1] and P e € [0, 1] are arbitrary constants.
  • We here generalize the lossless Huflman and arithmetic MASC algorithms for near-lossless coding.
  • Section V contains experimental results. The key contributions of the paper are summarized in Section VI.
  • a lossless instantaneous MASC for joint source (X, Y) consists of two encoders ⁇ x : X -» ⁇ 0, 1 ⁇ * and ⁇ y : y — s» ⁇ 0, 1 ⁇ * and a decoder ⁇ ⁇ x : ⁇ 0, 1 ⁇ * x ⁇ 0, 1 ⁇ * — ⁇ X x y.
  • the decoder can correctly reconstruct yi by reading only the first
  • the code Ty is a lossless instantaneous code for Y given X or a lossless instantaneous side-information code.
  • Lemma 1 Code ⁇ y is a lossless instantaneous side-information code for Y given X if and only if for each x G X, y, y' e c implies that ⁇ (y) and ⁇ (y') satisfy the prefix condition.
  • the optimal encoder ⁇ ⁇ is the one that losslessly describes Y with the smallest expected rate.
  • Lemma 1 demonstrates that instantaneous coding in a side-information MASC requires only that il ⁇ (y) '• V Ax ⁇ be prefix-free for each x E X and not that ⁇ (y) : y E y ⁇ be prefix-free, as would be required for instantaneous coding if no side-information were available to the decoder.
  • the collection Q (y ⁇ , ... ,y m ) is called a 1-level group for p(x,y) if each pair of distinct members Vi.V j G can be combined under p(x,y).
  • (y) is a special case of a 1-level group.
  • the tree representation T(G) for 1-level group Q is a single node representing all members of G-
  • members of all Q' C(TZ are called members of C(T )
  • members of 72. and C(72.) are called members of Q.
  • T(G) for G T(H) is the root of T(G) and the parent of all subtrees T(G') for G' E C(K).
  • 72- is a 1-level group
  • C(72.) is a set of groups of M — 1 or fewer levels, at least one of which is an (M — l)-level group.
  • T(72.) is the root of T(G) and the parent of all subtrees T(G') for Q' € C(72.).
  • an M-level group is also called a multi-level group.
  • H ( ⁇ 4 )
  • C(72) ⁇ ( ⁇ o), (0 2 , 07), ( ⁇ ) ⁇ .
  • the members of C(72.) are ⁇ 00, 0 2 , 06, 07 ⁇ ; the members of i are ⁇ 00, 0 2 , 04, 06, 07 ⁇ .
  • G 2 is a 2-level group since symbol ⁇ 4 can be combined with each of ⁇ o, ⁇ 2 , ⁇ , 07, and ( ⁇ o), (02, 07), ( ⁇ ) are 1-level groups under p.m.f.
  • the tree representation T(G ⁇ ) is a 2-level tree.
  • the tree root has three children, each of which is a single node.
  • the root T(a ⁇ ) of the three-level group has three children: the first two children are nodes T( ⁇ o) and T( ⁇ ); the third child is a 2-level tree with root node T( ⁇ 2) and children T(a ⁇ ) and T(os).
  • the tree representation of a partition is called a partition tree.
  • a partition tree is not necessarily a regular fe-ary tree; the number of children at each node depends on the specific multi-level group.
  • Figure 2 (a) Partition tree T(V(y)); (b) labels for T(V(y)) ⁇ (c) matched code for V(y); (d) Combining groups in partition ⁇ ( ⁇ 0 ), (( ⁇ 2 ) : ⁇ ( ⁇ 4 ), ( ⁇ 5 ) ⁇ ), (( ⁇ 7 ) : ⁇ ( ⁇ i), (03) ⁇ ), ( ⁇ e) ⁇ -
  • the node probability q( ⁇ ) of a 1-level group n with n G T(V(y)) is the sum of the probabilities of that group's members.
  • the subtree probability Q(n) of the 1-level group at node n G T(V(y)) is the sum of probabilities of n's members and descendants.
  • ⁇ (23) p ⁇ ( i)
  • Q(22 py(a 2 ) + ⁇ (ai) +py(a 5 ).
  • ⁇ y (y) describes the path in T(V(y)) from r to T(G) ⁇ the path description is a concatenated Ust of step descriptions, where the step from n to nfc, fc G ⁇ 1, . . . , K (n) ⁇ is described using a prefix-code on ⁇ 1, . . . , i-T(n) ⁇ .
  • An example of a matched code for the partition of Figure 2(a) appears in Figure 2(c), where the codeword for each node is indicated in parentheses.
  • a partition specifies the prefix and equivalence relationships in the binary descriptions of y G y,- a matched code is any code with those properties.
  • Theorem 1 Code ⁇ y is a lossless instantaneous side-information code for p(x, y) if and only if ⁇ y is a matched code for some partition V(y) for p(x,y).
  • a matched code for partition V(y) is a lossless instantaneous side- information code for Y.
  • This proof follows from the definition of a matched code.
  • the decoder can decode the value of X and then losslessly decodes the value of Y using the instantaneous code on c .
  • n G 7 The binary description of any internal node n G 7 " is the prefix of the descriptions of its descendants.
  • n and its descendants form a multi-level group, whose root 72- is the 1-level group represented by n.
  • C(72.) is the set of (possibly multi-level) groups descending from n in T.
  • the set of codewords descending from the same node satisfies the prefix condition.
  • T is a partition tree for some partition V(y) for p(x, y) and ⁇ f is a matched code for v(y).
  • l* y) (nk) l* ny) (n) for all n G T(V(y)) and fe G ⁇ 1, . . . , ⁇ C(n) ⁇ if those lengths are all integers.
  • Step 1 Design a Huffman code on the set of nodes descending from 7 ⁇ 's root, according to their subtree probabihties, i.e. nodes ⁇ (0 3 , a ⁇ ), (a-r) ⁇ with p.m.f.
  • Step 2 For each subsequent tree node n with i-T(n) > 0, consider as a new set, and do Huffman code design on this set, with p.m.f.
  • T be the partition tree of V(y)-
  • the codelength of a node n G T is denoted by l(n).
  • the average length J for V (y) is
  • data sequence Y n is represented by an interval of the [0, 1) line.
  • Y n by describing the mid-point of the corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals.
  • the interval for yTM recursively, by first breaking [0, 1) into intervals corresponding to all possible values of yi (see Figure 3(a)), then breaking the interval for the observed Yx into subintervals corresponding to -ill possible values of Y ⁇ y 2 , and so
  • Figure 3 Dividing the unit interval in (a) traditional arithmetic coding and (b) matched arithmetic coding for partition T(y) of Figure 2(a). (c) Matched arithmetic coding for sequence 0 7 0 3 0 ⁇ 1 0 2 .
  • the subintervals for ⁇ Y k y k +i ⁇ are ordered subintervals of A with lengths proportional to p(y k + ⁇ ).
  • the intervals of some symbols overlap in the matched arithmetic code.
  • the intervals associated with symbols 0 4 and 0 5 subdivide the interval associated with symbol 0 2 in the previous example.
  • These overlapping intervals correspond to the situation where one symbol's description is the prefix of another symbol's description in matched Huffman coding.
  • the decoder can uniquely distinguish between symbols with overlapping intervals to correctly decode Y n using its side information about X n .
  • V(y) Given a partition V(y), let and be the Huffman and optimal description lengths respectively for V(y).
  • P(y) is optimal for matched Huffman side-information coding on p(x, y) if ElX (Y) ⁇ El j ,, j Y) for any other partition V'(y) for p(x, y) (and therefore, by Theorems 1 and 3, ElX (Y) ⁇ El(Y) where I is the description length for any other instantaneous lossless side-information code onp(-c, y)).
  • T(y) is optimal for matched arithmetic side- information coding on p(x,y) if Elp, y Y) ⁇ El?p,,y Y) for any other partition V'(y) for p(x,y).
  • Group G* modifies Gj by replacing Q 0 with 1-level group (I, Go) and adding the descendants of / (in addition to the descendants of Go) as descendants of (I, Go) in T(G*)-
  • V(y) — ⁇ Gi, ⁇ ⁇ ⁇ , G m ⁇ be a partition of y under p(x, y).
  • Gi G V(y) can be combined with Gj G V(y) at Go, where Go s the 1-level group at some node n 0 of (Gj)- Let V*(y) be the resulting partition.
  • Si ⁇ Jjx - - -ji '. l ⁇ i ⁇ M ⁇ (i.e. the set of nodes on the path to n 0 , excluding node J);
  • S 2 ⁇ n G T(Gj) • n is the sibling of node s, s G «S ⁇ ;
  • S3 («-> ⁇ U ⁇ J ⁇ )n ⁇ n 0 ⁇ c (i.e. the set of nodes on the path to no, excluding node n 0 ).
  • Q a and q n denote the subtree and node probabihties respectively
  • Figure 4 Combining two groups (Gi and Gj) into one group.
  • -T j Q j iog(Q J + Q j ) + ⁇ Q nk iog ⁇ + - Q O r + ⁇ Q "* lo ⁇ nfceSi ⁇ ⁇ Q"*n ⁇ + ' Q ⁇ l ' ⁇ ⁇ xk Si ⁇ " ⁇ Q Q n n + fc Ql
  • Theorem 4 does not hold for matched Huffman coding.
  • Theorem 5 shows a weaker result that does apply in Huflman coding.
  • the partition design procedure can be recursive, solving for optimal partitions on sub-alphabets in the solution of the optimal partition on y.
  • the procedure begins by making a hst Cy of all (single- or multi-level) groups that can appear in an optimal partition T(y') of _V for p(x,y) given the above properties of optimal partitions.
  • y G _V' we wish to add to the Ust all groups that have y as one member of the root, and some subset of _V as members.
  • the optim ⁇ d partition is the partition whose optimal code gives the lowest expected rate.
  • a lower complexity higher memory algorithm is achieved by recursively building optimal matched codes for the partial partitions and ruling out partial partitions for which another partial partition on the same alphabet yields a lower rate.
  • V(y) used in lossless side-information coding is replaced by a pair of partitions (V(X),T > (y))-
  • V(X) and V(y) describe the prefix ⁇ md equivalence relationships for descriptions ⁇ (x) : x G X ⁇ and ⁇ (y) ⁇ y G y ⁇ , respectively.
  • Tx and T y be the subtrees descending from n. ⁇ and ny (including n and ny respectively). (The subtree descending from a leaf node is simply that node.) For inst ⁇ mtaneous coding, one of the following conditions must hold:
  • X G Tx or n is a leaf implies that Y 6 ny, and Y " G 7y or n # is a leaf impUes that l e n ⁇ ;
  • condition (A) the decoder recognizes that it has reached the end of ⁇ (X) and ⁇ (Y).
  • condition (B) the decoder recognizes that it has not reached the end of ⁇ y(Y) and reads the next stage description, traversing the described path in T(T(y)) to node n y ' with subtree 7y.
  • Condition (C) similarly leads to a new node n' x and subtree T x - If none of these conditions holds, then the decoder cannot determine whether to continue reading one or both of the descriptions, and the code cannot be instantaneous.
  • the decoder continues traversing T(V(X)) and T(P(y)) until it determines the 1-level groups n ⁇ r and ny with X G n and Y G ny. At each step before the decoding halts, one (or more) of the conditions (A), (B), and (C) must be satisfied.
  • Lemma 6 Partition pair (V(X), P(y)) for p(x, y) yields a lossless instantaneous MASC if and only if for any x, x' G X such that ⁇ x(x), ⁇ x(x') ⁇ does not satisfy the prefix condition, ⁇ (y) : y G A x U - a ,' ⁇ satisfies the prefix condition.
  • ⁇ (x) : x G B y U -5 ⁇ satisfies the prefix condition.
  • condition (A) is satisfied, but condition (D) is violated.
  • one of the foUowing must happen: (a) the decoder determines that Y G ny, but cannot determine whether or not X G n ⁇ (b) the decoder determines that X G nx, but cannot determine whether or not Y G ny; (c) the decoder c ⁇ innot determine whether or not Y G ny or whether or not X G nx.
  • n x and ny to denote the nodes of the partition tree satisfying x E n x and y E ⁇ y .
  • x, x' G X and y,y' G CV satisfy y, y' A c U A c ' (or equivalently x, x' G B y U B y >), but ⁇ (x) and ⁇ (x') do not satisfy the prefix condition, and ⁇ (y) and ⁇ (y') do not satisfy the prefix condition; i.e. the MASC prefix condition is violated.
  • one of the foUowing must hold:
  • ⁇ x(x) is the prefix of ⁇ (x') and ⁇ (y) is the prefix of ⁇ (y')-
  • Optim-dity of a matched code for partition V(y) is independent of whether V(y) is used in a side-information code or an MASC.
  • our optimal matched code design methods from Section II apply here as weU, giving optimal matched Shannon, Huffman, and arithmetic codes for ⁇ y partition pair (P(X), P(y)) for p(x, y) that satisfies the MASC prefix condition.
  • (V(X),V(y)) is optimal for use in a matched Huffman MASConp(x, y) if (ElX ⁇ X), El ⁇ ,, (Y)) sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x CV.
  • (V(X), V(y)) is optimal for use in a matched arithmetic MASC on p(x, y) if EVp ⁇ ) X), El j> ⁇ y ⁇ (Y)) sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x CV.
  • l ⁇ and V p denote the Huffm ⁇ -n and optimal description lengths respectively for partition V, and Huffm ⁇ in coding is optimal over aU codes on a fixed alphabet.
  • Mated codes e.g., Huffm ⁇ -n coding on X and arithmetic coding on Y
  • ⁇ -re also possible within this framework.
  • the lower convex hull of the rate region of interest is achievable through time sharing, we describe the lower boundary of achievable rates rather than the convex huh of that region in order to increase the richness of points that can be achieved without time sharing.
  • This region describes points that minimize the rate needed to describe Y subject to a fixed constraint on the rate needed to describe X or vice versa.
  • the regions are not identical since the curves they trace are not convex. Their convex hulls are, of course, identic-il.
  • Lemma 7 we again restrict our attention to partitions with no empty nodes except for the root. The proof of this result does not foUow immediately from that of the corresponding result for side-information codes.
  • Lemma 6 whether or not two symbols can be combined for one alphabet is a function of the partition on the other alphabet. Thus we must here show not only that removing empty nodes does not increase the expected rate associated with the optimal code for a given partition but also that it does not further restrict the famUy of partitions allowed on the other alphabet.
  • ⁇ (x) is any matched code for V(X).
  • Every node except for the root is non-empty, ⁇ md K(n) ⁇ 1).

Abstract

Embodiments of the invention present implementations for multiple access source coding (MASC). One embodiment presents an implemenation directed at the lossless side-information case of MASC. Another embodiment gives an implemenation of the general case if MASC. One combodiment is a near-lossless implemenation of MASC. In a two dimensional example, the imvention provides a way to decode (303) data pairs (x, y) form encoded individual data streams x (301) and y (302). The present invention provides a solution that partitions the source code into optimal partitions and then finds a matched code that is optimal for the given partition. Embodiments of the present invention use Optimal Shannon, Huffman and Arithmetic Codes for the matched codes. Another embodiment of the present invetion gives a mehtod of finding near lossless multiple access source coding.

Description

BACKGROUND OF THE INVENTION
This application claims priority from provisional applications numbered 60/265,402 filed January 30, 2001 and 60/301,609 filed June 27, 2001.
1. FIELD OF THE INVENTION
The present invention relates to the implementation of lossless and near-lossless source coding for multiple access networks.
2. BACKGROUND ART
Source coding
Source coding, also known as data compression, treats the problem of efficiently representing information for data transmission or storage.
Data compression has a wide variety of applications. In the area of data transmission, compression is used to reduce the amount of data transferred between the sources and the destinations. The reduction in data transmitted decreases the time needed for transmission and increases the overall amount of data that can be sent. For example, fax machines and modems all use compression algorithms so that we can transmit data many times faster than otherwise possible. The Internet uses many compression schemes for fast transmission; the images and videos we download from some bulletin boards are usually in a compressed format: In the area of data storage, data compression allows us to store more information on our limited storage space by efficiently representing the data. For example, digital cameras use image compression schemes to store more photos on their memory cards, DVDs use video and audio compression schemes to store movies on portable disks, we could also utilize text compression schemes to reduce the size of text files on computer hard disks.
In many electronic and computer applications, data is represented by a stream of binary digits called bits (e.g., 0 and 1). Here is an example overview of the steps involved in compressing data for transmission. The compression begins with the data itself at the sender. An encoder encodes the data into a stream with a smaller number of bits. For example, an image file to be sent across a computer network may originally be represented by 40,000 bits. After the encoding the number of bits is reduced to 10,000. In the next step, the encoded data is sent to the destination where a decoder decodes the data. In the example, the 10,000 bits are received and decoded to give a reconstructed image. The reconstructed image may be identical to or different from the original image.
Here is another example of the steps involved in compressing data for storage. In making
MP3 audio files, people use special audio compression schemes to compress the music and store them on the compact discs or on the memory of MP3 players. For example, 700 minutes of MP3 music could be stored on a 650MB CD that normally stores 74 minutes of music without MP3 compression. To listen to the music, we use MP3 players or MP3 software to decode the compressed music files, and get the reconstructed music that usually has worse quality than the original music. When transmitting digital data from one part of a computer network to another, it is often useful to compress the data to make the transmission faster. In certain networks, known as multiple access networks, current compression schemes have limitations. The issues associated with such systems can be understood by a review of data transmission, compression schemes, and multiple access networks.
Lossless and Lossy Compression
There are two types of compression, lossless and lossy. Lossless compression techniques involve no loss of information. The original data can be recovered exactly from the losslessly compressed data. For example, text compression usually requires the reconstruction to be identical to the original text, since very small differences may result in very different meanings. Similarly, computer files, medical images, bank records, military data, etc., all need lossless compression.
Lossy compression techniques involve some loss of information. If data have been compressed using lossy compression, the original data cannot be recovered exactly from the compressed data. Lossy compression is used where some sacrifice in reconstruction fidelity is acceptable in light of the higher compression ratios of lossy codes. For example, in transmitting or storing video, exact recovery of the video data is not necessary. Depending on the required quality of the reconstructed video, various amounts of information loss are acceptable. Lossy compression is widely used in Internet browsing, video, image and speech transmission or storage, personal communications, etc. One way to measure the performance of a compression algorithm is to measure the rate
(average length) required to represent a single sample, i.e. R = ^T P(x)l(x) , where l(x) is the
length of the codeword for symbol x, P(x) is the probability of x. Another way is to measure the distortion, i.e., the average difference between the original data and the reconstruction.
Fixed-length Code
A fixed- length code uses the same number of bits to represent each symbol in the alphabet. For example, ASCII code is a fixed-length code: it uses 7 bits to represent each letter. The codeword for letter a is 1000011, that for letter A is 1000001, etc.
Variable-length Code
A variable-length code does not require that all codewords have the same length, thus we may use different number of bits to represent different symbols. For example, we may use shorter codewords for more frequent symbols, and longer codewords for less frequent symbols; thus on average we could use fewer bits per symbol. Morse code is an example of a variable- length code for the English alphabet. It uses a single dot () to represent the most frequent letter E, and four symbols: dash, dash, dot, dash (---) to represent the much less frequent letter Q. Non-singular, Uniquely decodable. Instantaneous, Prefix-free Code
Table 1. Classes of Codes
Symbols P(X) Singular Non-singular, but not Uniquely decodable, Instantaneous 0A5 0 ϊ ϊ ϊ
2 0.25 0 10 10 01
3 0.1 1 0 100 001
4 0.2 10 110 000 000
A non-singular code assigns a distinct codeword to each symbol in the alphabet. A non- singular code provides us with an unambiguous description of each single symbol. However, if we wish to send a sequence of symbols, a non-singular code does not promise an unambiguous description. For the example given in Table 1, the first code assigns identical codewords to both symbol '1 ' and symbol '2', and thus is a singular code. The second code is a non-singular code, however, the binary description of the sequence '12' is '110', which is the same as the binary description of sequence '113' and that of symbol '4' . Thus we cannot uniquely decode those sequences of symbols.
We define uniquely decodable codes as follows. A uniquely decodable code is one where no two sequences of symbols have the same binary description. That is to say, any encoded sequence in a uniquely decodable code has only one possible source sequence producing it. However, one may need to look at the entire encoded bit string before determining even the first symbol from the corresponding source sequence. The third code in Table 1 is an example of a uniquely decodable code for the source alphabet. On receiving encoded bit ' 1', one cannot determine which of the three symbols '1', '2', '3' is transmitted until future bits are received.
Instantaneous code is one that can be decoded without referring to future codewords. The third code is not instantaneous since the binary description of symbol ' 1' is the prefix of the binary description of symbols '2' and '3', and the description of symbol '2' is also the prefix of the description of symbol '3'. We call a code a prefix code if no codeword is a prefix of any other codewords. A prefix code is always an instantaneous code; since the end of a codeword is always immediately recognizable, it can separate the codewords without looking at future encoded symbols. An instantaneous code is also a prefix code, except for the case of multiple access source code where instantaneous code does not need to be prefix free (we will talk about this later). The fourth code in Table 1 gives an example of an instantaneous code that has the prefix free property.
The nesting of these definitions is: the set of instantaneous codes is a subset of the set of uniquely decodable codes, which is a subset of the set of non-singular codes.
Tree Representation
We can always construct a binary tree to represent a binary code. We draw a tree that starts from a single node (the root) and has a maximum of two branches at each node. The two branches correspond to '0' and ' 1 ' respectively. (Here, we adopt the convention that the left branch corresponds to '0' and the right branch corresponds to ' 1 ' .) The binary trees for the second to the fourth code in Table 1 are shown in trees 100, 101 and 102 of Figure 1 respectively.
The codeword of a symbol can be obtained by traversing from the root of the tree to the node representing that symbol. Each branch on the path contributes a bit ('0' from each left branch and ' 1 ' from each right branch) to the codeword. In a prefix code, the codewords always reside at the leaves of the tree. In a non-prefix code, some codewords will reside at the internal nodes of the tree.
For prefix codes, the decoding process is made easier with the help of the tree representation. The decoder starts from the root of the tree. Upon receiving an encoded bit, the decoder chooses the left branch if the bit is '0' or the right branch if the bit is ' 1 '. This process continues until the decoder reaches a tree node representing a codeword. If the code is a prefix code, the decoder can then immediately determine the corresponding symbol.
Block Code
In the example given in Table 1, each single symbol (T, '2', '3', '4') is assigned a codeword. We can also group the symbols into blocks of length n, treat each block as a super symbol in the extended alphabet, and assign each super symbol a codeword. This code is called a block code with block length n (or coding dimension n). Table 2 below gives an example of a block code with block length n=2 for the source alphabet given in Table 1. Table 2
Block of Symbols Probability Code
11 0.2025 00 12 0.1125 010
13 0.045 10010
14 0.09 1000
21 0.1125 111
22 0.0625 1101
23 0.025 11001
24 0.05 0111
31 0.045 10110
32 0.025 101110
33 0.01 110001
34 0.02 110000
41 0.09 1010
42 0.05 0110
43 0.02 101111
44 0.04 10011 Huffman Code
A Huffman code is the optimal (shortest average length) prefix code for a given distribution. It is widely used in many compression schemes. The Huffinan procedure is based on the following two observations for optimal prefix codes. In an optimal prefix code:
1. Symbols with higher probabilities have codewords no longer than symbols with lower probabilities.
2. The two longest codewords have the same length and differ only in the last bit; they correspond to the two least probable symbols.
Thus the two leaves corresponding to the two least probable symbols are offsprings of the same node.
The Huffman code design proceeds as follows. First, we sort the symbols in the alphabet , according to their probabilities. Next we connect the two least probable symbols in the alphabet to a single node. This new node (representing a new symbol) and all the other symbols except for the two least probable symbols in the original alphabet form a reduced alphabet; the probability of the new symbol is the sum of the probabilities of its offsprings (i.e. the two least probable symbols). Then we sort the nodes according to their probabilities in the reduced alphabet and apply the same rule to generate a parent node for the two least probable symbols in the reduced alphabet. This process continues until we get a single node (i.e. the root). The codeword of a symbol can be obtained by traversing from the root of the tree to the leaf representing that symbol. Each branch on the path contributes a bit ('0' from each left branch and ' 1 ' from each right branch) to the codeword.
The fourth code in Table 1 is a Huffman code for the example alphabet. The procedure of how we build it is shown in Figure 2A.
Entropy Code
The entropy of source X is defined as: H(X) — — ^T p(x)logp(x) . Given a probability
model, the entropy is the lowest rate at which the source can be losslessly compressed.
The rate R of the Huffman code for source X is bounded below by the entropy H(X) of source and bounded above by the entropy plus one bit, i.e., H(X)≤R< H(X)+1. Consider data
sequence X" = (X ,X2,X^, ,Xπ)where each element of the sequence is independently and
identically generated. If we code sequence X" using Huffman code, the resulting rate (average
length per symbol) satisfies: — - — - < R < — . Thus when the block length (or coding n n dimension) n is arbitrarily large, the achievable rate is arbitrarily close to the entropy H(X). We call this kind of code 'entropy code', i.e., code whose rate is arbitrarily close to the entropy when coding dimension is arbitrarily large.
Arithmetic Code Arithmetic code is another, increasingly popular, entropy code that is used widely in many compression schemes. For example, it is used in the compression standard JPEG-2001.
We can achieve efficient coding by using long blocks of source symbols. For example, for the alphabet given in Table 1, its Huffman code rate is 1.85 bits per symbol. Table 2 gives an example of a Huffman code for the corresponding extended alphabet with block length two; the resulting rate is 1.8375 bits per symbol showing performance improvement. However, Huffman coding is not a good choice for coding long blocks of symbols, since in order to assign codeword for a particular sequence with length n, it requires calculating the probabilities of all sequences with length n, and constructing the complete Huffman coding tree (equivalent of assigning codewords to all sequences with length n). Arithmetic coding is a better scheme for block coding; it assigns codeword to a particular sequence with length n without having to generate codewords for all sequences with length n. Thus it is a low complexity , high dimensional coding scheme.
In arithmetic coding, a unique identifier is generated for each source sequence. This
identifier is then assigned a unique binary code. In particular, data sequence X" is represented
by an interval of the [0,1) line. We describe X" by describing the mid-point of the corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals. This mid-point is
the identifier for X" . We find the interval for x" recursively, by first breaking [0,1) into
intervals corresponding to all possible values of x, , then breaking the interval for the
observed Xx into subintervals corresponding to all possible values of X.,τ2 , and so on. Given the interval A c [θ,l] for Xk for some 0 < k < n (the interval for X°is [0,1)), the subintervals for
\XkxM } are ordered subintervals of A with lengths proportional to p(xk+x) .
For the alphabet given in Table 1, Figure 2B shows how to determine the interval for sequence '132'. Once the interval [0.3352, 0.3465] is determined for '132', we can use binary code to describe the mid-point 0.34085 to sufficient accuracy as the binary representation for sequence '132'.
In arithmetic coding, the description length of data sequence x" is
+ 1 where px(x") is the probability of x" ; this ensures the interval
Figure imgf000013_0001
corresponding to different codewords are disjoint and the code is prefix free. Thus the average rate per symbol for arithmetic code is
R + ή . Rate R is then bounded as:
Figure imgf000013_0002
( XΛ H(X + 2
— - — - < R < , which shows R is arbitrarily close to the source entropy when coding n n dimension n is arbitrarily large.
Multiple Access Networks
A multiple access network is a system with several transmitters sending information to a single receiver. One example of a multiple access system is a sensor network, where a collection of separately located sensors sends correlated information to a central processing unit. Multiple access source codes (MASCs) yield efficient data representation for multiple access systems when cooperation among the transmitters is not possible. An MASC can also be used in data storage systems, for example, archive storage systems where information stored at different times is independently encoded but all information can be decoded together if this yields greater efficiency.
In the MASC configuration (also known as the Slepian- Wolf configuration) depicted in
Figure 3 A, two correlated information sequences { ,}", and }, are drawn i.i.d.
(independently and identically distributed) according to joint probability mass function (p.m.f.) p(x,y). The encoder for each source operates without knowledge of the other source. The decoder receives the encoded bit streams from both sources. The rate region for this configuration is plotted in Figure 3B. This region describes the rates achievable in this scenario for sufficiently
large coding dimension and decoding error probability Re (Λ) approaching zero as the coding
dimension grows. Making these ideas applicable in practical network communications scenarios requires MASC design algorithms for finite dimensions. We consider two coding scenarios: first,
we consider lossless (Pe (n) = 0) MASC design for applications where perfect data reconstruction
is required; second, we consider near-lossless ( Pβ (n) is small but non-zero) code design for use in
lossy MASCs.
The interest in near-lossless MASCs is inspired by the discontinuity in the achievable rate region associated with going from near-lossless to truly lossless coding. For example, if p(x,y)>0 for all (x,y) pairs in the product alphabet, then the optimal instantaneous lossless MASC achieves rates bounded below by H(X) and H(Y) in its descriptions of Xanά Y, giving a total rate bounded below by H(X)+H(Y). In contrast, the rate of a near-lossless MASC is bounded below by H(X ), which may be much smaller than H(X)+H(Y). This example demonstrates that the move from lossless coding to near-lossless coding can give very large rate benefits. While nonzero error probabilities are unacceptable for some applications, they are acceptable on their own for some applications and within lossy MASCs in general (assuming a suitably small error probability). In lossy MASCs, a small increase in the error probability increases the code's expected distortion without causing catastrophic failure.
MASC versus Traditional Compression
To compress the data used in a multiple access network using conventional methods, people do independent coding on the sources, i.e., the two sources X and 7 are independently encoded by the two senders and independently decoded at the receiver. This approach is convenient, since it allows for direct application of traditional compression techniques to a wide variety of multiple access system applications. However, this approach is inherently flawed because it disregards the correlation between the two sources.
MASC on the contrary, takes advantage of the correlation among the sources; it uses independent encoding and joint decoding for the sources. (Joint encoding is prohibited because of the isolated locations of the source encoders or some other reasons.)
For lossless coding, the rates achieved by the traditional approach (independent encoding and decoding) are bounded below by H(X) and H(Y) for the two sources respectively, i.e.
Rx ≥ H(X) , and Rx + Rγ ≥ H(X) + H(Y) . The rates achieved by MASC are bounded as
follows: Rx ≥ H(X \ Y) , Rγ ≥ H(Y | X) and Rx + Rr ≥ H(X, Y) . When and Y are correlated, H(X) > H(X I Y) , H(Y) > H(Y | X) and H(X) + H(Y) > H(X,Y) . Thus, MASCs can
generally achieve better performance than the traditional independent coding approach.
Prior Attempts
A number of prior art attempts have been made to provide optimal codes for multiple access networks. Examples including H. S. Witsenhausen. "The Zero-Error Side Information Problem And Chromatic Numbers." IEEE Transactions on Information Theory, 22:592-593, 1976; A. Kh. Al Jabri and S. Al-Issa. "Zero-Error Codes For Correlated Information Sources". In Proceedings of Cryptography, pages 17—22, Cirencester,UK, December 1997; S. S. Pradhan and K. Ramchandran. "Distributed Source Coding Using Syndromes (DISCUS) Design And Construction". In Proceedings of the Data Compression Conference, pages 158— 167, Snowbird, UT, March 1999. IEEE; and, Y. Yan and T. Berger. "On Instantaneous Codes For Zero-Error Coding Of Two Correlated Sources". In Proceedings of the IEEE International Symposium on Information Theory, page 344, Sorrento, Italy, June 2000. IEEE.
Witsenhausen, Al Jabri, and Yan treat the problem as a side information problem, where both encoder and decoder know X, and the goal is to describe Y using the smallest average rate possible while maintaining the unique decodability of Y given the known value of X. Neither Witsenhausen nor Al Jabri is optimal in this scenario, as shown in Yan. Yan and Berger find a necessary and sufficient condition for the existence of a lossless instantaneous code with a given set of codeword lengths for Y when the alphabet size of X is two. Unfortunately their approach fails to yield a necessary and sufficient condition for the existence of a lossless instantaneous ' code when the alphabet size for is greater than two. Prandhan and Ramchandran tackle the lossless MASC code design problem when source 7 is guaranteed to be at most a prescribed Hamming distance from source X. Methods for extending this approach to design good codes for more general τp.m.f.sp(x,y) are unknown.
SUMMARY OF THE INVENTION
Embodiments of the invention present implementations for multiple access source coding (MASC). The invention provides a solution for independently encoding individual sources and for decoding multiple source data points from the individually encoded streams in a single decoder. In a two source example, the invention provides a way to separately encode samples from data source x and date source y - using no collaboration between the encoders and requiring no knowledge of y by the encoder of x or vice versa - and a way to decode data pairs (x, y) using the individual encoded data streams for both x and y.
Embodiments of the present invention disclosed herein include algorithms for:
1. optimal lossless coding in multiple access networks (the extension of Huffman coding to MASCs);
2. low complexity, high dimension lossless coding in multiple access networks (the extension of arithmetic coding to MASCs);
3. optimal near-lossless coding in multiple access networks (the extension of the Huffman MASC algorithm for an arbitrary non-zero probability of error);
4. low complexity, high dimensional near-lossless coding in multiple access networks (the extension of the arithmetic MASC algorithm for an arbitrary nonzero probability of error). The algorithmic description includes methods for encoding, decoding, and code design for an arbitrary p.m.f. p(x,y) in each of the above four scenarios.
Other embodiments of the present invention are codes that give (a) identical descriptions and/or (b) descriptions that violate the prefix condition to some symbols. Nonetheless, the codes described herein guarantee unique decodability in lossless codes or near lossless codes with Pe < e (e fixed at code design in "near-lossless" codes). Unlike prior art which only discusses properties (a) and (b), the present invention gives codes that yield both types of descriptions. The present invention also gives definition of the class of algorithmns that can be used to generate the codes with properties (a) and (b).
One embodiment of the present invention provides a solution that partitions the source code into optimal partitions and then 'finds a matched code that is optimal for the given partition, in accordance to the aforementioned definition of the class of algorithmns. In one embodiment the source alphabet is examined to find combinable symbols and to create subsets of combinable symbols. These subsets are then partitioned into optimal groups and joined in a list. The successful groups from the list are then used to create complete and non-overlapping partitions of the alphabet. For each complete and non-overlapping partition, an optimal matched code is generated. The partition whose matched code provides the best rate is selected. In one embodiment, the matched code can be a Huffman code, an arithmetic code or any other existing form of lossless code.
Embodiments of the present invention can be used to provide lossless and near- lossless compression for a general compression solution for environments where multiple encoders encode information to be decoded by a single decoder or for environments where one or more encoders encode information to be decoded by a single decoder to which side information is available.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features, aspects and advantages of the present invention will become better understood with regard to the following description, appended claims and accompanying drawings where:
Figure 1 shows the binary trees for the second to the fourth code in Table 1.
Figure 2A illustrates an example Huffman code building process.
Figure 2B illustrates an example sequence determination process for Arithmetic coding.
Figure 3A shows an example MASC configuration.
Figure 3B shows the achievable rate region of multiple access source coding according to the work of Slepian-Wolf.
Figure 4 is a flow diagram of an embodiment of the present invention.
Figure 5 is a flow diagram of an embodiment of finding combinable symbols of the present invention.
Figure 6 is a flow diagram of an embodiment for building a list of groups.
Figure 7 is a flow diagram for constructing optimal partitions. Figure 8 is flow diagram of an embodiment for constmcting a partition tree and labeling of each node within the tree.
Figure 9 is a block diagram of a side-information joint decoder embodiment of the invention.
Figures 10A - 10D illustrate node labeling and coding using the present invention.
Figure 11 is a flow diagram illustrating Huffman code word generation using the present invention.
Figures 12A — 12C illustrate arithmetic coding using the present invention.
Figures 13 illustrates a flow chart for a general coding scheme for an alternate algorithm embodiment.
Figure 14 show a comparison of three partition tress generated from the various embodiments of the present invention.
Figure 15 is a graph of general lossless and near-lossless MASC results.
Figure 16 is diagram showing how two groups are combined according to one embodiment of the invention. Figure 17 is a flow diagram for generating matched code according to an embodiment of the present invention.
Figure 18 is a flow diagram for building matched codes that approximate the optimal length function according to another embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention relate to the implementation of lossless and near- lossless source coding for multiple access networks. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that embodiments of the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
The invention provides a general data compression scheme for encoding and decoding of data from multiple sources that have been encoded independently. The invention can also be implemented in a side-information environment where one of the data sources is known to the decoder. Although the invention is a general solution for multiple data sources, the invention is described by an example of a two data source network.
The present invention is described herein and by way of example to two data sources X and Y that provide data stream xι, %2, ^3> ■■■χn and data stream 2 ι, 2 2. Vz, ■■■Vn respectively to dedicated encoders. The streams are provided to a single decoder that can produce decoded data pairs (-- , Vn). Before describing embodiments of the invention, a summary of notations used in
the example of the MASC problem is provided. Notations in MASC
In describing the multiple access source coding (MASC) problem, we consider finite- alphabet memoryless data sources X and Y with joint probability mass function p(x, y) on alphabet Xx y. We use pχ(x) and pγ(y) to denote the marginals of p(χ,y) with respect to X and Y. (The subscripts are dropped when they are obvious from the argument, giving pχ(x) = p(x) and Pi(y) = p(y)). A lossless instantaneous MASC for joint source (X,Y) consists of two encoders x : X→ {0, 1}* and ηy. y→ {0, l}* and a decoder l ■ {0, l}* x {0, 1}* -> Xx y. Here a first
dedicated encoder ηx is encoding data source X which has alphabet X into strings of 0's and l's
(bits). A second dedicated encoder ηy is doing the same for data source Y which has alphabet y .
Then a single decoder _1 recovers X and y from the encoded data streams. fχ(x) and η (y)
denote the binary descriptions of % and y and the probability of decoding error is Pe = P n~1(- χ(X), fY))Φ X, Y)) . Pe is the probability of occurrence for the discrepancy between
the decoded data and the original data. Here, we focus on instantaneous codes, where for any
input sequences xl ,x2,xi and y ,yτ-,ylt with p(x1,yl) > 0 the instantaneous decoder
reconstructs (x ,yl) by reading only the first ^C*,) bits from γx (x, )γx (x2x (x3 ) and
the first bits from Yγ(ylY(y2Y(y3) (without prior knowledge of these lengths).
The present invention provides coding schemes for the extension of Huffman coding to
MASCs (for optimal lossless coding and for near-lossless coding), the extension of arithmetic coding to MASCs (for low complexity, high dimension lossless coding and for near-lossless coding). The embodiments of the invention are described with respect to two environments, one, lossless side-information coding, where one of the data sources is known to the decoder, and another environment, the general case, where neither of the sources must be independently decodable.
To further describe this embodiment of the present invention, we begin by developing terminology for describing, for a particular code, which symbols from Y have binary descriptions that are identical and which have binary descriptions that are prefixes of each other. This embodiment of the present invention defines a "group" for which codes can be designed to describe its nested structure instead of designing codes for symbols. The invention also defines partitions, which are optimal for a particular coding scheme (Huffinan coding or arithmetic coding). Finally, the invention describes matched codes which satisfy particular properties for partitions and coding schemes. The goal in code design in the present application is to find the code that minimizes λRx + (1 - λ)Rγ for an arbitrary value of λ e [0, 1] . The result is codes with
intermediate values of Rx and Rγ . In some cases the goal is to design code that minimizes
λRx + (1 - λ)Rγ with probability of error no greater than Pe .
Figure 4 is a flow diagram that describes one embodiment of the invention. At step 401 the alphabet of symbols generated by the sources is obtained. These symbols are organized into combinable subsets of symbols at step 402. These subsets are such that there is no ambiguity between subsets as will be explained below. At step 403 the subsets are formed into optimal groups. These optimal groups are listed at step 404. The groups are used to find and define optimal partitions at step 405 that are complete and non-overlapping trees of symbols. The successful partitions are used to generate matched codes at step 406, using either arithmetic or Huffman codes. One skilled in the art will recognize that lossless codes other than Huffman and arithmetic can be utilized as well. At step 407, the partition whose matched code has the best rate is selected and used for the MASC solution.
Lossless Side-Information Coding
One embodiment of the present invention presents an implementation for lossless side- information source coding. This problem is a special case of the general lossless MASC problem. (In a general MASC, the decoder has to decode both sources (i.e. X and Y) without knowing either one). By contrast, in the side-information application, one of data sources is known to the decoder. The goal is to find an optimal way to encode one of the data sources given the other source is known.
The invention will be described first in connection with a lossless, side-information
MASC solution. Later we describe other embodiments of the invention for a lossless general MASC solution, and embodiments for near-lossless side-information and general MASC solutions.
Figure 9 shows an example side-information multiple access network. Side-information X is perfectly known to the decoder 902 (or losslessly described using an independent code on X), and the aim is to describe Y efficiently using an encoder 901 that does not know X. This scenario describes MASCs where γx encodes X using a traditional code for p.m.f. {p(x)}xeχ
and encoder γY encodes Y assuming that the decoder decodes X before decoding Y. In this
case, if the decoder 902 can correctly reconstruct yx by reading only the first
Figure imgf000027_0001
bits of the
description of the Y data stream γY(ylY(yz)ϊY(yl) from encoder 901 (without prior knowledge of these lengths), then the code γγ is a lossless instantaneous code for Y given Xor a
lossless instantaneous side-information code. Note that the side-information as shown in the figure comes from an external source to decoder 902. This external source can come from a wide variety of places. For example it is possible that the decoder aheady has embedded side information within it. Another example is that the external source is a data stream from another encoder similar to encoder 901.
A necessary and sufficient condition for γY to be a lossless instantaneous code for 7
given side information X is: for each x € X, y. 1 S -Λ. implies that γ(y ) and fγ(y') satisfy the prefix condition (that is, neither binary codeword is a prefix of the other codeword), where Ax = {y e y : p(x, y) > 0}.
It is important to note that instantaneous coding in a side-information MASC requires only that {ηγ(y) : y e Λ_} be prefix-free for each -c G A and not that {ηy(y) ■ y £ y} be prefix-free, as would be required for instantaneous coding if no side-information were available to the decoder. This is because once the decoder knows X, it eliminates all y'^Ax (since y'^Λx
implies p(X, y') = 0). Since all codewords for y £ E satisfy the prefix condition, the decoder can use its knowledge of to instantaneously decode 7.
Thus the optimal code may violate the prefix condition either by giving identical descriptions to two symbols (having two y symbols be encoded by the same codeword: j,(y) = *(j ) for some
Figure imgf000028_0001
or by giving one symbol a description that is a proper prefix of the
description of some other symbols. We write ηyiy) --. IYXX if the description of y is a prefix of the description of y1 where yXy1 and ^γ y) -< ηγ(y') if fγ(y ) is a proper prefix of
jγ(y ') meaning we disallow the case of 7 γ( y ) = ^γ(y ') .
Invention Operation
We will illustrate the operation of the present invention with the data set of Table 3. Table 1 gives a sample joint probability distribution for sources X and Y, with alphabets X=y= {α0,α ...,αr}.
Table 3
p(χ,y)
x y 0 αλ α2 α3 α4 α5 α6 α7
αQ 0.04 0 0.15 0 0 0 0 0 α. 0 0.04 0 0.05 0.06 0 0 0 α2 0.04 0 0.05 0 0 0 0.01 0
_33 0.02 0 0 0.06 0 0.01 0 0 4 0 0.05 0 0 0.05 0.02 0 0 α5 0 0.1 0 0 0 0.03 0.06 0 α6 0 0 0 0 0 0 0.02 0.05 α7 0 0 0 0 0 0 0.01 0.08 Combinable Symbols
At step 402 of Figure 4 we find combinable symbols and create subsets of these combinable symbols. Figure 5 is a flow diagram that describes the operation of finding combinable symbols and creating subsets of step 402. This example is directed to finding the combinable symbols of Y data.
Symbols 2/1 , 2/2 e y can be combined under p(x, y) if p(x, yι)p(x, 2/2) = 0 for each
x € X. At step 501 of Figure 5, a symbol y is obtained and at step 502 we find the set Cy — {z G y : z can be combined with y under p(χ, y)}. Symbols in set Cy can be combined with
symbol v but do not need to be combinable with each other. For example, the set Cy for αQ is
{ --[ , α4 , αη } (note that α, and α4 need not be combinable with each other).
In checking combinability, the first symbol αQ is examined and compared to symbols
α - αη . αQ is combinable with αx because p(x, α0) p(x, oχ) = 0 V-c <= . However, αQ is
not combinable with α2 because p(x, α0) p(x, α2) > 0 for -c = αo, x — α2. At step 503 it is
determined if each y symbol has been checked and a set Cy has been generated. If not, the system returns to step 501 and repeats for the next y symbol. If all y symbols have been checked at step 503, all of the sets Cy have been generated. Using the example of Table 3, the generated
sets Cy for each symbol are shown below in Table 4. Table 4
an ax, a4, an
Figure imgf000031_0001
a2 a , -23, a4, a5, a7 a a2, a6, CL,
Figure imgf000031_0002
a5 α2, Oη aβ < . fl 4
< «o, < > α2> α3. α4, α5
Continuing with Figure 5, at step 504 we find the nonempty subsets for each set C.y. For
example, the non empty subsets for set Cy of symbol α0 are { αx }, { α4 }, { α7 }, { αx , α4 }, { , ,
αη }, [α4, αη }, and { αx , α4, --7 } . At step 505 it is determined if each set Cy has been checked.
If not, the system checks the next set y at step 504. If all sets Cy have been checked, the process
ends at step 506.
Groups
We call symbols y,/ y "combinable" if there exists a lossless instantaneous side-information code in which 7 γ(y ) --. fγ{y'). If we wish to design a code with
lγ{y) = lγ(y') , then we join those symbols together in a ' ' 1 -level group." If we wish to give one 1-level group a binary description that is a proper prefix of the binary description of other 1-level groups, then we build a "2-level group." These ideas generalize to iW-level groups with >2. Figure 6 is a flow diagram of the group generation 403 and list making steps 404 of Figure 4. At step 601 the nonempty subsets for a set Cy generated by step 402 of Figure 4 are
obtained. At step 602 the optimal partition is found for each nonempty subset. At step 603 a root is added to the optimal partition to create an optimal group. For example, for an optimal
partition of a subset of the set y of 0 , a0 is added as the root of this optimal partition. This
optimal group is added to a list Cy at step 604. At step 605 it is determined if all sets have been
checked. If not, the system returns to step 601 and gets the nonempty subsets of the next set. If so, the process ends at step 606. After the operation of the steps of Figure 6, we have a list, Cy
that contains optimal groups.
The mathematical and algorithmical representations of the flow diagrams of Figures 4, 5, and 6 are presented here. Symbols 2/ι, 2/2 € -V can be combined imder p(x, y) if
p{x-, yι) { , y,i) — ° for each x £ X The collection Q= (2/1, ...,ym) is called a 1-level group for p(χ, y) if each pair of distinct members 2 -, 2/? S Q can be combined under p(χ, y) . For any y S -V
and any p(x,y), (y) is a special case of a 1-level group. The tree representation 1(0) for 1-level group Q is a single node representing all members of Q.
A 2-level group for p(x, y), denoted by Q = (f : C(1Z)) comprises a root T and its children C(1Z), where TJ is a 1-level group, C(V is a set of 1-level groups, and for each Q ' G 0 (11 ) , each pair yi & 1Z and y 2 6 Q ' can be combined under p(x, y) . Here members of all
Q ' G C (1Z ) are called members of C(1Z), and members of T and C(H) are called members of Q. In the tree representation 1(0) for Q , 1^ is the root of Q) and the parent of all subtrees T(G') foτ G' C(n). This ideas generalize to M-level groups. For each subsequent M> 2, an M-level group for p(χ, y) is a pair _? = (7 : C(72)) such that for each 9 € C(72) , each pair 1 G 72 and i ^ Q can be
combined under ρ(x,y). Here 72 is a 1 -level group and C(72) is a set of groups of - 1 or fewer
levels, at least one of which is an (M— l)-level group. The members of 72 and C(72) together
comprise the members of Q ~ (72. : C(Wj). Again, 7(72) is the root of 1(G) and the parent of all
subtrees T(9) for Q C(72). For any M> 1, an W-level group is also called a multi-level group.
We use the probability mass function (p.m.f.) in Table 1, with X=y= {αo,αι, .. ., --6, 07},
to illustrate these concepts. For this p.m.f., (00, 04, 0 ) is one example of a 1-level group since
p(x, ao)p(x, α4) = 0, p(x, <X)p(x, a7) = 0 and p(x, aA)p(x, 07) = 0 for all x 6 λ. (This is seen in Table 2
as the entries for Ct0 ). The pair (α4, 07), a subset of (αo, o , 07), is a distinct 1-level group for
p(x, y) . The tree representation for any 1 -level group is a single node.
An example of a 2-level group for p(χ,y) is _72 = ((o4) : {(OQ), (--2, 07), (<X)}). n this case the
root node 72. = (α4) and C(72) = {(αo), (02, 07), (og)}. The members of C(72) are {OQ, 02, αβ, 07}; the
members of Qi are { Q, 02, α4, αβ, 0 } . Here -72 is a 2-level group since symbol α4 can be combined
with each of αo, 02, αβ, o7, and (αo), (02, 07), (oβ) are 1-level groups under p.m.f. p(x,y). The tree
representation 1(0^ is a 2-level tree. The tree root has three children, each of which is a single
node.
An example of a 3-level group for p(x,y) is 3 = ((X) ■ {(αo), i), ((02) : {(04), (05)})})- In
1(9z), the root pi) of the three-level group has three children: the first two children are nodes T(Go) and T(Gχ) ; the third child is a 2-level tree with root node 7~2) and children T\ ) and
T(a5) . The tree representation ΑGz) is a 3-level tree.
Optimal Groups
The partition design procedure for groups is recursive, solving for optimal partitions on sub-alphabets in the solution of the optimal partition on V. For any alphabet V' C y , the procedure begins by making a list C yl of all (single- or multi-level) groups that can appear in an
optimal partition VCy1) of 3-" for p(x, y) . The list is initialized as Cyl = { (y) : y € V'} .
For each symbol y € V ' , we wish to add to the list all groups that have y as one member of the root, and some subset of IV' as members. To do that, we find the set Cy = { z € V' : z can
be combined with y under p(x , y) } . For each non-empty subset 5cς sucrι that £ y> does not
yet contain a group with elements <5U {y}, we find the optimal partition V(S) of S for p(x, y) . We construct a new multi-level group G with elements S"U {y} by adding y to the empty root of T(V(S)) ifP(S) contains more than one group or to the root of the single group in T S) otherwise. Notice t aty can be the prefix of any symbol in S . Since y can be combined with all members of «5U { /}5 y must reside at the root of the optimal partition of SU {y}; thus Q is optimal not only among all groups in {G' : members of Q' are <5U {y} and v is at the root of Q } but among all groups in {G1 : members of G' are <SU {y}}. Group Q is added to the £ y, and the
process continues.
After this is accomplished, the list of optimal groups (step 404 of Figure 4) has been accomplished. Optimal Partitions Design
After the list of optimal groups has been created, it is used to create optimal (complete and non-overlapping) partitions. (A more thorough partition definition will be introduced in a later section titled "Optimal Partition: Definition and Properties." ) Complete and non- overlapping means that all symbols are included but none are included more than once.
Referring to Figure 7, the steps for accomplishing this are shown. At step 700 we initialize z equal to 1. At step 701 we initialize an empty partition V ,j = i + 1. At step 702 we add the "/ th" group from Cy, to VA At step 703 we check to see if they'th group overlaps or is
combinable with existing groups in V . If so, we increment./ and return to step 703. If not, the y'th group is added to V at step 705. At step 706 we check to see if V' is complete. If not, increment _/' at step 704 and return to step 703. If V' is complete then see if/ is the last group in y, at step 707. If so, make a list of successful partitions at step 708. If not, then increment i
and return to step 701.
The operations of Figure 7 are performed mathematically as follows. A partition VQ?) on y for p.m.f. p(χ,y) is a complete and non-overlapping set of groups. That is,
Hy) = {Gι,G2, . .., G satisfies UT . θ, = ' and Q3 f]Gk = Φ for any j≠k, where each ft G Vςy is a
group for p(x, y), and 3 U k and _ , n Gk refer to the union and intersection respectively of the
members of d and Gk- The tree representation of a partition is called a partition tree. The
partition tree TζPQfy for partition VQ = {G\, -72, ..., Gm} is built as follows: first, construct the
tree representation for each Gf, then, link the root of all 1(0-), - S {l, . .., m.} to a single node, which is defined as the root r of
Figure imgf000036_0001
A partition tree is not necessarily a regular k-ary tree; the number of children at each node depends on the specific multi-level group.
After constructing the above list of groups, we recursively build the optimal partition of y foxp(x, y). If any group 0 € £ y, contains all of the elements of _V, then *P(y') = {0 } is
the optimal partition on _V . Otherwise, the algorithm systematically builds a partition, adding one group at a time from -C y> to set V(y') until V(y') is a complete partition. For 0 € y,
to be added to V(y') , it must satisfy: (1) Q H G' = ; and (2) Q , G' cannot be combined (see Theorem 4 for arithmetic or Theorem 5 for Huffman coding) for all 0' 6 V(y') .
Figure 10A gives an example of a partition tree from the example of Table 1. In this case the partition ζ ) = |(α3, og), Gz). This indicates that the root node has two children, one is a 1-
level group 1(α3, OQ) and the other is a 3-level group consisting of root node 7(α7), with children
7(αo), 1{αι) and %fX), 1(0-2) is the root for its children 1( 4) and 1(α5)
As a prelude to generating matched code for optimal partitions, the branches of a partition are labeled. We label the branches of a partition tree as follows. For any 1-level group g at depth d in TφQT)) , let n describe the <--step path from root r to node 1(0) in T(V(y) ) . We refer to Q by describing this path. Thus 2(n) = 1(0) . For notational simplicity, we sometimes substitute n for 1{ ) when it is clear from the context that we are talking about the node rather than the 1-level group at that node (e.g. n e
Figure imgf000036_0002
) rather than 2[n) e Kf(y)). To make the path descriptions unique, we fix an order on the descendants of each node and number them from left to right. Thus n's children are labeled as nl, n2, .. ., r-ϋf(n), where n/c is a vector created by concatenating fc to n and K( ) is the number of children descending from n. The labeled " partition tree for Figure 10A appears in Figure 10B.
The node probability q(n) of a 1-level group n with n € T QA)) is the sum of the probabilities of that group's members. The subtree probability Q( ) of the 1-level group at node n G T( )) is the sum of probabilities of n's members and descendants. In Figure 10B, g(23) = V ) and Q(23) = VM + VY(< ) + Vy(a&).
Referring to Figure 10B, the root node is labeled "r" and the first level below, comprising a pair of children nodes, is numbered "1" and "2" from left to right as per the convention described above. For the children of the root at number "2", the concatenation convention and left to right convention results in the three children nodes being labeled "21", "22", and "23" respectively. Accordingly, the children at root "23" are labeled "231" and "232".
Matched Code Generation
After creating partitions, the present invention determines the optimal partitions by generating matched code for each partition. The partition whose matched code has the best rate (of compression) is the partition to use for the MASC solution. These steps are described in Figure 8.
Referring to Figure 8, at step 801 a partition tree is constructed for each partition. (Note that this step is described above). At step 802 the order of descendants is fixed and numbered from left to right. At step 803, the node at each level is labeled with a concatenation vector. Thus n's children are labeled as nl, n2, ..., r-ET(n), where τik is a vector created by concatenating fc to n and K(ri) is the number of children descending from n. The labeled partition tree for Figure 10A appears in Figure 10B. At step 804 a matched code is generated for the partition. This matched code can be generated, for example, by Huffman coding or Arithmetic coding.
A matched code for a partition is defined as follows. A matched code 7y for partition
V(X) is a binary code such that for any node n € T(P(y)) and symbols 21, /2 G n and 23 G n -,
k E {1, ...,K(ή)}: (1) 7y{2/i) = Tϊfø). (2) 71(2/1) ~< 7-<23); (3) {ηγ(nk) : /- € {-., . . ., K( )}} is prefix-free. We here focus on codes with a binary channel alphabet {0,1 } . The extension to codes with other finite channel alphabets is straight forward and the present invention is not limited to a binary channel alphabet. (We use 7r(n) interchangeably with ηγ(y) for any
y G n.) If symbol y € y belongs to 1-level group Q, then ηγ(y) describes the path in
T(V( y)) from r to T( 0 ) ; the path description is a concatenated list of step descriptions, where the step from n to rk, k € {l, ..., K(τi)} is described using a prefix-code on {l, ..., K( )}. An example of a matched code for the partition of Figure 10A appears in Figure IOC, where the codeword for each node is indicated in parentheses. Figure 17 shows how a matched code is generated according to one embodiment of the invention. In step 1701, the process begins at the root of the tree. Then at 1702, the prefix code for each nodes' offsprings are designed. Finally at 1703 the ancestors' codewords are concatenated to form the resulting matched code.
In the above framework, a partition specifies the prefix and equivalence relationships in the binary descriptions of y € y. A matched code is any code with those properties. The above definitions enforce the condition that for any matched code, y\,y2 G for some x e X implies
that y( 2/ 1 ) Z- 7 γ( y 2 ) ; that is, y violates the prefix property only when knowing X eliminates all possible ambiguity.
Theorem 1 establishes the equivalence of matched codes and lossless side-information codes.
Theorem 1 Code 7y is a lossless instantaneous side-information code for p(χ,y) if and only if 7?
is a matched code for some partition V(y) for p(χ, y) . Proof: First we prove that a matched code for partition V(y) is a lossless instantaneous side-information code for Y. This proof follows from the definition of a matched code. In a matched code for partition V(y) , only symbols that can be combined can be assigned codewords that violate the prefix condition, thus only symbols that can be combined are indistinguishable using the matched code description. Since symbols 21 and 2/2 can be combined only if
p(x, yι)p(x, 2/2) = 0 for all x € X , then for each x € X, the matched code's codewords for Ac = {y e y : p(x, y) > 0} is prefix free. Thus the decoder can decode the value of X and then
losslessly decode the value of Y using the instantaneous code on A^.
Next we prove that a lossless instantaneous side-information code 71 must be a matched
code for some partition V y) on y for p(χ,y). That is given 77, it is always possible to find a
partition
Figure imgf000040_0001
y} describes a matched code for VQ?).
Begin by building a binary tree % corresponding to M as follows. Initialize % as a
fixed-depth binary tree with depth max
Figure imgf000040_0002
For each y € _V, label the tree node reached
by following path jj) downward from the root of the tree (here '0' and T correspond to left and
right branches respectively in the binary tree). Call a node in empty if it does not represent
any codeword in N and it is not the root of %; all other nodes are non-empty. When it is clear
from the context, the description of a codeword is used interchangeably with the description of the non-empty node representing it.
Build partition tree T from binary tree %. by removing all empty nodes except for the
root as follows. First, prune from the tree all empty nodes that have no non-empty descendants. Then, working from the leaves to the root, remove all empty nodes except for the root by attaching the children of each such node directly to the parent of that node. The root is left unchanged. In T:
(1) All symbols that are represented by the same codeword in N reside at the same node of T. Since Y is a lossless instantaneous side-information code, any ji, 22 at the same node in T can
be combined under p(x,y). Hence each non-root node in T represents a 1-level group.
(2) The binary description of any internal node n G 1 is the prefix of the descriptions of its descendants. Thus for 7r to be prefix free on -4- for each x G X, it must be possible to combine n with any of its descendants to ensure lossless decoding. Thus n and its descendants form a multi-level group, whose root 72 is the 1-level group represented by n. In this case, C(72) is the set of (possibly multi-level) groups descending from n in T.
(3) The set of codewords descending from the same node satisfies the prefix condition. Thus 7 is a partition tree for some partition (y) for p(χ, y) and N is a matched code for V{ ) . a
Given an arbitrary partition V y) for p(x,y), we wish to design the optimal matched code for V(y). In traditional lossless coding, the optimal description lengths are l*(y) = -logp(y) for
all 2/ € y if those lengths are all integers. Theorem 2 gives the corresponding result for lossless side-information codes on a fixed partition VQ .
Theorem 2 Given partition VQ for p(χ,y), the optimal matched code for VQA) has description lengths ^(r) = 0 and Q(nfc)
-* (nfc) = -* (n)-log2 TTTf , (n )
for all n G 7(7^ )) and fc G {l, ..., K(n)} if those lengths are all integers. Here ? (n) = l implies
FftXy) — ' for all symbols y G >' that are in 1-level group n.
Proof: For each internal node nG KJXyj), the codewords {7y(nfc) : fc G {l, ..., 7f(n)}} share
a common prefix and satisfy the prefix condition. Deleting the common prefix from each codeword in {-γ(nk) : k — 1, ..., K(n)} yields a collection of codeword suffixes that also
satisfy the prefix condition. Thus if ^3 (n) is the description length for n, then the collection of
lengths
Figure imgf000042_0001
: fc = 1, ..., K(ή)} satisfies the Kraft Inequality:
∑ *W 2-<Wn*)- Wn > ≤1. (Here Z^(r) =0 by definition.) We wish to minimize
the expected length
TO> ^)^W">.
of the matched code over all <^(n) that satisfy
∑ 2""(^(nfc)~^(n)) = Vn G l(V(y)) ={n G 1(V(y)) : K( ) > 0}. k=l
(We here neglect the integer constraint on code lengths.) If u(n) =2 PO. W ; then Σ rø) =ne^)9(n)l°S^
and ι-(n) must satisfy
u(nk) -= u(n), Vn G I Ptø) .
Figure imgf000043_0001
Since l(Py)) is a convex fimction of u( ), the constrained minimization can be posed as an unconstrained minimization using the Lagrangian
J=
Figure imgf000043_0002
Differentiating with respect to «(n) and setting the derivative to 0, we get
-σ(nfc)/-i(nfc) loge + λ(nfc) - λ(n) = 0, if nk is an internal node;
Figure imgf000043_0003
-g(nk)/u(nk)loge — λ(n) = 0. if nk is a leaf node. (1)
First consider all nfc's at the lowest level of the tree that have the same parent n. We have
Figure imgf000044_0001
Thus we get
-i(n) υ( ) τ-(nfc) = q(nk) jxw = φά) MQtø Vk = l, ...,K(n)
giving
Figure imgf000044_0002
Other nodes at the lowest level are processed in the same way.
Now fix some ni two levels up from the tree bottom, and consider any node ^fc.
Case 1: If nifc has children that are at the lowest level of the tree, then by (1),
^loge + λ(ι.1Λ) - λ(n) = 0. (4)
Substituting (3) into (4) gives
Figure imgf000045_0001
0, (5)
that is
Qi^k) . , M loge A(m) (6)
Case 2: If n = nxfc has no children, then by (1),
-[(lljfc) ge=^θ5 =(I >
which is the same as (6).
Considering all such nifc, fc = 1, . . ., E.(nx) we have
Q(nιfc)/ω(nιfc) loge = - λ(), fc = 1, . . ., K^)
Figure imgf000045_0002
which is the same problem as (2) and is solved in the same manner.
Continuing in this way (from the bottom to the top of K XX))), we finally obtain U «H l, . . , (n) Vn G X( (y)). (8)
Figure imgf000046_0001
Setting
Figure imgf000046_0002
completes the proof. □
Thus, Theorem 2 provides a method of calculating the optimal length function. We now present three strategies for building matched codes that approximate the optimal length fimction of Theorem 2. Figure 18 shows the process of building matched codes. At step 1801 the process begins at root. Then at 1802 one of three strategies is used (Shannon / Huffman / Arithmetic code) for code design for each node's immediate offsprings based on their normalized subtree probabilities. At 1803 the ancestors' codewords for each node are concatenated.
For any node n with K( ) > 0, the first matched code Tγi!y) describes the step from n to
nfc using a Shannon code with alphabet {l, ..., K(n)} and p.m.f. (Q(n^)/ ∑Jϊ <5(nj)} i ;
the resulting description lengths are ^ (T) — 0 and
) = C (n) + ^ « « l Codes ^ ^ 7^, replace the
Shannon codes of T I-. with Huffman and arithmetic codes, respectively, matched to the same
p.m.f.s.
Matched Huffman Coding
As an example, build the matched Huffman code for the partition in Figure 10A, working from the top to the bottom of the partition tree T . A flow diagram illustrating the steps of this process is illustrated in Figure 11. At step 1101 we begin at the root node and we design a Huffman code on the set of nodes descending from 1 's root, according to 'their subtree probabilities, i.e. nodes {(03,03), (07)} with p.m.f.
PaXX)
Figure imgf000047_0001
+p ( o) +pγ(α1) +p ((X) +p (αi) +py(α5)} = {.21, .79}; a Huffman code for
these two branches is {0, l}. Referring to Figure IOC we see the calculated codes for the two nodes below the root node (given in parentheses) is 0 and 1.
At step 1102, for each subsequent tree node n with K(ιi) > 0, consider \n^fk=1 as a
new set, and do Huffman code design on this set, with p.m.f. iQi )/ 2_«_ 1
Figure imgf000047_0002
We
first design a Huffman code for group (07) 's children {(αo), ), (02)} according to p.m.f.
{pM/Q,pfa)/Q,Pi(<*2) +fttø) + α /Q\ = /Q, .19/Q, .37/Q} where
Q
Figure imgf000047_0003
-66; a Huffman code for this set of branches is {00,01, 1}.
Then we design Huffman code {0, 1} for groups {(α4), (05)} with p.m.f.
{pγ(αι)/(pϊ(α4) +P (θ5)),i 5)/(py(o4) +p?(α5))} = {.11/.17, .06/.17}. The full codeword for any
node n is the concatenation of the codewords of all nodes traversed in moving from root 1(τ) to node n in T . The codewords for this example are shown in Figure IOC.
Any "matched Huffman code'" TyX* is shown to be optimal by Theorem 3.
Theorem 3 Given a partition VQ?), a matched Huffman code for VQ achieves the optimal expected rate over all matched codes for VQA) . Proof: Let 7 be the partition tree of TQ . The codelength of a node n G 1 is denoted by -(n).
The average length I for VQ is
ϊ = Igτ9(n)-(n) = (Q(fc)Z(fc) +Δ7(fc)
where for each fc G {l, ..., K(r)}, Al(k) = ∑kn Tq(kή) (- (fcn) - - (fc)) .
Note that l^k-ι Q k)l k) and {Δ-(fc)} can be minimized independently. Thus
min - = min min ΔJ(fc) .
Figure imgf000048_0001
In matched Huffman coding, working from the top to the bottom of the partition tree, we
first minimize 2-^ Q(fc)'(fc) over all integer lengths -(fc) by employing Huffman codes on Q(k).
We then minimize each Δ-(fc) over all integer length codes by similarly breaking each down layer by layer and minimizing the expected length at each layer. D
Matched Arithmetic Coding
In traditional arithmetic coding (with no side-information), the description length of data sequence y71 is l(y ) = [- log pγ(yn)] + 1 , where pγ(yn) is the probability of if. In designing
the matched arithmetic code of 2 for a given partition " Q , we use the decoder's knowledge of xn to decrease the description length of yn. The following example, illustrated in Figures 12B - 12C, demonstrates the techniques of matched arithmetic coding for the partition given in Figure 10A.
In traditional arithmetic coding as shown in Figure 12 A, data sequence Y1 is represented by an interval of the [0, 1) line. We describe Y n by describing the mid-point of the corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals. We find the interval for y n recursively, by first breaking [0, 1) into intervals corresponding to all possible values of 2/ι, then breaking the interval for the observed
Figure imgf000049_0001
into subintervals
corresponding to all possible values of Yγy2 , and so on. Given the interval A C [0, 1] for Y k for
some 0 < fc < n (the interval for Y° is [0, 1)), the subintervals for { Ykyk+ι } are ordered
subintervals of A with lengths proportional to p (2//.+1) .
In matched arithmetic coding for partition VQ as shown in Figure 12B, we again describe Y n by describing the mid-point of a recursively constructed subinterval of [0, 1). In this case, however, if
Figure imgf000049_0002
we break [0, 1) into intervals corresponding to nodes in B = {n : (K(n) = 0 Λ d( ) < do) V (K(ή) > 0 Λ d(n) = do)} . The
interval for each n G B with parent n0 has length proportional to
pW (n) = pW (A) 0 Q((rnin0))--aq(n0)
Figure imgf000049_0003
(here p^ n) is defined to equal 1 for the unique node r at depth 0). Refining the interval for
sequence Yl ~ 1 to find the subinterval for Y% involves finding the 1-level group n £ Vty) such
that ^ E n and using (n) to calculate the appropriate p^ ' values and break the current interval accordingly. We finally describe Yn by describing the center of its corresponding subinterval to an accuracy sufficient to distinguish it from its neighboring subintervals. To ensure unique decodability,
where p^A^ (yn) is the length of the subinterval corresponding to string y n . Given a fixed partition V y) , for each V € JJ denote the node where symbol y G y resides by n(y), and let H)(y) represent the parent of node y. Then
lW(y ») =
+ 1
Figure imgf000050_0001
< ∑ «*(».) + 2 i=l
where -*0 is the optimal length function specified in Theorem 2. Thus the description length
l(X (yn) in coding data sequence yn using a 1-dimensional "matched arithmetic code" ly-pry-
satisfies (l /π)-(A)(yn) < (1/n) ∑™=. -*(j ) + 2/n ; giving a normalized description length arbitrarily close to the optimum for n sufficiently large. We deal with floating point precision issues using the same techniques applied to traditional arithmetic codes.
As an example, again consider the p.m.f. of Table 3 and the partition of Figure 10 A. If Yi G {βg, αe, a }, [0, 1) is broken into subintervals [0, .21) for group (03, ag) and [.21, 1) for group {<X), since
Figure imgf000051_0001
P(A)((ar)) = P(A)(r)^^ = .79.
If lϊ G {αo, αi, αg}, [0, 1) is broken into subintervals [0, .21) for group (03, <X), [-21, .33) for group (αo), [.33, .56) for group (α-i), and [.56, 1) for group (03) since
p(AM) = P<A(A) Qil X -7^X - -12
->"»«..)) = pw(A))ga iXr -7S^X = .23
->">«.-)) = »'Λ)((°τ)) qH^ - -79^S» = -44-
Finally, if ^ϊ G {04,05}, [0, 1) is broken into subintervals [0, .21) for group (03,06), [.21, .33) for
group (αo), [-33, .56) for group (a ), [.56, .84) for group (04), and [-84, 1) for group (05) since P(Λ)(W)
Figure imgf000052_0001
.44(.ll/(.37 - .2)) = .2847
P{A)(M) = P(A)(M)q((<g) (:;( ) (aa)) = -44(.06/(.37 - .2)) = .1553.
Figure 12B shows these intervals.
Figure 12C shows the recursive interval refinement procedure for Y5 — (0703040x02) - Symbol gives interval [0, .21) of length .19 (indicated by the bold line). Symbol Y2 = o3
refines the above interval to the interval [.21, .3759) of length .21 .79 = .1659. Symbol 3 = 04
refines that interval to the interval [.3024, .3500) of length .28 .1659 = .0472. This procedure continues until finally we find the interval [0.3241, 0.3289).
Notice that the intervals of some symbols overlap in the matched arithmetic code. For example, the intervals associated with symbols 04 and 05 subdivide the interval associated with
symbol ΩQ in the previous example. These overlapping intervals correspond to the situation
where one symbol's description is the prefix of another symbol's description in matched Huffinan coding. Again, for any legitimate partition " Qty, the decoder can uniquely distinguish between symbols with overlapping intervals to correctly decode Yn using its side information about Xn .
Optimal Partitions: Definitions and Properties
The above describes optimal Shannon, Huffman, and arithmetic codes for matched ' lossless side-information coding with a given partition ^3-). The partition yielding the best performance remains to be found. Here we describe finding optimal partitions for Huffman and arithmetic coding.
Given a partition IX) , let ^φ,y^ and ^φ,y\ be the Huffman and optimal description
lengths respectively for IX . We say that 7^ 3) is optimal for matched Huffman side-information
coding on p(x,y) if ElV (y)(Y) ≤ Elv [ ,^(Y) for any other partition V '(y ) for p(x, y) (and
therefore, by Theorems 1 and 3, El ( ") ≤ El(Y) where l is the description length for any
other instantaneous lossless side-information code on p(x,y). We say that ^( is optimal for matched arithmetic side-information coding on p(x, y) if El ^ (Y) ≤ ^^■p'ty ^ f°r any
other partition V ' ( y ) for p ( , y) .
Some properties of optimal partitions follow. Lemma 2 demonstrates that there is no loss of generality associated with restricting our attention to partitions V (y) for which the root is the only empty internal node. Lemma 3 shows that each subtree of an optimal partition tree is an optimal partition on the sub-alphabet it describes. Lemmas 2 and 3 hold under either of the above definitions of optimality. Lemma 4 implies that an optimal partition for matched Huffman coding is not necessarily optimal for arithmetic coding, as shown in Corollary 1. Properties specific to optimal partitions for Huffman coding or optimal partitions for arithmetic coding follow.
Lemma 2 There exists an optimal partition V*(y) for p(x,y) for which every node except for the root of V*(y) is non-empty and no node has exactly one child. Proof: If any non-root node n of partition ^(3^) is empty, then removing n, so i13^}^ descend
directly from n's parent, gives new partition V'(y) . Any matched code on 7^(3 , including the optimal matched code on T- ), is a matched code on V'(y) . If n has exactly one child, then combining n and its child yields a legitimate partition V'(y) ; the optimal matched code for V(y) yields expected rate no worse than that of the optimal matched code for VQ?) . □
Lemma 3 If _, . ■ ■ , %n are the subtrees descending from any node n in optimal partition V * ( y )
for p(x, y), then the tree where {T, .. τ descend from an empty root is identical to
T(V y)) , where V" (3^) is an optimal partition of = U™χ % for p(x, y) .
Proof: Since the matched code's description can be broken into a description of n followed by a matched code on { ., • ■ -, τ} and the corresponding description lengths add, the partition described by Tf3^)) cannot be optimal unless the partition described by { ., ■ ■ .,%nf is. G
Lemma 4 Let pi and i denote two p.m.f.s for alphabet i and
Figure imgf000054_0001
respectively, and use H(p) and
•™ (p) to denote the entropy and expected Huffman coding rate, respectively, for p.m.f p. Then,
H(pλ) > H(p2) does not imply R{H)(pι) ≥ R H){p2).
Proof: The following example demonstrates this property. Let pi = {0.5, 0.25, 0.25},
i = {0.49, 0.49, 0.02} then &) = 1.5, Hfø) = 1.12. However, the rate of the Huffman tree for Pi
is 1.5, while that for p% is 1:51. □ Corollary 1 The optimal partitions for matched Huffinan side-information coding and matched arithmetic side-information coding are not necessarily identical.
Proof: The following example demonstrates this property. Let alphabet
Figure imgf000055_0001
marginal p.m.f. {0.49, 0.01, 0.25, 0.24, 0.01}, and suppose that VLQ = {(bo, ty, (fa), (63, 64)} and
my) = {(bo), (fa, fa), (bi, 64)} are partitions of y for p(x, y) . The node probabilities of VxQ and
V are pi = {0.5, 0.25, 0.25} and p i = {0.49, 0.49, 0.02}, respectively. By the proof of Lemma 4,
Vι(y) is a better partition for Huffman coding while VQ is better for arithmetic coding, α
In the arguments that follow, we show that there exist pairs of groups (-?/, Gj) such that
Gi n Gj = 0 ; but Gι and Gj cannot both descend from the root of an optimal partition. This result is derived by showing conditions under which there exists a group 0* that combines the members of Gι and Gj and for which replacing {Gι, Gj} with {<?*} in ^(3 guarantees a
performance improvement.
The circumstances under which "combined" groups guarantee better performance than separate groups differ for arithmetic and Huffman codes. Theorems 4 and 5 treat the two cases in turn. The following definitions are needed to describe those results.
We say that 1-level groups -?ι and G2 (or nodes Α.G1) and 1 Qι)) can be combined
under p(x, y) if each pair 2/1 £ Q , 2/2 £ 2 can be combined under p(x, y) . If Gl, GJ £ *PQ , so that Gι and Gj extend directly from the root r of TζPQ) and nodes I
and J are the roots of Gϊ) and Gj), and G0 denotes the 1-level group at some node n-, in
1(0 J), we say that Qi can be combined with Gj at rio if (1) I can be combined with r-o and each
of n>'s descendants in Gj) and (2) n-, and each of i-o's ancestors in 0J) can be combined with
I and each of I's descendants in Gι). The result of combining 0ι with Gj at 0o is a new group
-?. Group 0" modifies Gj by replacing 0o with 1-level group (I,GQ) and adding the descendants
of I (in addition to the descendants of 0o) as descendants of (I,0o) in T(G*) . Figure 10D shows
an example where groups Gι = ((0) : {(04), (05)}) and Gj = ((07) : {( ), (03)}) of partition
^Q = {(°o)ι ^J) /) (α6)} combine at (02). The modified partition is V(y) = {(αo), £*, (αβ)}, where * = ((α2, α7) : {(«ι), fø), (04), (α5)}).
Lemma 5 For any constant A > 0, the function f(x) =■ log (1 + A/x) is monotonically
increasing in x for all x > 0.
Proof: The 1st order derivative of f(x ) is /'(x) = log (1 + A/x) - A/(x + A). Letu = A/x,g(u) = //(c)|x-=A/u = log (1 + u) - u/(u + 1 ), then u≥Q and g(0) = 0. The
1st order derivative of g(u) is cf(u) = u/(u + 1)2. For any u > 0, g'(u ) > 0 , thus g(u) > 0. So for any x> 0, f'(x) > 0, that is,f(x) is monotonically increasing in x. a
Theorem 4 Let 7^3 = {Gχ} ■ .., δfo} be a partition of y under p(-c, y). Suppose that Gι G 7^3^) can
be combined with Gj G l^ζy) at -?0, where -7o is the 1-level group at some node H, of 1(Gj). Let
V * ( y ) be the resulting partition. Then El^ (Y) ≤ Et(y) (Y) . Proof: Let n0 = Jji, . ■ -,JM = npJiV/, so that n0's parent is np. Define
§. — {Jji- • -J- : 1 ≤ϊ ≤ Λ-5" (i.e. the set of nodes on the path to n0 , excluding node J);
<% = {n G I(Gj) : n is the sibling of node s }, s G < , < = φ U { J}) n {n}c (i.e. the set of nodes on
the path to n0 , excluding node n0 ). For any node n G TζPQ ), let Qn and σn denote the subtree
and node probabilities respectively of node n in KfXj), and define &Qn = Qn — <h = - j Qnj.
Then Figure 16 shows the subtree probabilities associated with combining Gi with G at & . Let the resulting new group be Gk.
Note that the sum of the subtree probabilities of Gi and Gj equals the subtree probability
of , and thus the optimal average rate of the groups in 'Pry) D {Gi, 0j}c are not changed by the
combination. Thus if (-/, -j) and (Jj, Z) are the optimal average rates for (5r, -?) in ^ ) and
7 (3?) , respectively, then
Figure imgf000057_0001
(-/— -j) +(lj—lj) gives the total rate cost of using partition
^3) rather than partition "P*(- ) . Here
-1 = Q/logρ/ -∑Q/fclo ^L + Δ./
O
Figure imgf000057_0002
+Δg,iog. ΔQ
Δg, + Δgno
Figure imgf000057_0003
where Δlj represents the portion of the average rate unchanged by the combination of Gi
It follows that ΔZ/≥O since
Figure imgf000058_0001
>0, and since xlogL +c/x) is
monotonically increasing in x > 0 and c> 0 implies that
Figure imgf000058_0002
<ΔQ.lo (l+§) <Q/log(l+|)
Similarly, using ΔZj as the portion of lj unchanged by the combination,
Figure imgf000058_0003
-ifc
-1 = Qjlog(Qj + Qι)+ g__.og
Figure imgf000058_0004
Figure imgf000058_0005
Figure imgf000058_0006
Δ r
Δ. j = Qj log — j= )+ ∑ Qnk log — , + ∑ Q k log nASSl πfeeSi Δgn + g, if (no)
ΔQn Qna
+ ∑ ni-io Λn j- r Q>ι. + ' £ ∑i "v-".o0fc"i"° S ΔQa + ε0- akζ Δgπo+Δg/
Figure imgf000058_0007
-Δ«-loε(1+^)
≥ oB(l + )-ΔQ.,„g(1 + ^-) nζ Σ «.l SιUS3 Thus Δ-j > 0 by the monotonicity of -clog(l +c/x). Since the optimal rates of Gi and Gj both
decrease after combining, we have the desired result.
Unfortunately, Theorem 4 does not hold for matched Huffman coding. Theorem 5 shows a result that does apply in Huffman coding.
Theorem 5 Given partition V(y) of y on p(x,y), if Gi- Gj G V(y) satisfy: (1) Gi is a 1-level
group and (2) 0ι can be combined with Gj at root J of IJβj) to form partition ?* (3^) , then
El{H) (Y) < EliH) (Y)
Proof: Let Q denote the matched Huffman code for ^( ), and use cx-i and c-j to denote this
code's binary descriptions for nodes I and J . The binary description for any symbol in Gi
equals oq ( c-(y) = c-/ for each _/ € Of) while the binary description for any symbol in Gj has prefix aj (o(y) for each y € Gj, where α is a matched Huffman code for Gj ). Let c be the
shorter of cq and oy . Since Q is a matched Huffman code for ^3) and V* (3^) is a partition of
y on p[χ,y),
if y £. G α*fø) if yG Gj
Figure imgf000059_0001
otherwise
is a matched code for V* (y) . Further, |c J < |oj| and
Figure imgf000059_0002
imply that the expected
length of ct Y) is less than or equal to the expected length of o(Y) (but perhaps greater than the expected length of the matched Huffman code for V* (y) ). General Lossless Instantaneous MASCs: Problem Statement. Partition Pairs, and Optimal Matched Codes
We here drop the side-information coding assumption that X (or Y) can be decoded independently and consider MASCs in the case where it may be necessary to decode the two symbol descriptions together. Here, the partition V(y) used in lossless side-information coding is replaced by a pair of partitions (P(X) , V(y) ). As in side-information coding, V(X) and P(y) describe the prefix and equivalence relationships for descriptions Xiχ(χ) : x . X} and
ilγ(y) '■ y e y} , respectively. Given constraints on (P fy , 7?(3;) ) that are both necessary and sufficient to guarantee that a code with the prefix and equivalence relationships described by (P , V(y) ) yields an MASC that is both instantaneous and lossless, Theorem 1 generalizes easily to this coding scenario, so every general instantaneous lossless MASC can be described as a matched code on V( ) and a matched code on V y) for some CP ) , V(y) ) satisfying the appropriate constraints.
In considering partition pairs (P(X) , P y) ) for use in lossless instantaneous MASCs, it is necessary but not sufficient that each be a legitimate partition for side iriformation coding on its respective alphabet. (If *P y) fails to uniquely describe Y when the decoder knows X exactly, then it must certainly fail for joint decoding as well. The corresponding statement for V(X) also holds. These conditions are, however, insufficient in the general case, because complete knowledge of X may be required for decoding with V(y) and vice versa.) Necessary and sufficient conditions for CP(X) , V y) ) to give an instantaneous MASC and necessary and sufficient conditions for (P(X) , V(y) ) to give a lossless MASC follow. For (7?(Λ) , V(y) ) to yield an instantaneous MASC, the decoder must recognize when it reaches the end of ηχ(X) and ηγ Y) . The decoder proceeds as follows. We think of a matched
code on V as a multi-stage descriptiowith each stage corresponding to a level in 1{P) . Starting at the roots of T(P(X)) and T(V(y)), the decoder reads the first-stage descriptions of ηχ(X) and
iγ(Y) , traversing the described paths from the roots to nodes nx and τχy in partitions T(V( X) )
T(P(y)) respectively. (The decoder can determine that it has reached the end of a single stage description if and only if the matched code is itself instantaneous.) If either of the nodes reached is empty, then the decoder knows that it must read more of the description; thus we assume, without loss of generality, that n,γ and ny axe not empty. Let Tx and Ty be the subtrees
descending from nx and n^ (including nx and ny respectively). (The subtree descending from
a leaf node is simply that node.) For instantaneous coding, one of the following conditions must hold:
(A) G Tx or ny is a leaf implies that YE αy , and Y E ly or nx is a leaf implies that l€ nx;
(B) Xe Tx implies that Y£ ny,
(C) Y e Ty implies that X £ nx.
Under condition (A), the decoder recognizes that it has reached the end of ηχ(X) and
7 γ( Y) . Under condition (B), the decoder recognizes that it has not reached the end of γ(Y)
and reads the next stage description, traversing the described path in 1 CVQX) to node n' with
subtree T'y . Condition (C) similarly leads to a new node rlχ and subtree ' χ . If none of these conions holds, then the decoder cannot determine whether to continue reading one or both of the descriptions, and the code cannot be instantaneous. The decoder continues traversing 1 (P(X)) and 1 PQX) until it determines the 1-level groups nx and ny with X E nΛ and YE ny. At each
step before the decoding halts, one (or more) of the conditions (A), (B), and (C) must be satisfied.
For (P(X),
Figure imgf000062_0001
to give a lossless MASC, for any(x, y) E X x y withpfx, y) > 0 following the above procedure on (-yχ(x), γ(y)) must lead to final nodes (nx, ny ) that satisfy:
(D) (x, y) E nx x ny and for any other x' E nx and y' E ny, p(x, y') = p(x', y) = p(x', y') = 0
The following lemma gives a simplified test for deteπm'ning whether partition pair (T X), TXX)) yields a lossless instantaneous MASC. We call this test the MASC prefix condition. Lemma 6 reduces to Lemma 1 when either V(X) = {{x} : x E X} or 7^(3) = {{y} : y y}. n either of these cases, the general MASC problem reduces to the side information problem of Section II.
Lemma 6 Partition pair (P(X), PQ ) for p(x , y) yields a lossless instantaneous MASC if and only if for any χ, x' E X such that {ix(x), x(x')} does not satisfy the prefix condition,
ilX ) ■ y E Ax ϋ A^} satisfies the prefix condition and for any y, y' E y such that
ilγ(y), lY y')} does not satisfy the prefix condition, iix(x) : x E By U Byl} satisfies the prefix
condition. Here By — {x E X : p(x, y) > 0} . Proof: First, we show that if lossless instantaneous MASC decoding fails, then the MASC prefix condition must be violated. If lossless instantaneous MASC decoding fails, then there must be a time in the decoding procedure, that we decode to nodes (n^, n^) with subtrees %> and 1y, but one of the following occurs:
(1) none of the conditions (A), (B), or (C) is satisfied;
(2) condition (A) is satisfied, but condition (D) is violated.
In case (1), one of the following must happen: (a) the decoder determines that YE ny , but
cannot determine whether or not X E nx ; (b) the decoder determines that X E nx , but cannot
determine whether or not Y E ny ; (c) the decoder cannot determine whether or not YE ny or whether or not X E nx. If (a) occurs, then there must exist V, y' E ny , x E nx , and x' E
Figure imgf000063_0001
nχ c
t p(χ, y)p(χ', y) > 0 oτ p(x, y)p(χ', y') > 0, which means χ, χ' E By A Byl. If (b) occurs,
then there must exist -- ,-c' E nx , y E n , and y' E TAyV\ nc with p(x, y)p(x, y') > 0 or
p(x,y)p(x',τ ) > 0, which means X E Λ.1-1-4^. If (c) occurs, then there must exist x E nx,
χ' E y E ny, and y' E Tyn ny Wih p(χ,y)p x',yJ) > 0 or p(x', y)p(χ,y') > 0, which
means y, y' E Ax U Λ-,. Thus in subcases (a), (b), and (c) of case (1) the MASC prefix condition
is violated.
In case (2), assume the true values of {X, Y) are (x,y), then one of the following must occur: (a) we decode Y= y but cannot decode X; (b) we decode X = x but cannot decode Y ; (c) we can decode neither X nor Y . If (a) occurs, then there must exist an x' E nx with p(χ',y) > 0, which means x, x' E By. If (b) occurs, then there must exist a ' € n with
p(x,y') > 0 , which means y , y' 6 x. If (c) occurs, then there must exist x' E x and
if G ny ithp(x/,y') > 0 or p(x,y') > O oτ p(χ',y) > 0,whichmeans x , x' E By U Bv, or y , y' E AXU Aχl. Thus in subcases (a), (b), and (c) of case (2) the
MASC prefix condition is likewise violated.
Next, we show that if the MASC prefix condition is violated, then we cannot achieve a lossless instantaneous MASC. Here we use n„ and n to denote the nodes of the partition tree satisfying x E nx and y E ny. We assume symbols x,x' E X and y,y' y satisfy y,y' E Ax U Aχl and x , x ' e By U By>, but 7χ(x) and lx(x') do not satisfy the prefix
condition, and lγ(y ) and 7γ( ') do not satisfy the prefix condition; i.e. the MASC prefix
condition is violated. Then one of the following must hold:
(1) lx(x) = lx(x') and i = ηγ(y') ;
(2) ηx(x) = χ(x') and γ(y) is the prefix of ηγ(y');
(3) lγ(y) = lγ(y') and fχ(x) is the prefix ofηχ(x');
(4) TSt is the prefix of ηχ(xl) and 7y(y) is the prefix of 7y(y') .
In case (1), there must be a time in the decoding procedure that the decoder stops at (i -, ny) and determines that X nx;YExιy. However, since y, y' E Ax U Aχl, all of the following are possible given len- and Y< xty : (a) y E Ax n Aχ c i and y' E Ax, n Ax c ; (b)
y E Λ-. n Ax c and y' E Ax n -4^; (c) y, y' E Ax n Aχl. Thus the decoder cannot determine
which of the following symbols was described: (x, y), (x, y'), (x1, y) or (x , y').
In case (2), there must be a time in the decoding procedure that the decoder reaches (nx, ny) and determines that X E nx. However, as in case (1), all of the three possibilities can
happen, and the decoder does not have extra information to determine whether or not Y .
In case (3), there must be a time in the decoding procedure that the decoder reaches (i c, ny)and determines that YE ny. However, as in case (1), all of the three possibilities can
happen, and the decoder does not have extra information to determine whether or not n..
In case (4), there must be a time in the decoding procedure, that the decoder reaches
(ri , ny) and needs to determine whether or not X E ii. and whether or not Y E n y. However,
again as in case (1), all of the three possibilities can happen, and the decoder does not have extra information to instantaneously decode. □
Optimality of a matched code for partition V(y) is independent of whether P y) is used in a side-information code or an MASC. Thus our optimal matched code design methods from lossless side-information coding apply here as well, giving optimal matched Shannon, Huffman, and arithmetic codes for any partition pair (P(X) , V(y) ) for p(x, y) that satisfies the MASC prefix condition. Optimal Partition Properties
Given a partition pair P(X) , V(y)) that satisfies the MASC prefix condition, (P(X) ,
V(y)) is optimal for use in a matched Huffinan MASC on p(x , y) if (El^XX), E-° ,(Y))
sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x y. Similarly, P(X) , P y) ) is optimal for use in a matched arithmetic MASC on p (x , y ) if
(Elγ,χ X), Elv* ,y Y)) sits on the lower boundary of the rates achievable by a lossless
MASC on alphabet X x y . Again -jjf and '* denote the Huffman and optimal description
lengths respectively for partition V , and Huffman coding is optimal over all codes on a fixed alphabet. (Mixed codes (e.g., Huffman coding on X and arithmetic coding on Y) are also possible within this framework.) While the lower convex hull of the rate region of interest is achievable through time sharing, we describe the lower boundary of achievable rates rather than the convex hull of that region in order to increase the richness of points that can be achieved without time sharing. This region describes points that minimize the rate needed to describe Y subject to a fixed constraint on the rate needed to describe X or vice versa. The regions are not identical since the curves they trace are not convex. Their convex hulls are, of course, identical.
Using Lemma 7, we again restrict our attention to partitions with no empty nodes except for the root. The proof of this result does not follow immediately from that of the corresponding result for side-information codes. By Lemma 6, whether or not two symbols can be combined for one alphabet is a function of the partition on the other alphabet. Thus we must here show not only that removing empty nodes does not increase the expected rate associated with the optimal code for a given partition but also that it does not further restrict the family of partitions allowed on the other alphabet.
Lemma 7 For each partition pair (7?(Λ) , V(y) ) that achieves performance on the lower boundary of the achievable rate region, there exists a partition pair (V * ( X) , V*(y) ) achieving the same rate performance as (P X) , 7?(3 ) ), for which every node except for the roots of V* ( X) and V*(y) is non-empty and no node has exactly one child.
Proof: Case 1 : If any non-root node n of partition V(X) is empty, then we remove n, so
descend directly from n's parent. Case 2: If any node n has exactly one child nl, then
we combine n and nl to form 1-level group (n, nl) with {nlfc}^ descending directly from
(n, nl). In both cases, the rate of the new partition does not increase and the prefix condition among V(X) 's non-empty nodes is unchanged, thus the symbols of 3^ that can be combined likewise remains the same by Lemma 6.
Partition Design
By Lemma 6, whether or not two symbols can be combined in a general MASC is a function of the partition on the other alphabet. Fixing one partition before designing the other allows us to fix which symbols of the second alphabet can and cannot be combined and thereby simplifies the search for legitimate partitions on the second alphabet. In the discussion that follows, we fix ?(Λ) and then use a variation on the partition search algorithm of lossless side- information coding to find the best V(y) for which (7?(Λ) , V y)) yields an instantaneous lossless MASC. Traversing all V(X) allows us to find all partitions with performances on the lower boundary of the achievable rate region.
To simplify the discussion that follows, we modify the terminology used in lossless side- information coding to restrict our attention from all partitions on y to only those partitions ^ for which ( (X) , 73(3;) ) satisfies the MASC prefix condition given a fixed P(X) . hi particular, using Lemma 6, symbols y andy' can be combined given V(X) if and only if there does not exist an x, x' E X such that ~yχ(x) :-- 7x(x') and y, y G -Λx U Aχl. (Here jχ(x) is any matched code
for P(X) .) Equivalently, y and y ' can be combined given T^λ) if for each pair -c , x' E X such that ηχ(x) --- lx(x') , (p(x, y) + p(x',y))(p(x, y') + p(x', y')) = 0. Given this new definition, the corresponding definitions for M-level groups, partitions on y, and matched codes for partitions on y for a fixed ^(Λ) follow immediately.
Next consider the search for the optimal partition on y given a fixed partition T-^A) . We use
Figure imgf000068_0001
to denote this partition. The procedure used to search for
Figure imgf000068_0002
is almost identical to the procedure used to search for the optimal partition in side-information coding. First, we determine which symbols from y can be combined given T^A) . In this case, for each node n £ 7[V(X)) , if Tn is the subtree of 1(V(X) ) with root n, then for each
n' £ xk with k € {l, . . ., K( )}, symbols y,y' € Λn U Λnι cannot be combined given V(X) . Here An — {y : y £ Ax, x E n} . Traversing the tree from top to bottom yields the frill
list of pairs of symbols that cannot be combined given V(X) . All pairs not on this list can be combined given 7 (A) . Given this list, we construct a list of groups and recursively build the optimal partition V* ( y \ V ( X) ) using the approach described in an earlier section. Given a method for finding the optimal partition V* (y | V ( X) ) for a fixed partition V(X) , we next need a means of listing all partitions V( X) . (Note that we really wish to list all V( X) , not only those that would be optimal for side-information coding. As a result, the procedure for constructing the list of groups is slightly different from that in lossless side-
information coding.) For any alphabet X' C X , the procedure begins by making a list fAχ> of
all (single- or multi-level) groups that may appear in a partition of X' for p(x, y) satisfying Lemma 7 (i.e. every node except for the root is non-empty, and K(n) l). The list is initialized as C x> = { (x ) : x E X } . For each symbol x E X ' and each non-empty subset
S C {z E X : z can be combined with x under p(x, y) }, we find the set of partitions P S) } of S for p(x, y) ; for each V(S) , we add x to the empty root of 1\V(S)) if V(S) contains more than
one group or to the root of the single group in V(S) otherwise; then we add the resulting new group to £ Xι if £ x> does not yet contain the same group.
After constructing the above list of groups, we build a collection of partitions of X' made of groups on that list. If any group 0 E Cχl contains all of the elements of X' , then {0}
is a complete partition. Otherwise, the algorithm systematically builds a partition, adding one group at a time from £χ to set V ( X') until V(X') is a complete partition. For 0 E x, to be
added to V(X') , it must satisfy 0 n 0' = 0 for all 0' E V(X') . The collection of partitions for
X' is named £-p(Xi) .
We construct the optimal partition V*( y\ V(X)) for eachT^-l") € £p(V) and choose
those partition pairs (V(X) , V(y) ) that mimmize the expected rate needed to describe Y given a fixed constraint on the expected rate needed to describe X (or vice versa). Near-Lossless Instantaneous Multiple Access Source Coding: Problem Statement, Partition Pairs, and Optimal Matched Codes
Finally, we generalize the MASC problem from lossless instantaneous side-information and general MASCs to near-lossless instantaneous side-information and general MASCs. For
any fixed e > 0, we call MASC ((lx, 1Y) , 1~ ) a near-lossless instantaneous MASC for Pe ≤ e if
((lx, 1Y) , 7_1) yields instantaneous decoding with Pe = Pτ(i~1( χ(X), lγ(Y))≠(X, Y)) < e. For instantaneous decoding in a near-lossless MASC, we require that for any input sequences x , x , xz, . . ■ and _/ι, y , Vz, - - - with p(xχ, yι) > 0 the instantaneous decoder reconstructs some
reproduction of
Figure imgf000070_0001
bits from
Ix(xι)lx( 2)lχ 3)- ■ ■ and the first
Figure imgf000070_0002
bits from Iγ yι)lγ(y2)l (y3)- ■ ■ (without prior knowledge of these lengths). That is, we require that the decoder correctly determines the length of the description of each (x, y) with p(x, y) > 0 even when it incorrectly reconstructs the values
of x and y. This requirement disallows decoding error propagation problems caused by loss of synchronization at the decoder.
Theorem 6 gives the near-lossless MASC prefix property. Recall that the notation
lγ(v) ""< 7 (z/) means that lγ(y) is a proper prefix of 7y(y ') , disallowing 7y(y) = iγ(y') .
Theorem 6 Partition pair (V(X), T )) can be used in a near-lossless instantaneous MASC on p(χ,y) if and only if both of the following properties are satisfied:
(A) for any x , x' £ X such that lχ(x) - lχ(x'), {l {y) : y £ U '} is prefix free; (B) for any x, x' e Λ. such that lχ(x) =lχ(x'), {l y) ■ y & 4UΛ'} is free of proper-prefixes.
Proof: If either condition (A) or condition (B) is not satisfied, then there exist symbols x,x' E X and y, y' £ y, such that y, y' £ Ax U Ax', and one of the following is true:
(1) lx(x) = 7 0 and lγ(y) -< lγ(y') ; (2) lγ(y) = lγ(y') and lχ(x) lχ(x') ;
(3) lx(x) - lx(x') and lγ(y ) -< lγ(y') . In any of these cases, the decoder cannot determine where to stop decoding one or both of the binary descriptions by an argument like that in Lemma 6. The result is a code that is not instantaneous.
For the decoder to be unable to recognize when it has reached the end of lx(X) and
1Y(Y) , one of the followings must occur: (1) the decoder determines that X £ nx, but cannot
determine whether or not Y £ ny; (2) the decoder determines that Y £ ny, but cannot determine
whether or not X £ n^; (3) the decoder cannot determine whether or not X £ x oτ Y ny.
Following the argument used Lemma 6, each of these cases leads to a violation of either (A) or (B) (or both).
Thus the near-lossless prefix property differs from the lossless prefix property only in allowing lx(x) = lx(x') and iγ(y) = lγ(y') when y, y' £ AX U Aχl. in near-lossless side
information coding of Y given X this condition simplifies as follows. For any y, y' y for which there exists an x £ X with p(x, y)p(x, y') > 0, iγ(y) -< iγ(y') is disallowed (as in
lossless coding) but lγ(y) = lγ(y') is allowed (this was disallowed in lossless coding). In this
case, giving and y' descriptions lγ(y) A iγ(y') would leave the decoder no means of
determining whether to decode l7v(y)| bits or l7κ(y')l bits. (The decoder knows only the value of a; and both p(x, y) and p(x, y') are nonzero.) Giving and y' descriptions lγ(y) = lγ(y') allows instantaneous (but not error free) decoding; and the decoder decodes to the symbol with
the given description that maximizes p(- \x) . In the more general case, if ( ^, G^) are the 1-level groups described by (iχ(X), iγ(Y)), the above conditions allow instantaneous decoding
of the description of G^ and G . A decoding error occurs if and only if there is more than
one pair of (x , y) £ 0^ x G^ with p(x, y) > 0. In this case, the decoder reconstructs the symbols as aτg max{x y)egiχ) χg{ )p( , y) .
Decoding Error Probability and Distortion Analysis
As discussed above, the benefit of near-lossless coding is a potential savings in rate. The cost of that improvement is the associated error penalty, which we quantify here.
By Lemma 6, any 1-level group G Q 37 is a legitimate group in near-lossless side-information coding of Y given X. The minimal penalty for a code with lγ(y) — lγ(y') for
all y, y' G is
Figure imgf000072_0001
This minimal error penalty is achieved by decoding the description of Q to y = arg maxy,egp(x, y') when X = x . Multi-level group 0 — (R> : C(1Z)) is a legitimate
group for side-information coding of Y given X if and only if for any x £ X and y £ 7c, y' £ C(K) implies p(x, y)p(x, y') = 0. In this case,
Figure imgf000073_0001
That is, the error penalty of a multi-level group equals the sum of the error penalties of the 1-level groups it contains. Thus for any partition V y) satisfying the near-lossless MASC prefix property,
Figure imgf000073_0002
Similarly, given a partition T^Λ), a 1-level group 0 -Ξ -V is a legitimate group for a general near-lossless MASC given V(X) if for any y,j/ , y and y1 do not both belong to -U A-, for
any a., x' such that χ(x) - χ(x'). A multi-level group 0= (f : C(R)) on 3* is a legitimate group
for a general near-lossless MASC if 7 and all members of C(1Z)) are legitimate, and for any y E ll and y' E C(1l), y and y' do not both belong to -.U -/ for any χ , χ ' such that x is a
prefix of x .
For any pair of nodes nΛ- € (7^^)) and ny E KfOX), the minimal penalty for (nA-, ny) is
Pe(nχ,ny)
Figure imgf000073_0003
Decoding the description of nx and --.3, to arg max..,...^ {p(-c, y)} gives this minimal
error penalty. Thus the minimal penalty for using partition pair (P(X), O)) satisfying the near-lossless MASC prefix property is
Figure imgf000074_0001
Since near-lossless coding may be of most interest for use in lossy coding, probability of error may not always be the most useful measure of performance in a near-lossless code. In lossy codes, the increase in distortion caused by decoding errors more directly measures the impact of the error. We next quantify this impact for a fixed distortion measure d(a,a) >0. If -. is the Hamming distortion, then the distortion analysis is identical to the error probability analysis.
In side information coding of Y given X, the minimal distortion penalty for 1-level group 0 is
Figure imgf000074_0002
This value is achieved when the description of G is decoded to arg min^g, ∑yegp(x, y) d(y, y) when X= x. Thus for any partition TQ satisfying the
near-lossless MASC prefix property, the distortion penalty associated with using this near-lossless code rather than a lossless code is
Figure imgf000075_0001
In general near-lossless MASC coding, the corresponding distortion penalty for any partition ( ^ ^) that satisfies the near-lossless MASC prefix property is
z#tø KW) + d(y,y)].
Figure imgf000075_0002
Partition Design
In near- lossless coding, any combination of symbols creates a legitimate 1-level group 0
(with some associated error PJG) or D(G)). Thus one way to approach near-lossless MASC design is to consider all combinations of 1-level groups that yield an error within the allowed error limits, in each case design the optimal lossless code for the reduced alphabet that treats each such 1-level group G as a single symbol ig (xg . X if \G\ > 1) or yg (yg $ -V if \G\ > 1),
and finally choose the combination of groups that yields the lowest expected rates. Considering all combinations of groups that meet the error criterion guarantees an optimal solution since any near-lossless MASC can be described as a lossless MASC on a reduced alphabet that represents each lossy 1-level group by a single symbol. For example, given a 1-level group G = (xx, . . ., xm) X, we can design a near-lossless
MASC with error probability Pe(G) by designing a lossless MASC for alphabets
X = Xf) {xi, . . ., xm}c U { g} and y and p.m.f.
if x X\ \ X p(^y) - {Σ lp(Xt,y) if X = Xg
Thus designing a near-lossless MASC for p(x, y) that uses only one lossy group 0 is equivalent to designing a lossless MASC for the probability distribution p(x,y), where the matrix describing p(x, y) can be achieved by removing from the matrix describing p(x, y) the rows for symbols ...,xm E θ and adding a row for xg. The row associated with xg equals the sum of
the rows removed. Similarly, building a near-lossless MASC using 1-level group G C >' is equivalent to building a lossless MASC for a p.m.f. in which we remove the columns for all y 0 and include a column that equals the sum of those columns.
Multiple (non-overlapping) 1-level groups in X or y can be treated similarly. In using groups 0\, G2 C λ, the error probability adds, but in using groups Gx -Ξ Λ and Oy -Ξ 3' the effect on
the error probability is not necessarily additive. For example, if 0X = (χ\, .. .,xm) and
Gy = (yι, . . -,yk) then the error penalty is
---.. _
Pa(Qx, 0y) =
Figure imgf000076_0001
where R = {xι, ..., a.m} and C= {yi, ..., yμ}. Since using just G gives
Ps(Gx) = p(x,y)
Figure imgf000077_0001
and using just Qy gives
Figure imgf000077_0002
we have
P*(Sx, Qy) = Pe(Qx) + Pe(Qy) - δ(Gχ, Qy), where
δ(Gx,Gy) = (9)
Figure imgf000077_0003
is not necessarily equal to zero. Generalizing the above results to multiple groups Gt, ..., 0χu
and Oy ■■■. &y,κ corresponding to row and column sets {R\, q, .. -,RM} and {C, Q, ..., Cκ} respectively gives total error penalty Pe({Gx,i, Gx,2, ■ ■ ■ , GX,M}, {Gy,ι, Gy,2, ■ ■ ■ , Gy,κ})
M K M K
= ∑Pe(Gx,i) + ∑P Gy,i) - ∑ ∑δ(Gχ,i, Gy,j). (10)
-=ι i=ι t=i j=ι
Here
M K
Pe({Gx,ι, Gx,2, • • ■ , GX,M), {Gy,ι, Gy,2, • - • , Gy,κ}) ≥ ax{ ∑ Pefø), ∑Pt(5W)}. i=l j'=l
Using these results, we give our code design algorithm as follows.
In near-lossless coding of source X given side information , we first make a list J~-χ e of all lossy 1-level groups of X that result in error at most e (the given constraint). (The earlier described lossless MASC design algorithm will find all zero-error 1-level groups.) Then a subset S ,e of CX such that SX is non-overlapping and result in error at most e is a combination of
lossy 1-level groups with total error at most e . For each Sχ,e, obtain the reduced alphabet X
and p.m.f p(x, y) by representing each group G £ <Sχ,ε by a single symbol xg as we described
earlier. Then perform lossless side information code design of X on p(x,y). After all subsets S s are traversed, we can find the lowest rate for coding X that results in error at most e .
Near-lossless coding of Y with side information X can be performed in a similar fashion.
To design general near-lossless MASCs of both X and Y, we first make a list -C-i-;- of all
1-level groups of X that result in error at most e , and a list C,y of all 1-level groups of y that
result in error at most e . (We include zero-error 1-level groups here, since using two zero-error 1-level groups Gx Q X and Gy Q y together may result in non-zero error penalty.) Second, we
make a list £su {$ ,e -Ξ A*-,- •" .?,- is non-overlapping, Pe(Sχ,e) ≤ e} of all combinations
of 1-level groups of X that yield an error at most e , and a list s-^ø U {$y,e £y,e'.Sy,e is
non-overlapping, Pe(Sy<e) < e} of all combinations of 1-level groups of IV that yield an error at
most e . (We include 0 in the lists to include side information coding in general coding.) Then for each pair (βχ,e, S ,e), we calculate the corresponding δ value and the total error penalty using formula (9) and (10). If the total error penalty is no more than e , we obtain the reduced alphabet
, y and p.m.f. p(x,y) described by Sχι€,Syte), then perform lossless MASC design on p(x,y). After all pairs of Sχe,Sy^ £ £Sχc x £s e are traversed, we can trace out the lower boundary
of the achievable rate region.
An Alternative Algorithm Embodiment
We next describe an alternative method of code design. The following notation is useful to the description of that algorithm.
The approach described below assumes a known collection of decisions on which symbols of y can be combined. If we are designing a side-information code, these decisions arise from the assumption that source X is known perfectly to the decoder and thus the conditions described in the section -"Lossless Side-Information Coding" apply. If we are designing a code for Y given an existing code for , these conditions arise from the MASC prefix condition in Lemma 6. The algorithm also relies on an ordering of the alphabet y denoted by 3^ = {yi, y , , yxj Here N = | y | is the number of symbols in 3 , and for any 1 ≤i <j ≤N , symbol v, is
placed before symbol v7 in the chosen ordering. Any ordering of the original alphabet is allowed. The ordering choice restricts the family of codes that can be designed. In particular, the constraints imposed by the ordering are as follows:
1. Two symbols can be combined into a one-level group if and only if
(a) they are combinable
(b) they hold adjacent positions in the ordering.
2. A one-level group can be combined with the root of a distinct (one- or multi-level) group if and only if
(a) the combination meets the conditions for combinability
(b) the groups hold adjacent positions in the ordering.
3. Two (one- or multi-level) groups can be made descendants of a single root if and only if the groups hold adjacent positions in the ordering. 4. The group formed by combining two symbols or two groups occupies the position associated with those symbols or groups in the alphabet ordering. Given that only adjacent symbols can be combined, there is no ambiguity in the position of a group.
We discuss methods for choosing the ordering below.
Finally, we define a function/used in the code design. For any i≤j , let
0 [i, j] denote the group that occupies positions from i toy. When the
algorithms begins, only
Figure imgf000081_0002
{1, 2, ..., NJ. The
values of G[i, j] for each i <j are set as the algorithm runs. The value of Q[l, N] when the
algorithm is completed is the desired code on the full alphabet. For any p e (0,1), let
H(p ,1 -p) = -p log p - (1 - p) Iog(l - p) . Finally, for any i≤j < k, let c[i, j, k] be defined as
follows.
if w[i, 7] = 0 and Q[i, j] can be combined with the root
Figure imgf000081_0003
ifw[z',7] > 0, w[7 + l,/ ] = 0, and GO + l,k] can be combined with the root of Gfi, j]
Figure imgf000081_0004
otherwise
The value of c[i, j, k] describes if the two adjacent groups (7[ι. ] and ^[7 + L k] must
be siblings under an empty root (when c[i, j, k] = 2 ) or one group can reside at the root of the
other group ( when c[i, j, k] = 0, <?[z, 7] can reside at the root of .7 [7 + k] ; when c[i, j, k] = 1, [7 + 1. k] can reside at the root of G[i, j] )■ We cannot calculate c[i, j, k] until £[z, j] and
£[7 + 1, have been calculated.
The value of /( w[i,
Figure imgf000082_0001
+ 1, k] ) is the rate of group G[i, k] when we use groups
G[i, j] and G[j + l. k] to construct £[/, A:]. When G[i, j] can reside at the root of -7 [7+1, k] ,
f{ w[z'. j + !> k] ) equals G [j + 1, k] 's best rate; when 0 [j + 1, k] can reside at the root of
G[i, j] , /( w[i, + 1, k] ) equals £[z, 7] 's best rate; when G[i, j] and [ + 1, k] must be
siblings, /( w[i, + l, k] ) equals w°[i, j, k]. The best rate of G[ k] is the minimal value of
( w[z' WU +
Figure imgf000082_0002
ver all j e { i, i+1 , i+L- 1 } . The function ( w[i,
Figure imgf000082_0003
+ 1, k ] ) is
calculated as follows:
f( w[i, j\
Figure imgf000082_0004
Here,
in Huffman coding in arithmetic coding
Figure imgf000082_0005
Given the above definitions, we use the following algorithm for code design. 1. Choose an order for alphabet * . In this step, we simply choose one of the |-V |!/2 distinct orderings of the symbols in 3^ . (An ordering and its reversal are identical for our purposes.)
2. Initialize w[i, i]=0 and £[z, z] = (y,) for all i e {1, 2, ..., N }
3. For each L e {1, 2 N-l}
a. For each i e {1, 2, .... N-L}, set
w[i, i + L] = min /e , ,■ ,.+lj..,+I_l} /( w[i,
Figure imgf000083_0001
)
b. Let 7* = argmin /6{, +1..,+i.l} /( w[i,
Figure imgf000083_0002
+ 1, i + ) , then set
G [i, j *] combined with the root of £ [j * +1, z + L] if c [i, j*, i + Z] = 0
G[i,i + L] = £[7*+l,z' + Z] combined with the root of £[z, 7*] if c[i,j*,i + ] = l
£[z',7*] and £[7*+l,z' + ] siblings under empty root if c[i,j*,i + L] = 2
When the above procedure is complete, £[l, N] is an optimal code subject to the
constraints imposed by ordering {yj, y2, , yNJ and w[l, NJ gives its expected description length.
Figure 13 illustrates the process in the alternate algorithm embodiment. At box 1301, an ordering of the alphabet is fixed. Then at box 1302, the variables (weight, group, etc) are initialized. At box 1303, L is set to 1. At box 1304, i is set to 1. L and i are counter variables for the loop starting at box 1305, which iterates through the ordering and progressively creates larger combination out of adjacent groups until an optimal code for the ordering is obtained. At box 1305, the current combination (i,j, i+L) is checked for combinability. The function/ for the combination is also determined at this point. At box 1306 the weight and grouping of the current combination are determined. At box 1307, it is determined whether i ≤ N-L . If it is then the process increments z* at 1310 and returns to box 1305. If not, it proceeds to box 1308 where a determination of whether L ≤ N-l is made. If it is then the process increments L and returns to box 1304. If not, the loop is complete and the process terminates at 1309. The optimal code and rate have been obtained.
The algorithm may be used in a number of different ways.
1. The code designer may simply fix the ordering, either to a choice that is believed to be good or to a randomly chosen value, and simply use the code designed for that order. For example, since only symbols that are adjacent can be combined, the designer may choose an ordering that gives adjacent positions to many of the combinable symbols.
2. Alternatively, the designer may consider multiple orderings, finding the optimal code for each ordering and finally using the ordering that gives the best expected performance.
3. The designer may also choose a first ordering O at random, find the best code £(C.) for
this ordering; then for each w e {1, 2, M] , the designer could permute ordering Om
using one or more of the permutation operations described below to find an ordering Om+x , for the given permutation operations, G( m+ ) is guaranteed to be at least as good
as G(On) , since Om+x is consistent with G( m) . This solution involves running the
design algorithm M+1 times. The value of Mean be chosen to balance performance and complexity concerns. Here we list four methods to derive a new ordering from an old ordering, such that the new ordering's performance is promised to be at least as good as the old ordering. Suppose the old ordering Om is {yj, ,yN}.
(a) Let £[i, j], £0+1, k] (i ≤j < k) be any two subtrees descending from the same parent
in G( m) . The new ordering Om+x is { y,, ,.y,-/, yJ+1 ,yk , y. ,y},
yk+i, ,yN] -
(b) Let 7 [i, j] be the root of subtree £[i, k] (z ≤j < k ) in G(Om ) . The new ordering Om+x
is {yi, ,yι-ι, y +ι, ,yj. , y„
Figure imgf000085_0001
y ■
(c) Let K[i, j] be the root of subtree £[k, j] (k< i ≤j ) in G( m) . The new ordering Om+l
is {yi, ,yk-ι,yι, ,y yk. ,yι-ι , yj+ι, ,y/v}-
(d) Suppose the subroot 7 [i, j] in G( m) is a one-level group with more than one
symbol. Any permutation on the sub-ordering {yh .yj results in a new ordering.
4. Any combination of random choices and permutations of the ordering can be used. 5. The designer may also be willing to try all orderings to find the optimal code.
Here we note that trying all orderings guarantees the optimal performance. Choosing a sequence of orders at random gives performance approaching the optimal performance in probability.
Huffman Coding Example for the New Algorithm
Table 5 gives another example of the joint probability of source X and Y, with
X = y = { ax , a2 , α3 , -z4 , a5 } . Suppose X is given as side-information, we now find the optimal
Huffman code for Y subject to the constraints imposed by ordering {ax , a2 , a3 , α4 , a5 } on 3^ •
Table 5
Figure imgf000086_0001
Initialize: w[/,t] = 0,
Figure imgf000086_0002
= (ai), i e {l,...,5} . /-. = 1 : ai and a2 are combinable, so w[l, 2 = 0 , £ [l, 2] = (α. , α2 ) ;
_-2 and .- j are combinable, so w[2, 3] = 0 , £ [2, 3] = (a2 , «3 ) ;
a3 and a4 are not combinable, so , 4] = -φ, 4] = 0.4 , £[3, 4] =(( ) : {(α.) ,(α4)}) ;
α* and a5 are not combinable, so
Figure imgf000087_0001
= 0.6, £[4, 5] = f( ):{(--4),(α5)}V
£ = 2 :
i = 1 : c[l, 1, 3] = 0 (since w[l,l]= 0 and £[l, l] = (α.) can be combined with the root
of £[2, 3] = (a2 ,a3)), so
Figure imgf000087_0002
w[2, 3]] = 0, which is the minimal value.
Thus w[l,3] = 0, £[l, 3] = (--!, -72,fl3);
i = 2 : c [2, 2, 4] = 0 (since w[2, 2j = 0 and £ [2, 2] = (a2 ) can be combined with the root
of £[3, 4] = (( ):{(α3),(-74)})),so /[vφ,
Figure imgf000087_0003
w[3, 4] = 0.4 ;
e[2, 3, 4] = 2 , (since w[2,3J=0 but £[2, 3] = ( 23) can't be combined with
the root of- £[4, 4] w[2, ] + P[2, ή = 0.5.
Figure imgf000087_0004
So, , 4] = 0.4, £[2,4] = ((α2):{( 3), (α4)}).
i = 3 : c [3, 3, 5] = 2 , /[iφ, 3], w[4, 5j = w° [3, 3, 5] = w[4 > 5] + Φ» 5] = 1.35 ; c[3, 4, 5] = 2 , (since vφ, 4] > 0 , w[5,5] = 0 , but £[5,5] = (a5 ) can't be
combined with the root of £[3, 4] = (( ) : {(--3),(α4)})),
/[w[3,4jw[5,5]l = w0[3,4,5]=w[3,4]+E[3,5] = 1.15.
So, φ, 5] = 1.15, £[3, 5] = (( ):{(( ):{(a3), (a4)}), (a5)}).
1 = 3,
z=l: e[l, l,4] = 0, [W[l,lJW[2,4]] = >v[2,4] = 0.4;
c[l, 2,4] = 0, [w[l,2H3 »4l = w[3,4] = 0.4;
c[l, 3,4] = 2, [w[l,3], 4,4l] = wo[l,3,4]=w[l,3]+P[l,4]=0.65.
So, w[l,4] = 0.4, £[1, 4] = ((fl fl2):{( ,), (α4)}).
i = 2: c[2, 2,5] = 2, [w[2,2jw[3,5]] = w0[2,2,5]=w[3,5]+P[2,5]=2;
c[2, 3, 5] = 2 , /[vφ, 3], vφ, 5] = w° [2, 3, 5] = vψ, 5] + P[2, 5] = 1.45 ;
c[2, 4, 5] = 2 ,
Figure imgf000088_0001
vφ, 5] = w° [2, 4, 5] = vφ, 4] + φ, 5] = 1.25.
So, vφ,5] = 1.25, £[2, 5] = (( ):{((a2 ):{(«-), ( 4)}), (a,)}).
L = 4 :
z = l: c[l,'l,5] = 0,/[ ujvφ.5ll =
Figure imgf000088_0002
c[l, 2,5] = 2, /[w[l,2],w[3,5]] = w0[l,2,5]=w[3,5] + p[l,5] = 2.15;
c[l, 3,5] = 2, /[w[l,3],w[4,5l] = w0[l,3,5]=w[4,5]+p[l,5] = 1.6;
e[l, 4,5] = 2, /Wl,4],w[5,5]] = w0[l,4,5]=w[l,4]+P[l,5]=1.4.
So, w[l,5] = 1.25, £[l,5] = ((αι):{((Ω2):{(α3), (a,)}), («,)}).
Thus the optimal Huffman code subject to the constraints imposed by ordering
{a,a2,a3,a„a5} on y is £[l, 5] = ((α.):{((α2):{(-73), (a4)}), (α5)}j, with rate w[l,5] = 1.25
bits.
Experimental Results
This section shows optimal coding rates for lossless side-information MASCs, lossless general MASCs, and near-lossless general MASCs for the example of Table 3. We achieve these results by building the optimal partitions and matched codes for each scenario, as discussed in earlier sections. Both Huffman and arithmetic coding rates are included.
Table 6 below gives the side-information results for the example of Table 3. Table 6
Figure imgf000090_0002
Here H(X) and RH(X) are the optimal and Huffinan rate for source X when X is coded
independently. We use [H(Y), &SIJY), #SJ ?)] and [Ruffl,
Figure imgf000090_0001
o denote the
optimal and Huffman results respectively for [traditional, side-information from results from Jabri and Al-Issa and our side-information] coding on Y. The partition trees achieving these results are shown in Figure 14. The rate achievable in coding Y using side-information X is approximately half that of an ordinary Huffman code and 90% that of result from [2].
Figure 15 shows general lossless and lossy MASC results. The optimal lossless MASC gives significant performance improvement with respect to independent coding of X and Y but does not achieve the Slepian- Wolf region. By allowing error probability 0.01 (which equals min. p(-c, y), i.e. the smallest error probability that may result in different rate region than in
lossless coding), the achievable rate region is greatly improved over lossless coding, showing the benefits of near-lossless coding. By allowing error probability 0.04, we get approximately to the Slepian- Wolf region for this example.
For the joint probability distribution given in Table 3 of the "Invention Operation" section, we perform the alternative algorithm embodiment (described in the last section) on several orderings of the alphabet y = {a0,a.,a2,a3,aA,as,a6,a7} (considering X is given as side-
information).
For Huffman coding, many orderings can achieve the optimal performance
(RS * (Y) = 1.67), for example, orderings (a0, fly, a3, a6, a2, a4, a5, a7), (a3, a6, a0, a4, a2,
as, a i, a7), (a4, a6, a0, a], a3, a5, a7, a2), (a7, a2, a3, a5, a4, a6, fly, a0), etc, etc. These
are just a few examples.
For arithmetic coding, again, many orderings can achieve the optimal performance
( I,Λ (Y) = 1-53582 ), for example, orderings (ao, a4, fly, a5, a2, a7, a , a6), (aj, a5, a2, a ,
a4, a7, a6, a3), (a5, at, a2, a4, a0, a7, a6, a3), (a6, a3, a4, a0, a2, a5, au a7), etc, etc.
These are just a few examples.
Table 7 below gives examples of a few randomly chosen orderings' Huffman code rates and arithmetic code rates.
Table 7
Figure imgf000092_0001
Thus, an implementation of lossless and near-lossless source coding for multiple access networks is described in conjunction with one or more specific embodiments. The invention is defined by the claims and their full scope of equivalents.
The paper titled "Lossless and Near-Lossless Source Coding for Multiple Access Networks" by the inventors is attached as Appendix A. Lossless and Near-Lossless Source Coding for Multiple Access
Networks *
Qian Zhao Michelle Eflros
Abstract
A multiple access source code (MASC) is a source code designed for the following network configuration: a pair of correlated information sequences {-X.} ^ι and {l".}0^ is drawn i.i.d. according to joint probability mass function (p.m.f.) p(x, y) the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources. The work of Slepian and Wolf describes all rates achievable by MASCs of infinite coding dimension (n — »• oo) and asymptotically negligible error probabilities (Pe - 0). In this paper, we consider the properties of optimal MASCs for practical coding applications. In considering codes for practical applications, we move from Slepian and Wolf's performance bounds on codes with infinite coding dimension and asymptotically negligible error probability to find properties of codes with a finite coding dimension (n = n0) and both lossless (Pi 10' = 0) and near-lossless (Pe ' < e for some fixed e > 0) performance. The penalty in total rate associated with going from Pe = 0 to Pe < e for e arbitrarily small can be extremely large for some sources, and thus near-lossless MASCs are useful as entropy codes within lossy MASCs. Our central results include generalizations of Huffman and arithmetic codes to attain the conesponding optimal MASCs for any source p.m.f., any coding dimension, and any bound on the probability of error. Experimental results comparing the optimal achievable rate region to the Slepian- Wolf rate region are included.
Index Terms: Slepian- Wolf, optimal lossless and near-lossless, networks source coding, H-ifi an and arithmetic code design, Wyner-Ziv.
*Q. Zhao(qianz@z.caltech.edu) and M. Ef-ros(eflros@z.caltecl-..edu) are with the Department of Electrical Engineering, MC 136-93, California Institute of Technology, Pasadena, CA 91125. This material is based upon work supported by NSF under Award No. CCR-9909026 and by the Caltech Lee Center for Advanced Networking. I Introduction
A multiple access network is a system with several transmitters sending information to a single receiver. One example of a multiple access system is a sensor network, where a collection of separately located sensors sends correlated information to a central processing unit. MASCs yield efficient data representations for multiple access systems when cooperation among the transmitters is not possible.
In the MASC configuration (also known as the Slepian- Wolf configuration) depicted in Figure 1(a), two correlated information sequences { t}°^ι and { .}~ι are drawn i.i.d. according to joint p.m.f. p(x, y). The encoder for each source operates without knowledge of the other source. The decoder receives the encoded bit streams from both sources. The rate region for this configuration [1] is plotted in Figure 1(b). This region describes the rates achievable in this scenario with coding dimension n -A- oo and probability of decoding error Pe ' — 0. Making these ideas applicable in practical network communications scenarios requires MASC design algorithms for finite dimensions. We consider two coding scenarios: first, we consider lossless (Pe = 0) MASC design for applications where perfect data reconstruction is required; second, we consider near-lossless (Pe < e) code design for use in lossy MASCs.
The interest in near-lossless MASCs is inspired by the discontinuity in the achievable rate region associated with going from near-lossless to truly lossless coding [2]. For example, if p(x,y) > 0 for all (x, y) X x y, then the optimal instantaneous lossless MASC achieves rates bounded below by H(X) and H(Y) in its descriptions of X and Y, giving a total rate bounded below by (X) + H(Y). This example demonstrates that the move from lossless coding to near-lossless coding can give arbitrarily large rate benefits. While nonzero error probabilities are unacceptable for some applications, they are acceptable on their own for some applications and within lossy MASCs in general (assuming a suitably small error probability). In lossy MASCs, a small increase in the error probability increases the code's expected distortion without causing catastrophic failure.
Prior works on practical lossless MASCs include [2, 3, 4, 5]. References [2, 3, 5] treat the problem as a side information problem, where both encoder and decoder know X, and the goal is to describe Y using the smallest average rate possible while mamtaining the unique decodability of Y given the known value of X. Neither [2] nor [3] is optimal in this scenario, as shown in [5] . In [5], Yan and Berger find a necessary and sufficient condition for the existence of a lossless instantaneous code with a given set of codeword lengths for Y when the alphabet size of X is two.
Figure imgf000095_0001
(a) (°)
Figure 1: (a) An MASC and (b) the Slepian- Wolf achievable rate region
Unfortunately their approach fails to yield a necessary and sufficient condition for the existence of a lossless instantaneous code when the alphabet size for X is greater than two. In [4], Prandhan and Ramchandran solve the lossless MASC code design problem when source Y is guaranteed to be at most a prescribed Hamming distance from source X. Methods for extending this approach to design good codes for more general p.m.f.s p(x,y) are unknown.
In this work, we describe in detail algorithms for (I) optimal lossless coding in multiple access networks (the extension of HuflEman coding to MASCs); (2) low complexity, high dimension lossless coding in multiple access networks (the extension of arithmetic coding to MASCs); (3) optimal near-lossless coding in multiple access networks (the extension of the Huflman MASC algorithm for an arbitrary non-zero probability of error); and (4) low complexity, high dimensional near- lossless coding in multiple access networks (the extension of the arithmetic MASC algorithm for an arbitrary non-zero probability of error). The algorithmic description includes methods for encoding, decoding, and code design for an arbitrary p.m.f. p(x, y) in each of the above four scenarios. Parts of the description given here were presented in [6, 7].
We introduce our algorithms through the examination of a sequence of increasingly general coding problems. The first problem involves losslessly describing source Y when source X is treated as side information known perfectly to the decoder for source Y but unknown to the encoder for source Y. The solution to this problem is applicable both to the problem of coding with side information and to a special case of the MASC problem. The MASC application arises in applications where source X is losslessly described using a traditional, independent lossless code (e.g., a Huflman code matched to the marginal p.m.f. of X; this code achieves expected rate Rx satisfying H(X) < Rx < H(X) + 1), and we wish to encode Y at the lowest possible incremental expected rate Ry given the decoder's knowledge of X. The rate minimizing code when X is unknown to both the encoder for Y and the decoder for Y is the Huffman code matched to the marginal p.m.f. on y-, this Huffman code achieves an expected rate Ry satisfying H(Y) < Ry < H(Y) + 1. The rate minimizing code design algorithm when X is known to both the encoder for Y and the decoder for Y designs a family of Huffman codes - one for each conditional p.m.f. on y given a particular value of a; E X; this family of Huffman codes achieves an expected rate Ry satisfying H(Y\X) < Ry < H(Y\X) + 1. Section II generalizes the Huffman code design algorithm to the scenario where X is known to the decoder of Y but unknown to the encoder of Y. Section II also treats the problem of arithmetic code design for the same scenario, allowing low complexity, high dimension entropy MASCs.
The solution to the side information problem considered in Section II yields codes that rninimize Ry at the expense of a high Rx (when X is treated as side information) and codes that minimize Rx at the expense of a high Ry (when Y is treated as side information). The general MASC problem considered in Section III relaxes the assumption that one of the two sources should be independently decodable to find the lossless MASC with the best possible tradeoff between Rx and Ry. In this case, we consider all codes with which an independently encoded X and Y can be jointly and instantaneously decoded with probability of error zero. The goal of the code design is to find the code that minimizes \Rχ + (1 — λ)Rγ for an arbitrary value of λ € [0, 1]. The two side information codes correspond to special cases (λ E {0, 1}) of the generalized problem. The result is a family of codes with intermediate values of Rx and Ry. As in the first scenario, we generalize both Huffrnan and arithmetic codes for application to this MASC scenario.
Finally, Section IV treats the near-lossless MASC problem. In this case, the problem is to design the code that minimizes \Rχ + (1 — λ)Rγ over all instantaneously decodable MASCs with probability of error no greater than Pe. Both λ E [0, 1] and Pe € [0, 1] are arbitrary constants. We here generalize the lossless Huflman and arithmetic MASC algorithms for near-lossless coding.
Throughout this paper, we explore the properties of optimal codes and discuss constructive algorithms for finding optimal codes. We treat both Huffman and arithmetic MASCs for all scenarios. The encoding and decoding complexity of the proposed optimal MASCs is comparable to the encoding and decoding complexities of traditional (single-sender, single-receiver) Huffman and arithmetic codes. Unfortunately, the complexities of the constructive code design algorithms are high. High design complexities are apparently unavoidable for optimal lossless and near-lossless MASC since the optimal code design problem is NP-hard [8]. Design of fast algorithms to approx- imate the optimal solution is a topic of ongoing research.
Section V contains experimental results. The key contributions of the paper are summarized in Section VI.
II Lossless Side-Information Coding
A Problem Statement
We consider finite-alphabet memoryless sources X and Y with joint probability mass function p(x, y) on alphabet X x y. We use pχ(x) and pγ(y) to denote the marginals of p(x, y) with respect to X and Y. (The subscripts are dropped when they are obvious from the argument, giving p(x) = pχ(x) and p(y) = py(y)-) A lossless instantaneous MASC for joint source (X, Y) consists of two encoders ηx : X -» {0, 1}* and ηy : y — s» {0, 1}* and a decoder η~x : {0, 1}* x {0, 1}* — ► X x y. Here ηχ(x) and Jy(y) are the binary descriptions of x and y and the probability of decoding error is Pe = Pι(η~1(ηχ(X), ηγ(Y)) (X, Y)). This section treats lossless coding, where Pa ≡ 0. Further, we concentrate exclusively on instantaneous codes, where for any input sequences 0.1, 0.2, 0.3, . .. and 2 1 , 22. ϊ/3 , • • ■ with p(-cι, yi) > 0 the instantaneous decoder reconstructs (x\,yχ) by reading only the first | (a.ι)| bits from ηx (x )ηχ (0.2) -? (^3) ■ • • aQ the first |τκ(2 ι)l its from
Figure imgf000097_0001
• • ■ (without prior knowledge of these lengths).
When X is perfectly known to the decoder (or losslessly described using an independent code on X), the problem reduces to the side information problem, and the aim is to describe Y efficiently using an encoder that does not know X. This scenario describes MASCs where ηx encodes X using a traditional code for p.m.f. {jp(x)}xeX aud 1Y encodes Y assuming that the decoder decodes X before decoding Y. In this case, if the decoder can correctly reconstruct yi by reading only the first |τr(yι)| bits of the Y encoder output 7r(j/ι)τκ(y2)7y(-/3) ■ • ■, the code Ty is a lossless instantaneous code for Y given X or a lossless instantaneous side-information code. We focus on the side information problem in this section. We treat the general MASC problem in Section III.
One class of lossless instantaneous codes aimed at the side information problem is introduced in [3, 5]. In those papers, one of the sources, say X, is encoded using a prefix code x that is optimal for the marginal p.m.f. p(x)}xex- The other source Y is encoded, using a code ηy that satisfies the property that for each x E X, 2/1, 2/2 Ax = {y E y : p(x, y) > 0} implies that ηy(yι) and 7r(j/2) satisfy the prefix condition. (That is, neither binary codeword is a prefix of the other binary codeword.) The decoder begins by losslessly reconstructing X. Given X, the decoder eliminates all y' Ax (since y' £" Ax implies p(X,y') = 0). Since all codewords for y E Ax satisfy the prefix condition, the decoder can use its knowledge of X to instantaneously decode Y.
The above prefix condition of ηy on c for each x E X is both necessary and sufficient for ηy to be a lossless instantaneous code for Y given side information X, as shown in Lemma 1.
Lemma 1 Code ηy is a lossless instantaneous side-information code for Y given X if and only if for each x G X, y, y' e c implies that ηγ(y) and ηγ(y') satisfy the prefix condition.
Proof: Sufficient: If there exists some x € X and some y, y' E Ax such that ηγ(y) is a prefix of ηγ(y'), then the codewords for y and y' cannot be instantaneously distinguished when X = x.1 Converse: Since X is assumed to be known to the decoder, the decoder is guaranteed a prefix-free code on the set of eligible symbols c Q y. □
The optimal encoder ηγ is the one that losslessly describes Y with the smallest expected rate. Lemma 1 demonstrates that instantaneous coding in a side-information MASC requires only that ilγ(y) '• V Ax} be prefix-free for each x E X and not that {ηγ(y) : y E y} be prefix-free, as would be required for instantaneous coding if no side-information were available to the decoder. Thus the optimal code may violate the prefix condition either by giving identical descriptions to two symbols (ηγ(y) — ηγ(y') for some y φ y') or by giving one symbol a description that is a proper prefix of the description of some other symbols (we write ηγ(y) __< ηγ(y') if the description of y is a prefix of the description of y' y; and ηγ(y) -< ηy(y') if ηy (y) is a proper prefix of ηγ(y') meaning we disallow ηγ(y) = ηγ(y'))- Both [3] and [5] allow ηγ(y) = ηγ(y') when y E Ac implies y' Ax for all x G X. However, [3] requires that distinct codewords be prefix-free, ruling out the optimal solution for many p(x, y).
This paper introduces the first constructive algorithm for building optimal MASCs for sources with arbitrary alphabets X and y and arbitrary p.m.f. p(x, y). Since X and y are arbitrary, we can let X be the extension alphabet X = X" and y be the extension alphabet y = yj*. Thus our optimal code design algorithm yields an optimal design algorithm for Xx n and y™ for any n. Before introducing this algorithm, we consider the properties of optimal codes. The following definitions and notations are useful for that discussion.
'We here assume without loss of generality that X is chosen to include only symbols for which p(x) > 0. B Groups, Partitions, and Matched Codes: Definitions and Properties
We begin by developing terminology for describing, for a particular code, which symbols from y have binary descriptions that are identical and which have binary descriptions that are prefixes of each other. We call symbols y,y' E y "combinable" if there exists a lossless instantaneous side- information code in which ηγ(y) ηγ(y'). If we wish to design a code with ηγ(y) = ηγ(y'), then we join those symbols together in a "1-level group." If we wish to give one 1-level group a binary description that is a proper prefix of the binary description of other 1-level groups, then we build a "2-level group." These ideas generalize to M-level groups with M > 2. We define these terms carefully below, ruling out constructions that cannot yield lossless side-information codes. These definitions allow us to design codes for the nested descriptions of groups rather than the description of symbols.
Symbols /1, 22 y can be combined under p(x,y) if p(x, yι)p(x,V2) = 0 for each x E X. The collection Q = (yι, ... ,ym) is called a 1-level group for p(x,y) if each pair of distinct members Vi.Vj G can be combined under p(x,y). For any y E y and any p(x,y), (y) is a special case of a 1-level group. The tree representation T(G) for 1-level group Q is a single node representing all members of G-
A 2-level group for p(x,y), denoted by Q = (72. : C(K)) comprises a root 7£ and its children C(7Z), where 72. is a 1-level group, C(H) is a set of 1-level groups, and for each Q' e C(T ), each pair y\ e 72. and 22 £ G' can be combined under p(x, y). Here members of all Q' C(TZ are called members of C(T ), and members of 72. and C(72.) are called members of Q. In the tree representation T(G) for G, T(H) is the root of T(G) and the parent of all subtrees T(G') for G' E C(K).
For each subsequent M > 2, an M-level group for p(x,y) is a pair Q = (72. : C(H)) such that for each G' e C(72.), each pair 21 G 72. and y2 € G' can be combined under p(x, y). Here 72- is a 1-level group and C(72.) is a set of groups of M — 1 or fewer levels, at least one of which is an (M — l)-level group. The members of 72 and C(K) together comprise the members of G = (72 : C(72.)). Again, T(72.) is the root of T(G) and the parent of all subtrees T(G') for Q' € C(72.). For any M > 1, an M-level group is also called a multi-level group.
We use the p.m.f. in Table 1, with X = y = {αo, αχ, . .. , 0^, 07}, to illustrate these concepts. For this p.m.f., (αo,α ,θ7) is one example of a 1-level group since p(x, α )p(x, α ) = 0, p(x, αo)p(x, αγ) = 0 and p(x, α )p(x, α7) = 0 for all a. G X. The pair (α , α7), a subset of (αo, o , α7), is a distinct 1-level group for p(x, y). The tree representation for any 1-level group is a single node.
Figure imgf000100_0001
Table 1: A sample p.m.f. on alphabet X x _V with X — y = {αo, 01, . . . , αβ, α7}.
An example of a 2-level group for p(x, y) is 2 = ((04) : {(αo), (α2, α7), (aβ)})- In this case H = (α4) and C(72) = {(αo), (02, 07), (αβ)}. The members of C(72.) are {00, 02, 06, 07}; the members of i are {00, 02, 04, 06, 07}. Here G2 is a 2-level group since symbol α4 can be combined with each of αo, θ2, αβ, 07, and (αo), (02, 07), (αβ) are 1-level groups under p.m.f. p(x, y). The tree representation T(Gι) is a 2-level tree. The tree root has three children, each of which is a single node. An example of a 3-level group for p(x, y) is G3 = ((α7) : {(α0), (01), ((α2) : {(o4), (05)})}). In T(Gz), the root T(aγ) of the three-level group has three children: the first two children are nodes T(αo) and T(αι); the third child is a 2-level tree with root node T(θ2) and children T(a±) and T(os).
A partition V(y) on y for p.m.f. p(x, y) is a complete and non-overlapping set of groups. That is, V y) = {Gi, G2, ■ ■ ■ , Gm} satisfies
Figure imgf000100_0002
y and Gj ] Gk = Φ for any j φ k, where each ft € V(y) is a group for p(x, y), and Gj U Gk a Gj ~ Gk refer to the union and intersection respectively of the members of Gj and Gk- The tree representation of a partition is called a partition tree. The partition tree T(V(y)) for partition V(y) = {Gι, G2, - - - . Gm} s built as follows: first, construct the tree representation for each Gi] then, link the root of all T(Gx), - G {1, . . . , } to a single node, which is defined as the root r of T(V(y)). A partition tree is not necessarily a regular fe-ary tree; the number of children at each node depends on the specific multi-level group. Figure 2(a) gives a partition tree for partition T(y) = {(03, 05), 3}.
We label the branches of a partition tree as follows. For any 1-level group G at depth d in T(T(y)), let n describe the -.-step path from root r to node T(G) in T(V(y))- We refer to G
Figure imgf000101_0001
Figure 2: (a) Partition tree T(V(y)); (b) labels for T(V(y))\ (c) matched code for V(y); (d) Combining groups in partition {(α0), ((α2) : {(α4), (α5)}), ((α7) : {(αi), (03)}), (αe)}-
by describing this path. Thus T(n) = T(.7). For notational simplicity, we sometimes substitute n for T(n) when it is clear from the context that we are talking about the node rather than the 1-level group at that node (e.g. n G T(P(y)) rather than T(n) G T(V(y))). To make the path descriptions unique, we fix an order on the descendants of each node and number them from left to right. Thus n's children are labeled as nl, n2, . . . , ixK"(n), where nk is a vector created by concatenating fc to n and K(Ώ is the number of children descending from n. The labeled partition tree for Figure 2(a) appears in Figure 2(b).
The node probability q(ή) of a 1-level group n with n G T(V(y)) is the sum of the probabilities of that group's members. The subtree probability Q(n) of the 1-level group at node n G T(V(y)) is the sum of probabilities of n's members and descendants. In Figure 2(b), α(23) = pγ( i) and Q(22 = py(a2) + γ(ai) +py(a5).
A matched code ηy for partition V(y) is a binary code2 such that for any node n G T(V(y)) and symbols yχ,y2 G n and y3 € nfc, fc G {1, ... , K (n)}: (1) ηγ(yχ) = ηγ(y2); (2) ηγ(yι) -< 1Y VZ) (3) {ηy(nk) : fc G {1, . . . , 2-. (n)}} is prefix-free. (We use ηyfa) interchangeably with ηγ(y) for any y G n.) If symbol y G. y belongs to 1-level group G, then ηy (y) describes the path in T(V(y)) from r to T(G)\ the path description is a concatenated Ust of step descriptions, where the step from n to nfc, fc G {1, . . . , K (n)} is described using a prefix-code on {1, . . . , i-T(n)}. An example of a matched code for the partition of Figure 2(a) appears in Figure 2(c), where the codeword for each node is indicated in parentheses.
In the above framework, a partition specifies the prefix and equivalence relationships in the binary descriptions of y G y,- a matched code is any code with those properties. Our definitions enforce
2We here focus on codes with binary channel alphabet {0, 1}. The extension to codes with other finite channel alphabets is straight forward. the condition that for any matched code, 2/1,2/2 € Ax for some -c G X implies that ηγ(yι) Trfø); that is, ηy violates the prefix property only when knowing X euminates all possible ambiguity. Theorem 1 establishes the equivalence of matched codes and lossless side-information codes.
Theorem 1 Code ηy is a lossless instantaneous side-information code for p(x, y) if and only ifηy is a matched code for some partition V(y) for p(x,y).
Proof: First we prove that a matched code for partition V(y) is a lossless instantaneous side- information code for Y. This proof follows from the definition of a matched code. In a matched code for partition V(y), only symbols that can be combined can be assigned codewords that violate the prefix condition, thus only symbols that can be combined are indistinguishable using the matched code description. Since symbols yi and y2 can be combined only if p(x,yχ)p(x, y2) = 0 for all x G X, then for each x G X, the matched code's codewords for Ac = {y G V : p(x,y) > 0} is prefix free. Thus the decoder can decode the value of X and then losslessly decodes the value of Y using the instantaneous code on c.
Next we prove that a lossless instantaneous side-information code ηy must be a matched code for some partition V(y) on y for p(x,y). That is given ηy, it is always possible to find a partition V(y) on y for p(x, y), such that J-f = {ηγ(y) : y G y} describes a matched code for V(y).
Begin by building a binary tree 75 corresponding to N* as follows. Initialize 72 as a fixed-depth binary tree with depth maXj,6y |τr(y)|. For each y G y, label the tree node reached by following path ηy (y) downward from the root of the tree (here '0' and '1' correspond to left and right branches respectively in the binary tree). Call a node in T2 empty if it does not represent any codeword in N and it is not the root of 75; ah other nodes are non-empty. When it is clear from the context, the description of a codeword is used interchangeably with the description of the non-empty node representing it.
Build partition tree T from binary tree 75 by removing all empty nodes except for the root as follows. First, prune from the tree all empty nodes that have no non-empty descendants. Then, working from the leaves to the root, remove all empty nodes except for the root by attaching the children of each such node directly to the parent of that node. The root is left unchanged. In T:
(1) All symbols that are represented by the same codeword in Λ reside at the same node of T. Since Ty is a lossless instantaneous side-information code, any yι, y2 at the same node in T can be combined under p(x, y). Hence each non-root node in T represents a 1-level group.
(2) The binary description of any internal node n G 7" is the prefix of the descriptions of its descendants. Thus for ηy to be prefix free on Ax for each x G X, it must be possible to combine n with any of its descendants to ensure lossless decoding. Thus n and its descendants form a multi-level group, whose root 72- is the 1-level group represented by n. In this case, C(72.) is the set of (possibly multi-level) groups descending from n in T. (3) The set of codewords descending from the same node satisfies the prefix condition.
Thus T is a partition tree for some partition V(y) for p(x, y) and λf is a matched code for v(y). □
Using Theorem 1, we break the problem of lossless side-information code design into two parts: partition design and matched code design. We begin with the second part.
C Matched Code Design: Optimal Shannon, Huffman, and Arithmetic Codes
Given an arbitrary partition V(y) for p(x, y), we wish to design the optimal matched code for V(y). In traditional lossless coding, the optimal description lengths are l*(y) — — logp(y) for all y G y if those lengths are all integers. Theorem 2 gives the corresponding result for lossless side-information codes on a fixed partition V(y).
Theorem 2 Given partition V(y) forp(x,y), the optimal matched code for V(y) has description lengths = 0 and
l* y)(nk) = l*ny)(n)
Figure imgf000103_0001
for all n G T(V(y)) and fe G {1, . . . , ϋC(n)} if those lengths are all integers. Here Hp,y n = I implies lγ>ty)(y) = I foτ a^ symbols y G _V that are in 1-level group n.
Proof: For each internal node n G T(V(y)), the codewords {ηγ(nk) : fc G {1, . .. , i-T(n)}} share a common prefix and satisfy the prefix condition. Deleting the common prefix from each codeword in {ηy (nfc) : fc = 1, . . . , K (n)} yields a collection of codeword suffixes that also satisfy the prefix condition. Thus if lγ>(y) (n) is the description length for n, then the collection of lengths {l-pry) (nfc) — lP{y)( ) : fc = l, . . . , K(n)} satisfies the Kraft Inequality: ∑£? 2-{<W)(nfcHw)(11)) < 1. (Here = 0 by definition.) We wish to minimize the expected length
y)) = ∑ q(n)l y)(n), of the matched code over all lp(y) (n) that satisfy
K(n)
∑ 2-(l y)W-l y)(>°)) = 1) Vn G ( (^)) = {n G T(V(y)) : ϋT(n) > 0}.
(We here neglect the integer constraint on code lengths.) If u(n) = 2~l,pw(n', then
neT(7>0>)) W and -t(n) must satisfy
∑ u(nfe) = --(n), Vn e l(P(y)). k=l
Since ϊ( 7( V)) is a convex function of u(n), the constrained minimization can be posed as an unconstrained minimization using the Lagrangian
J .
Figure imgf000104_0001
Differentiating with respect to u(n) and setting the derivative to 0, we get
— g(nfc)/ti(nfc) loge + λ(nfe) — λ(n) = 0, if nfc is an internal node;
(1) — g(nfe)/ι-(nfc) loge — λ(n) = 0, if nfc is a leaf node.
First consider all nfc's at the lowest level of the tree that have the same parent n. We have
σ(nfe)/u(nfc) loge = Q(nk)/u(nk) loge = -λ(n), fc = 1, . . . , K(n);
(2) ∑£ι) «(nfc) = «(n).
Thus we get
U(nk) 1, ... , K(n),
Figure imgf000104_0002
giving
Figure imgf000104_0003
Other nodes at the lowest level are processed in the same way.
Now fix some i two levels up from the tree bottom, and consider any node nifc. Case 1: If nifc has children that are at the lowest level of the tree, then by (1), σ(nιfc)
Figure imgf000104_0004
Substituting (3) into (4) gives
— g T(m—fc TT) , log e ∑ J-ir —fe)Q -( rmfcj). log e - λ ,(,nι) , = Q j(—mfc-), log e - λ .,(nx) , = 0 _, ( ,5E), u(xiχk) u(nιfc) u(nιfe) v ' that is
Figure imgf000105_0001
Case 2: If n = nifc has no children, then by (1),
Figure imgf000105_0002
which is the same as (6).
Considering all such nifc, fe = 1, ... , i-7(nι), we have
Figure imgf000105_0003
∑fir)«(mfc)=i(nι) which is the same problem as (2) and is solved in the same manner.
Continuing in this way (from the bottom to the top of T(T(y))), we finally obtain
"(nfc) = v S"!? "(*). Vfe = l,...,ϋT(n) VneI(P(y)). (8)
Setting
Figure imgf000105_0004
— log -.(nfc) completes the proof. D
We present three strategies for bunding matched codes that approximate the optimal length function of Theorem 2. For any node n with K(n) > 0, the first matched code ly-pty, describes the step from n to nfc using a Shannon code with alphabet {1, ... , K(n)} and p.m.f. {Q(nk)/∑j ' Q( j)}fe-ι ; the resulting description lengths are
Figure imgf000105_0005
0 and Ifi'Jnk) — lV{y)( ) +
Figure imgf000105_0006
Q(n )/Q(nk))]- Codes
Figure imgf000105_0007
and yApφy replace the Shannon codes of γ"P(y) w^1 Huffinan and arithmetic codes, respectively, matched to the same p.m.f.s.
As a simple example, we build the matched Huflman code for the partition in Figure 2(a). We work from the top to the bottom of the partition tree 7". Step 1: Design a Huffman code on the set of nodes descending from 7~'s root, according to their subtree probabihties, i.e. nodes {(03, aβ), (a-r)} with p.m.f. {pr(α3) + γ(a), Yar) +Pγ(a0) +pγ(aχ) +pγ(a2) +py(α) +py(a5)} = {.21, .79}; a Huffman code for these two branches is {0,1}. Step 2: For each subsequent tree node n with i-T(n) > 0, consider
Figure imgf000105_0008
as a new set, and do Huffman code design on this set, with p.m.f. {Q(nk)/∑fjl Q(nj)}fc=ι • We first design a Huffman code for group (α7)'s children {(α0),(αι),(α2)} according to p.m.f. {pγ(aQ)/Q,pγ(aχ)/Q,pγ(a2) +py(a-χ) +pγ(a5)/Q} = {.1/Q, .19/Q, .37/Q}, where Q = pγ(ao)+py(aχ)+pγ(a )A-pγ(a4:) +py(a5) = .66; a Huffman code for this set of branches is {00, 01, 1}. Then we design Huffman code {0, 1} for groups {(α4), (05)} with p.m.f. {pγ(a4)/(pγ(ai) + pγ(a5)),pγ(a5)/(pγ(a-ι) + pγ(a5))} = {.11/.17, .06/.17}. The full codeword for any node n is the concatenation of the codewords of all nodes traversed in moving from root 7"(r) to node n in T- The codewords for this example appear in Figure 2(c). Any "matched Huffman code"
Figure imgf000106_0001
is optimal by Theorem 3.
Theorem 3 Given a partition V(y), a matched Huffman code for V(y) achieves the optimal expected rate over all matched codes for V(y).
Proof: Let T be the partition tree of V(y)- The codelength of a node n G T is denoted by l(n). The average length J for V (y) is
I = (Q(fc)-(fc) + Δl(fc)) ,
Figure imgf000106_0002
where for each fc G {1, . . . , K(τ)}, Δl(fe) = ∑kueT q(k )(l(kn) - l(k)).
Note that ∑klχ Q(k)l(k) and {Δ-(fe)} can be minimized independently. Thus
Jj-(r) iV(r) min = min ∑ Q(k)l(k) + ∑ minΔl(fc). fe=ι fc=ι
In matched Hijffman coding, working from the top to the bottom of the partition tree, we first mini-mize ∑j._: Q(k)l(k) over -ill integer lengths l(k) by employing Huffman codes on Q(k). We then minimize each Δl(fe) over all integer length codes by similarly breaking each down layer by layer and minimizing the expected length at each layer. D
In traditional arithmetic coding (with no side-information), the description length of data sequence yn is l(yn) = —
Figure imgf000106_0003
+ 1, where py(yn) is the probability of yn. In designing the matched arithmetic code of yn for a given partition V(y), we use the decoder's knowledge of xn to decrease the description length of yn. The following example, illustrated in Figure 3, demonstrates the techniques of matched arithmetic coding for the partition given in Figure 2(a).
In traditional arithmetic coding, data sequence Yn is represented by an interval of the [0, 1) line. We describe Yn by describing the mid-point of the corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals. We find the interval for y™ recursively, by first breaking [0, 1) into intervals corresponding to all possible values of yi (see Figure 3(a)), then breaking the interval for the observed Yx into subintervals corresponding to -ill possible values of Yχy2, and so
Figure imgf000107_0001
Figure 3: Dividing the unit interval in (a) traditional arithmetic coding and (b) matched arithmetic coding for partition T(y) of Figure 2(a). (c) Matched arithmetic coding for sequence 07030^102.
on. Given the interval A C [0, 1] for Yk for some 0 < fe < n (the interval for Y° is [0, 1)), the subintervals for {Ykyk+i} are ordered subintervals of A with lengths proportional to p(yk+χ).
In matched arithmetic coding for partition V(y), we again describe Yn by describing the midpoint of a recursively constructed subinterval of [0, 1). In this case, however, if Yi G no at depth _-(no) = do in T(V(y)), we break [0, 1) into intervals corresponding to nodes in B = {n : (K(n) = 0 Λ d(n) < do) V (K( ) > 0 Λ --(n) = dQ)}. The interval for each n G B with parent no has length proportional to
pW( ) = p M(A>(lno)
,∑So) Q(nofc)/ * Q(*o) - q{no) . (here p^(n) is defined to equal 1 for the unique node r at depth 0). Refining the interval for sequence Yt_1 to find the subinterval for Yl involves finding the 1-level group n G V(y) such that Yi G n and using d(n) to calculate the appropriate pW values and break the current interval accordingly. We finally describe Yn by describing the center of its corresponding subinterval to an accuracy sufficient to distinguish it from its neighboring subintervals. To ensure unique decodability,
-μ)( re) = r-_ogP )l + ι, where p^(yn) is the length of the subinterval corresponding to string yn. Given a fixed partition V(y), for each y G y denote the node where symbol y G -V resides by n(y), and let no(y) represent the parent of node y. Then lW(y") = -lo&pM(ynY} + 1 ∑ -log^)(nfø)) + 1 i=l
Figure imgf000108_0001
< ∑'*fø) + 2 i=l where .*() is the optimal length function specified in Theorem 2. Thus the description length l(A)(yn) in coding data sequence yn using a 1-dimensional "matched arithmetic code" lγ-pιy\ satisfies
Figure imgf000108_0002
< (1/ra) ∑i-χ l*(yi) + 2/n, giving a normalized description length arbitrarily close to the optimum for n sufficiently large. We deal with floating point precision issues using the same techniques applied to traditional arithmetic codes.
As an example, again consider the p.m.f. of Table 1 and the partition of Figure 2(a). If Yi G {03, 06, 07}, [0, 1) is broken into subintervals [0, .21) for group (03, 06) and [.21, 1) for group (07) since
Figure imgf000108_0003
If Yi G { 0, 01, 02}, [0, 1) is broken into subintervals [0, .21) for group (03, 05), [.21, .33) for group (αo), [.33, .56) for group (αi), εmd [.56, 1) for group (02) since
Q(( o)) - -1
P{Λ)((ao)) = P{A)((ar)) .79 = .12
Q((a7)) - q((ar)) .79 - .13
.19 pW((aχ)) = pM((θ7)) Q((αι)) = .79 = .23 Q((α7)) - g((θ7)) 79 - .13 P )(M) = j>(Λ)((o7))- Q(M) .37
= .79- .44.
Q(("7)) - q((a7)) - ' .79 - .1Z Finally, if Yi G {0,4, 05}, [0, 1) is broken into subintervals [0, .21) for group (03, 06), [.21, .33) for group (OQ), [.33, .56) for group (αi), [.56, .84) for group (α4), and [.84, 1) for group (05) since
jpW((α4)) = .2847
P A,((«*)) =
Figure imgf000108_0004
-1553- Figure 3(b) shows these intervals.
Figure 3(c) shows the recursive interval refinement procedure for Y5 = (0703040102). Symbol Yi = 07 gives interval [0.21, 1) of length .79 (indicated by the bold line). Symbol Y2 — 03 refines the above interval to the interval [.21, .3759) of length .21 • .79 = .1659. Symbol Y3 = α4 refines that interval to the interval [.3024, .3500) of length .28 • .1659 = .0472. This procedure continues until finally we find the interval [0.3241, 0.3289).
Notice that the intervals of some symbols overlap in the matched arithmetic code. For example, the intervals associated with symbols 04 and 05 subdivide the interval associated with symbol 02 in the previous example. These overlapping intervals correspond to the situation where one symbol's description is the prefix of another symbol's description in matched Huffman coding. Again, for any legitimate partition V(y), the decoder can uniquely distinguish between symbols with overlapping intervals to correctly decode Yn using its side information about Xn.
D Optimal Partitions: Definitions and Properties
The preceding discussion describes optimal Shannon, Hiiffman, and arithmetic codes for matched lossless side-information coding with a given partition V(y). The partition yielding the best performance remains to be found. We focus on Huffinan and arithmetic coding.
Given a partition V(y), let
Figure imgf000109_0001
and
Figure imgf000109_0002
be the Huffman and optimal description lengths respectively for V(y). We say that P(y) is optimal for matched Huffman side-information coding on p(x, y) if ElX (Y) < Elj,, jY) for any other partition V'(y) for p(x, y) (and therefore, by Theorems 1 and 3, ElX (Y) < El(Y) where I is the description length for any other instantaneous lossless side-information code onp(-c, y)). We say that T(y) is optimal for matched arithmetic side- information coding on p(x,y) if Elp,y Y) < El?p,,y Y) for any other partition V'(y) for p(x,y).
Some properties of optimal partitions follow. Lemma 2 demonstrates that there is no loss of generality associated with restricting our attention to partitions V(y) for which the root is the only empty internal node. Lemma 3 shows that each subtree of an optimal partition tree is an optimal partition on the sub-alphabet it describes. Lemmas 2 and 3 hold under either of the above definitions of optimahty. Lemma 4 implies that an optimal partition for matched Huffman coding is not necessarily optimal for arithmetic coding, as shown in Corollary 1. Properties specific to optimal partitions for H-iffman coding or optimal partitions for arithmetic coding follow.
Lemma 2 There exists an optimal partition T*(y) forp(x, y) for which every node except for the root of V*(y) is non-empty and no node has exactly one child.
Proof: If any non-root node n of partition P(y) is empty, then removing n, so
Figure imgf000110_0001
descend directly from n's parent, gives new partition V'(y). Any matched code on V(y), including the optimal matched code on V(y), is a matched code on T'(y). If n has exactly one child, then combining n and its child yields a legitimate partition V(y); the optimal matched code for T'(y) yields expected rate no worse than that of the optimal matched code for V(y). □
Lemma 3 If χ, ... , Tm are the subtrees descending from any node n in optimal partition V*(y) forp(x,y), then the tree where {71, ... , Tm} descend from an empty root is identical to T(V*(y)), where V*(y) is an optimal partition of y = Ui χlϊ for (x,y).
Proof: Since the matched code's description can be broken into a description of n followed by a matched code on {71, ■ • • , Tm} and the corresponding description lengths add, the partition described by T(P(y)) cannot be optimal unless the partition described by {71, . . . , Tm} is. O
Lemma 4 Letpi and 2 denote two p.mf.s for alphabet yx and y2 respectively, and use H(p) and R^ (p) to denote the entropy and expected Huffman coding rate, respectively, for p.m.f. p. Then, H(pχ) ≥ H(p2) does not imply R^(pχ) > R^(p2).
Proof: The following example demonstrates this property. Let px = {0.5, 0.25, 0.25}, p2 = {0.49, 0.49,0.02}, then H(pχ) = 1.5, H(p2) = 1.12. However, the rate of the Huflman tree for px is 1.5, while that for p2 is 1.51. D
Corollary 1 The optimal partitions for matched Huffman side-information coding and matched arithmetic side-information coding are not necessarily identical.
Proof: The following example demonstrates this property. Let alphabet y = {&o, &ι, b , 63, 64} have marginal p.m.f. {0.49, 0.01, 0.25, 0.24, 0.01}, and suppose that Vχ(y) = {(bo, bχ), (b2), (b3, b,i)} and Pi(y) = {(bo), ( , ), (bi, )} are partitions of y for p(x, y). The node probabihties of Vχ(y) and V2(y) are px - {0.5, 0.25, 0.25} and p2 = {0.49, 0.49, 0.02}, respectively. By the proof of Lemma 4, Vχ(y) is a better partition for Huffmεm coding while V2(y) is better for arithmetic coding. □
In the arguments that follow, we show that there exist pairs of groups (Gι, Gj) such that Gi (1 Gj = 0 but Gi and Gj cannot both descend from the root of εm optimal partition. This result is derived by showing conditions under which there exists a group G* that combines the members of Gi and Qj and for which replacing {Gi, Gj} with {-?*} in V(y) guarantees a performance improvement. The circumstances under which "combined" groups guarantee better performance than separate groups differ for arithmetic and Huffman codes. Theorems 4 and 5 treat the two cases in turn. The following definitions are needed to describe those results.
We say that 1-level groups Gi and G2 (or nodes T(Gι) and T(G2)) can be combined under p(x,y) if each pair yi G Gi, y2 G G2 can be combined under p(x,y).
If Gι, Gj G V(y), so that Gi and j extend directly from the root r of T(P(y)) and nodes I and J are the roots of T(Gι) and T(Gj), and Go denotes the 1-level group at some node n0 in T(Gj), we say that Gi can be combined with Gj at n0 if (1) I can be combined with n0 and each of n0's descendants in T(Gj) and (2) no and each of n0's ancestors in T(Gj) can be combined with I and each of J's descendants in T(Gι)- The result of combining Gi with Gj at o is a new group G*. Group G* modifies Gj by replacing Q0 with 1-level group (I, Go) and adding the descendants of / (in addition to the descendants of Go) as descendants of (I, Go) in T(G*)- Figure 2(d) shows an example where groups Gi = ((a ) : {(α4), (05)}) and Gj = ((07) : {(θι), (θ3)}) of partition W) = {(ao), Gι, Gj, (a6)} combine at (α2). The modified partition is V*(y) = {(α0), f *, (θ6)}, where Q* = ((02,07) : {(αi), (03), (α4), (α5)}).
Lemma 5 For any constant A > 0, the function f(x) = -clog(l+-4/o.) is monotonically increasing in x for all x > 0.
Proof: The 1st order derivative of f(x) is f'(x) — log(l + A/x) — A/(x + A). Let u = A/x, g(u) =
Figure imgf000111_0001
0. The 1st order derivative of g(u) is g'(u) = u/(u + l)2. For any u > 0, g'(u) > 0, thus g(u) > 0. So for any x > 0, f'(x) > 0, that is, f(x) is monotonically increasing in x. D
Theorem 4 Let V(y) — {Gi, ■ ■ ■ , Gm} be a partition of y under p(x, y). Suppose that Gi G V(y) can be combined with Gj G V(y) at Go, where Go s the 1-level group at some node n0 of (Gj)- Let V*(y) be the resulting partition. Then El^,^(Y) < El^y)(Y).
Proof: Let n0 = Jjx . . . JM = Hp M, so that n0's parent is . Define Si = { Jjx - - -ji '. l < i < M} (i.e. the set of nodes on the path to n0, excluding node J); S2 = {n G T(Gj) • n is the sibling of node s, s G «Sι}; S3 = («->ιU{J})n{n0}c (i.e. the set of nodes on the path to no, excluding node n0). For any node n G T(V(y)), let Qa and qn denote the subtree and node probabihties respectively
Figure imgf000112_0001
Figure 4: Combining two groups (Gi and Gj) into one group.
of node n in T(P(y)), and define ΔQn = Qn — «Zn = ∑j i Qaj- Then Figure 4 shows the subtree probabihties associated with combining Gi with Gj at Go- Let the resulting new group be G*-
Note that the sum of the subtree probabihties of Gi and Gj equals the subtree probability of G*, and thus the optimal average rate of the groups in V(y) f {Gi, Gj}c are not changed by the combination. Thus if (ι,ϊj) and (-/, Ij) are the optimal average rates for (Gι,Gj) in P(y) and P*(y), respectively, then Δ7j + ΔZj = (7/ — lr) + (Ij — Ij) gives the total rate cost of using partition V(y) rather than partition V*(y). Here
- = Qi log Q,+ + ΔJj
Figure imgf000112_0002
V K{l)
Qi + Quk Q/fe
-lτ = Qjlog f (QJ + + 0QJ) π + ∑Qι* log + Δ-j nkζSi Qi + ΔQn /
Figure imgf000112_0003
AQi + ΔQ no
Figure imgf000112_0004
QI + QJ Qi + Qjή Qι + Qap Qi + Q*
= Qi log
Qi + AQj Qi + AQj Qi + Δ B QJ
Figure imgf000112_0005
Ql + Qn
= Q/iogπ + Q,1og(l+^)-Δg,log(1+^), ne53 Qi + ΔQ, where Δ-j represents the portion of the average rate unchanged by the combination of Gi and Gj- It follows that Δli > 0 since logrjne53( + Qn)/(Qι + ΔQn) > 0, and since xlog(l + c/x) is monotonically increε-sing in x > 0 and c > 0 implies that
ΛQ,b6 (l + ) < Aβ,--. (l + t) < Q,bg (l + ) .
Ill Similarly, using Δ. j as the portion of Ij unchanged by the combination,
- = QjlogQj A- ∑ nfceSiUS.
Figure imgf000113_0001
-Tj = Qjiog(QJ + Qj) + ∑ Qnk iog ^+-QOr +Q"* loε nfceSi Δ ~Q"*nβ + ' Q ^l ' ^ τxk Si ^ "^Q Qnn +fc Ql
+ + Alj
Figure imgf000113_0002
Figure imgf000113_0003
Thus Δ7j > 0 by the monotonicity of -clog(l + c/x). Since the optimal rates of Qi and Gj both decrease after combining, we have the desired result. D
Unfortunately, Theorem 4 does not hold for matched Huffman coding. Theorem 5 shows a weaker result that does apply in Huflman coding.
Theorem 5 Given partition V(y) of y on p(x, y), if Gx, Qj G V(y) satisfy: (1) £/ is a 1-level group and (2) Gi can be combined with Gj at root J of T(Gj) to form partition P*(y), then
ElWy) Y) ≤ El%y)(Y).
Proof: Let denote the matched Hiiffman code for V(y), and use αj and aj to denote this code's binary descriptions for nodes J ε d J. The binary description for any symbol in Gi equals / (a(y) = αj for each y G Gi) while the bin-iry description for εmy symbol in Gj has prefix aj
(a(y) = aja'(y) for eεich y G GJ, where ' is a matched Huffman code for Gj)- Let α-min be the shorter of α/ and aj. Since is a matched Huffm- code for V(y) and T*(y) is a partition of y on
Figure imgf000113_0004
is a matched code for T*(y). Further, lo-minl < | /| εmd Ic-min] < |αj| imply that the expected length of a*(Y) is less than or equal to the expected length of a(Y) (but perhaps greater than the expected length of the matched Huffman code for P*(y)). d
E Partition Design
We next consider techniques for finding the optimal partition. These techniques may be used to find the optimal partition for Huffman coding or the optimal partition for arithmetic coding by applying the appropriate optimality criterion. While the partition design problem is NP-hard [8], we can gain significant efficiency improvements in the search process by taking advantage of the above-described properties of optimal partitions.
By Lemma 3, the partition design procedure can be recursive, solving for optimal partitions on sub-alphabets in the solution of the optimal partition on y. For any alphabet y' C y, the procedure begins by making a hst Cy of all (single- or multi-level) groups that can appear in an optimal partition T(y') of _V for p(x,y) given the above properties of optimal partitions. The hst is initialized as Cy = {(y) : y G _V'}- For each symbol y G _V', we wish to add to the Ust all groups that have y as one member of the root, and some subset of _V as members. To do that, we find the set Cy = {z G V' : z can be combined with y under p(x,y)}. For each non-empty subset S Cy such that Cy does not yet contain a group with elements S U {y}, we find the optimal partition P(S) of S for p(x, y). We construct a new multi-level group G with elements «S U {y} by adding y to the empty root of T(V(S)) if V(S) contains more than one group or to the root of the single group in V(S) otherwise. Notice that y can be the prefix of any symbol in «->. Combining this observation with Lemma 3 gives the optimahty of among all groups in {_?' : members of G' are S U {y} and y is at the root of _?}. Since y can be combined with ah members of S U {y}, y must reside at the root of the optim-d partition of «*> U {y}; thus G is optimεd not only among all groups in {G' - members of G' are S U {y} and y is at the root of G}. but among all groups in {_?' : members of G' are <S U {y}}. Group G is added to the Cy, and the process continues.
After constructing the above Ust of groups, we recursively build the optimal partition of y' for p(x, y). If any group G G Cy contains all of the elements of y', then V(y') = {G} is the optimal partition on y'. Otherwise, the algorithm systematically builds a partition, adding one group at a time from Cy to set V(y') until V(y') is a complete partition. For G G Cy to be added to T(y'), it must satisfy: (1) G l"l G' = 0 and (2) G, G' cannot be combined (see Theorem 4 for arithmetic or Theorem 5 for Huffman coding) for all G' G P(y')- For each complete partition, we find the rate of the optimal code on V(y'). The optimεd partition is the partition whose optimal code gives the lowest expected rate. A lower complexity higher memory algorithm is achieved by recursively building optimal matched codes for the partial partitions and ruling out partial partitions for which another partial partition on the same alphabet yields a lower rate.
Ill General Lossless Instantaneous MASCs
A Problem Statement, Partition Pairs, and Optimal Matched Codes
We here drop the side-information coding assumption that X (or Y) can be decoded independently and consider MASCs in the case where it may be necessary to decode the two symbol descriptions together. Here, the partition V(y) used in lossless side-information coding is replaced by a pair of partitions (V(X),T>(y))- As in side-information coding, V(X) and V(y) describe the prefix εmd equivalence relationships for descriptions {ηχ(x) : x G X} and {ηγ(y) ■ y G y}, respectively. Given constraints on (V(X),T(y)) that are both necessary and sufficient to guarantee that a code with the prefix and equivalence relationships described by (V(X), P(y)) yields an MASC that is both instantaneous εmd lossless, Theorem 1 generalizes easily to this coding scenario, so every general instantaneous lossless MASC can be described as a matched code on V(X) and a matched code on T(y) for some (V(X), V(y)) satisf ing the appropriate constraints.
In considering partition pairs CP(X),V(y)) for use in lossless instantaneous MASCs, it is necessary but not sufficient that each be a legitimate partition for side information coding on its respective alphabet. (If V(y) fails to uniquely describe Y when the decoder knows X exactly, then it must certainly fail for joint decoding as well. The corresponding statement for V(X) also holds. These conditions are, however, insufficient in the general case, because complete knowledge of X may be required for decoding with V(y) and vice versa.) Necessary and sufficient conditions for (V(X), V(y)) to give an instantaneous MASC and necessary and sufficient conditions for (T(X), V(y)) to give a lossless MASC follow.
For (V(X), P(y)) to yield an instantaneous MASC, the decoder must recognize when it reaches the end of ηχ(X) and ηγ(Y). The decoder proceeds as foUows. We think of a matched code on V as a multi-stage description, with each stage corresponding to a level in T(T). Starting at the roots of T(V(X)) and T(V(y)), the decoder reads the first-stage descriptions of ηχ(X) and ηγ(Y), traversing the described paths from the roots to nodes n.χ εmd ny in partitions T(T(X)) and T(P(y)) respectively. (The decoder can determine that it has reached the end of a single stage description if and only if the matched code is itself instantaneous.) If either of the nodes reached is empty, then the decoder knows that it must read more of the description; thus we assume, without loss of generahty, that and ny are not empty. Let Tx and Ty be the subtrees descending from n.χ and ny (including n and ny respectively). (The subtree descending from a leaf node is simply that node.) For instεmtaneous coding, one of the following conditions must hold:
(A) X G Tx or n is a leaf implies that Y 6 ny, and Y" G 7y or n# is a leaf impUes that l e n^;
(B) X G Tx impUes that Y ny;
(C) Y G 7y impUes that X nx.
Under condition (A), the decoder recognizes that it has reached the end of ηχ(X) and ηγ(Y). Under condition (B), the decoder recognizes that it has not reached the end of ηy(Y) and reads the next stage description, traversing the described path in T(T(y)) to node ny' with subtree 7y. Condition (C) similarly leads to a new node n'x and subtree Tx- If none of these conditions holds, then the decoder cannot determine whether to continue reading one or both of the descriptions, and the code cannot be instantaneous. The decoder continues traversing T(V(X)) and T(P(y)) until it determines the 1-level groups n^r and ny with X G n and Y G ny. At each step before the decoding halts, one (or more) of the conditions (A), (B), and (C) must be satisfied.
For (V(X),V(y)) to give a lossless MASC, for any (x, y) G X x CV with p(x, y) > 0 following the above procedure on (ηχ(x), ηγ(y)) must lead to final nodes (n^, ny) that satisfy:
(D) (x, y) G nx X ny and for any other x' G nx and y' G n , p(x, y') = p(x', y) = p(x', y') = 0.
The following lemma gives a simplified test for determirung whether partition pεiir (V(X), V(y)) yields a lossless instantaneous MASC. We call this test the MASC prefix condition. Lemma 6 reduces to Lemma 1 when either V(X) = {{o.} : x G X} or 7?(CV) = {{y} : y G CV}. In either of these cεises, the general MASC problem reduces to the side information problem of Section II.
Lemma 6 Partition pair (V(X), P(y)) for p(x, y) yields a lossless instantaneous MASC if and only if for any x, x' G X such that {ηx(x),ηx(x')} does not satisfy the prefix condition, {ηγ(y) : y G Ax U - a,'} satisfies the prefix condition. (The statement that for any y,y' G CV such that {lγ y)ιlγ y')} does not satisfy the prefix condition, {ηχ(x) : x G By U -5 } satisfies the prefix condition is equivalent. Here By = {x G X : p(x, y) > 0} J Proof: First, we show that if lossless instantaneous MASC decoding fails, then the MASC prefix condition must be violated. If lossless instantaneous MASC decoding fails, then there must be a time in the decoding procedure, that we decode to nodes (n^, ny) with subtrees Tx and 7y, but one of the foUowing occurs:
(1) none of the conditions (A), (B), or (C) is satisfied;
(2) condition (A) is satisfied, but condition (D) is violated.
In case (1), one of the foUowing must happen: (a) the decoder determines that Y G ny, but cannot determine whether or not X G nχ (b) the decoder determines that X G nx, but cannot determine whether or not Y G ny; (c) the decoder cεinnot determine whether or not Y G ny or whether or not X G nx. If (a) occurs, then there must exist y, y' G ny, x G nχ} ε-nd -c' G Tx f~l nx with p(x,y)p(x',y) > 0 or p(x,y)p(x',y') > 0, which means x,x' G By U By>. If (b) occurs, then there must exist -c, -c' G n , y G ny, and y' G
Figure imgf000117_0001
with p(x, y)p(x, y') > 0 or p(x,y)p(x', y') > 0, which means y,y' G c U Ac'- If (c) occurs, then there must exist x G nx, x' £ T H nx, y G ny, and y' G 7y f~l ny with p(x, y)p(x', y') > 0 or p(x',y)p(x,y') > 0, which means y,y' G Ac U Ac Thus in subcases (a), (b), and (c) of case (1) the MASC prefix condition is violated.
In case (2), assume the true values of (X, Y) axe (x, y), then one of the following must occur: (a) we decode Y = y but cannot decode X; (b) we decode X = x but cannot decode Y"; (c) we can decode neither X nor Y. If (a) occurs, then there must exist an x' G n* with p(x',y) > 0, which meε s x, x' G By. If (b) occurs, then there must exist a y' G ny with p(x, y') > 0, which means y,y' G Ax. If (c) occurs, then there must exist x' G nx and y' G ny with p(x',y') > 0 or p(x, y') > 0 or p(x', y) > 0, which means x, x' £ By U By> or equivalently y, y' G * U Ac . Thus in subcases (a), (b), and (c) of case (2) the MASC prefix condition is likewise violated.
Next, we show that if the MASC prefix condition is violated, then we cannot achieve a lossless instantaneous MASC. Here we use nx and ny to denote the nodes of the partition tree satisfying x E nx and y E Ωy. We assume symbols x, x' G X and y,y' G CV satisfy y, y' Ac U Ac' (or equivalently x, x' G By U By>), but ηχ(x) and ηχ(x') do not satisfy the prefix condition, and ηγ(y) and ηγ(y') do not satisfy the prefix condition; i.e. the MASC prefix condition is violated. Then one of the foUowing must hold:
(1) ηχ(x) = ηχ(x') and ηγ(y) = ηγ(y')
(2) ηχ(x) = ηχ(x') and ηγ(y) is the prefix of ηγ(y'); (3) ηγ(y) = ηγ(y') and ηχ(x) is the prefix of ηχ(x')
(4) ηx(x) is the prefix of ηχ(x') and ηγ(y) is the prefix of ηγ(y')-
In case (1), there must be a time in the decoding procedure that the decoder stops at (nx, ny) and determines that X G n,., Y G ny. However, since y, y' G Ac U Ac, aU of the foUowing are possible given X € nx and Y G nj,: (a) y G -4* D -4°, and y' E Ac f\ A%; (b) y G Ac D -4°. and y' £ Ac n -4^; (c) y, y' G B D -4a,'. Thus the decoder cannot determine which of the foUowing symbols was described: (x, y), (x, yr), (x',y) or (x',y').
In cεise (2), there must be a time in the decoding procedure that the decoder reaches (nx, ny) and determines that X £ nx. However, as in case (1), ah of the three possibilities can happen, and the decoder does not have extra information to determine whether or not Y £ ny.
In case (3), there must be a time in the decoding procedure that the decoder reaches (n^ y) and determines that Y £ ny. However, as in case (1), aU of the three possibihties cε-n happen, and the decoder does not have extra information to determine whether or not X E n,..
In case (4), there must be a time in the decoding procedure, that the decoder reaches (nx, ny) and needs to determine whether or not l e n, and whether or not Y" G . However, again as in case (1), aU of the three possibihties can happen, and the decoder does not have extra information to instantaneously decode. □
Optim-dity of a matched code for partition V(y) is independent of whether V(y) is used in a side-information code or an MASC. Thus our optimal matched code design methods from Section II apply here as weU, giving optimal matched Shannon, Huffman, and arithmetic codes for ε y partition pair (P(X), P(y)) for p(x, y) that satisfies the MASC prefix condition.
B Optimal Partition Properties
Given a partition pair (V(X),V(y)) that satisfies the MASC prefix condition, (V(X),V(y)) is optimal for use in a matched Huffman MASConp(x, y) if (ElXχ X), El^,, (Y)) sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x CV. Similarly, (V(X), V(y)) is optimal for use in a matched arithmetic MASC on p(x, y) if EVp<χ) X), Elj>ιy\(Y)) sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x CV. Again lγ and Vp denote the Huffmε-n and optimal description lengths respectively for partition V, and Huffmεin coding is optimal over aU codes on a fixed alphabet. (Mixed codes (e.g., Huffmε-n coding on X and arithmetic coding on Y) ε-re also possible within this framework.) While the lower convex hull of the rate region of interest is achievable through time sharing, we describe the lower boundary of achievable rates rather than the convex huh of that region in order to increase the richness of points that can be achieved without time sharing. This region describes points that minimize the rate needed to describe Y subject to a fixed constraint on the rate needed to describe X or vice versa. The regions are not identical since the curves they trace are not convex. Their convex hulls are, of course, identic-il.
Using Lemma 7, we again restrict our attention to partitions with no empty nodes except for the root. The proof of this result does not foUow immediately from that of the corresponding result for side-information codes. By Lemma 6, whether or not two symbols can be combined for one alphabet is a function of the partition on the other alphabet. Thus we must here show not only that removing empty nodes does not increase the expected rate associated with the optimal code for a given partition but also that it does not further restrict the famUy of partitions allowed on the other alphabet.
Lemma 7 For each partition pair (V(X),V(y)) that achieves performance on the lower boundary of the achievable rate region, there exists a partition pair (V*(X),V*(y)) achieving the same rate performance as (P(X), P(y)), for which every node except for the roots of V*(X) and P*(y) is non-empty and no node has exactly one child.
Proof: Case 1: If any non-root node n of partition V(X) is empty, then we remove n, so {nk}k^' descend directly from n's pεirent. Case 2: If any node n has exactly one child nl, then we combine n -ind nl to form 1-level group (n, nl) with
Figure imgf000119_0001
descending directly from (n, nl). In both cases, the rate of the new partition does not increase and the prefix condition εunong V( Ys nonempty nodes is unchanged, thus the symbols of CV that can be combined likewise remains the same by Lemma 6. □
C Partition Design
By Lemma 6, whether or not two symbols cεm be combined in a general MASC is a function of the partition on the other alphabet. Fixing one partition before designing the other aUows us to fix which symbols of the second alphabet can and cannot be combined and thereby simplifies the search for legitimate partitions on the second alphabet. In the discussion that foUows, we fix V(X) and then use a variation on the partition search algorithm of Section II to find the best V(y) for which (P(X),V(y)) yields an instantaneous lossless MASC. Traversing aU V(X) aUows us to find aU partitions with performances on the lower boundary of the achievable rate region.
To simply the discussion that foUows, we modify the terminology used in Section II to restrict our attention from aU partitions on CV to only those partitions V(y) for which (V(X), T(y)) satisfies the MASC prefix condition given a fixed V(X). In particular, using Lemma 6, symbols y εmd y' can be combined given V(X) if and only if there does not exist an x, x' £ X such that ηχ(x) ηχ(x') and y,y € AχUAc>. (Here ηχ(x) is any matched code for V(X).) Equivalently, y and y' can be combined given V(X) if for each pair x, x' E X such that ηχ(x) __- ηx(x'), (p(x,y) + p( .', ))(p(-c) y,) + p(x', y')) = 0. Given this new definition, the corresponding definitions for Λ -level groups, partitions on y, and matched codes for pεirtitions on CV for a fixed V(X) foUow immediately.
Next consider the search for the optimal partition on y given a fixed partition V(X). We use V*(y\V(X)) to denote this partition. The procedure used to search for V*(y\V(X)) is almost identical to the procedure used to search for the optimal partition in side-information coding. First, we determine which symbols from CV can be combined given V(X). In this case, for each node n £ T(V(X)), if %. is the subtree of T(V(X)) with root n, then for each n' G 7n* with k E {1, . . . ,i-.(n)}, symbols y, y' E An U An> cannot be combined given V(X). Here An = {y : y E Aχ,x £ n}. Traversing the tree from top to bottom yields the full Ust of pairs of symbols that cannot be combined given V(X). AU pairs not on this hst can be combined given T(X). Given this Ust, we construct a Ust of groups and recursively build the optimal partition V*(y\V(X)) using the approach described in Section II.
Given a method for finding the optimal partition P*(y P(X)) for a fixed partition V(X), we next need a means of Usting aU partitions V(X). (Note that we reaUy wish to Ust aU V(X), not only those that would be optimε.1 for side-information coding. As a result, the procedure for constructing the Ust of groups is sUghtly different from that in Section II.) For any alphabet X' C X, the procedure begins by making a Ust Cχι of aU (single- or multi-level) groups that may appear in a partition of X' for p(x, y) satisfying Lemma 7 (i.e. every node except for the root is non-empty, εmd K(n) φ 1). The Ust is initiaUzed as Cχ> = {(x) : x E X'}. For each symbol x £ X' and each non-empty subset S C {z E X' : z can be combined with a under p(x,y)}, we find the set of partitions {V(S)} of S for p(x, y) for each V(S), we add x to the empty root of T(P(S)) if V(S) contains more than one group or to the root of the single group in V(S) otherwise; then we add the resulting new group to Cχ< if Cχ> does not yet contain the same group.
After constructing the above Ust of groups, we build a collection of partitions of X' made of groups on that Ust. If any group G E Cχι contains all of the elements of X', then {G} is a complete partition. Otherwise, the algorithm systematicaUy builds a partition, adding one group at a time from Cχ> to set T(X') until V(X') is a complete partition. For G G Cχ> to be added to V(X'), it must satisfy G C. G' = 0 for ah G' E V(X'). The coUection of partitions for X' is named
Figure imgf000121_0001
We construct the optimal partition
Figure imgf000121_0002
for each V(X) E Cp(χ) and choose those partition pairs (V(X), 'P(y)) that minimize the expected rate needed to describe Y given a fixed constraint on the expected rate needed to describe X (or vice versa). The computational complexity of this search is large, but the design problem is NP-hard [8], and thus low complexity optimal search algorithms are not available.
IV Near-Lossless Instantaneous Multiple Access Source Coding
A Problem Statement, Partition Pairs, and Optimal Matched Codes
FinaUy, we generalize the MASC problem from lossless instantaneous side-information and general MASCs to near-lossless instantaneous side-information and general MASCs. For any fixed e > 0, we caU MASC ((7x, 7r)) 7_1) a near-lossless instantaneous MASC for Pe < e if ((7j_"ι7r).7-1) yields instantaneous decoding with Pe = Pτ(η~1(ηχ(X), ηγ(Y)) φ (X, Y)) < e. For instantaneous decoding in a near-lossless MASC, we require that for any input sequences xχ,x2, x3, .. - ε d y , y ,y3, ... with p(-cι, yι) > 0 the instantaneous decoder reconstructs some reproduction of (xχ,yχ) by reεiding no more and no less than the first |τχ:(a.ι)| bits from ηχ(xχ)ηχ(x2)ηχ(xa) ■ ■ ■ and the first |τr(yι)| bits from κ( ι)τκ( 2)7ιr( 3) ■ • ■ (without prior knowledge of these lengths). That is, we require that the decoder correctly determines the length of the description of each (x, y) with p(x, y) > 0 even when it incorrectly reconstructs the values of x and y. This requirement disεtllows decoding error propagation problems caused by loss of synchronization at the decoder.
Theorem 6 gives the near-lossless MASC prefix property. Recall that the notation ηγ(y) -< ηγ(y') means that ηγ(y) is a proper prefix of τκ(y'), disε-Uowing ηγ(y) = ηγ(y')-
Theorem 6 Partition pair (P(X),T(y)) can be used in a near-lossless instantaneous MASC on p(x, y) if and only if both of the following properties are satisfied:
(A) for any x, x' £ X such that ηχ(x) ηχ(x'), {τy(y) : y £ Ac I) Aχ>} is prefix free;
(B) for any x, x' £ X such that ηχ(x) = ηχ(x'), {τr(y) ■' y € Ac U Ac} is free of proper-prefixes.
Proof: If either condition (A) or condition (B) is not satisfied, then there exist symbols x, x' £ X and y, y' G CV, such that y, y' G Λ- U Ac>, and one of the foUowing is true: (1) ηχ(x) = ηχ(x') and γ(y) -< Tκ(y'); (2) 7r(y) = Tr(y') and ηχ(x) - ηχ(x'); (3) ηχ(x) - ηχ(x') and τ (y) -< ηγ(y')- In any of these cases, the decoder cannot determine where to stop decoding one or both of the binary descriptions by an argument like that in Lemma 6. The result is a code that is not instantaneous.
For the decoder to be unable to recognize when it has reached the end of ηχ(X) and ηy(Y), one of the foUowings must occur: (1) the decoder determines that X £ nx, but cannot determine whether or not Y E ny, (2) the decoder determines that Y E ny, but cannot determine whether or not X £ nx; (3) the decoder cannot determine whether or not X E nx or Y £ ny. FoUowing the argument used Lemma 6, each of these cases leads to a violation of either (A) or (B) (or both). □
Thus the neax-lossless prefix property differs from the lossless prefix property only in aUowing ηx x) = 7JC ') and ηy (y) = ηy (y1) when y , y' G ΛB U ΛB' • hi near-lossless side information coding of Y given X this condition simplifies as foUows. For εmy y,y' £ CV for which there exists an x £ X with p(x, y)p(x, y') > 0, ηγ(y) ηγ(y') is disaUowed (εis in lossless coding) but ηγ(y) = ηγ(y') is aUowed (this was disaUowed in lossless coding). In this case, giving y and y' descriptions 7r(y) -< Tr(y') would leave the decoder no means of determining whether to decode |τr(y)| bits or |τy(y')l °i*s- (The decoder knows only the value of x and both p(x, y) and p(x, y') are nonzero.) Giving y and y' descriptions ηγ(y) = ηγ(y') εdlows instantaneous (but not error free) decoding; εind the decoder decodes to the symbol with the given description that maximizes p(-|-c). In the more general case, if G^X G ) are the 1-level groups described by (ηχ(X), ηy(Y)), the above conditions aUow instεmtaneous decoding of the description of QW and G^ A decoding error occurs if and only if there is more than one pair of (x, y) E QW X ^ with p(x,y) > 0. In this case, the decoder reconstructs the symbols as
Figure imgf000122_0001
p(x,y).
B Decoding Error Probability and Distortion Analysis
As discussed in Section I, the benefit of neεir-lossless coding is a potential savings in rate. The cost of that improvement is the associated error penalty, which we quantify here.
By Lemma 6, εiny 1-level group G CV is a legitimate group in near-lossless side-information coding of Y given X. The minimal penalty for a code with ηγ(y) = Tr(y') for aU y,y' E G is
P ) = p(x, y)
Figure imgf000122_0002
This minim-d error penalty is achieved by decoding the description of G to y = ειrgma y'egp(-c, y') when X = x. Multi-level group Q = (7£ : C(R)) is a legitimate group for side-information coding of Y given X if and only if for any x E X and y £ K, y' E C(H) impUes p(x, y)p(x, y') = 0. In this case,
P ) = ∑ P.(a). neT(5) That is, the error penalty of a multi-level group equals the sum of the error penalties of the 1-level groups it contains. Thus for any partition P(y) satisfying the near-lossless MASC prefix property,
Figure imgf000123_0001
Simil-irly, given a partition V(X), a 1-level group G -Ξ CV is a legitimate group for a general near-lossless MASC given V(X) if for any y, y' E G, y and y' do not both belong to ΛB U ΛE' f r any x, x' such that ηχ(x) -< ηχ(x'). A multi-level group G = (7t : C(R ) on CV is a legitimate group for a general near-lossless MASC if and aU members of C(T ) are legitimate, and for any y G 7£ and y' G C(R), y and y' do not both belong to Λc U E' for any x, x' such that x is a prefix of -c'.
For ε y pε-ir of nodes nx E T(V(X)) and ny G 7"(P(CV)), the minimal penalty for (nχ, ny) is
Pe(nΛf, ny) = ∑ ρ(x,y) - r max p(x,y). (χ,y)enx χαy (*,y)e» Xx»y
Decoding the description of nx and ny to argmax-.enΛ.)yeny {p(&, y)} gives this minimal error penalty. Thus the minimal penalty for using partition pair (V(X), V(y)) satisfying the near-lossless
MASC prefix property is
Pe(V(X), V(y)) = ∑ Pe(n , ny). nxeT{v{x))^yeT{V{y)) Since near-lossless coding may be of most interest for use in lossy coding, probabiUty of error may not always be the most useful measure of performance in a near-lossless code. In lossy codes, the increase in distortion caused by decoding errors more directly measures the impact of the error.
We next quεmtify this impact for a fixed distortion measure d(α, ά) > 0. If d is the Hamming distortion, then the distortion analysis is identical to the error probabiUty analysis.
In side information coding of Y given X, the minimal distortion penalty for 1-level group G is
Figure imgf000123_0002
This value is achieved when the description of G is decoded to argminygg ∑yeg p(x, y)d(y, y) when X = x. Thus for any partition 77(CV) satisfying the near-lossless MASC prefix property, the distortion penalty associated with using this near-lossless code rather than a lossless code is
D(V(y)) = ∑ D(n). ne y)) In generεil near-lossless MASC coding, the corresponding distortion penεdty for any partition (P{X),V(y)) that satisfies the near-lossless MASC prefix property is
D(V(X), T(y)) ∑ ∑ min ∑ P(x, y)[d(x, x) + d(y, y)]. nxercp{x)) nyeTCPW) xen"yea* -.en., .yen-,
C Partition Design
In near-lossless coding, any combination of symbols creates a legitimate 1-level group G (with some associated error Pe(G) or D(G))- Thus one way to approach near-lossless MASC design is to consider aU combinations of 1-level groups that yield an error within the aUowed error limits, in each case design the optimal lossless code for the reduced εdphabet that treats each such 1-level group as a single symbol xg (xg X if
Figure imgf000124_0001
> 1), and finaUy choose the combination of groups that yields the lowest expected rates. Considering aU combinations of groups that meet the error criterion guarantees an optimal solution since any near-lossless MASC can be described as a lossless MASC on a reduced alphabet that represents each lossy 1-level group by a single symbol.
For example, given a 1-level group G = (xi, ■ - • , xm) Q X, we can design a near-lossless MASC with error probabiUty Pe(G) by designing a lossless MASC for alphabets X = X C\ {xχt .. . , xm}c U {xg} and y and p.m.f.
Figure imgf000124_0002
Thus designing a near-lossless MASC for p(x,y) that uses only one lossy group Q is equivalent to designing a lossless MASC for the probabiUty distribution p(x,y), where the matrix describing p(x, y) cεin be achieved by removing from the matrix describing p(x, y) the rows for symbols xx, . .. , xm £ G and adding a row for xg. The row associated with xg equals the sum of the rows removed. SimUarly, building a near-lossless MASC using 1-level group Q C y is equivalent to building a lossless MASC for a p.m.f. in which we remove the columns for aU y G G and include a column that equals the sum of those columns.
Multiple (non-overlapping) 1-level groups in or CV can be treated similarly. In using groups Gι, G2 C X, the error probabiUty adds, but in using groups Gx X and Qy Q CV the effect on the error probabiUty is not necessarily additive. For example, if Gx = (xι, - - - , %m) and Gy = (yi, ■ • • i Vk) then the error penalty is
Pe(Gx,Gy) = ∑ y)-maxp(a.,y) + ∑ ∑ p(χ, y) - maxp(x, y) yey-c
Figure imgf000125_0001
xeR j xζ -A Vyeσ ye°
+ ∑∑ p( . y) - ma* . p(χ, y) xeky c X R'^C where R — {xx, ... , xm} and C = y, ... , yk}. Since using just Gx gives
,
Figure imgf000125_0002
and using just Gy gives
P Gy) = ,
Figure imgf000125_0003
we have e(^, Gy) = Pβ({?Ar) + -Pe(-?y) - *(&. , Sy), where
δ(Gx,Gy) = ∑ ∑p(z,y)+ max _p(x,y) - + ∑ maxp(a;,y) (9) xβRyec xeR>yea
Figure imgf000125_0004
xeRyeC J is not necessarily equal to zero. GeneraUzing the above results to multiple groups Gxi,- --,Gx M and Gy,ι, ■■■ , Gy,κ corresponding to row εmd column sets {-Ri, R , ■ - - , RM} and { χ, C, ..., CK} respectively gives total error penalty
Pe({Gx,ι, x,2, ■-, GX,M}, {Gy,χ, y,2, ■■-, Gy,κ})
M K M K
= ∑Pe(9χ,i) + ∑ P Gyj) - ∑ ∑ δ(Gχ, Gyd). (10)
»=ι j=ι i=ι i=ι
Here
M K
Pe({Gx,ι, Gχ,2, -.., GX,M}, {Gy,ι,Gy,2, -.., Gy,κ}) ≥ m f∑PJ^), ∑ Pe(Gy,j)}- t=l j-
Using these results, we give our code design algorithm as foUows.
In near-lossless coding of source X given side information Y, we first ma_ke a Ust C e of aU lossy 1-level groups of X that result in error at most e (the given constraint). (The earlier-described lossless MASC design algorithm will find aU zero-error 1-level groups.) Then a subset S e of Cx e such that SχjC is non-overlapping ε-nd result in error at most e is a combination of lossy 1-level groups with total error at most e. For each Sχ>e, obtain the reduced alphabet X and p.m.f p(x, y) by representing each group Q £ Sχ>e by a single symbol xg as we described earUer. Then perform lossless side information code design of X on p(x, y). After aU subsets Sχ,e are traversed, we can find the lowest rate for coding that results in error at most e. Near-lossless coding of Y with side information X cε be performed in a similar fashion.
To design general near-lossless MASCs of both and Y, we first make a Ust Cχ)C of aU 1-level groups of X that result in error at most e, and a Ust CyιC of aU 1-level groups of CV that result in error at most e. (We include zero-error 1-level groups here, since using two zero-error 1-level groups Gx -Ξ X and Gy Q CV together may result in non-zero error penalty.) Second, we make a Ust Csx t = 0 U {«-> ,_ C CχtC : Sχ>e is non-overlapping, Pe(Sχιe) < e} of aU combinations of 1-level groups of X that yield an error at most e, and a Ust sy <, = 0U{«Syι£ C Cyιe : SyιC is non-overlapping, Pe(«->y)e) < e} of aU combinations of 1-level groups of CV that yield an error at most e. (We include 0 in the lists to include side information coding in generεd coding.) Then for each pair (Sxte, Sy>e), we calculate the corresponding 5 value and the total error penalty using formula (9) and (10). If the total error penalty is no more than e, we obtain the reduced alphabet X, y and p.m.f. p(x, y) described by (Sχ>s, Syιe), then perform lossless MASC design on p(x, y). After aU pairs of (Sχ,ε, «5y,e) £ CsX e X CsyιC axe traversed, we can trace out the lower boundary of the achievable rate region.
V Experimental Results
This section shows optimal coding rates for lossless side-information MASCs, lossless general MASCs, and near-lossless general MASCs for the example of Table 1. We achieve these results by building the optimal partitions and matched codes for each scenario, as discussed in Sections II, III, and IV. Both Huffman and arithmetic coding rates are included. We also compare the side-information results to the results of [3].
Table 2 gives the side-information results. Here H(X) εmd RH(X) are the optimal and Huffman rate for source X when X is coded independently. We use [H(Y), R'SI A(Y), RgI jχ Y)] and [R (Y), R'SI H(Y)> RSI JΪ 001 to den te the optimal and Huffman results respectively for [traditional, [3] side-information, and our side-information] coding on Y. The partition trees achieving these results are shown in Figure 5. The rate achievable in coding Y using side-information X is approximately half that of an ordinεiry Huffman code and 90% that of [3].
Figure imgf000127_0001
Table 2: Lossless side-information coding results for the exεimple of Table 1. root root root
0 o (a.j θ.7) ( 4 a6) a3 a^ a (α0 α4) (αj α2 α7) (α3 α^ (α5)
CL (α0 α4) α.
(a) (b) , αc (c)
Figure 5: Partition trees for (a) R I>A(Y)\ 0>) R*SI,H(Y); (<=) R'SI,A(Y) and fl^tf (^)-
Figure 6 shows general lossless and lossy MASC results. The optimal lossless MASC gives significant performance improvement with respect to independent coding of X and Y but does not achieve the Slepian- Wolf region. By aUowing error probabiUty 0.01 (which equals τoxa.x,yp(x, y), i.e. the smaUest error probabiUty that may result in different rate region than in lossless coding), the achievable rate region is greatly improved over lossless coding, showing the benefits of near-lossless coding. By aUowing error probabiUty 0.04, we get approximately to the Slepian- Wolf region for this example.
VI Summary
This paper demonstrates that the optimal lossless and near-lossless MASC design problems can be broken into two sub-problems: partition design and matched code design. The partition of an MASC describes the prefix and equivalence relationships for the code's binary descriptions. We give necessary and sufficient conditions on these pεirtitions for instantaneous and lossless (or near- lossless) decoding and describe a variety of properties of the optimεd partition that decrease the complexity associated with the search for the optimal partition. We demonstrate the relationship between optimal matched codes and traditional (single-sender, single-receiver) source codes, and use this relationship to give optimal matched code design algorithms. When combined these results characterize lossless and near-lossless side-information and general MASCs and yield a means of searching for the optimal codes of those types for an arbitrary source p.m.f. p(x,y). Experimental results based on this algorithm are consistent with the theory of MASCs and demonstrate its
Figure imgf000128_0001
MASC: code built by our algorithm IND: decode X and Y independently
• • • • MASC with Huffman coding (n=l) IND with Huffman coding (n=l) — - MASC optimal performance *-si -*-*- IND optimal performance
θ -θ -θ θ Slepian- Wolf rate region
Figure 6: General lossless and near-lossless MASC results. feasibiUty in optimal code design on smaU alphabets. While the computational complexity of the search algorithms described in this paper is high, the problem is NP-hard [8], and thus low complexity optimal algorithms are not available. Low complexity approximations to the optimal design procedure for general distributions are the topic of ongoing reseεirch.
References
[1] D.Slepian and J.K.Wolf. Noiseless coding of correlated information sources. IEEE Transactions on Information Theory, IT-19(4):471-480, July 1973.
[2] H. S. Witsenhausen. The zero-error side information problem and chromatic numbers. IEEE Transactions on Information Theory, 22:592-593, 1976.
[3] A. Kh. Al Jabri and S. Al-Issa. Zero-error codes for correlated information sources. In Proceedings of Cryptography, pages 17-22, Cirencester,UK, December 1997.
[4] S. S. Pradhan εmd K. Ramchandran. Distributed source coding using syndromes (DISCUS) design and construction. In Proceedings of the Data Compression Conference, pages 158-167, Snowbird, UT, March 1999. IEEE.
[5] Y. Yan and T. Berger. On instantaneous codes for zero-error coding of two correlated sources. In Proceedings of the IEEE International Symposium on Information Theory, page 344, Sorrento, Italy, June 2000. IEEE.
[6] Q. Zhao and M. Effros. Optimal code design for lossless and near-lossless source coding in multiple access networks. In Proceedings of the Data Compression Conference, pages 263-272, Snowbird, UT, March 2001. IEEE.
[7] Q. Zhao and M. Effros. Lossless source coding for multiple access networks. In Proceedings of the IEEE International Symposium on Information Theory, Wεishington, DC, June 2001. IEEE. Submitted, October 2000.
[8] P. Koulgi, E. Tuncel, S. Regunathan, and K. Rose. Minimum redundε cy zero-error source coding with side information. In Proceedings of the IEEE International Symposium on Information Theory, Washington DC, USA, June 2001. IEEE.

Claims

1. A method for encoding and decoding first and second data streams comprising: encoding said first data stream using a first encoder to produce a first encoded data stream; encoding said second data stream using a second encoder to produce a second encoded data stream; providing said first and second encoded data streams to a receiver; decoding said first and second encoded data streams using a single decoder.
2. The method of claim 1 wherein said encoding and decoding are lossless.
3. The method of claim 1 wherein said encoding and decoding are near-lossless.
4. The method of claim 1 wherein said receiver is provided one of said first and second data streams as side-information.
5. The method of claim 4 wherein encoding of said second stream satisfies a prefix
condition and said prefix condition is satisfied for a code γγ for Y given X when
for each X EX, and each y,y £- .. the description ofy is not a prefix of the
description of y '.
6. The method of claim 5 wherein said code γγ is a matched code.
7. The method of claim 6 wherein said code γY is an instantaneous, side-information
matched code for p(χ, y) when ηy is a matched code for some partition VQA) for
p(χ, y)-
8. A method of generating code comprising: obtaining an alphabet of symbols generated by a data source; identifying combinable symbols of said alphabet and generating subsets of combinable symbols;
identifying optimal partitions of said subsets of symbols to generate a list of groups; using said list of groups to generate partitions of the full alphabet.
9. The method of claim 8 further comprising determining a matched code for each partition.
10. The method of claim 8 further comprising selecting a partition whose matched code has a best rate.
11. The method of claim 8 wherein said matched code comprises a Huffman code.
12. The method of claim 8 wherein said matched code comprises an arithmetic code.
13. The method of claim 8 wherein symbols y\ , y2 £ -V can be combined under p(x, y)
ifp(x, y )p(x, 2/2) = 0 for each x E λ.
14. The method of claim 13 wherein for each symbol a set Cy is generated.
15. The method of claim 13 further including the step of identifying all non-empty subsets for each set Cy.
16. The method of claim 8 wherein a partition is complete and nonoverlapping if T(y) = {Gx, G2, .. -, Gm} satisfies l£. Gi = and Gjf]Gk = </> for any j where each £ € T^ ) is a
group for p(χ, y) , and _7 U fc and -7j n k refer to the union and intersection respectively of the
members of Gj and k.
17. The method of claim 8 wherein said coding scheme is a lossless coding scheme.
18. The method of claim 8 wherein said coding scheme is a near-lossless coding scheme.
19. The method of claim 8 wherein said coding scheme is a side-information, lossless coding scheme.
20. The method of claim 8 wherein said coding scheme is a side-information, near- lossless coding scheme.
21. A method of code for X and Y comprising: generating a partition pair V(X) and VQA) such that each partition is a legitimate partition for a side-information, lossless decoding scheme; identifying said partition pair as a legitimate partition for general lossless decoding if the two descriptions together give enough information to decode X and 7 uniquely.
22. The method of claim 21 wherein said partition pair is a legitimate partition pair when for any χ, χ' £ X such that {ηχ(χ), ηχ(x')} does not satisfy the prefix condition,
{ηy(y) : y Ax U Aχl} satisfies the prefix condition.
23. The method of claim 21 wherein said partition pair is a legitimate partition pair when for any y , y' £ CV such that {ηγ(y), ηγ(y')} does not satisfy the prefix condition,
{ηx(χ) ■ x By U By,} satisfies the prefix condition.
24. A method for generating a MASC code comprising: generating instantaneous code by: generating subtrees Tx and T descending from nodes n and y (including nx and ny
respectively).
25. The method of claim 24 further comprising satisfying one of the following conditions; (A) X £ Tx or ny is a leaf implies that YE ny , and YE Ty or nx is a leaf implies that
X nx;
(B) X E Tx implies that Y ny,
(C) YE Ty implies that X < nx.
26. The method of claim 25 wherein said instantaneous code is lossless when: generating code such that for any ( , y) E X x CV with fj., y) > 0, final nodes (nx, ny )
are generated that satisfy;
(D)(x, y) E nx x ny and for any other x' E nx and y' £ ny, p(x, y') = p(x',y) = p(x', y') = Q
27. A method of generating code comprising: obtaining an alphabet of symbols generated by a data source determining which of said symbols can have identical code descriptions and which symbols cannot have identical code descriptions;
28. The method of claim 27 further including determining which of said symbols can have code descriptions for which one symbols's code description is a prefix of another symbol's code description.
29. A method of generating code for data sources X and'Y having data rates Rx and Ry respectively, comprising:
generating a code that minimizes λRx + (1 - λ)Rγ for an arbitrary value of λ .
30. The method of claim 29 wherein λ e [0, 1] .
31. A method for encoding and decoding a plurality of data streams comprising: encoding said plurality of data streams using a plurality of encoders to produce a plurality of encoded data streams; providing said plurality of encoded data streams to a receiver; decoding said plurality of encoded data streams using a single decoder.
32. The method of claim 31 wherein said encoding and decoding are lossless.
33. The method of claim 31 wherein said encoding and decoding are near-lossless.
34. The method of claim 31 wherein said decoding is accomplished using side- information.
35. A method of designing codes comprising: obtaining an alphabet of symbols generated by a data source; ordering said alphabet of symbols; identifying restrictions of a class of codes based on said ordering of said alphabet; desigriing code for said restricted class for said ordering of said alphabet.
36. The method of claim 35 wherein said restrictions include a requirement that symbols be adjacent symbols.
37. The method of claim 35 further including the step of selecting an ordering of said alphabet based on generating code for a plurality of orderings.
38. The method of claim 37 wherein an ordering is selected based on a best rate resulting from one of said orderings.
PCT/US2002/003146 2001-01-30 2002-01-30 Lossless and near-lossless source coding for multiple access networks WO2002061948A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002253893A AU2002253893A1 (en) 2001-01-30 2002-01-30 Lossless and near-lossless source coding for multiple access networks

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US26540201P 2001-01-30 2001-01-30
US60/265,402 2001-01-30
US30160901P 2001-06-27 2001-06-27
US60/301,609 2001-06-27

Publications (2)

Publication Number Publication Date
WO2002061948A2 true WO2002061948A2 (en) 2002-08-08
WO2002061948A3 WO2002061948A3 (en) 2003-09-18

Family

ID=26951184

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/003146 WO2002061948A2 (en) 2001-01-30 2002-01-30 Lossless and near-lossless source coding for multiple access networks

Country Status (3)

Country Link
US (1) US7187804B2 (en)
AU (1) AU2002253893A1 (en)
WO (1) WO2002061948A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100840757B1 (en) * 2006-02-28 2008-06-23 노키아 코포레이션 Huffman coding and decoding

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10218541A1 (en) 2001-09-14 2003-04-24 Siemens Ag Context-adaptive binary arithmetic video coding, e.g. for prediction error matrix spectral coefficients, uses specifically matched context sets based on previously encoded level values
US20030236793A1 (en) * 2002-06-19 2003-12-25 Ericsson Inc. Compressed prefix tree structure and method for traversing a compressed prefix tree
US7415243B2 (en) 2003-03-27 2008-08-19 Honda Giken Kogyo Kabushiki Kaisha System, method and computer program product for receiving data from a satellite radio network
US8041779B2 (en) 2003-12-15 2011-10-18 Honda Motor Co., Ltd. Method and system for facilitating the exchange of information between a vehicle and a remote location
US7849149B2 (en) 2004-04-06 2010-12-07 Honda Motor Co., Ltd. Method and system for controlling the exchange of vehicle related messages
US7818380B2 (en) 2003-12-15 2010-10-19 Honda Motor Co., Ltd. Method and system for broadcasting safety messages to a vehicle
EP1751713A1 (en) * 2004-05-18 2007-02-14 Koninklijke Philips Electronics N.V. Image processing system for automatic segmentation of a 3-d tree-like tubular surface of an object, using 3-d deformable mesh models
US7643788B2 (en) 2004-09-22 2010-01-05 Honda Motor Co., Ltd. Method and system for broadcasting data messages to a vehicle
US7660475B2 (en) * 2004-12-22 2010-02-09 Ntt Docomo, Inc. Method and apparatus for coding positions of coefficients
US7148821B2 (en) * 2005-02-09 2006-12-12 Intel Corporation System and method for partition and pattern-match decoding of variable length codes
US7256716B2 (en) * 2005-03-01 2007-08-14 The Texas A&M University System Data encoding and decoding using Slepian-Wolf coded nested quantization to achieve Wyner-Ziv coding
US7295137B2 (en) * 2005-03-01 2007-11-13 The Texas A&M University System Data encoding and decoding using Slepian-Wolf coded nested quantization to achieve Wyner-Ziv coding
US7779326B2 (en) * 2005-03-01 2010-08-17 The Texas A&M University System Multi-source data encoding, transmission and decoding using Slepian-Wolf codes based on channel code partitioning
US8140261B2 (en) 2005-11-23 2012-03-20 Alcatel Lucent Locating sensor nodes through correlations
EP1950684A1 (en) * 2007-01-29 2008-07-30 Accenture Global Services GmbH Anonymity measuring device
EP1965497B8 (en) * 2007-03-01 2011-10-05 Sisvel Technology Srl Distributed arithmetic coding method
US7668653B2 (en) 2007-05-31 2010-02-23 Honda Motor Co., Ltd. System and method for selectively filtering and providing event program information
US8099308B2 (en) 2007-10-02 2012-01-17 Honda Motor Co., Ltd. Method and system for vehicle service appointments based on diagnostic trouble codes
US8230217B2 (en) * 2008-10-13 2012-07-24 International Business Machines Corporation Method and system for secure collaboration using slepian-wolf codes
US8627483B2 (en) * 2008-12-18 2014-01-07 Accenture Global Services Limited Data anonymization based on guessing anonymity
US8131770B2 (en) * 2009-01-30 2012-03-06 Nvidia Corporation System, method, and computer program product for importance sampling of partitioned domains
US8682910B2 (en) 2010-08-03 2014-03-25 Accenture Global Services Limited Database anonymization for use in testing database-centric applications
US11128935B2 (en) * 2012-06-26 2021-09-21 BTS Software Solutions, LLC Realtime multimodel lossless data compression system and method
US9325639B2 (en) 2013-12-17 2016-04-26 At&T Intellectual Property I, L.P. Hierarchical caching system for lossless network packet capture applications
US10158738B2 (en) * 2014-12-22 2018-12-18 Here Global B.V. Optimal coding method for efficient matching of hierarchical categories in publish-subscribe systems
US11528490B2 (en) 2018-04-13 2022-12-13 Zhejiang University Information preserving coding and decoding method and device
CN109982086B (en) * 2019-04-10 2020-12-08 上海兆芯集成电路有限公司 Image compression method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142283A (en) * 1989-07-28 1992-08-25 International Business Machines Corporation Arithmetic compression coding using interpolation for ambiguous symbols
US5534861A (en) * 1993-04-16 1996-07-09 International Business Machines Corporation Method and system for adaptively building a static Ziv-Lempel dictionary for database compression
US5751859A (en) * 1995-06-14 1998-05-12 Lucent Technologies Inc. Compression of text images by soft pattern matching
US5764374A (en) * 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US5818877A (en) * 1996-03-14 1998-10-06 The Regents Of The University Of California Method for reducing storage requirements for grouped data values
US5848195A (en) * 1995-12-06 1998-12-08 Intel Corporation Selection of huffman tables for signal encoding
US5959560A (en) * 1997-02-07 1999-09-28 Said; Amir Data compression via alphabet partitioning and group partitioning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6771831B2 (en) * 2001-11-16 2004-08-03 California Institute Of Technology Data compression method and system using globally optimal scalar quantization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142283A (en) * 1989-07-28 1992-08-25 International Business Machines Corporation Arithmetic compression coding using interpolation for ambiguous symbols
US5534861A (en) * 1993-04-16 1996-07-09 International Business Machines Corporation Method and system for adaptively building a static Ziv-Lempel dictionary for database compression
US5751859A (en) * 1995-06-14 1998-05-12 Lucent Technologies Inc. Compression of text images by soft pattern matching
US5848195A (en) * 1995-12-06 1998-12-08 Intel Corporation Selection of huffman tables for signal encoding
US5764374A (en) * 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US5818877A (en) * 1996-03-14 1998-10-06 The Regents Of The University Of California Method for reducing storage requirements for grouped data values
US5959560A (en) * 1997-02-07 1999-09-28 Said; Amir Data compression via alphabet partitioning and group partitioning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100840757B1 (en) * 2006-02-28 2008-06-23 노키아 코포레이션 Huffman coding and decoding

Also Published As

Publication number Publication date
WO2002061948A3 (en) 2003-09-18
US20020176494A1 (en) 2002-11-28
US7187804B2 (en) 2007-03-06
AU2002253893A1 (en) 2002-08-12

Similar Documents

Publication Publication Date Title
WO2002061948A2 (en) Lossless and near-lossless source coding for multiple access networks
Duhamel et al. Joint source-channel decoding: A cross-layer perspective with applications in video broadcasting
Jiang et al. Multiple description coding via polyphase transform and selective quantization
JP5313362B2 (en) High speed parsing of variable length fixed length code
US20050008238A1 (en) Adaptive variable length decoding method
TW200838323A (en) Memory efficient coding of variable length codes
US7944377B2 (en) Method, medium and apparatus for quantization encoding and de-quantization decoding using trellis
Sugiura et al. Optimal Golomb-Rice code extension for lossless coding of low-entropy exponentially distributed sources
Effros Distortion-rate bounds for fixed-and variable-rate multiresolution source codes
KR20220127261A (en) Concepts for coding neural network parameters
Bai et al. Distributed multiple description coding: principles, algorithms and systems
Shirani et al. An achievable rate-distortion region for multiple descriptions source coding based on coset codes
Zhang et al. Successive coding in multiuser information theory
ES2219589T3 (en) DATA COMPRESSION.
Xu et al. Layered Wyner–Ziv video coding for transmission over unreliable channels
US8751914B2 (en) Encoding method of TLDPC codes utilizing treillis representations of the parity check equations and associated encoding device
Maugey et al. Incremental coding for extractable compression in the context of massive random access
Hu et al. Progressive significance map and its application to error-resilient image transmission
Lakovic et al. An algorithm for construction of efficient fix-free codes
CN108809334B (en) Sequence determination method, device and equipment
Tseng et al. Construction of symmetrical reversible variable length codes using backtracking
US7193542B2 (en) Digital data compression robust relative to transmission noise
TW202320493A (en) Transition encoder and method for transition encoding with flexible word-size
Baccaglini et al. A flexible RD-based multiple description scheme for JPEG 2000
CN108737017A (en) The method, apparatus and communication equipment of information processing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP