US 20060039555 A1 Abstract The present invention provides permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in cryptography, multimedia and other applications. The permute instructions are based on a Benes network comprising two butterfly networks of the same size connected back-to-back. Intermediate sequences of bits are defined that an initial sequence of bits from a source register are transformed into. Each intermediate sequence of bits is used as input to a subsequent permutation instruction. Permutation instructions are determined for permitting the initial source sequence of bits into one or more intermediate sequence of bits until a desired sequence is obtained. The intermediate sequences of bits are determined by configuration bits. The permutation instructions form a permutation instruction sequence of at least one instruction. At most 21gr/m permutation instructions are used in the permutation instruction sequence, where r is the number of k-bit subwords to be permuted, and m is the number of network stages executed in one instruction. The permutation instructions can be used to permute k-bit subwords packed into an n-bit word, where k can be 1, 2, . . . , or n bits, and k*r=n.
Claims(13) 1-65. (canceled) 66. A system of performing an arbitrary permutation of a source sequence of bits in a programmable processor comprising:
means for defining an intermediate sequence of bits that said source sequence of bits is transformed into using butterfly network stages and inverse butterfly network stages; means for determining a permutation instruction for transforming said source sequence of bits into one or more intermediate sequence of bits until a desired sequence of bits is obtained, wherein each intermediate sequence of bits is used as input to the subsequent permutation instruction and the determined permutation instructions form a permutation instruction sequence and configuration bits are used in said permutation instruction for determining movement of said source sequence of bits in said source register to said intermediate sequence of bits or movement of said intermediate sequence of bits into a destination register or a source register; and means for storing said configuration bits and means for retrieving said stored configuration bits for use in said permutation instruction. 67. A system of performing an arbitrary permutation of a source sequence of bits in a programmable processor comprising:
means for defining an intermediate sequence of bits that said source sequence of bits is transformed into using Benes network stages and inverse butterfly network stages; means for determining a permutation instruction for transforming said source sequence of bits into one or more intermediate sequence of bits until a desired sequence of bits is obtained, wherein each intermediate sequence of bits is used as input to the subsequent permutation instruction and the determined permutation instructions form a permutation instruction sequence and configuration bits are used in said permutation instruction for determining movement of said source sequence of bits in said source register to said intermediate sequence of bits or movement of said intermediate sequence of bits into a destination register or a source register; and means for storing said configuration bits and means for retrieving said stored configuration bits for use in said permutation instruction. 68. A method of performing an arbitrary permutation of a source sequence of bits in a programmable processor comprising the steps of:
a. defining an intermediate sequence of bits that said source sequence of bits is transformed into using one or more network stages selected from the group consisting of Benes network stages, butterfly network stages, and inverse network stages; and b. determining one or more permutation instructions for transforming said source sequence of bits into said intermediate sequence of bits, wherein configuration bits are used in said one or more permutation instructions for determining movement of said source sequence of bits in a source register to said intermediate sequence of bits or movement of said intermediate sequence of bits into a destination register or a second intermediate sequence of bits. 69. The method of repeating steps a. and b. using said determined intermediate sequence of bits from step b. as said source sequence of bits in step a. until a desired sequence of bits is obtained, the determined permutation instructions form a permutation instruction sequence. 70. The method of 71. The method of c. storing said configuration bits; and d. retrieving said stored configuration bits. 72. The method of determining a subsequent permutation instruction using said retrieved configuration of bits. 73. The method of d. storing a portion of said configuration bits; and e. retrieving said stored portions of said configuration bits. 74. The method of determining a subsequent permutation instruction using said retrieved configuration portion of said configuration bits. 75. The method of 76. The method of 77. The method of Description 1. Field of the Invention The present invention relates to a method and system for performing arbitrary permutations of a sequence of bits in a programmable processor by determining a permutation instruction based on butterfly networks. 2. Description of the Related Art The need for secure information processing has increased with the increasing use of the public internet and wireless communications in e-commerce, e-business and personal use. Typical use of the internet is not secure. Secure information processing typically includes authentication of users and host machines, confidentiality of messages sent over public networks, and assurances that messages, programs and data have not been maliciously changed. Conventional solutions have provided security functions by using different security protocols employing different cryptographic algorithms, such as public key, symmetric key and hash algorithms. For encrypting large amounts of data, symmetric key cryptography algorithms have been used, see Bruce Schneier, “Applied Cryptography”, 2nd Ed., John Wiley & Sons, Inc., 1996. These algorithms use the same secret key to encrypt and decrypt a given message, and encryption and decryption have the same computational complexity. In symmetric key algorithms, the cryptographic techniques of “confusion” and “diffusion” are synergistically employed. “Confusion” obscures the relationship between the plaintext (original message) and the ciphertext (encrypted message), for example, through substitution of arbitrary bits for bits in the plaintext. “Diffusion” spreads the redundancy of the plaintext over the ciphertext, for example through permutation of the bits of the plaintext block. Such bit-level permutations have the drawback of being slow when implemented with conventional instructions available in microprocessors and other programmable processors. Bit-level permutations are particularly difficult for processors, and have been avoided in the design of new cryptography algorithms, where it is desired to have fast software implementations, for example in the Advanced Encryption Standard, as described in NIST, “Announcing Request for Candidate Algorithm Nominations for the Advanced Encryption Standard (AES)”, http://csrc.nist.gov/encryption/aes/pre-round1/aes Conventional techniques have also used table lookup methods to implement fixed permutations. To achieve a fixed permutation of n input bits with one table lookup, a table with 2 Permutations are a requirement for fast processing of digital multimedia information, using subword-parallel instructions, more commonly known as multimedia instructions, as described in Ruby Lee, “Accelerating Multimedia with Enhanced Micro-processors”, A few microprocessor architectures have subword rearrangement instructions. MIX and PERMUTE instructions have been implemented in the MAX-2 extension to Precision Architecture RISC (PA-RISC) processor, see Ruby Lee, “Subword Parallelism in MAX-2”, It is desirable to provide significantly faster and more economical ways to perform arbitrary permutations of n bits, without any need for table storage, which can be used for encrypting large amounts of data for confidentiality or privacy. The present invention provides permutation instructions which can be used in software executed in a programmable processor for solving permutation problems in both cryptography and multimedia. For fast cryptography, bit-level permutations are used, whereas for multimedia, permutations on subwords of typically 8 bits or 16 bits are used. Permutation instructions of the present invention can be used to provide any arbitrary permutation of sixty-four 1-bit subwords in a 64-bit processor, i.e., a processor with 64-bit words, registers and datapaths, for use in fast cryptography. The permutation instructions of the present invention can also be used for permuting subwords greater than 1 bit in size, for use in fast multimedia processing. For example, in addition to being able to permute sixty-four 1-bit subwords in a register, the permutation instructions and underlying functional unit can permute thirty-two 2-bit subwords, sixteen 4-bit subwords, eight 8-bit subwords, four 16-bit subwords, or two 32-bit subwords. The permutation instructions of the present invention can be added as new instructions to the Instruction Set Architecture of a conventional microprocessor, or they can be used in the design of new processors or coprocessors to be efficient for both cryptography and multimedia software. The method for performing permutations is by constructing a Benes interconnection network. This is done by executing a certain number of stages of the Benes network with permute instructions. The permute instructions are performed by a circuit comprising Benes network stages. Intermediate sequences of bits are defined that an initial sequence of bits from a source register are transformed into. Each intermediate sequence of bits is used as input to a subsequent permutation instruction. Permutation instructions are determined for permuting the initial source sequence of bits into one or more intermediate sequence of bits until a desired sequence is obtained. The intermediate sequences of bits are determined by configuration bits. The permutation instructions form a permutation instruction sequence. At most 1gn permutation instructions are used in the permutation instruction sequence. In an embodiment of the present invention, multibit subwords are permuted by eliminating pass-throughs in the Benes network. In a further embodiment of the invention, the method and system are scaled for performing permutations of 2n bits in which subwords are packed into two or more registers. In this embodiment, at most 41gn+2 instructions are used to permute 2n bits using n-bit words. For a better understanding of the present invention, reference may be made to the accompanying drawings. Reference will now be made in greater detail to a preferred embodiment of the invention, an example of which is illustrated in the accompanying drawings. Wherever possible, the same reference numerals will be used throughout the drawings and the description to refer to the same or like parts. A Benes network can be used to perform permutations of n bits with edge-disjoint paths using intermediate states. The Benes network can be formed by connecting two butterfly networks of the same size back-to-back. An example of an 8-input Benes network is shown in An n-input Benes network can be broken into 21gn stages, 1gn of them are distinct. The number of node in each stage is n. A node is defined as a point in the network where the path selection for an input takes place. In each stage of a butterfly network, for every input, there is another input that shares the same two outputs with it. Such pairs of inputs can be referred to as “conflict inputs” and their corresponding outputs can be referred to as “conflict outputs”. The distances between conflict pairs in one stage of the Benes network are the same. The distances between conflict pairs are different in different stages. In the implementation of method In a preferred embodiment of the invention, the instruction format for the permutation instruction can be defined as:
wherein m
The CROSS instruction can be added to the Instruction Set Architecture of conventional microprocessors, digital signal processor (DSP), cryptographic processor, multimedia processor, media processors, programmable System-on-a-Chips (SOC), and can be used in developing processors or coprocessors for providing cryptography and multimedia operation. In particular, the CROSS instruction can permute sixty-four 1-bit subwords in a 64-bit processor for use in, for example,encryption and decryption processing using software. The CROSS instruction can also permute multi-bit subwords as described below, for example, thirty-two 2-bit subwords, sixteen 4-bit subwords, eight 8-bit subwords, four 16-bit subwords or two 32-bit subwords in a 64-bit processor for use for example in multimedia processing. Each node Left half of configuration bits R During the first basic operation, node Each of nodes During the second basic operation, node A method for implementing CROSS instructions to do arbitrary permutations is shown in 1. “Inputs” and “outputs” refer to the inputs and outputs of current Benes network. Starting from the first input that is not configured, referred to as “current input”, set the “end input” to be the conflict input of the “current input”. If all “inputs” have already been configured, go to Step 4. 2a. Connect “current input” to the sub-network “sub1” that is on the same side as “current input”. Connect the output that has the same value as “current input”, to sub 2b. Connect “current input” to the sub-network “sub1” such that “sub1” is not “sub2”. Connect the output that has the same value as “current input”, to sub 3. Connect “current output” to sub-network “sub2” such that “sub2” is not “sub1”. Also connect the input that has the same value as “current output”, call it “input (current output)”, to “sub2”. If “input (current output)” is the same as “end input”, go back to Step 1. Otherwise set “current input” to the conflict input of “input (current output)” and go to Step 2b. 4. At this point, all the “inputs” and “outputs” have been connected to the two sub-networks. If the configuration of the two sub-networks is trivial, i.e. n=2, the configuration is done. Otherwise for each sub-network, treat it as a full Benes network and repeat the steps beginning at Step 1. In block For example, a Benes network configured for the permutation (abcdefgh)→(fabcedhg) is shown in A schematic diagram of a method for permuting multi-bit subwords The CROSS instruction can be used to permute subwords packed into more than one register. If a register is n bits, two registers are 2n bits. The CROSS instructions can be used for 2n-bit permutations by using an instruction such as the SHIFT PAIR instruction in PA-RISC, as described in Ruby Lee, “Precision Architecture”, In block In block In another embodiment of the invention, two or more different butterfly stages are combined in one stage of the implementation. In an alternate embodiment of using system In an alternate embodiment using system The CROSS instruction, in any of the above described embodiments, can be used by itself, rather than in a sequence of instructions. The CROSS instruction generates a subset of all possible permutations. A permutation performed by a single CROSS instruction can be reversed by reversing the order of the stages used in the CROSS instruction with the configuration bits for each stage being the same as for the original permutation. For example, the permutation achieved by CROSS, Horizontal and vertical track counts and transistor counts have been calculated for a circuit implementation of CROSS instruction based on the Benes network of the present invention and are compared to a circuit implementation of a cross bar network for 8-bit and 64-bit permutations in Table 2. The numbers in Table 2 are computed as follows: For the CROSS instruction implementation, the following relationships are used,
Transistors=2n1gn×The 2n horizontal tracks come from the 2 input lines in each node. The number of horizontal tracks is composed of two parts: n/2 configuration lines per stage for the 21gn stages, and the number of data tracks needed between adjacent stages, which is 2×(2n−2) in total. The 8n1gn transistors are from 4=8n1gn 4 transistors in each cell for 2n1gn cells. For implementation of an 8-input crossbar network as shown in From these equations, it is shown that when n is large, the CROSS instructions yield the smaller size. As shown in table 2, the CROSS circuit implementation yields much smaller transistor count and reasonable track counts for permutations of 64 bits. Accordingly, it yields more area-efficient implementation. Control logic circuits for generating the configuration signals, which are more complex for the crossbar than for CROSS, were not counted.
Table 3 shows a comparison of the number of instructions needed for permutations of a 64-bit word with different subword sizes for method
It is to be understood that the above-described embodiments are illustrative of only a few of the many possible specific embodiments which can represent applications of the principles of the invention. Numerous and varied other arrangements can be readily devised in accordance with these principles by those skilled in the art without departing from the spirit and scope of the invention. Referenced by
Classifications
Rotate |