Publication number | US7043494 B1 |

Publication type | Grant |

Application number | US 10/353,723 |

Publication date | May 9, 2006 |

Filing date | Jan 28, 2003 |

Priority date | Jan 28, 2003 |

Fee status | Paid |

Publication number | 10353723, 353723, US 7043494 B1, US 7043494B1, US-B1-7043494, US7043494 B1, US7043494B1 |

Inventors | Deepali Joshi, Ajit Shelat, Amit Phansalkar, Sundar Iyer, Ramana Kompella, George Varghese |

Original Assignee | Pmc-Sierra, Inc. |

Export Citation | BiBTeX, EndNote, RefMan |

Patent Citations (12), Non-Patent Citations (6), Referenced by (62), Classifications (14), Legal Events (4) | |

External Links: USPTO, USPTO Assignment, Espacenet | |

US 7043494 B1

Abstract

A combined hash table/bucket trie technique facilitates fast, deterministic, memory-efficient exact match look-ups on extremely large tables. A limited number of hash keys which collide on the same location can be stored in the hash table. If further keys collide on the same location, a bucket trie is formed, the colliding keys are stored in the trie, and trie traversal information is stored in the hash table. Regardless of the number of buckets in the trie, an input key need only be compared with the keys in one bucket to detect a stored key identical to the input key or conclude that no stored key is identical to the input key.

Claims(17)

1. A look-up method comprising:

providing a data structure comprising:

(a) a hash table containing up to H cells, each one of said cells having a cell type indicator containing one of a first cell type value and a second cell type value; and

(b) an overflow bucket table containing up to B_{max }buckets;

wherein:

(i) any one of said cells containing said first cell type value in said cell type indicator further comprises:

(1) k key storage fields, each one of said k key storage fields for storing one of said keys;

(2) k flow index fields, each one of said k flow index fields for storing flow index information for a key stored in a corresponding one of said k key storage fields;

(ii) any one of said cells containing said second cell type value in said cell type indicator further comprises:

(1) n-1 key bit position fields, each one of said key bit position fields for storing a bit position value representative of a bit position in any one of said keys;

(2) n bucket index fields, each one of said bucket index fields for storing an index value representative of an address displacement of a corresponding one of said buckets within said overflow bucket table;

(iii) any one of said buckets further comprises:

(1) m key storage fields, each one of said m key storage fields for storing one of said keys;

(2) m flow index fields, each one of said m flow index fields for storing flow index information for a key stored in a corresponding one of said m key storage fields;

(iv) any particular one of said cells having said second cell type value in said type field corresponds to up to n of said buckets organized in a trie having up to n-1 intermediate nodes and up to n leaf nodes, each one of said intermediate nodes corresponding to one of said key bit position fields of said particular one of said cells, said bit position value stored in said corresponding one of said key bit position fields for selecting a binary digit at a corresponding bit position in any one of said keys, said binary digit for determining a branch direction at said intermediate node corresponding to said one of said key bit position fields;

(v) each one of said up to n of said buckets organized in said trie corresponds to one of said leaf nodes; and,

(vi) H, k, B_{max}, n and m are integers, and

determining whether said data structure contains a key which exactly matches an input key K.

2. A look-up method as defined in claim 1 , further comprising:

(a) applying a hash function F(·) to said input key K to produce a hash index I=F(K) corresponding to one of said hash table cells;

(b) if said cell type indicator of said one of said hash table cells contains said first cell type value, comparing said input key K with each key stored in said one of said hash table cells to determine whether any one of said keys stored in said one of said hash table cells exactly matches said input key K;

(c) if none of said keys stored in said one of said hash table cells exactly matches said input key K, terminating said look-up method by indicating that said data structure does not contain a key which exactly matches said input key K;

(d) if one of said keys stored in said one of said hash table cells exactly matches said input key K, terminating said look-up method by indicating which one of said keys stored in said one of said hash table cells exactly matches said input key K;

(e) if said cell type indicator of said one of said hash table cells contains said second cell type value, traversing said trie intermediate nodes, commencing at a root node of said trie, by branching in said branch directions determined by said bit position values stored in said key bit position fields corresponding to said respective intermediate nodes until one of said leaf nodes corresponding to one of said buckets is reached;

(f) comparing said input key K with each key stored in said one of said buckets to determine whether any one of said keys stored in said one of said buckets exactly matches said input key K;

(g) if none of said keys stored in said one of said buckets exactly matches said input key K, terminating said look-up method by indicating that said data structure does not contain a key which exactly matches said input key K; and,

(h) if one of said keys stored in said one of said buckets exactly matches said input key K, terminating said look-up method by indicating which one of said keys stored in said one of said buckets exactly matches said input key K.

3. A look-up method as defined in claim 2 , wherein said hash function F(·) is selected so that, for input key any K, F(K) produces one of H hash indices.

4. A look-up method as defined in claim 3 , wherein said hash function F(·) is an H3 hash function.

5. A look-up method as defined in claim 3 , wherein said hash function F(·) is a cyclic redundancy check hash function.

6. A look-up method as defined in claim 2 , further comprising:

(a) if one of said keys stored in said one of said hash table cells exactly matches said input key K, terminating said look-up method by retrieving from said one of said hash table cells containing said key exactly matching said input key K said flow index information for said stored key which exactly matches said input key K; and,

(b) if one of said keys stored in said one of said buckets exactly matches said input key K, terminating said look-up method by retrieving from said one of said buckets containing said key exactly matching said input key K said flow index information for said stored key which exactly matches said input key K.

7. A look-up method as defined in claim 3 , wherein said trie is a balanced trie.

8. A look-up method as defined in claim 3 , further comprising storing said data structure in an auxiliary memory device not containing stored instructions for performing said look-up method.

9. A look-up method as defined in claim 8 , further comprising performing said method by programmable logic operations executed by and stored in a processor electronically coupled to said auxiliary memory device.

10. A look-up method as defined in claim 9 , further comprising:

(a) storing said hash table in a first group of storage locations within said auxiliary memory device;

(b) storing said overflow bucket table in a second group of storage locations within said auxiliary memory device;

(c) transferring data from said first group of storage locations to said processor during a first burst operation; and,

(d) transferring data from said second group of storage locations to said processor during a second burst operation immediately following said first burst operation.

11. A look-up method as defined in claim 9 , further comprising storing said hash table and said overflow bucket table within said auxiliary memory device to facilitate performance of a data read operation on said hash table during a first clock cycle, followed by performance of a data read operation on said overflow bucket table during a second clock cycle immediately following said first clock cycle.

12. A look-up method as defined in claim 2 , wherein:

(a) said comparing of said input key K with each key stored in said one of said hash table cells further comprises simultaneously comparing said input key K with each key stored in said one of said hash table cells; and,

(b) said comparing of said input key K with each key stored in said one of said buckets further comprises simultaneously comparing said input key K with each key stored in said one of said buckets.

13. A look-up method as defined in claim 9 , wherein each data read operation performed on said auxiliary memory device retrieves a bounded amount of data from said auxiliary memory device irrespective of the number of said keys for which said hash function F(·) produces an identical one of said hash indices.

14. A look-up method as defined in claim 2 , wherein said integer number of keys is at least 1,000,000.

15. A look-up method as defined in claim 9 , wherein:

(a) said integer number of keys is at least 1,000,000; and,

(b) said auxiliary memory device has a random access time no greater than 30 nanoseconds.

16. A computer-readable medium encoded with a data structure for storing an integer number of keys, said computer-readable medium comprising:

(a) a hash table containing up to H cells, each one of said cells having a cell type indicator containing one of a first cell type value and a second cell type value; and

(b) an overflow bucket table containing up to B_{max }buckets;

wherein:

(i) any one of said cells containing said first cell type value in said cell type indicator further comprises:

(1) k key storage fields, each one of said k key storage fields for storing one of said keys;

(2) k flow index fields, each one of said k flow index fields for storing flow index information for a key stored in a corresponding one of said k key storage fields;

(ii) any one of said cells containing said second cell type value in said cell type indicator further comprises:

(1) n-1 key bit position fields, each one of said key bit position fields for storing a bit position value representative of a bit position in any one of said keys;

(2) n bucket index fields, each one of said bucket index fields for storing an index value representative of an address displacement of a corresponding one of said buckets within said overflow bucket table;

(iii) any one of said buckets further comprises:

(1) m key storage fields, each one of said m key storage fields for storing one of said keys;

(2) m flow index fields, each one of said m flow index fields for storing flow index information for a key stored in a corresponding one of said m key storage fields;

(iv) any particular one of said cells having said second cell type value in said type field corresponds to up to n of said buckets organized in a trie having up to n-1 intermediate nodes and up to n leaf nodes, each one of said intermediate nodes corresponding to one of said key bit position fields of said particular one of said cells, said bit position value stored in said corresponding one of said key bit position fields for selecting a binary digit at a corresponding bit position in any one of said keys, said binary digit for determining a branch direction at said intermediate node corresponding to said one of said key bit position fields;

(v) each one of said up to n of said buckets organized in said trie corresponds to one of said leaf nodes; and,

(vi) H, k, B_{max}, n and m are integers.

17. A computer-readable medium as defined in claim 16 , wherein said integer number of keys is at least 1,000,000.

Description

The invention provides a method of performing fast, deterministic, memory-efficient, exact match look-up operations on large tables such as TCP/IP flow tables containing millions of 104-bit {SIP, DIP, SP, DP, protocol} 5-tuple keys.

High speed networked packet switching communications systems commonly use table look-up operations to match a field or a set of fields against a table of entries. For example, Internet protocol (IP) routing operations commonly apply longest prefix match (LPM) comparison techniques to perform forwarding table look-ups. Access control list (ACL) filtering operations commonly apply masked match comparison techniques involving one or more fields to perform ACL table look-ups. Network address translation (NAT) operations also apply masked match comparison techniques involving one or more fields to perform NAT table look-ups. So-called “exact match” table look-ups, in which a multiple-field input key is matched against a table of keys to determine whether the table contains a key which exactly matches the input key, are commonly used in performing asynchronous transfer mode virtual packet identifier and virtual circuit identifier (ATM VPI/VCI) table look-ups, multi-protocol label switching (MPLS) label look-ups, transmission control protocol/Internet protocol (TCP/IP) flow identification look-ups, etc.

As one example, TCP/IP flow identification involves a flow table look-up in which a 5-tuple key, consisting of a packet's source IP address (SIP), destination IP address (DIP), source port (SP), destination port (DP) and protocol fields is used to identify packet “flow.” For example, the 5-tuple {192.1.4.5, 200.10.2.3, 21, 1030, tcp} corresponds to SIP 192.1.4.5, DIP 200.10.2.3, SP **21**, DP **1030** and protocol tcp. An input 5-tuple key is compared with 5-tuple keys stored in a flow table to determine whether one of the keys stored in the table exactly matches the input key. Each key in the table is stored with other information which can be retrieved and utilized to obtain packet flow information, if a stored key corresponding to the input key is located.

Flow tables can be very large. For example, a flow table suitable for use at an OC-48 line rate (2.4 Gbps) typically contains millions of entries. To accommodate such line rates the exact match table look-up operation must be extremely fast and it should be deterministic in the sense that the look-up operation should have a very high probability of successfully locating an input key in a very large table within a short finite interval on the order of a few nanoseconds.

A variety of hardware and software approaches have been used to perform exact match look-up operations on large tables, with varying degrees of efficacy. One hardware solution utilizes content addressable memory (CAM) or ternary content addressable memory (TCAM). CAM/TCAM devices facilitate extremely fast, parallel (i.e., simultaneous) look-up operations on all keys stored in a table. However, table size is limited, due to the relatively high power consumption of CAM/TCAM devices and due to the cost and complexity of apparatus incorporating the large number of CAM/TCAM devices required to contain even modestly large table. Consequently, currently available CAM/TCAM devices are not well suited to exact match look-up operations on very large tables. Hardware based trie/tree walking techniques are also impractical, since their performance is reduced by the requisite large number of memory accesses and by relatively long table update times. Single and multilevel hashing techniques are also commonly used to perform exact match look-up operations on large tables, but such techniques are constrained by the fact that the look-up operation is non-deterministic (due to the fact that hashing techniques normally use a linear walk to differentiate between collided entries), by inefficient use of memory and by relatively low performance.

The invention combines hash and trie techniques to facilitate fast, deterministic, memory-efficient exact match look-up operations on extremely large tables. For example, exact match look-ups can be cost-effectively performed on extremely large TCP/IP flow identification look-up tables at OC-192 line rates (10 Gbps). A limited number of hash keys which collide on the same location can be stored in the hash table. If further keys collide on the same location, a bucket trie is formed, the colliding keys are stored in the trie, and trie traversal information is stored in the hash table. Regardless of the number of buckets in the trie, an input key need only be compared with the keys in one bucket to detect a stored key identical to the input key or conclude that no stored key is identical to the input key. Look-up time is bounded by two memory burst accesses of fixed size.

Although the invention is of general application, a number of practical requirements are specifically satisfied, including exact match TCP/IP flow table look-ups utilizing {SIP, DIP, SP, DP, protocol} 5-tuple keys. Arbitrarily large bit-size keys are accommodated, in that the invention is readily adaptable to key sizes ranging from a few bits to hundreds of bits. Extremely large look-up tables having millions of entries necessitating storage in high-density off-chip auxiliary memory devices are accommodated. Look-up times are bounded and sufficiently deterministic to accommodate exact match look-up operations at OC-192 line rates using commercially available auxiliary memory devices. Dynamic table updates (key insertion and deletion) are relatively simple. The required hardware logic is relatively simple to implement, and is relatively small thus minimizing power requirements and integrated circuit surface area. Although particularly well suited to exact match look-up operations on extremely large tables, the invention can also be advantageously applied to perform exact match look-up operations on smaller tables.

**3**B and **3**C together provide a simplified flow chart depiction of the sequence of operations performed in storing keys.

**0**” hash table cell in accordance with the invention.

**1**” hash table cell in accordance with the invention.

**0** hash table cell containing two 6-bit keys. **0** cell into a type **1** cell associated with a bucket containing the two

Throughout the following description, specific details are set forth in order to provide a more thorough understanding of the invention. However, the invention may be practiced without these particulars. In other instances, well known elements have not been shown or described in detail to avoid unnecessarily obscuring the invention. Accordingly, the specification and drawings are to be regarded in an illustrative, rather than a restrictive, sense.

Introduction

The object is to determine whether an input key is present in (i.e. exactly matches a key in) a table storing an extremely large number, potentially millions, of keys. Keys are initially stored in a hash table having H cells. Each cell stores up to k keys. A hash function is applied to the input key, producing a hash index corresponding to a cell in the table. If that cell contains no more than k keys a primary look-up procedure compares the keys in that cell with the input key to determine whether one of the keys in that cell exactly matches the input key.

“Hash collisions” render hashing non-deterministic. For purposes of this invention, a hash collision occurs if more than k different input keys produce the same hash index. In such case, the cell's maximum storage capacity of k keys is insufficient to store all different “colliding” keys for which the hash function produces the same hash index. If a hash collision occurs, a deterministic trie mechanism is used to redistribute the colliding keys into “buckets.” Each bucket stores a maximum of m keys. A secondary look-up procedure identifies the bucket corresponding to the input key, and compares the keys stored in that bucket with the input key to determine whether one of the keys in that bucket exactly matches the input key. The hash function, the trie mechanism and their associated parameters can be selected to improve the efficiency of the secondary look-up procedure.

The invention is described in terms of these parameters:

K | input key |

T | maximum number of keys to be stored (this is applica- |

tion specific; total memory-key storage-capacity M | |

can be much greater than T) | |

H | total number of cells in the hash table |

w_{H} |
bit width of one hash table cell |

k | maximum number of keys storable in one hash table cell |

B_{max} |
maximum number of buckets in the overflow bucket |

table | |

w_{B} |
bit width of one bucket |

m | maximum number of keys storable in one bucket |

n | maximum number of buckets associated with one hash |

table cell | |

BSELx | key bit position containing bit value to be used in deter- |

mining direction for branching from trie node x | |

BIDy | bucket index: (BIDy*w_{B}) + overflow bucket table's |

initial memory address = bucket y's initial memory | |

address | |

F(·) | hash function |

I_{i} |
hash index produced by applying F(·) to key K_{i} |

I1, I2, . . . | flow indices stored with keys in hash table |

BI1, BI2, . . . | flow indices stored with keys in overflow bucket table |

B(·) | bucket selection function |

M | total memory requirement = ((H*k) + (B_{max}*m)) |

E | Memory usage efficiency = (T/M) |

Look-Up Procedure

As shown in **10**, denoted F(·), is applied to an input key K (**200**) to produce a hash index I=F(K) (block **202**). F(·) is selected so that, for any K, F(K) produces one of H possible hash indices. This allows each hash index produced by F(·) to correspond to one of a total of H cells in hash table **12**. The hash table cells are numbered from **0** to H-**1** as shown in _{H }bits wide and can store a maximum of k keys. **14** contains up to 2 keys).

Each cell in hash table **12** includes a type field containing a value which indicates whether the cell is a “type **0**” cell or a “type **1**” cell. Type **0** cells directly store up to k keys, which are directly retrievable from such cells by a primary look-up procedure. Type **1** cells do not store keys. Instead, each type **1** cell is associated with a trie. The trie (not shown in _{max }buckets in overflow bucket table **16**, as required for storage of newly received keys. Each w_{B }bit-wide bucket stores up to m keys. Instead of storing keys, type **1** cells store bucket indices and key bit position identifiers which are used by a secondary look-up procedure to identify the bucket in which a key exactly matching the input key may be stored. Since each type **1** cell can correspond to a maximum of n buckets, one type **1** cell can indirectly store up to n*m keys.

The primary look-up procedure is indicated by heavy, solid line arrows **18**, **20** in **0** cell in hash table **12** (**204**, “type **0**” output) such as cell **14**, then key comparator **22** compares (block **206**) the input key to the key(s) stored in that type **0** cell to determine whether any of the keys stored in that cell exactly match the input key. If a stored key exactly matching the input key is not located in that type **0** cell (block **206**, “no” output), then the primary look-up procedure terminates (block **208**) by indicating that no key matching the input key K has been found. If a stored key exactly matching the input key is located (block **206**, “yes” output), the primary look-up procedure terminates (block **210**), for example by causing key comparator **22** to output a match flag and a flow index corresponding to the stored key which exactly matches the input key. Each key has a uniquely corresponding flow index. These flow indices (designated I**1**, I**2**, etc. in the case of flow indices stored in hash table **12**, or designated BI**1**, BI**2**, etc. in the case of flow indices stored in overflow bucket table **16**) are stored with the keys in hash table **12** and in overflow bucket table **16**, and correspond only to one of the keys, not to the hash index I and not to the aforementioned bucket index.

The secondary look-up procedure is indicated by heavy dashed line arrows **24**, **26**, **28**, **30** in **1** cell in hash table **12** (**204**, “type **1**” output) such as cell **32**, then, as will be explained in more detail, a bucket selection function **34**, denoted B(·), uses bits of the input key pointed to by the key bit position identifiers to determine the bucket number (1 out of n) in the trie associated with that cell and uses it to select one of the n bucket indices stored in that cell to identify a bucket **36** in bucket table **16** which may contain a key exactly matching the input key (blocks **212**, **214**). **36** contains up to 4 keys). Key comparator **22** then compares (block **216**) the input key to the key(s) stored in that bucket to determine whether any of the key(s) stored in that bucket exactly match the input key. If a stored key exactly matching the input key is not located in that bucket (block **216**, “no” output), then the secondary look-up procedure terminates (block **208**) by indicating that no key matching the input key K has been found. If a stored key exactly matching the input key is located in that bucket (block **216**, “yes” output), the secondary look-up procedure terminates (block **210**), for example by causing key comparator **22** to output match flag and index values as previously mentioned.

As explained by Knuth, “A trie is essentially an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or symbols. Each node on level l represents the set of all keys that begin with a certain sequence of l symbols; the node specifies an M-way branch, depending on the (l+1)st symbol.” (See: D. E. Knuth, The Art of Computer Programming, Volume 3, Sorting and Searching, Addison Wesley, 1973, page 481). Any one of a variety of well known trie structures can be used to organize the buckets corresponding to type **1** cells. Trie structures, and algorithms for trie construction, trie traversal, trie expansion and trie compression, etc. are well known to persons skilled in the art and therefore need not be described in detail. Non-exhaustive examples of suitable trie structures are provided below. The bucket selection function B(·) chosen for use in a particular embodiment of the invention will depend upon the trie structure selected.

A CAM **38** can be provided to handle situations in which neither hash table **12** nor overflow bucket table **16** can store any further keys. That is, situations in which the hash index I=F(K) corresponds to a type **1** cell which corresponds to a trie having the maximum of n buckets, and in which every bucket corresponding to the bit values in the new key's bit positions identified by the type **1** cell's bit position identifiers already contains its maximum of m keys. Such situations can be minimized by appropriate selection of H, k, B_{max}, m and n. Given appropriate minimization of such situations by techniques familiar to persons skilled in the art, a small CAM **38** can store the relatively small number of “overflow” keys which cannot be stored in hash table **12** or overflow bucket table **16**.

CAM **38** is optional, as indicated by its dashed line representation in **38** is provided, the input key K is compared (**218**) to the keys stored in CAM **38** simultaneously with the application of F(·) to K to produce hash index I (block **202**). If such comparison identifies a key stored in CAM **38** which exactly matches the input key K (block **218**, “yes” output) then the look-up procedure terminates (block **210**), and match flag and index values are output as previously explained. Each key's uniquely corresponding index value is stored with the key in CAM **38**.

The look-up procedure is deterministic, due to the bounded nature of the worst-case look-up time which results when a type **1** cell is encountered. In this worst-case, the number of bits which must be fetched from hash table **12** and overflow bucket table **16** is fixed, as is the required number of compare operations. The bit width of each cell in hash table **12**, and the bit width of each bucket in overflow bucket table **16** are preferably selected so that the full bit width of a single cell or bucket can be fetched at wire-speed in one or more burst accesses to memory. Comparison operations can also be performed in parallel to further expedite the look-up procedure.

The invention is primarily concerned with the look-up operation of determining whether a key exactly matching an input key is stored in hash table **12** or in overflow bucket table **16**. However, to further assist persons skilled in the art in understanding the invention, a brief explanation of one manner in which keys can be inserted into or deleted from hash table **12** or overflow bucket table **16** is now provided. These explanations assume specific data structures for hash table **12** or in overflow bucket table **16** and are not intended to encompass all possible embodiments of the invention.

Key Insertion

Initially (i.e., before any keys are stored in hash table **12** or in overflow bucket table **16**) all cells in hash table **12** are type **0** cells. A new key K_{n }is inserted by applying F(·) to the new key, producing a hash index I_{n }corresponding to one of the cells (cell I_{n}) in hash table **12** (**300**, **302**). If cell I_{n }is a type **0** cell (block **304**, “type **0**” output) which does not already contain its maximum of k keys (block **306**, “no” output), then K_{n }is stored in cell I_{n }(block **308**), which remains a type **0** cell, and the key insertion process concludes successfully (block **310**).

If cell I_{n }is a type **0** cell (block **304**, “type **0**” output) which already contains its maximum of k keys (block **306**, “yes” output), K_{n }cannot be stored in cell I_{n}. Instead, overflow bucket table **16** is checked (block **312**) to determine whether it contains a bucket which is not already associated with other (type **1**) cells in hash table **12**. If overflow bucket table **16** contains no such bucket (block **312**, “no” output) then K_{n }cannot be stored and the key insertion process concludes unsuccessfully unless CAM **38** is provided as explained below with reference to _{max}, m and n. If overflow bucket table **16** contains such a bucket (block **312**, “yes” output) then that bucket is allocated for use by cell I_{n }(block **316**). The key data corresponding to the k keys stored in cell I_{n }is copied into that bucket (block **318**) and K_{n }is also stored in that bucket (block **320**). Cell I_{n }is then converted (block **322**) from type **0** to type **1** by overwriting its type field with the value “1”, by storing (block **324**) in a predefined field of cell I_{n }a bucket index corresponding to the displacement of the allocated bucket from the start of overflow bucket table **16**, and by initializing predefined key bit position data fields of cell I_{n}. As explained below, this facilitates subsequent reading and writing operations involving the bucket. The key insertion process then concludes successfully (block **310**).

If cell I_{n }is a type **1** cell (block **304**, “type **1**” output) then B(·) is applied, as hereafter explained, to the bucket index and key bit position data stored in cell I_{n }to identify, within the trie associated with cell I_{n}, a bucket which is appropriate for storage of K_{n }(**326**). If the identified bucket does not already contain its maximum of m keys (block **328**, “no” output), then K_{n }is stored in that bucket (block **330**), and the key insertion process concludes successfully (block **332**). If the identified bucket already contains its maximum of m keys (block **328**, “yes” output), and if the trie associated with cell I_{n }already has its maximum of n buckets such that the trie cannot be expanded further (block **334**, “no” output) then the keys stored in those n buckets cannot be redistributed by adding another bucket to the trie in order to make room for storage of K_{n}. The key insertion process therefore concludes unsuccessfully unless CAM **38** is provided as explained below with reference to

If the trie associated with cell I_{n }does not have its maximum of n buckets (block **334**, “yes” output) the key insertion process expands the trie associated with cell I_{n }in a manner which depends on the selected trie structure. **9**.

Overflow bucket table **16** is checked (block **338**) to determine whether it contains a bucket which is not already associated with another type **1** cell in hash table **12**. If overflow bucket table **16** contains no such bucket (block **338**, “no” output) then K_{n }cannot be stored and the key insertion process concludes unsuccessfully unless CAM **38** is provided as explained below with reference to **16** contains such a bucket (block **338**, “yes” output) then that bucket is allocated for use by cell I_{n }(block **340**) and is associated with one of the trie's leaf nodes (block **342**); the trie being expanded if necessary, as hereafter explained. K_{n }and the m keys already stored in the bucket identified in block **326** are then distributed, as hereafter explained, between the bucket identified in block **326** and the bucket allocated in block **340**, in accordance with a selected key bit position value which, if m is even as it usually is, preferably results in storage of m/2 of those keys in one of the two buckets and storage of the remaining (in/2)+1 keys in the other bucket (block **344**). Cell I_{n }is then updated (block **346**) by storing in other predefined fields of cell I_{n }the key bit position value selected to distribute the keys between the two buckets, and the address displacement by which the initial memory address location of the bucket allocated in block **340** is offset from overflow bucket table **16**'s initial memory address location. The key insertion process then concludes successfully (block **332**).

Returning to block **344**, persons skilled in the art will understand that there may not be a bit position value which results in the aforementioned preferred storage of m/2 keys in one of the two buckets and storage of the remaining (m/2)+1 keys in the other bucket. In such case, the best attainable key distribution may be achieved by selecting a key bit position value which results in storage of (m/2)−1 keys in one of the two buckets and storage of the remaining (m/2)+2 keys in the other bucket; or, which results in storage of (m/2)−2 keys in one of the two buckets and storage of the remaining (m/2)+3 keys in the other bucket; or, which results in some other unequal division of keys between the two buckets. The objective of the block **344** procedure is to select that key bit position value which minimizes the difference between the number of keys stored in each of the two buckets.

If the trie associated with cell I_{n }does not have its maximum of n buckets (block **334**, “yes” output), and if the full bucket identified by the block **326** operation is at the lowest level of the trie, then the trie cannot be expanded by adding a level beneath the trie node associated with the identified bucket. In such case, the trie can be reorganized by redistributing the keys using any one of a variety of techniques familiar to persons skilled in the art. One simple but potentially relatively slow approach is to rebuild the trie. Another more complex approach is to attempt to optimize the existing trie. In terms of implemenation, trie rebuilding can be algorithmically simpler than optimizing an existing trie. Trie rebuilding is also advantageous because it produces a balanced trie for the set of keys under consideration.

For example, assume that a total of N keys have collided on cell I_{n }(i.e. N keys are currently stored in the buckets which are currently in the trie associated with cell I_{n}). A simple trie-rebuilding approach can be implemented by iteratively considering K_{n }together with those N keys. During the first iteration the N+1 keys are examined bit-by-bit, commencing with the most significant bit, until a key bit position is located for which one group consisting of half (or approximately half) of the keys contains one binary digit value and for which a second group consisting of the remaining half (or approximately half) of the keys contains the opposite binary digit value. During subsequent iterations, if either group of keys produced by the immediately preceding iteration contains more than m keys then that group is similarly subdivided into two subgroups by locating a key bit position for which one subgroup consisting of ideally half (or approximately half) of the group's keys contains one binary digit value and for which a second subgroup consisting of the remaining half (or approximately half) of the group's keys contains the opposite binary digit value. This yields a plurality of key groups each containing less than m keys which can then be stored in buckets forming the rebuilt trie.

As indicated above, the key insertion procedure can produce a skewed trie structure with fewer than n buckets and with a full bucket identified by the block **326** operation at the lowest level of the trie. Rebuilding the trie in the foregoing manner rebalances the trie's structure. However, the trie rebuilding operation can be time consuming and is accordingly unsuited to usage during each key insertion operation in a high speed lookup implementation.

In any of the above cases in which the key insertion process concludes unsuccessfully, if optional CAM **38** has been provided (**348**, “yes” output), and if sufficient storage space remains in CAM **38** (block **350**, “no” output), then K_{n }is stored in CAM **38** (block **352**), and the key insertion process concludes successfully (block **354**). If CAM **38** has not been provided (block **348**, “no” output); or if CAM **38** has been provided (block **348**, “yes” output) but insufficient storage space remains in CAM **38** (block **350**, “yes” output), then K_{n }cannot be stored and the key insertion process concludes unsuccessfully (block **356**). As mentioned above, unsuccessful situations of this sort can be minimized by appropriate selection of H, k, B_{max}, m and n.

Key Deletion

A previously stored (“old”) key K_{o }is deleted by applying F(·) to K_{o}, producing a hash index I_{o }corresponding to one of the cells (cell I_{o}) in hash table **12** (**400**, **402**). If cell I_{o }is a type **0** cell (block **404**, “type **0**” output) then K_{o }is deleted from cell I_{o }(block **406**), which remains a type **0** cell, and the key deletion process concludes (block **408**). Simultaneously, if optional CAM **36** has been provided (block **410**, “yes” output), and if K_{o }is stored in CAM **36** (block **412**, “yes” output), then K_{o }is deleted from CAM **36** (block **414**), and the key deletion process concludes (block **408**). If cell I_{o }is a type **1** cell (block **404**, “no” output) then B(·) is applied, as hereafter explained, to the key bit position and bucket address offset values stored in cell I_{o }to identify the bucket corresponding to K_{o }(block **416**). K_{o }is then deleted from that bucket (block **418**).

Deletion of K_{o }may necessitate subsequent compression of the trie to maintain its structure. **9**. Trie compression can entail a variety of situations. For example, the block **418** K_{o }deletion operation may be performed on a bucket associated with a node having a parent node which is in turn associated with another bucket. If those two “node-related” buckets collectively contain less than m keys after K_{o }is deleted (block **420**, “yes” output) then the keys in those two buckets are merged (block **422**) into the bucket associated with the parent node. The bucket associated with the node beneath the parent node is then released (block **424**) for future use. If the parent node has a grandparent node, the trie can be traversed upwardly in similar repetitive fashion to further compress the trie until two node-related buckets which collectively contain m or more keys are located.

If deletion of K_{o }does not leave two node-related buckets which collectively contain less than m keys (block **420**, “no” output); or, after the block **422** and **424** trie compression operations are performed; then a test (**426**) is made to determine whether the trie associated with cell I_{n }now has only one bucket. If the test result is positive (block **426**, “yes” output), and if that bucket contains k or fewer keys (block **428**, “yes” output), then those k keys are copied (block **430**) from that bucket into cell I_{o}, cell I_{o }is converted to type **0** by overwriting its type field with the value “0” (block **432**), the bucket is released (block **434**) for future use, and the key deletion process concludes (block **436**).

The following example illustrates operation of the invention in an exact match TCP/IP flow table look-up application involving 104-bit wide {SIP, DIP, SP, DP, protocol} 5-tuple keys.

Hash table **12** is configured such that each cell has a w_{H}=256 bit storage capacity. This permits storage of two 104-bit keys per cell, hence k=2. The hash function F(·) is implemented as a so-called “H3” algorithm (see for example Ramakrishna et al, “Efficient Hardware Hashing Functions for High Performance Computers”, IEEE Transactions on Computers, December 1997, Vol. 46, No. 12, pp. 1378–1381). Overflow bucket table **16** is configured such that each bucket has a w_{B}=512 bit storage capacity. This permits storage of four 104-bit keys per bucket, hence m=4. The maximum number of buckets that can be associated with one hash cell is selected to be 8, hence n=8.

The total number of cells, H, in hash table **12** is selected to be (H=T/k=T/2) where T is the maximum number of keys which the application incorporating the invention may store in hash table **12** and overflow bucket table **16** combined. The total number of buckets, B_{max}, in overflow bucket table **16**, is selected to be (B=T/m=T/4). This requires total memory, M=2T, yielding memory efficiency, E=TIM of 50%. Other hash table configurations characterized by different values of H, k and other overflow bucket table configurations characterized by different values of B_{max}, m can be selected to provide a tradeoff between memory efficiency E and the probability of successful key insertion as described above with reference to

_{H}=256 bit wide type **0** hash table cell. The leftmost two bits (bit positions **254** and **255**) indicate the cell type, and contain the value “0” if the cell is a type **0** hash table cell. Since, in this example, k=2, and each 5-tuple key is 104-bits wide, the **0** cell has two 104-bit wide key fields, HK**1** and HK**2**. Key field HK**1** occupies bit positions **150** through **253** inclusive. Key field HK**2** occupies bit positions **23** through **126** inclusive. Two keys, with which key comparator **22** compares the input key as described above, can be stored in key fields HK**1** and HK**2** (i.e., a first key stored in HK**1** and a second key stored in HK**2**).

The **0** cell also stores two 23-bit wide flow indices, I**1** and I**2**. As previously mentioned, these flow indices correspond only to their respectively corresponding keys, not to the hash index I and not to the aforementioned bucket index. Flow index I**1**, which occupies bit positions **127** through **149** inclusive, contains a value which is dynamically assigned when a key is stored in key field HK**1**. Flow index I**2**, which occupies bit positions **0** through **22** inclusive, contains a value which is dynamically assigned when a key is stored in key field HK**2**. 23 bits suffice to store over 8 million flow indices (i.e. T is at most 2^{23}=8,388,608 in this example). The I**1**, I**2** fields are initialized by storing the hexadecimal value 7FFFFF (i.e., all 23 bits set to “1”) in them, this value being reserved for I**1**, I**2** fields corresponding to key fields in which valid keys have not yet been stored. Subsequently assigned (i.e. valid) I**1**, I**2** values are used to index into other data structures (not part of this invention and therefore not shown) containing information related to the keys stored in HK**1** and HK**2** respectively.

_{H}=256 bit wide type **1** hash table cell. The leftmost two bits (bit positions **254** and **255**) again indicate the cell type, and contain the value “1” if the cell is a type **1** hash table cell. Bit positions **232** through **253** inclusive are unused. The **1** cell has seven 8-bit fields BSEL**0**, BSEL**1**, BSEL**2**, BSEL**3**, BSEL**4**, BSEL**5**, and BSEL**6**, occupying bit positions **176** through **231** inclusive. As will be explained, in this example, each type **1** cell is associated with a trie having up to n-1=7 nodes. BSEL**0** corresponds to node **0**, BSEL**1** corresponds to node **1**, etc. Each “BSEL” field can store an 8-bit value representative of a bit position within a key. The binary digit in the key's represented bit position determines one of two directions for branching from the corresponding node. Thus, if the value “9” is stored in the BSEL**3** field, then the binary value of the bit occupying the key's 9th bit position determines the branch direction at node **3**. The BSEL fields are initialized by storing the hexadecimal value FF (i.e., all 8 bits set to “1”) in them, this value being reserved for BSEL fields which do not yet contain valid key bit position identifiers.

The **1** cell also has one 22-bit wide “BID” field for each one of the up to n=8 buckets that can be associated with the cell in this example. Specifically, the BID**0**, BID**1**, BID**2**, BID**3**, BID**4**, BID**5**, BID**6** and BID**7** fields occupy bit positions **0** through **175** inclusive. Each “BID” field stores a bucket index corresponding to the displacement of the associated bucket from the start of overflow bucket table **16**. This facilitates addressing of each bucket. For example, the initial memory address location of bucket **2** is obtained by adding B2OFFSET*w_{B }(since w_{B }is expressed in bits, this gives a bit offset, not a byte offset) to the (known) address of overflow bucket table **16**'s initial memory location, where B2OFFSET is the value stored in the BID**2** field. The BID fields are initialized by storing the hexadecimal value 3FFFFF (i.e., all 22 bits set to “1”) in them, this value being reserved for BID fields which do not yet contain valid offset values.

_{B}=512-bit wide bucket in overflow bucket table **16**. Comparison of **0** hash table cells. Specifically, the leftmost two bits of one of the bucket's two consecutive 256-bit segments (bit positions **510** and **511**; and, bit positions **254** and **255**) are unused. Each consecutive 256-bit segment of the bucket has two 104-bit wide key fields, namely BK**1**, BK**2** in the first segment and BK**3**, BK**4** in the second segment. Key field BK**1** occupies bit positions **406** through **509** inclusive. Key field BK**2** occupies bit positions **279** through **382** inclusive. Key field BK**3** occupies bit positions **150** through **253** inclusive. Key field BK**4** occupies bit positions **23** through **126** inclusive. Four keys, with which key comparator **22** compares the input key as described above, can be stored in key fields BK**1**, BK**2**, BK**3** and BK**4** (i.e., a first key in BK**1**, a second key in BK**2**, a third key in BK**3** and a fourth key in BK**4**).

Each 256-bit segment of the bucket also stores two 23-bit wide flow indices, namely BI**1**, BI**2** in the first segment and BI**3**, BI**4** in the second segment. As previously mentioned, these flow indices correspond only to their respectively corresponding keys, not to the hash index I and not to the aforementioned bucket index. Flow index BI**1**, which occupies bit positions **383** through **405** inclusive, contains a value which is dynamically assigned to a key inserted into key field BK**1**. Flow index BI**2**, which occupies bit positions **279** through **382** inclusive, contains a value which is dynamically assigned to a key inserted into key field BK**2**. Flow index BI**3**, which occupies bit positions **127** through **149** inclusive, contains a value which is dynamically assigned to a key inserted into key field BK**3**. Flow index BI**4**, which occupies bit positions **0** through **22** inclusive, contains a value which is dynamically assigned to a key inserted into key field BK**4**. The BI**1**, BI**2**, BI**3**, BI**4** fields are initialized by storing the hexadecimal value 7FFFFF (i.e., all 23 bits set to “1”) in them, this value being reserved for BI**1**, BI**2**, BI**3**, BI**4** fields corresponding to key fields in which valid keys have not yet been stored. Subsequently assigned (i.e. valid) BI**1**, BI**2**, BI**3**, BI**4** fields values are used to index into other data structures (unrelated to this invention and therefore not shown) containing information related to the keys stored in BK**1**, BK**2**, BK**3**, and BK**4** respectively.

Before continuing to explain the invention in the context of potentially unwieldy 104-bit keys, it is convenient to consider a simplified example assuming a key size of 6 bits, with hash table **12** configured to store two 6-bit keys per cell (k=2) and overflow bucket table **16** configured to store four 6-bit keys per bucket (m=4). **0** hash table cell **800**A in which two keys K**1**, K**2** having binary values K**1**=101001 and K**2**=010100 are stored with their respective indices I**1**, I**2**. Assume that a third key K**3** having a binary value K**3**=011011 is to be stored and that a hash collision results, in that the selected hashing algorithm produces identical hash indices corresponding to cell **800**A for each of K**1**, K**2** and K**3**. K**3** cannot be stored in cell **800**A because cell **800**A already contains its maximum of k=2 keys.

The next available bucket in overflow bucket table **16** is allocated for use by cell **800**A as bucket “**0**” as shown in **1**, K**2** are copied from cell **800**A into bucket **0** (with their respective indices I**1**, I**2**), and K**3** is also stored in bucket **0** (with its dynamically assigned index). Cell **800**A is then converted into a type **1** cell **800**B by replacing the value “0” in cell **800**A's type field with the value “1” and by storing the aforementioned “invalid” initialization values (all bits set to “1”) in each of cell **800**B's BSEL**0** through BSEL**6**, and BID**1** through BID**7** fields. Cell **800**B's BID**0** field is initialized by storing therein a bucket index value “B0OFFSET” representative of the address displacement by which bucket **0**'s initial memory address location is offset relative to overflow bucket table **16**'s initial memory address location. This facilitates addressing of bucket **0** and its contents in subsequent reading and writing operations.

Now assume that a fourth key K**4** having a binary value K**4**=010101 is to be stored and that a further hash collision results, in that the selected hashing algorithm produces identical hash indices corresponding to cell **800**B for each of K**1**, K**2**, K**3** and K**4**. Since only cell **800**B's BID**0** field contains a non-invalid value, it is apparent that only one bucket is associated with cell **800**B. The B0OFFSET value is retrieved from cell **800**B's BID**0** field and multiplied by w_{B }as aforesaid. The resultant product is added to the (predefined) address of bucket table **16**'s initial memory address location to obtain the address of bucket **0**'s initial memory address location. Bucket's **0**'s contents are then retrieved. Since bucket **0** does not contain its maximum of m=4 keys, K**4** is stored in bucket **0**, as shown in

Assume now that a fifth key K**5** having a binary value K**5**=111010 is to be stored and that a further hash collision results, in that the selected hashing algorithm produces identical hash indices corresponding to cell **800**B for each of K**1**, K**2**, K**3**, K**4** and K**5**. Since only cell **800**B's BID**0** field contains a non-invalid value, it is apparent that only one bucket is associated with cell **800**B. The B0OFFSET value is again used to obtain the address of bucket **0**'s initial memory address location. Bucket's **0**'s contents are then retrieved. Since bucket **0** contains its maximum of m=4 keys, K**5** cannot be stored in bucket **0**.

The next available bucket in overflow bucket table **16** is allocated for use by cell **800**B as bucket “**4**”; and, buckets **0** and **4** are organized in a trie **802** having a root node (“node **0**”) with buckets **0** and **4** each constituting one leaf of trie **802**, as shown in **802** is a balanced trie having a maximum L=3 levels with 2^{l }buckets per level, where l is the trie level. The buckets are designated **0**, (2^{L}/2^{l}), 2*(2^{L}/2^{l}), 3*(2^{L}/2^{l}), . . . , (2^{L}−1)*(2^{L}/2^{l}). The 2^{l=1}=2 buckets on level l=1 are thus designated {0, 2^{L}/2^{l}}={0, 4}; the 2^{l=2}=4 buckets on level l=2 are designated {0, (2^{L}/2^{l}), 2*(2^{L}/2^{l}), 3*(2^{L}/2^{l})}={0, 2, 4, 6}; and, the 2^{l=3}=8 buckets on level l=3 are designated {0, (2^{L}/2^{l}), 2*(2^{L}/2^{l}), 3*(2^{L}/2^{l}), . . . , 7*(2^{L}/2^{l})}={0, 1,2,3,4,5,6,7}. More generally, the number of trie levels L depends on the number of bits allocated in each type **1** cell for storage of key bit position identifiers (BSELx) and bucket indices (BIDy), and on the number of bits required per BSEL field and per BID field. The number of buckets at any trie level l≦L is 2^{l}−1. The buckets are designated, from left to right as i*(2^{L}/2^{l}) for i=0, . . . ,2^{l}−1.

Cell **800**B's BID**4** field is updated by storing therein a bucket index value “B4OFFSET” representative of the address displacement by which bucket **4**'s initial memory address location is offset relative to overflow bucket table **16**'s initial memory address location. This facilitates addressing of bucket **4** and its contents in subsequent reading and writing operations.

The m=4 keys which fill bucket **0** (i.e. K**1**, K**2**, K**3**, K**4**) and K**5** are redistributed between bucket **0** and bucket **4** by storing m/2 of those keys in one of buckets **0** or **4** and storing the remaining (m/2)+1 keys in the other one of those buckets. This is achieved by examining the m+1 keys bit-by-bit, commencing with the most significant bit, until a bit position is located for which m/2 keys contain one binary digit value and for which the remaining m+1−m/2 keys contain the opposite binary digit value. In this example m=4, so the object is to locate a bit position for which m/2=2 keys contain “0” and for which the remaining m+1−m/2=3 keys contain “1”; or, conversely, a bit position for which 2 keys contain “1” and for which the remaining 3 keys contain “0”. As can be seen, K**1** and K**5** each contain “1” in their most significant bit position; and, K**2**, K**3** and K**4** each contain “0” in their most significant bit position. Accordingly, the most significant bit position (i.e. bit position **6** in this example) satisfies the foregoing objective.

Since K**2**, K**3** and K**4** each contain “0” in bit position **6** they are left in bucket **0**. Since K**1** and K**5** each contain “1” in bit position **6** they are copied into bucket **4**. K**1** is then deleted from bucket **0**. The value “6” is stored in cell **800**B's BSEL**0** field, to denote the fact that bit position **6** must be used to determine how to branch from node **0** to either of buckets **0** or **4**. Specifically, if a key's bit position **6** contains the value “0” then the branch is made from node **0** to bucket **0** as indicated by the line labelled “0” in **6** contains the value “1” then the branch is made from node **0** to bucket **4** as indicated by the line labelled “1” in

Consider an alternative hypothetical situation in which the most significant bit of K**1**, K**2**, K**3** and K**4** contains “0” and in which the most significant bit of K**5** contains “1”. In this situation, the keys' most significant bit position does not satisfy the foregoing objective (i.e. no 2 keys have “0” in bit position **6** with the remaining 3 keys having “1” in bit position **6**; and, no 2 keys have “1” in bit position **6** with the remaining 3 keys having “0” in bit position **6**). Consequently, the next most significant bit (i.e. bit position **5**) of K**1**, K**2**, K**3**, K**4** and K**5** is examined to determine whether the values in bit position **5** satisfy the foregoing objective. If the values in bit position **5** satisfy the foregoing objective then those keys having the value “0” in bit position **5** remain in bucket **0**; those keys having the value “1” in bit position **5** are copied into bucket **4** and deleted from bucket **0**; K**5** is stored in bucket **0** if it has the value “0” in bit position **5** or stored in bucket **4** if it has the value “1” in bit position **5**; and, the value “5” is stored in cell **800**B's BSEL**0** field, to denote the fact that bit position **5** must be used to determine how to branch from node **0** to either of buckets **0** or **4**.

Reverting to the situation depicted in **6** having a binary value K**6**=010111 is to be stored and that a further hash collision results, in that the selected hashing algorithm produces identical hash indices corresponding to cell **800**B for each of K**1**, K**2**, K**3**, K**4**, K**5** and K**6**. Since cell **800**B's BID**0** and BID**4** fields contain non-invalid values, it is apparent that two buckets exist on trie **802**'s level l=1, namely bucket **0** and bucket **4**. Since cell **800**B's BSEL**0** field contains the value 6 it is apparent that bit position **6** must be used to determine how to branch from node **0** to either of buckets **0** or **4**. K**6**'s bit position **6** contains the value “0” so, as previously explained, the branch is made from node **0** to bucket **0**. The B0OFFSET bucket index value corresponding to bucket **0** is retrieved from cell **800**B's BID**0** field and used as previously explained to obtain the address of bucket **0**'s initial memory address location. Bucket's **0**'s contents are then retrieved. Since bucket **0** does not contain its maximum of m=4 keys, K**6** is stored in bucket **0**, as shown in

Now suppose that a seventh key K**7** having a binary value K**7**=010110 is to be stored and that a further hash collision results, in that the selected hashing algorithm produces identical hash indices corresponding to cell **800**B for each of K**1**, K**2**, K**3**, K**4**, K**5**, K**6** and K**7**. As previously explained, the non-invalid values in cell **800**B's BID**0** and BID**4** fields make it apparent that there are two buckets on trie **802**'s level l=1, namely bucket **0** and bucket **4**; and, the value **6** in cell **800**B's BSEL**0** field makes it apparent that bit position **6** determines the branch direction from node **0** to either of buckets **0** or **4**. K**7**'s bit position **6** contains the value “0”, so the branch is made from node **0** to bucket **0**. The B0OFFSET bucket index value is retrieved from cell **800**B's BID**0** field and used to obtain the address of bucket **0**'s initial memory address location as previously explained. Bucket's **0**'s contents are then retrieved. Since bucket **0** contains its maximum of m=4 keys, K**7** cannot be stored in bucket **0**. Bucket **4** does not contain its maximum of m=4 keys, but K**7** cannot be stored in bucket **4** because K**7**'s bit position **6** does not contain the value “1”. More particularly, the value “6” in cell **800**B's BSEL**0** field makes it apparent that bucket **4** can contain only keys having the value “1” in bit position **6**.

Trie **802** is accordingly expanded by adding a second node (“node **1**”) thereto, as shown in **16** is allocated for use by cell **800**B and designated bucket “**2**” as previously explained. Buckets **0** and **2** are each associated with leaf “node **1**” of trie **802**, as shown in **800**B's BID**2** field is updated by storing therein a value “B2OFFSET” representative of to the address displacement by which bucket **2**'s initial memory address location is offset relative to overflow bucket table **16**'s initial memory address location. This facilitates addressing of bucket **2** and its contents in subsequent reading and writing operations. The m=4 keys which fill bucket **0** (i.e. K**2**, K**3**, K**4**, K**6** as depicted in **7** are redistributed between bucket **0** and bucket **2** by storing m/2 of those keys in one of buckets **0** or **2** and storing the remaining (m/2)+1 keys in the other one of those buckets. As previously explained, this is achieved by examining the m+1 keys bit-by-bit, commencing with the most significant bit, until a bit position is located for which m/2 keys contain one binary digit value and for which the remaining m+1−m/2 keys contain the opposite binary digit value. The most significant bit position which satisfies the foregoing objective for K**2**, K**3**, K**4**, K**6** and K**7** is bit position **2**.

Since K**2** and K**4** each contain “0” in bit position **2** they are left in bucket **0**. Since K**3**, K**6** and K**7** each contain “1” in bit position **2** they are copied into bucket **2**. K**3** and K**6** are then deleted from bucket **0**. The value “2” is stored in cell **800**B's BSEL**1** field, to denote the fact that bit position **2** must be used to determine how to branch from node **1** to either of buckets **0** or **2**. Specifically, if a key's bit position **2** contains the value “0” then the branch is made from node **1** to bucket **0** as indicated by the line labelled “0” beneath node **1**, whereas if the same key's bit position **2** contains the value “1” then the branch is made from node **1** to bucket **2** as indicated by the line labelled “1” beneath node **1**.

Now assume that a look-up operation involving an input key K_{i }having a binary value K_{i}=010111 is to be performed on a hash table and overflow bucket table data structure incorporating cell **800**B and trie **802** as depicted in **800**B for K_{i}. The value **6** in cell **800**B's BSEL**0** field makes it apparent that bit position **6** determines the branch direction from node **0**. K_{i}'s bit position **6** contains the value “0”, so the branch is made from node **0** to node **1**. The value **2** in cell **800**B's BSEL**1** field makes it apparent that bit position **2** determines the branch direction from node **1**. K_{i}'s bit position **2** contains the value “1”, so the branch is made from node **1** to bucket **2**. The B2OFFSET value is retrieved from cell **800**B's BID**2** field and used to obtain the address of bucket **2**'s initial memory address location as previously explained. Bucket's **2**'s contents are then retrieved and compared with K_{i}, revealing that K_{i}=K**6**.

Note that although trie **802** depicted in **1** cell and regardless of the number of trie levels.

**802** for the foregoing example; namely a balanced trie having up to L=3 levels with 2^{l }buckets per level, where l is the trie level. Although **0** on each of the trie's three levels, persons skilled in the art will understand that at any given time only one instance of any particular bucket will be associated with the trie as the trie is expanded or compressed in accordance with well known trie formation algorithms to insert or delete keys.

Look-Up Procedure (Generalized)

A more detailed explanation of the invention's exact match look-up operational sequence is now provided with reference to **900**) to produce hash index I=F(K) (block **902**). If CAM **38** is provided, K is compared (block **904**) to the keys stored in CAM **38** simultaneously with the production of hash index I. If such comparison identifies a key stored in CAM **38** which exactly matches K (block **904**, “yes” output) then the flow index stored with that matching key is retrieved from CAM **38** (block **906**) and the look-up procedure concludes successfully (block **908**).

If K is not stored in CAM **38** (block **904**, “no” output) and if I=F(K) corresponds to a type **0** cell in hash table **12** (block **910**, “type **0**” output) then K is compared (block **912**) to the key(s) stored in that cell. If such comparison identifies a key stored in that cell which exactly matches K (block **914**, “yes” output) then the flow index stored with that matching key is retrieved from that cell (block **916**) and the procedure concludes successfully (block **908**). If such comparison does not identify a key stored in the type **0** cell corresponding to I=F(K) which exactly matches K (block **914**, “no” output), then the procedure concludes (block **918**) by indicating that no key matching the input key K is stored in hash table **12**, overflow bucket table **16** or CAM **38**.

If K is not stored in CAM **38** (block **904**, “no” output) and if I=F(K) corresponds to a type **1** cell in hash table **12** (block **910**, “type **1**” output) then a “current node” counter x and a “current bucket” counter are each initialized at zero (**920**). The value in the BSELx field (i.e. BSEL**0** if x=0) of the type **1** cell corresponding to I=F(K) is then retrieved (block **922**).

If the retrieved BSELx value is not invalid (block **924**, “no” output) then the value of the bit in K's bit position corresponding to the retrieved BSELx value is used to branch (block **926**) in one of two possible directions from the current node (i.e. node **0** if the current node counter still has its initial value of x=0). The current node counter x and current bucket counter values are then updated (block **928**) in a manner dependent upon the selected trie structure. The foregoing operations are then repeated, commencing with the block **922** operation, in which the BSELx field corresponding to the updated x value is retrieved.

If the retrieved BSELx value is invalid (block **924**, “yes” output) or if a trie leaf node is reached (BSEL fields are not provided for leaf nodes since there can be no branching from a leaf node) then K is compared (block **930**) to the key(s) stored in the current bucket (i.e. bucket **0** if the current bucket counter still has its initial value). If such comparison identifies a key stored in the current bucket which exactly matches K (block **932**, “yes” output) then the flow index stored with that matching key is retrieved from the current bucket (block **934**) and the procedure concludes successfully (block **936**). If such comparison does not identify a key stored in the current bucket which exactly matches K (block **932**, “no” output), then the procedure concludes (block **938**) by indicating that no key matching the input key K is stored in hash table **12**, overflow bucket table **16** or CAM **38**.

The invention is much less computationally intensive than would be the case if all colliding keys were redistributed each time a collision is detected. For example, as explained above with reference to **0** and **2** even though seven keys collided on hash cell **800**B. Redistribution of all colliding keys each time a collision is detected would optimize usage of buckets, but requires considerable memory bandwidth and is difficult to implement in hardware. Although the invention provides less optimal bucket usage, it is possible to minimize situations in which buckets are unavailable for key storage, as previously mentioned. Besides reducing computational intensity, the invention can be implemented in hardware and does not require much memory bandwidth.

Look-Up Performance

64-bit, 200 MHz double data rate, fast-cycle random access memory (DDR FCRAM) integrated circuit devices available from Toshiba America Electronic Components, Inc. Irvine, Calif. under model no. TC59LM814/06CFT can be used to store hash table **12** and overflow bucket table **16**. Since these are double data rate devices, a 64-bit wide FCRAM data interface needs only one burst of four transfers consuming two memory device clock cycles to read or write a hash table cell, assuming each hash table cell is w_{H}=256-bits wide. Similarly, assuming each bucket is w_{B}=512-bits wide, two four-transfer bursts, each requiring two memory device clock cycles can be used to read or write the contents of one 512-bit bucket. However, FCRAM device timing constraints permit access to a bank of such devices only once every five memory device clock cycles, so the best per look-up timing is 5 clock cycles or 25 ns.

Worst-case look-up timing (i.e. reading both hash table **12** and overflow bucket table **16**) requires 6 clock cycles, so the worst-case look-up time is 30 ns which is equivalent to a look-up rate of 33M look-ups per second. By comparison, performance of one look-up per packet at the OC-192 line rate of 10 Gbps requires a look-up rate of 25M look-ups per second, which is readily supported, with excess bandwidth remaining for dynamically updating the look-up table and performing memory refreshing. These performance estimates assume that hash table **12** and overflow bucket table **16** are stored in different banks of FCRAM devices; and, that each bucket entry is split into two separate banks, the first bank containing the entry's first 256 bits and the second bank containing the entry's remaining 256 bits. The aforementioned FCRAM integrated circuit devices each have four banks. An FCRAM integrated circuit device may have a 4, 8, 16 or 32-bit data interface. No FCRAM integrated circuit device having a 64 bit data interface is currently available. The 64-bit interface is obtained by using two FCRAM integrated circuit devices. Since FCRAM integrated circuit devices do not support 8-transfer bursts, two 4-transfer bursts are required as indicated above.

As will be apparent to those skilled in the art in the light of the foregoing disclosure, many alterations and modifications are possible in the practice of this invention without departing from the spirit or scope thereof. For example, the invention can be hardware implemented using field programmable gate array (FPGA) or application-specific integrated circuit (ASIC) technology. Instead of storing hash table **12** and overflow bucket table **16** in DDR FCRAM devices, one may store them in single data rate synchronous dynamic random access memory devices (SDRAM), reduced latency dynamic random access memory devices (RLDRAM), quadruple data rate static random access memory devices (QDR SRAM), etc. as appropriate to meet specific application requirements. Instead of using an H3 type hash function, other hash functions such as a cyclic redundancy check (CRC) type hash function can be used.

As another example, one may introduce a third type of hash cell capable of storing some (perhaps less than k) keys and also capable of storing some bucket offset and key bit position selector information sufficient to accommodate a few buckets organized in a small trie. If the trie grows beyond a certain size the third cell type will be incapable of storing sufficient bucket offset and key bit position selector information, at which point it can be converted to a type **1** cell. This alternate scheme uses buckets (and thus memory) more efficiently.

The dynamic table update procedure (i.e. key insertion and deletion) can be implemented in either software or hardware. The update procedure can be optimized as required to satisfy various speed and memory utilization trade-offs.

Type **1** hash cells can use different bit structures to accommodate different B(·) bucket selection functions using different trie structures such as skewed or multi-bit trie structures, and to accommodate additional buckets. For example, **0** in order to reach one of the four second level nodes, whereas only one bit is required to select a branch direction from one of the second level nodes in order to reach one of the buckets. To accommodate the **1** hash cell are associated with node **0** (i.e. one BSEL field for each one of two key bit positions) and one BSEL field is associated with each of the remaining nodes. ^{n }buckets in a binary trie of this sort.

Patent Citations

Cited Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|

US4866634 * | Aug 10, 1987 | Sep 12, 1989 | Syntelligence | Data-driven, functional expert system shell |

US5089952 * | Oct 7, 1988 | Feb 18, 1992 | International Business Machines Corporation | Method for allowing weak searchers to access pointer-connected data structures without locking |

US5404488 * | Oct 1, 1993 | Apr 4, 1995 | Lotus Development Corporation | Realtime data feed engine for updating an application with the most currently received data from multiple data feeds |

US6034958 | Jul 11, 1997 | Mar 7, 2000 | Telefonaktiebolaget Lm Ericsson | VP/VC lookup function |

US6097725 | Apr 16, 1998 | Aug 1, 2000 | International Business Machines Corporation | Low cost searching method and apparatus for asynchronous transfer mode systems |

US6226710 | Nov 14, 1997 | May 1, 2001 | Utmc Microelectronic Systems Inc. | Content addressable memory (CAM) engine |

US6597661 * | Aug 25, 1999 | Jul 22, 2003 | Watchguard Technologies, Inc. | Network packet classification |

US6598051 * | Sep 19, 2000 | Jul 22, 2003 | Altavista Company | Web page connectivity server |

US6690667 * | Nov 30, 1999 | Feb 10, 2004 | Intel Corporation | Switch with adaptive address lookup hashing scheme |

US6701317 * | Sep 19, 2000 | Mar 2, 2004 | Overture Services, Inc. | Web page connectivity server construction |

US6754662 * | Dec 20, 2000 | Jun 22, 2004 | Nortel Networks Limited | Method and apparatus for fast and consistent packet classification via efficient hash-caching |

US6789156 * | Jul 25, 2001 | Sep 7, 2004 | Vmware, Inc. | Content-based, transparent sharing of memory units |

Non-Patent Citations

Reference | ||
---|---|---|

1 | D. E. Knuth, The Art of Computer Programming, vol. 3, Sorting and Searching, §6.3 "Digital Searching" Addison Wesley, 1973, pp. 481-491. | |

2 | J. Carter and M. Wegman, "Universal Classes of Hash Functions," Journal of Computer and System Sciences, vol. 18, No. 2, pp. 143-154, 1979. | |

3 | M V Ramakrishna, E. Fu, E. Bahcekapili "Efficient Hardware Hashing Functions for High Performance Computers", IEEE Transaction on Computers, Dec. 1997 vol. 46 No. 12, pp. 1378-1381. | |

4 | * | Mishra, Priti et al., "Join Processing in Relational Databases", ACM Computing Surveys, vol. 24, No. 1, Mar. 1992. |

5 | Mukesh Singhal et al "A Novel Cache Architecture to support Layer four Packet Classification at Memory Access Speeds", IEEE Infocom 2000, pp. 1445-1454. | |

6 | TC59LM814/06CFT-50, Toshiba 200MHz FCRAM datasheet, Nov. 30, 2001. |

Referenced by

Citing Patent | Filing date | Publication date | Applicant | Title |
---|---|---|---|---|

US7308526 * | Jun 2, 2004 | Dec 11, 2007 | Intel Corporation | Memory controller module having independent memory controllers for different memory types |

US7325013 * | Apr 22, 2004 | Jan 29, 2008 | Id3Man, Inc. | Database with efficient fuzzy matching |

US7382787 | Jun 20, 2002 | Jun 3, 2008 | Cisco Technology, Inc. | Packet routing and switching device |

US7418536 | Jan 4, 2006 | Aug 26, 2008 | Cisco Technology, Inc. | Processor having systolic array pipeline for processing data packets |

US7450438 | Apr 17, 2003 | Nov 11, 2008 | Cisco Technology, Inc. | Crossbar apparatus for a forwarding table memory in a router |

US7469241 * | Jun 17, 2005 | Dec 23, 2008 | Oracle International Corporation | Efficient data aggregation operations using hash tables |

US7469317 * | Sep 12, 2006 | Dec 23, 2008 | Alcatel Lucent | Method and system for character string searching |

US7512684 | Sep 30, 2004 | Mar 31, 2009 | Intel Corporation | Flow based packet processing |

US7515588 * | Mar 18, 2004 | Apr 7, 2009 | Intel Corporation | Method and apparatus to support a large internet protocol forwarding information base |

US7525904 | Apr 14, 2003 | Apr 28, 2009 | Cisco Technology, Inc. | Redundant packet routing and switching device and method |

US7536476 * | Dec 22, 2003 | May 19, 2009 | Cisco Technology, Inc. | Method for performing tree based ACL lookups |

US7620046 | Sep 30, 2004 | Nov 17, 2009 | Intel Corporation | Dynamically assigning packet flows |

US7680806 * | May 17, 2005 | Mar 16, 2010 | Cisco Technology, Inc. | Reducing overflow of hash table entries |

US7710991 | Apr 14, 2003 | May 4, 2010 | Cisco Technology, Inc. | Scalable packet routing and switching device and method |

US7730055 | Jun 23, 2008 | Jun 1, 2010 | Oracle International Corporation | Efficient hash based full-outer join |

US7769708 * | Aug 23, 2007 | Aug 3, 2010 | Auditude.Com, Inc. | Efficient fuzzy matching of a test item to items in a database |

US7796515 * | Apr 28, 2004 | Sep 14, 2010 | Hewlett-Packard Development Company, L.P. | Propagation of viruses through an information technology network |

US7797152 * | Feb 17, 2006 | Sep 14, 2010 | The United States Of America As Represented By The Director, National Security Agency | Method of database searching |

US7889712 | Dec 23, 2005 | Feb 15, 2011 | Cisco Technology, Inc. | Methods and apparatus for providing loop free routing tables |

US7944828 | Oct 1, 2009 | May 17, 2011 | Intel Corporation | Dynamically assigning packet flows |

US8010401 * | Jan 30, 2007 | Aug 30, 2011 | Intuit Inc. | Method and system for market research |

US8014282 | Jun 26, 2008 | Sep 6, 2011 | Intel Corporation | Hashing packet contents to determine a processor |

US8165302 * | Feb 12, 2007 | Apr 24, 2012 | Sony Corporation | Key table and authorization table management |

US8199644 | May 5, 2010 | Jun 12, 2012 | Lsi Corporation | Systems and methods for processing access control lists (ACLS) in network switches using regular expression matching logic |

US8266116 * | Aug 28, 2007 | Sep 11, 2012 | Broadcom Corporation | Method and apparatus for dual-hashing tables |

US8270399 | Oct 29, 2008 | Sep 18, 2012 | Cisco Technology, Inc. | Crossbar apparatus for a forwarding table memory in a router |

US8270401 | Apr 3, 2003 | Sep 18, 2012 | Cisco Technology, Inc. | Packet routing and switching device |

US8271635 | Jun 17, 2009 | Sep 18, 2012 | Microsoft Corporation | Multi-tier, multi-state lookup |

US8429143 * | Apr 25, 2008 | Apr 23, 2013 | International Business Machines Corporation | Methods and systems for improving hash table performance |

US8547837 | Apr 5, 2011 | Oct 1, 2013 | Intel Corporation | Dynamically assigning packet flows |

US8599853 | Apr 16, 2010 | Dec 3, 2013 | Wipro Limited | System and method for an exact match search using pointer based pipelined multibit trie traversal technique |

US8661160 | Aug 30, 2006 | Feb 25, 2014 | Intel Corporation | Bidirectional receive side scaling |

US8682940 * | Jul 2, 2010 | Mar 25, 2014 | At&T Intellectual Property I, L. P. | Operating a network using relational database methodology |

US8793257 * | May 13, 2010 | Jul 29, 2014 | Roger Frederick Osmond | Method for improving the effectiveness of hash-based data structures |

US8868926 * | Apr 6, 2012 | Oct 21, 2014 | Exablox Corporation | Cryptographic hash database |

US8909781 | Jul 26, 2012 | Dec 9, 2014 | Pi-Coral, Inc. | Virtual access to network services |

US8954411 * | Dec 21, 2011 | Feb 10, 2015 | Ebay Inc. | Method and system to facilitate a search of an information resource |

US8954550 * | Feb 13, 2008 | Feb 10, 2015 | Microsoft Corporation | Service dependency discovery in enterprise networks |

US9015198 | May 24, 2010 | Apr 21, 2015 | Pi-Coral, Inc. | Method and apparatus for large scale data storage |

US9047417 | Oct 29, 2012 | Jun 2, 2015 | Intel Corporation | NUMA aware network interface |

US9094237 | Aug 31, 2012 | Jul 28, 2015 | Cisco Technology, Inc. | Packet routing and switching device |

US9106584 | Sep 26, 2011 | Aug 11, 2015 | At&T Intellectual Property I, L.P. | Cloud infrastructure services |

US20040218615 * | Apr 28, 2004 | Nov 4, 2004 | Hewlett-Packard Development Company, L.P. | Propagation of viruses through an information technology network |

US20050187898 * | Aug 2, 2004 | Aug 25, 2005 | Nec Laboratories America, Inc. | Data Lookup architecture |

US20050207409 * | Mar 18, 2004 | Sep 22, 2005 | Naik Uday R | Method and apparatus to support a large internet protocol forwarding information base |

US20050234901 * | Apr 22, 2004 | Oct 20, 2005 | Caruso Jeffrey L | Database with efficient fuzzy matching |

US20050273564 * | Jun 2, 2004 | Dec 8, 2005 | Sridhar Lakshmanamurthy | Memory controller |

US20060067228 * | Sep 30, 2004 | Mar 30, 2006 | John Ronciak | Flow based packet processing |

US20060067349 * | Sep 30, 2004 | Mar 30, 2006 | John Ronciak | Dynamically assigning packet flows |

US20060075142 * | Sep 29, 2004 | Apr 6, 2006 | Linden Cornett | Storing packet headers |

US20060116989 * | Jun 17, 2005 | Jun 1, 2006 | Srikanth Bellamkonda | Efficient data aggregation operations using hash tables |

US20060117126 * | Jan 4, 2006 | Jun 1, 2006 | Cisco Technology, Inc. | Processing unit for efficiently determining a packet's destination in a packet-switched network |

US20060126640 * | Dec 14, 2004 | Jun 15, 2006 | Sood Sanjeev H | High performance Transmission Control Protocol (TCP) SYN queue implementation |

US20060265370 * | May 17, 2005 | Nov 23, 2006 | Cisco Technology, Inc. (A California Corporation) | Method and apparatus for reducing overflow of hash table entries |

US20070083914 * | Jul 26, 2006 | Apr 12, 2007 | Jonathan Griffin | Propagation of malicious code through an information technology network |

US20090204696 * | Feb 13, 2008 | Aug 13, 2009 | Ming Zhang | Service dependency discovery in enterprise networks |

US20100299333 * | May 13, 2010 | Nov 25, 2010 | Roger Frederick Osmond | Method for improving the effectiveness of hash-based data structures |

US20120005243 * | Jul 2, 2010 | Jan 5, 2012 | At&T Intellectual Property I, Lp | Operating a Network Using Relational Database Methodology |

US20120095975 * | Apr 19, 2012 | Ebay Inc. | Method and system to facilitate a search of an information resource | |

US20130268770 * | Apr 6, 2012 | Oct 10, 2013 | Tad Hunt | Cryptographic hash database |

EP2834943A4 * | Apr 8, 2013 | Sep 23, 2015 | Exablox Corp | Cryptographic hash database |

WO2015093870A1 * | Dec 18, 2014 | Jun 25, 2015 | Samsung Electronics Co., Ltd. | Method and device for managing data |

Classifications

U.S. Classification | 1/1, 707/E17.012, 707/999.101, 707/999.102, 707/999.103, 707/999.104, 707/999.01 |

International Classification | G06F17/30 |

Cooperative Classification | Y10S707/99945, Y10S707/99944, Y10S707/99942, Y10S707/99943, G06F17/30961 |

European Classification | G06F17/30Z1T |

Legal Events

Date | Code | Event | Description |
---|---|---|---|

Jan 28, 2003 | AS | Assignment | Owner name: PMC-SIERRA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LYER, SUNDAR;REEL/FRAME:013717/0274 Effective date: 20021105 Owner name: PMC-SIERRA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KOMPELLA, RAMANA;REEL/FRAME:013722/0665 Effective date: 20030117 Owner name: PMC-SIERRA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JOSHI, DEEPALI;SHELAT, AJIT;PHANSALKAR, AMIT;REEL/FRAME:013719/0823 Effective date: 20021017 Owner name: PMC-SIERRA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VARGHESE, GEORGE;REEL/FRAME:013717/0260 Effective date: 20021121 |

Oct 21, 2009 | FPAY | Fee payment | Year of fee payment: 4 |

Aug 6, 2013 | AS | Assignment | Free format text: SECURITY INTEREST IN PATENTS;ASSIGNORS:PMC-SIERRA, INC.;PMC-SIERRA US, INC.;WINTEGRA, INC.;REEL/FRAME:030947/0710 Owner name: BANK OF AMERICA, N.A., NORTH CAROLINA Effective date: 20130802 |

Oct 16, 2013 | FPAY | Fee payment | Year of fee payment: 8 |

Rotate