|Publication number||US20090323535 A1|
|Application number||US 12/555,801|
|Publication date||Dec 31, 2009|
|Priority date||Jun 27, 2003|
|Also published as||US7606161, US20040264380|
|Publication number||12555801, 555801, US 2009/0323535 A1, US 2009/323535 A1, US 20090323535 A1, US 20090323535A1, US 2009323535 A1, US 2009323535A1, US-A1-20090323535, US-A1-2009323535, US2009/0323535A1, US2009/323535A1, US20090323535 A1, US20090323535A1, US2009323535 A1, US2009323535A1|
|Inventors||Mohan Kalkunte, Srinivas Sampath, Karagada Ramarao Kishore|
|Original Assignee||Broadcom Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Referenced by (1), Classifications (9)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application is a continuation of, and claims priority under 35 U.S.C. §120 to, U.S. patent application Ser. No. 10/825,656, filed on Apr. 16, 2004 and titled Distributing Information Across Equal-Cost Paths in a Network, now U.S. Pat. No. ______, which itself claims priority from U.S. Provisional Patent Application Ser. No. 601483,026, entitled “ECMP IN XGS” and filed on Jun. 27, 2003 and U.S. Provisional Patent Application Ser. No. 60/592,617, entitled “Distributing Information Across Equal-Cost Paths in a Ntwork” and filed on Dec. 16, 2003. The contents of all of the above-referenced applications are hereby incorporated in their entirety by reference.
1. Field of the Invention
Certain embodiments of the present invention are directed to methods of distributing datagrams across a network. Certain other embodiments of the present invention are directed to devices for implementing such methods.
2. Description of the Related Art
Telecommunications systems typically distribute information, usually in the form of datagrams such as, but not limited to, data cells and packets, over networks that include network devices such as, but not limited to, hosts, servers, modules, nodes, and distribution devices such as, but not limited to, switches and routers. A small portion of a representative network is illustrated in
A second port 150, also among the plurality of ports, is one possible egress for the data that came into the router 100 as part of the first packet P1. In a Layer-3 switching environment, upon egress from the router 100, the source address (SA) of the packet is changed to the router MAC address of the router 100. The destination address (DA) of the packet is changed to the next-hop address (NHA) or, in the example illustrated in
A third port 160 is another potential egress port for the data that entered the router 100 through the first port 140. If the data egresses through the third port 160, it does so as a third datagram, illustrated as a third packet P3 in
Although many other factors frequently come into play, according to a simplified model, calculating the “cost” of a path in a network involves counting the number of “hops” that a datagram or packet has to make between a source and a destination. For example, in
According to this simplified model, since the router 100 relies exclusively upon the number of hops between it and the datagram or packet destination to determine the cost of a path, the router 100 can make no cost-based distinction between the path through the first host 110 and the path through the second host 120. Hence, in order to determine whether to forward the data in the first packet P1 as the second packet P2 or the third packet P3, the router 100 often makes use of an equal-cost multi-path (ECMP) algorithm, which is well know in the related art.
Unfortunately, although ECMP algorithms according to the related art are often useful for distributing traffic evenly over a set of equal-cost paths, ECMP algorithms according to the related art fail to account for general network traffic that also commonly flows through the various network devices along the equal-cost paths. Hence, in the partial network illustrated in
In addition to the general inability of ECMP algorithms according to the related art to account for general network traffic, these algorithms also typically require a considerable amount of distribution device resources. These resources are generally in the form of time allocated to performing the algorithms and in the form of hardware that is used and re-used while performing the algorithms.
At least in view of the above, there is a need for methods that are capable of distributing datagram, data, and/or packet traffic across equal-cost paths in a network in a manner that reduces the possibility that certain network devices along the equal-cost paths will be overly burdened. There is also a need for devices capable of implementing such methods.
In addition, at least in view of the above, there is also a need for methods that reduce the amount of time spent by the distribution device in performing lookup algorithms and/or that reduce the amount of hardware that is used and re-used while performing such algorithms. Further, there is a need for devices capable of implementing such methods.
In order to address and/or overcome at least some of the above-discussed shortcomings of the related art, new devices and methods are provided. Some of these methods and devices are summarized below.
According to certain embodiments of the present invention, a first method of distributing data across a network is provided. According to this first method, a step of providing a distribution device configured to distribute packets of data across a set of equal-cost paths in a network is typically provided. According to this first method, distribution of the packets across the paths is usually based on at least one attribute of each of the packets.
According to certain other embodiments of the present invention, a second method of distributing data across the network is provided. According to this second method, a distribution device is normally provided, and this distribution device is generally configured to distribute a set of packets of data across a set of equal-cost paths in the network. According to this second method, each packet in the set of packets is typically distributed across the set of equal-cost paths according to a weighted distribution.
According to yet other embodiments of the present invention, a first data packet distribution device is provided. This distribution device typically includes a set of ports and a first distribution unit. The first distribution unit often includes a device logic. Usually, the first distribution unit is configured to use the device logic to distribute a packet of data entering the device through a first port among the set of ports to a second port among the set of ports. Normally, the device logic includes a first lookup unit that itself generally includes an acknowledgement unit for acknowledging whether multiple equal-cost paths exist, a first referencing unit for referencing a second lookup unit when multiple equal-cost paths do exist, and a second referencing unit for referencing a third lookup unit otherwise. Typically, the device logic also includes the second lookup unit that itself includes a second distribution unit for distributing the packet across the set of ports and a third referencing unit for referencing the third lookup unit. Also, the device logic commonly includes the third lookup unit, that itself usually includes a selection unit for selecting the second port.
In addition, certain embodiments of the present invention include a second device for distributing Internet Protocol (IP) packets across a network. The device generally includes a set of interface means for interfacing the device with the network. The device also routinely includes distribution means for distributing a set of IP packets entering the device through a first interface means in the set of interface means such that packets in the set of IP packets are distributed across all of the interface means in the set of interface means that are operably connected to equal-cost paths according to a weighted distribution.
For proper understanding of certain embodiments of the invention, reference should be made to the accompanying drawings, wherein:
As telecommunications networks grow and increase in speed, more and more information and/or data is distributed over the networks, often in the form of datagrams. Hence, it becomes more and more desirable to enhance network efficiency at every level. Below are described several representative methods and devices for enhancing network efficiency by more efficiently distributing datagrams such as, but not limited to, Internet Protocol (IP) packets, over a network.
According to certain embodiments of the present invention, distribution and/or routing methods for distributing datagrams, often in the form of IP packets, across a network are provided. These methods typically help alleviate traffic congestion across network devices and/or entire portions of the network. The network devices may include, for example, nodes, modules, routers, distribution devices, and links,
In order to illustrate how some of these methods operate, the representative packet 200 and the representative set of tables 210, 220, 230 illustrated in
As mentioned above, according to certain embodiments of the present invention, methods of distributing data across a network are provided. According to some of these methods, such as the first representative method 400 illustrated in
According to certain embodiments of the present invention, as shown in the second step 420 of the first representative method 400, the datagrams are distributed, over time, across each of the available equal-cost paths. In this second step the distribution is made as a function of at least one attribute of each of the datagrams.
According to a second representative method 500, the steps of which are illustrated in
Another representative embodiment of the present invention that illustrates how the packet attribute is used to choose from among a set of equal-cost paths is discussed below, and is explained with reference to the exemplary packet 200 and tables 210, 220, 230 illustrated in
The distribution device typically includes, or is operably connected to, a first distribution unit 310 that itself typically includes a device logic 320 and/or memory. Usually, the first distribution unit 310 is configured to make use of the device logic 320 and/or memory when distributing DATA contained in the first datagram D1 that enters the distribution device. The first datagram D1 typically enters through the first port 330 and is distributed to an egress port, such as a second port 340 or a third port 345, chosen from among a plurality of possible egress ports. If the second port 340 illustrated in
According to certain embodiments of the present invention, the device logic 320 includes a first lookup unit 350, which often stores the LPM Table 210 illustrated in
In operation, according to certain embodiments of the present invention, the distribution device or switch 300 performs a Longest Prefix match between the packet 200 and a portion of an entry in the LPM Table 210. Typically, only a single entry in the LPM Table 210 includes a portion that matches the Longest Prefix of the packet 200, regardless of whether or not multiple equal-cost paths exist in the network for the packet 200. This LPM Table 210 entry is referred to below as the “matched entry”.
Usually, a distribution device that includes and makes use of a first lookup unit 350 in which an LPM Table 210 is stored relies on the ECMP value included in the matched LPM Table entry to specify whether equal-cost paths exist. Normally, the ECMP value is a binary value that either specifies that multiple equal-cost paths exist or that they do not.
As illustrated in
The ECMP Table 220 is usually only referenced when multiple equal-cost paths do exist. When the ECMP value in the LPM Table 210 indicates that no equal-cost paths are present, the ECMP Table 220 or, more generally, the second compilation of sets of instructions used by the second lookup unit 380, is normally not referenced. In such cases, instructions for referencing a third compilation of sets of instructions are used.
These instructions for referencing the third compilation of sets of instructions directly from the first compilation of sets of instructions are commonly included in the first compilation of sets of instructions and may, for example, take the form of an L3 Table Index contained in the matched entry of the LPM Table 210. The L3 Table Index may be stored in and/or used by a second referencing unit 365 in the first lookup unit 350.
Typically, the L3 Table Index is used to pick an entry from an L3 Table. The above-discussed ECMP Table may include both the L3 Table Index and the L3 Interface Index. The L3 Interface Index may be used to find the L3 Interface attribute that may eventually be used during datagram transmission but that is not typically relevant to selection from among equal cost paths. This operation is generally simply a direct index from the ECMP Table into the L3 Table at the L3 Table Index address.
As stated above, according to certain embodiments of the present invention, the second compilation of sets of instructions takes the form of the ECMP Table 220 illustrated in
Sets of instructions for referencing the third compilation of sets of instructions are also commonly stored and/or used by a third referencing unit 385 in the second lookup unit 380. These sets of instructions commonly take the form of L3 Table Indices that point to a specific entry in the L3 Table 330. From the discussion above, it should be clear to one of skill in the art that, when multiple equal-cost paths are available, L3 Table Indices from the ECMP Table 220 may be used and that L3 Table Indices from the LPM Table 210 may be used in the absence of multiple equal-cost paths.
In the example illustrated in
One advantage of including multiple, identical entries in the ECMP Table 220 is that these multiple entries allow for the distribution of packet traffic across a set of equal-cost paths according to a weighted distribution over time. How such a distribution is achieved is discussed below.
The distribution of each packet in a set of packets across a set of equal-cost paths according to a weighted distribution is stipulated in the second step 620 of the third representative method 600 illustrated in
Returning now to the description of methods that make use of the devices such as the device illustrated in
More specifically, according to one representative example, before the ECMP Table Base Pointer is used to reference the ECMP Table 220, the ECMP Table Base Pointer first undergoes a mathematical transformation involving a packet attribute. The mathematical transformation of the ECMP Table Base Pointer may, for example, begin with the selection of a packet attribute, which is often chosen to be the SIP, since different packets passing through the distribution device are more likely to have unique sources and destinations. Then, according to certain embodiments and as shown in the fourth step 540 of the representative method illustrated in
Pursuant to the hashing of the packet attribute, the hash value may be mathematically manipulated further, usually by adding the hash value to the ECMP Table Base Pointer to generate a modified pointer into the ECMP Table 220. Because the hash value is, by definition, highly likely to be unique, each modified pointer can reference a different ECMP Table entry or, when multiple identical ECMP Table entries are present, a different set of such identical ECMP Table entries. Since each ECMP Table entry references a single L3 Table entry, each of the packets will be distributed, based on the hashed attribute, over all of the equal-cost paths available, as specified in the fifth step 550 of the method illustrated in
It should be noted that, according to certain embodiments of the present invention, a user-programmable option may be included to determine a hash selection from the ECMP Table. When such an option is included, a hash of the Layer 3 source address (SIP) and Layer 3 destination address (DIP) are typically used. For example, the following function may be used to select and entry:
Entry_select=[hash_function(SIP,DIP) % sizeof(ECMP_entries)]
The above function typically concatentates SIP and DIP, both of which may be of 32 bits, to form, for example, a 64-bit quantity. Then, the 64-bit quantity generally has a hash function performed upon it. The results of the hash function may then be used to obtain a modulo result on the divisor, which may be, for example, the number of equal-cost (ECMP) paths available. Since the result will typically have a value between 0 and the number of ECMP paths available, this value may represent the ECMP entry chosen.
When different numbers of identical ECMP Table entries point to a first L3 Table entry and a second L3 Table entry, the above-mentioned weighted distribution of packets among the equal-cost paths becomes possible. For example, if nine identical ECMP Table entries, each pointing to a first L3 Table entry that instructs that the packet be distributed to a first path via a first egress port are present in the ECMP Table 220, and only three identical ECMP Table entries that point to a second L3 Table entry that instructs that the packet be distributed to a second path via a second egress port are present, then the packet is three times as likely to be distributed to the first path than to the second path. Under such circumstances, over time, packet traffic is generally three times higher on the first path than on the second path and the traffic is said to be distributed across the equal-cost paths according to a weighted distribution.
Such a weighted distribution of traffic is especially beneficial when network modules are unevenly loaded with traffic from the rest of the network. For example, in
Another advantage of the above-described method has to do with the fact that, instead of using multiple LPM Table entries to establish the presence of equal-cost paths, an ECMP value is used. Hence, the LPM Table 210 according to certain embodiments of the present invention is typically of a small size. Therefore, the amount of time spent performing lookup algorithms is reduced. Further, because of the small LPM Table, there is generally a reduction in the amount of hardware, such as memory, that is used and/or re-used.
Next, according to the third step 730, a hashing function is performed on a packet attribute of the first packet, thereby obtaining a hashed result. According to the fourth step 740, the hashed result is divided by the COUNT value of the first entry of the LPM Table to obtain a remainder value. Then, according to the fifth step 750, the remainder value is used as an offset that is added to the ECMP Table Base Pointer found in the first entry of the LPM Table to generate a modified pointer.
Following the above-listed steps, sixth step 760 specifies that an ECMP Table entry be selected based on the modified pointer. According to the seventh step 770, a pointer in the selected ECMP Table entry to reference an entry in an L3 Table. According to certain embodiments, the L3 Table Index is used to reference an entry in the L3 Table. Finally, the eighth step 780 specifies that the packet be forwarded to the distribution device egress port specified in the selected L3 Table entry.
If the above-discussed algorithm, or variations thereof, is performed on all packets entering a distribution device pursuant to a determination that the packets have multiple equal-cost paths available to them, then a weighted distribution of traffic over the paths can be obtained. According to certain embodiments of the present invention, as the presence or absence of equal-cost paths varies over time, the various tables used to perform the weighted distribution may be updated. Generally, this may be done according to a best-fit algorithm, as specified in step 630 of
In order to illustrate a fifth representative method 800 according to certain embodiments of the present invention,
In order to illustrate a sixth representative method 900 according to certain embodiments of the present invention,
In order to illustrate a seventh representative method 1000 according to certain embodiments of the present invention,
Following these steps, the fourth step 1040 specifies performing a hashing function on an attribute of the first packet to obtain a hashed result. Then, fifth step 1050 specifies dividing the hashed result by the first value, thereby obtaining a remainder value, and using the remainder value to obtain an offset. Next, sixth step 1060 specifies adding the offset to the first pointer to select the second set of instructions. According to this step, the second set of instructions typically includes a pointer to a third set of instructions in a third compilation of sets of instructions. Following the sixth step 1060, the seventh step 1070 specifies forwarding the first packet to a port designated in the third set of instructions. Finally, the eighth step 1080 specifies distributing each packet in the set of packets across the set of equal-cost paths.
One having ordinary skill in the art will readily understand that the invention, as discussed above, may be practiced with steps in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although the invention has been described based upon these preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of the invention. In order to determine the metes and bounds of the invention, therefore, reference should be made to the appended claims.
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8094555 *||Nov 27, 2006||Jan 10, 2012||Cisco Technology, Inc.||Dynamic weighted-fair load-balancing|
|International Classification||G01R31/08, H04L12/56|
|Cooperative Classification||H04L45/00, H04L45/24, H04L45/12|
|European Classification||H04L45/24, H04L45/00, H04L45/12|