|Publication number||US7453884 B2|
|Application number||US 10/958,920|
|Publication date||Nov 18, 2008|
|Filing date||Oct 4, 2004|
|Priority date||Apr 25, 2000|
|Also published as||US7123620, US20050083936|
|Publication number||10958920, 958920, US 7453884 B2, US 7453884B2, US-B2-7453884, US7453884 B2, US7453884B2|
|Original Assignee||Cisco Technology, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (51), Non-Patent Citations (4), Referenced by (32), Classifications (21), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application is a continuation application based on U.S. patent application Ser. No. 09/558,693, filed Apr. 25, 2000 now U.S. Pat. No. 7,123,620.
1. Field of the Invention
The present invention relates to data communication networks. More particularly, the present invention relates to a scalable apparatus and method for dynamic selection of explicit routes and dynamic rerouting in data communication networks and internetworks such as the Internet.
As is known to those of ordinary ski in the art, a network is a communication system that allows users to access resources on other computers and exchange messages with other users. A network is typically a data communication system that links two or more computers and peripheral devices. It allows users to share resources on their own systems with other network users and to access information on centrally located systems or systems that are located at remote offices. It may provide connections to the Internet or the networks of other organizations. The network typically includes a cable that attaches to network interface cards (“NICs”) in each of the devices within the network. Users may interact with network-enabled software applications to make a network request (such as to get a file or print on a network printer). The application may also communicate with the network software, which may then interact with the network hardware to transmit information to other devices attached to the network.
When a user 110 connects to a particular destination, such as a requested web page 120, the connection from the user 110 to the web page 120 is typically routed through several internetworking devices such as routers 130-A-130-I. Routers are typically used to connect similar and heterogeneous network segments into internetworks. For example, two LANs may be connected across a dialup, integrated services digital network (“ISDN”), or across a leased line via routers. Routers may also be found throughout internetwork known as the Internet End users may connect to a local Internet service provider (“ISP”) (not shown).
As shown in
Routers such as routers 130-A-130-I typically transfer information along data communication networks using formatted data packets. For example, when a “source” computer system (e.g., computer 110 in
When a router receives a data packet, it reads the data packet's destination address from the data packet header, and then transmits the data packet on the link leading most directly to the data packet's destination. Along the path from source to destination, a data packet may be transmitted along several links and pass through several routers, with each router on the path reading the data packet header and then forwarding the data packet on to the next “hop.”
To determine how data packets should be forwarded, each router is typically aware of the locations of the network's end systems (i.e., which routers are responsible for which end systems), the nature of the connections between the routers, and the states (e.g., operative or inoperative) of the links forming those connections. Using this information, each router can compute effective routes through the network and avoid, for example, faulty links or routers. A procedure for performing these tasks is generally known as a “routing algorithm.”
The interfaces 220 and 230 are typically provided as interface cards. Generally, they control the transmission and reception of data packets over the network, and sometimes support other peripherals used with router 130. Examples of interfaces that may be included in the low and medium speed interfaces 220 are a multiport communications interface 222, a serial communications interface 224, and a token ring interface 226. Examples of interfaces that may be included in the high speed interfaces 230 include a fiber distributed data interface (“FDDI”) 232 and a multiport Ethernet interface 234. Each of these interfaces (low/medium and high speed) may include (1) a plurality of ports appropriate for communication with the appropriate media, and (2) an independent processor, and in some instances (3) volatile RAM. The independent processors may control such communication intensive tasks as packet switching and filtering, and media control and management. By providing separate processors for the communication intensive tasks, this architecture permits the master CPU 210 to efficiently perform routing computations, network diagnostics, security functions, and other similar functions.
The low and medium speed interfaces are shown to be coupled to the master CPU 210 through a data, control, and address bus 240. High speed interfaces 230 are shown to be connected to the bus 240 through a fast data, control, and address bus 250, which is in turn connected to a bus controller 260. The bus controller functions are typically provided by an independent processor.
Although the system shown in
At a higher level of abstraction,
As each new data packet 340 arrives on an interface 310 k, it is written into a corresponding input interface queue 320 k, waiting for its turn to be processed. Scheduling logic 350 determines the order in which input interfaces 310 a-310 n should be “polled” to find out how many data packets (or equivalently, how many bytes of data) have arrived on a given interface 310 k since the last time that interface 310 k was polled. Scheduling logic 350 also determines the amount of data that should be processed from a given interface 310 k during each “polling round.”
Regardless of the specific form of scheduling logic 350 used, when scheduling logic 350 determines that a particular data packet 340 i should be processed from a particular input interface queue 320 k, scheduling logic 350 transfers the data packet 340 i to subsequent portions of the networking device (shown as dashed block 355) for further processing. Eventually, data packet 340 i is written into one of a plurality of output queues 360 a-360 q, at the output of which the data packet 340 i is finally transmitted from the networking device the corresponding output interface 370 a-370 q. Fundamentally, then, the packet forwarding component of a router performs the function of examining the source and destination address of each data packet and identifying one from among a plurality of output interfaces 370 a-370 q on which to transmit each data packet.
Still referring to
If the packet protocol is determined to one that can be routed in step 420 (e.g., if it is an IP packet), then at step 435, the router first performs housekeeping functions known to those of ordinary skill in the art, and then the router “looks up” the destination IP address in its routing table to identify the appropriate router output interface (also called a “port”) on which to transmit the received packet. At step 440, the router determines whether the destination port is directly attached to the router. If so (i.e., if the destination port is another port on the router), then at step 445, the link layer header is added back to the packet with the original link layer destination address, and then at step 450 the reassembled link layer frame is transmitted through the port identified at step 435.
Otherwise, if at step 440 the router determines that the destination port is not directly attached to the router (indicating that another router hop needs to occur), the Media Access Control (“MAC”) address of the next hop may be added to the packet, and a new link layer header with this MAC destination address extracted from the routing table is added to the frame at step 455, and then at step 460 the frame is transmitted through the port identified at step 435.
With the integration of voice, video, and data traffic, the aggregate bandwidth requirement of applications is getting higher. More network infrastructures are being built with high-end routers and switches with a large number of interfaces. As a result, network connectivity is getting richer than ever before. Multiple selections of routes for any given source and destination pair are often available. However, as is known to those of ordinary skill in the art, existing routing protocols, such as the Open Shortest Path First (“OSPF”) protocol, the Routing Information Protocol (“RIP”), the Enhanced Interior Gateway Routing Protocol, and the Border Gateway Protocol (“BGP”) are essentially destination-based routing protocols, which means that all packets with the same destination are typically forwarded along a minimum-hop path to their destination.
With destination-based routing protocols, routes are typically calculated automatically at regular intervals by software in routing devices. To enable this type of dynamic routing, routing devices contain routing tables, which essentially consist of destination address/next hop pairs. As an example, an entry in a routing table (i.e., a routing table as used in step 435 of
Although networks are designed to match expected traffic load, the actual traffic load in data communication networks changes dynamically depending on the time of day, and from day to day as well. A special event (e.g., the final game of the soccer World Cup) can dramatically change the network traffic pattern. In addition, network devices can become disabled or may be added to a network, and these events can also change the network traffic pattern.
As is known to those of ordinary skill in the art, destination-based minimum-hop routing algorithms (such as OSPF, RIP, and BGP) typically select the “shortest path” to a destination based on a metric such as hop count. These protocols therefore have limited capacity to balance the network load. With these protocols, when the traffic load is concentrated, some links are heavily loaded, while others sit idle. Thus, when using these routing algorithms, a network cannot dynamically adjust its forwarding paths to avoid congested links.
Traffic engineering is an important network service that achieves network resource efficiency by directing certain traffic flows to travel through explicitly defined routes that are different from the default paths determined by the routing protocols (e.g., OSPF, EIGRP, or BGP). As is known to those of ordinary skill in the art, the key to successful traffic engineering is the ability to have a set of network mechanisms that supports both explicit routes and dynamic rerouting. Dynamic rerouting refers to the ability to periodically recalculate routes in a data communication network depending on network load or other factors.
Fundamentally, traffic engineering is a function of routing. However, existing destination-based routing protocols, such as the protocols already mentioned, make it very difficult—if not impossible—to support explicit routes.
One current approach proposed by the Internet Engineering Task Force (“IETF”) toward supporting explicit routes in traffic engineering applications is to use the multiple label switching (“MPLS”) technique and the Resource Reservation Protocol (“RSVP”) to set up explicit routes. MPLS and RSVP are not discussed in detail in this document, so as not to overcomplicate the present discussion. For the purposes of the present discussion, however, suffice it to say that in both the MPLS and RSVP techniques, a technique known as “tag switching” can be used, wherein explicit routes are cached in Tag Information Base (“TIB”) entries. In general, tag switching techniques work as follows: at the edge of a tag-switched network, a tag is applied to each packet A tag has only local significance. When a packet is received by a tag switch (e.g., a router or ATM switch with tag switching software), the switch performs a table look-up in the TIB. Each entry in the TIB consists of an incoming tag and one or more subentries of the form: outgoing tag, outgoing interface, outgoing link layer information. The tag switch replaces the tag in the packet with the outgoing tag and replaces the link layer information. The packet is then sent out on the given outgoing interface. It should be noted that the TIB is built at the same time that the outing tables are populated, not when the tag is needed for the first time, which allows flows to be switched starting with the first packet.
According to the tag switching techniques that can be implemented with MPLS and RSVP, at each node in a route, an incoming packet is forwarded along the explicit route to the next hop stored in the TIB whose index tag matches the tag in the packet header. However, the main problem with this approach is its inherent complexity. Both RSVP and MPLS are very complicated and computationally intensive protocols, and they must be extended in order to support explicit routes. Moreover, to support traffic engineering, a network must support both the RSVP and MPLS protocols, and this can significantly limit the scope of traffic engineering available. This is because new service models, such as differentiated services, do not require the hop-by-hop signaling provided by RSVP. Moreover, this approach to supporting dynamic traffic engineering (i.e., making routing decision on a “per-flow” basis), is inflexible and expensive. Moreover, as is known to those of ordinary skill in the art, the current approach is not scalable because the size of the TIB that is required for a network with a given number of nodes grows exponentially as a function of the number of nodes.
The present invention provides a technique for scalable and dynamic rerouting that significantly reduces the complexity of traffic engineering as currently proposed in the IETF. As described herein, according to aspects of the present invention, a global path identifier is assigned to each explicit route in a data communication network. In one embodiment, this global path identifier is inserted in the optional field of an IP packet header, and is used in selecting the next hop by a router's forwarding engine. As another example, the global path identifier can be inserted as a label in MPLS systems. Explicit routes can be selected either by a policy server or by ingress routers. When encountering a new selected path, an ingress router sends an explicit object to downstream nodes of the path to set up explicit routes by caching the next hop in an Explicit Forwarding Information Base (“EFIB”) table in each router along the route. Two explicit routes that merge at a network node will share the same entry in the EFIB tables in all downstream nodes. Ingress routers maintain an Explicit Route Table (“ERT”) table that tracks the global path identifier associated with each flow through the data communication network. Multiple flows using the same path can be implemented by sharing the same global path identifier in the table. In case of sudden network load changes, rerouting can be performed by changing the global path identifier associated with those flows that need to be rerouted, and by then transmitting a new path object to downstream nodes.
Compared with the existing approach, the technique according to aspects of the present invention is routing protocol independent, scalable, and dynamic, and it can support both class-based and flow-based explicit routes. These and other features and advantages of the present invention will be presented in more detail in the following specification of the invention and in the associated figures.
A global path identifier is assigned to each explicit route through a data communication network. This global path identifier can be obtained by performing the bitwise Exclusive OR function of all of the unique identifiers, such as IP addresses, of all the nodes along each explicit route. The global path identifier is inserted into each packet as the packet enters a network, and is used in selecting the next hop by a router's forwarding engine. Explicit routes can be selected either by a policy server or by ingress routers. When encountering a new selected path, an ingress router sends an explicit object to downstream nodes of the path to set up explicit routes by caching the next hop in an Explicit Forwarding Information Base (“EFIB”) table in each router along the route. Two explicit routes that merge at a network node will share the same entry in the EFIB tables in all downstream nodes. Ingress routers maintain an Explicit Route Table (“ERT”) table that tracks the global path identifier associated with each flow through the data communication network. Multiple flows using the same path can be implemented by sharing the same global path identifier in the table. In case of sudden network load changes, rerouting can be performed by changing the global path identifier associated with those flows that need to be rerouted, and by then transmitting a new path object to downstream nodes.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the present description, serve to explain the principles of the invention.
In the drawings:
Those of ordinary skill in the art will realize that the following description of the present invention is illustrative only and not in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons, having the benefit of this disclosure. Reference will now be made in detail to an implementation of the present invention as illustrated in the accompanying drawings. The same reference numbers will be used throughout the drawings and the following description to refer to the same or like parts.
As is known to those of ordinary skill in the art, internetworking is the process of establishing and maintaining communications between and transferring data among multiple local networks in a distributed network system.
The routers within a routing domain manage communications among local networks within their domain and communicate with each other using an intra-domain routing protocol. As is known to those of ordinary skill in the art, examples of such protocols are the IP Routing Information Protocol (“RIP”) and the International Standards Organization (“ISO”) Integrated Intermediate System-to-Intermediate System (“IS-IS”) routing protocol.
Domains 504-508 are connected to the backbone 502 through router nodes 510, 512 and 514, respectively. The routing protocols implemented in such routers are referred to as inter-domain routing protocols. An example of an inter-domain routing protocol is the Inter-domain Routing Protocol (“IDRP”), wherein in IP, the Exterior Gateway Protocol (“EGP”) and the Border Gateway Protocol (“BGP”) routing protocols are known to those of ordinary skill in the art. Although not shown in
Thus, the hierarchically-arranged distributed internetwork system 500 contains levels of subnetworks, each having an associated routing level. The lower routing level includes intra-domain routers 530-534 which manage individual links and nodes within their respective domains. The higher routing level includes inter-domain routers 510-514 which manage all the lower-level domains without addressing details internal to lower routing levels. Communications among these routers typically comprises an exchange (i.e., “advertising”) of routing information. This exchange occurs between routers at the same routing level (referred to as “peer” routers) as well as between routers at different routing levels.
In order to reduce design complexity, most networks are organized as a series of hardware and software levels, or “layers,” within each node. These layers interact to format data for transfer between, for example, a source node and a destination node communicating over the network. Specifically, predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of predefined protocols. This layered design permits each layer to offer selected services to other layers using a standardized interface that shields those layers from the actual implementation details of the services.
As was mentioned earlier, the destination address typically stored in the header of a data packet can be used by the network layer protocols to determine the route for the packet. As is known to those of ordinary skill in the art when a packet is received at the network layer, that layer examines the network layer header of the packet, determines the next hop that the packet should take based upon the destination address, appends a new network layer header onto the packet as necessary, and passes the modified packet to the data link layer responsible for the outgoing link associated with the next hop.
Since network layer addresses are hierarchical in nature, network layer protocols that perform routing functions at the same routing level of an internetwork make next hop determination based upon the same portion of the destination address referred to as the destination address prefix. The routing information exchanged by peer routers typically includes this destination address prefix.
In conventional routers, these destination address prefixes are stored in a forwarding information database (“FIB”). To determine the next hop, network layer protocols typically implement a longest-matching-address-prefix (“longest match”) algorithm that searches the forwarding database for an entry corresponding to the destination address located in the network layer header. The data structure of the forwarding database comprises a large number of “branches,” each representing a string of the hierarchical destination address fields. The branches of the database have different address field values and/or terminate with a different number of address fields. The longest match algorithm must traverse numerous branches during this address matching process to determine the next hop. Due to the hierarchical characteristics of destination addresses, there may be several destination address prefixes in the forwarding database that match at least a portion of a particular destination address. As a result, the longest address in the forwarding database must be found.
As is known to those of ordinary skill in the art, the time required to search the forwarding database is proportional to the average length of the destination address in the database. Destination addresses often have a significant number of bits, particularly in large, hierarchically-arranged internetworks. Due to the length of the destination addresses, as well as the manner in which the longest match algorithm navigates the forwarding database, the address matching functions contribute significantly to the overhead associated with the packet forwarding process performed by network layer protocols, thereby adversely affecting the router's efficiency.
Although all routers perform similar tasks, there are some important hardware and software differences between the functions performed by intra-domain routers and inter-domain routers. Inter-domain routers, also known as “edge routers,” or “ingress routers,” are primarily concerned with network topology and maintenance, while intra-domain routers, also known as “core routers” are primarily devoted to packet forwarding.
As shown in
The routers include connectionless network layer protocols such as the connectionless network layer protocol (“CLNP”), the Internet Protocol (“IP”) network layer protocol and the Internetwork Packet Exchange (“IPX”) protocol. Connectionless network layer protocols exchange routing information with each other using neighbor greeting protocols such as the Address Resolution Protocol (“ARP”). This information is stored in the forwarding information database and used by the network layer protocols to determine paths through the internetwork 600.
Primarily, when an inter-domain router receives a packet from a previous hop inter- or intra-domain router, it determines the next hop inter- or intra-domain router to which to forward the packet using a packet forwarding system 636, of which many are known to those of ordinary skill in the art. Similarly, intra-domain routers forward packets between a previous hop inter-domain router and the end nodes, making next hop determination using the packet forwarding system 636. For example, as shown in
The present invention employs a novel approach to support dynamic selection of explicit routes and dynamic rerouting in the Internet. The proposed technique is simple and scalable. It can support both flow-based and class-based dynamic traffic engineering. The technique is independent of existing routing protocols, but uses the route selected by existing routing protocols as default routes. Hence, it can be used in both intra-domain (e.g., OSPF, RIP, and EIGRP) and inter-domain (e.g., BGP) routing.
Most packet-switched networks employ some form of shortest-path routing. The objective of shortest-path routing is to determine a least-cost path between a source and a destination. The cost can be hop count, delay, bandwidth, queue size, or combinations of these “metrics.” Shortest-path routing algorithms are divided into two classes: distance vector and link state algorithms. As is known to those of ordinary skill in the art, distance-vector algorithms are derived from the Bellman-Ford shortest-path algorithm, and base path selection on local estimates of path costs. These algorithms are usually implemented in a distributed asynchronous fashion. On the other hand, link-state algorithms are derived from Dijkstra's shortest-path algorithm, and are usually implemented in a replicated fashion (i.e., each node performs an independent route computation) and construct paths based on global estimates of individual link costs.
As packet-switched networks grow to accommodates an increasing user population, the amount of routing information that must be distributed, stored, and manipulated in these networks also grows. Research on routing in large packet-switched networks has focused on ways to reduce the quantity of routing information, without sacrificing the quality of a selected path. The majority of proposed and deployed solutions use algorithms for hierarchical clustering of topology information, abstraction of routing information relating to these clusters, and packet forwarding within the hierarchy.
Routing algorithms deployed in today's Internet focus on basic connectivity and typically support only one type of datagram service-best effort service. Moreover, current Internet routing protocols, such as BGP, OSPF, and RIP, use “shortest path routing” (i.e., routing that is optimized for a single metric, such as administrative weight or hop count). These routing protocols are also “opportunistic,” which means that they use the current shortest path or route to a destination. In order to avoid loops, alternative paths with adequate but less than optimal cost cannot be used to route traffic. However, it should be noted that shortest path routing protocols do allow a router to alternate among several equal cost paths to a destination.
There are typically two methods to achieve high network throughput: conserving resources and balancing traffic load. To conserve resources, it is desirable to select paths that required the fewest possible hops, or which require the least amount of bandwidth. On the other hand, to balance the network load, it is desirable to select the least loaded paths. In practice, these two methods often conflict with each other, and tradeoffs must be made between load balancing and resource conserving resources.
Packet-switched networks may be subdivided into two general categories: virtual circuit networks and datagram networks. In a virtual circuit network (e.g., an Asynchronous Transfer Mode, or “ATM” network), a connection is set up before data transmission starts and is tom down after data transmission is completed. All packets for a connection are transmitted in sequence from the source to the destination. In such a network, state information is maintained at every node along the path of the virtual circuit. This state information can be made available to admission control, resource reservation, and Quality of Service (“QoS”) routing components of the network. In contrast, in a connectionless datagram network such as the Internet, all information 10′ needed to deliver a packet is carried in the packet header, and individual forwarding decisions are made as packets arrive in a switch or a router. No connection state information is maintained inside the network.
The concept of a connection still exists on the end-systems, but there is no explicit connection setup and tear down inside the network. It is possible that different packets of the same connection may be sent over different paths from the source to the destination.
Without state information inside the network, it is difficult for a network to provide end-to-end QoS guarantees. In order to provide these guarantees, the current trend is to maintain per-session state information or the aggregation of this state information in the network. State information can be maintained in two ways. Soft state refers to state that is periodically refreshed by the reservation protocol deployed in the network, so, when lost, the state will be automatically reinstated. For example, in the RSVP protocol mentioned earlier, two kinds of soft state information are maintained: path state and reservation state. Each data source periodically sends a path message that establishes or updates the path state, while each receiver periodically sends a reservation message that establishes or updates the reservation state. In contrast, hard state, such as the state maintained in an ATM network, is established at connection setup time and is maintained for the lifetime of the connection. Researchers have suggested maintaining per-flow state only at the edge of a network, while the network core maintains only aggregated state information.
As is known to those of ordinary skill in the art, the next generation of the Internet Protocol (known as “IPv6”) provides enhancements over the capabilities of the existing Internet Protocol version 4 (“IPv4”) service. Specifically, IPv6 adds capabilities to label packets belonging to particular traffic “flows.”A flow in IPv6 is a sequence of packets sent from a particular source to a particular destination. For example, the 24 bit Flow Label field in the IPv6 header may be used by a source to label those packets for which it requests special handling by the IPv6 routers, such as non-default quality of service or “real-time” service. The nature of that special handling might be conveyed to the routers by a control protocol such as RSVP, or by information within the packets comprising the flow (e.g., in a hop-by-hop option).
As used herein, the term “flow” refers to the concept of a virtual circuit in ATM networks aid of a flow in IP networks. Thus, a “flow” can be a hard-state virtual circuit as in an ATM network, a soft-state flow as defined in IPv6, or a stateless connection such as a TCP connection in an IP network. A flow (also called a “session”) may use either a single path or multiple paths. For traffic requiring resource guarantees, it is assumed that the traffic source specifies its traffic characteristics and desired performance guarantees. The network performs admission control to decide if the traffic should be admitted. Once a flow is admitted, resources along its path are reserved through a reservation protocol, such as RSVP.
For the purposes of the present invention, there are two different types of flows: guaranteed flows and best-effort flows. Guaranteed flows require QoS guarantees in terms of delay, delay jitter, and bandwidth. In contrast, best-effort flows require no firm QoS guarantees but expect the network to deliver packets to their destinations as quickly as possible. For guaranteed flows, there are two different types of QoS requirements: bandwidth guarantees and delay guarantees. For best-effort flows, there are two subclasses: high-bandwidth traffic and low latency traffic.
Best-effort traffic has been and will continue to be the main traffic class in data networks for the foreseeable future. However, with the increasing data sizes used by applications and the increased network capacity, large data sets from a few megabytes to several gigabytes are often transmitted over the network. In contrast to traditional best-effort traffic, such as electronic mail, telnet, and small file transfers, this new class of applications can send data at a high burst rate and can make use of the high bandwidth available in the network. This new class of traffic is known as high-bandwidth traffic, traditional best-effort traffic is referred to as low-latency traffic.
Ideally, both high-bandwidth and low-latency traffic should be transmitted in the minimum possible time, but the factors that affect the elapsed time are different for each class of traffic. A low-latency message may consist of only a few packets, and the packets should be sent to their destination with the minimum per-packet end-to-end delay to achieve the minimum elapsed time. For high-bandwidth traffic, however, a message can consist of hundreds or thousands of packets. When sending these packets to their destinations, the available bandwidth on the path will dominate the elapsed time. Given the different traffic characteristics and the different dominating factors determining the performance of low-latency traffic and high-bandwidth traffic, the network should provide explicit support to optimize user satisfaction as well as network resource utilization efficiency for these two classes of traffic.
Among guaranteed traffic flows, a distinction may be made between those that require only bandwidth guarantees and those that also require delay guarantees. For a flow requiring bandwidth guarantees, the requested bandwidth may either be the mini bandwidth or the average bandwidth. However as is known to those of ordinary skill in the art, for a flow requiring delay guarantees, the QoS requirements also include delay, delay jitter, and loss. To provide such QoS guarantees, the network needs to reserve a certain amount of bandwidth at each node on the path from the source to the destination. In many cases, much more bandwidth than the average traffic rate may have to be reserved if the requested delay bound is tight and the traffic source is bursty. In this case, the reserved but unused bandwidth may be used opportunistically by other flows in the network. Ensuring that the link share for each flow corresponds to its requested QoS is the responsibility of specialized scheduling algorithms known to those of ordinary skill in the art, which are not discussed herein.
Having described various contexts in which aspects of the present invention may be implemented, one simplified embodiment of the invention is now described.
For the sake of simplicity, data flows through the network are shown as unidirectional arrows. In the simplified example shown in
As is well known to those of ordinary skill in the art, there are many different possible routes from each source S1-S4 to each destination D1-D2. Determining the optimal route from each source to each destination, based on various factors known to those of ordinary skill in the art, is the quintessential function performed by routing algorithms, such as the various routing algorithms that have been described in this document. The specific type or types of routing algorithms used in network 700 are not important in the context of the present invention. The only requirement is that some sort of routing algorithm is implemented, which can be used to determine the route from each source to each destination. Moreover, as mentioned earlier, these routes can be dynamically recalculated based on various factors known to those of ordinary skill in the art.
As an example, assume that, at any given point in time, whatever routing algorithms are being used in network 700 shown in
R1, R3, R8
R2, R3, R8
R2, R4, R3, R8
R2, R1, R3, R8
R2, R4, R6, R7
According to aspects of the present invention, each of the above explicit routes can be assigned a “global path identifier” (or “global path ID”) by performing a bit-wise Exclusive-Or (“XOR”) function of all the labels of the network segments along the route. Thus, the global path ID for <S1, D1> would be calculated by calculating the bit-wise XOR function of the network segments along the route defined by “R1, R3, R8,” as follows:
Global path ID<S1, D1>=0001XOR0011XOR1000=1010
Similarly, the global path ID's for the remaining routes defined above are calculated as follows:
Global path ID<S2, D1>=0010XOR0011XOR1000=1001
Global path ID<S3, D1>=0010XOR0100XOR0011XOR1000=1101
Global path ID<S4, D1>=0010XOR0001XOR0011XOR1000=1000
Global path ID<S2, D2>=0010XOR0100XOR0110XOR0111=0111
Thus, even from this simplified example, it can be seen that unique global path ID's (with a high degree of probability) can be assigned to each route. As mentioned earlier, a practical implementation would use the IP address of each appropriate hop along the route instead of the simple 4 bit binary numbers used in the above example. As those of ordinary skill in the art will recognize, this approach can be used with binary numbers having any arbitrary number of bits.
According to aspects of the present invention, each ingress router maintains an Explicit Route Table (“ERT”) which, given a source/destination pair, provides an ordered list of the network devices along the corresponding explicit route and a global path ID that is assigned to that route. An exemplary ERT is shown in
According to aspects of the present invention, in one embodiment, each core router in the network maintains an Explicit Forwarding Information Base (“EFIB”) instead of an ERT.
As those of ordinary skill in the art will recognize, due to mathematical properties of the XOR function, this calculation guarantees that whenever routes merge in the network, the global path ID's associated with the routes will also merge. In accordance with these properties, the outgoing global path ID can be easily obtained simply by performing a bit-wise XOR function of the incoming global path ID with the identifying number of the current node (e.g., the IP address of the appropriate port on the current router, in a practical implementation).
Thus, using the global path ID as defined above, once two routes merge, only a single entry in the EFIB is needed for all of the routes that merge at a given node. This is significant because the number of entries required in the EFIB is thus greatly reduced, resulting in a scalable solution. According to the present invention, the size of the EFIB is proportional to the number of different routes used, and is independent of number of flows in the network.
As an example,
Before packets can be forwarded according to aspects of the present invention, the ERT and EFIB tables must be set up.
At step 1030, for each explicit route in the ERT of an ingress router, a path object is transmitted downstream to the next hop along the explicit route, using either the reliable forwarding mode or the unreliable mode. In the former case, the path objects are forwarded to the next hop using reliable transmission (e.g., TCP), which is similar to link state update techniques used in the BGP protocol, as is known to those of ordinary skill in the art. In one embodiment, path objects are maintained as soft state, and may be transmitted periodically from an ingress node such as an edge router in case table entries have been flushed out and lost for any reason.
As mentioned earlier, according to aspects of the present invention, core routers in the network maintain an. Explicit FIB (“EFIB”), which is indexed by an incoming global path ID. This EFIB may be implemented as either a separate routing table (analogous to using a TIB in tag switching approaches known to those of ordinary skill in the art), or as an extension of the existing FIB in typical routers.
At step 1130, if the EFIB entry does not already exist, then the corresponding outgoing global path ID is calculated by performing a bit-wise XOR function of the incoming global path ID with the ID of the receiving router (e.g., the IP address of the corresponding port on the receiving router), and the next hop is determined based on the explicit route information contained in the received path object. At step 1140, the resulting EFIB entry information is written into the EFIB, and the path object is forwarded to the next hop along the explicit route, if necessary, at step 1150.
At step 1240, if a match is found in the ERT, then the appropriate global path ID corresponding to the <source, destination> pair is retrieved from the ERT, along with information identifying the next hop along the appropriate explicit route. At step 1250, the global path ID is inserted into the packet as the outgoing global path ID, and at step 1260, the packet is forwarded to the next hop along the explicit route based on the next hop information retrieved from the ERT.
As those of ordinary skill in the art will recognize, a benefit of the present invention is that, unlike other routing techniques, explicit route tear down is not required. To support rerouting, a new path object is simply sent downstream to modify the entries in the EFIB, and a new global path ID is calculated. Packets using the new route are then inserted into the network with new global path ID.
As described herein, the present invention discloses a novel approach to support dynamic selection of explicit routes and dynamic rerouting in the Internet. The proposed technique is simple and scalable. It can support both flow-based and class-based dynamic traffic engineering. The technique is independent of existing routing protocols, but uses the route selected by existing routing protocols as default routes. Hence, it can be used in both intra-domain (e.g., OSPF, RIP, and EIGRP) and inter-domain (e.g., BGP) routing. Dynamic rerouting typically implies high implementation complexity and operational overhead. The proposed technique makes rerouting simple and relatively inexpensive.
As those of ordinary skill in the art will recognize, the techniques proposed in the present invention significantly reduce the complexity of traffic engineering as currently proposed in the IETF. Compared with other approaches known to those of ordinary skill in the art, the new techniques according to the present invention are independent of the underlying routing protocol, scalable (i.e., they require a small EFIB size and support flow aggregation), and dynamic (i.e., they can support both class-based and flow-based explicit routes).
The techniques described herein according to aspects of the present invention may be implemented in routers or in any device having a plurality of output interfaces that forwards incoming data to one or more of these output interfaces. As is known to those of ordinary skill in the art, the program code which may be required to implement aspects of the present invention may all be stored on a computer-readable medium. Depending on each particular implementation, computer-readable media suitable for this purpose may include, without limitation, floppy diskettes, hard drives, network drives, RAM, ROM, EEPROM, nonvolatile RAM, or flash memory.
While embodiments and applications of this invention have been shown and described, it would be apparent to those of ordinary skill in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4794594||Jun 12, 1987||Dec 27, 1988||International Business Machines Corporation||Method and system of routing data blocks in data communication networks|
|US5088032||Jan 29, 1988||Feb 11, 1992||Cisco Systems, Inc.||Method and apparatus for routing communications among computer networks|
|US5095480||Jun 16, 1989||Mar 10, 1992||Fenner Peter R||Message routing system for shared communication media networks|
|US5249178||Jul 26, 1991||Sep 28, 1993||Nec Corporation||Routing system capable of effectively processing routing information|
|US5274631||Mar 11, 1991||Dec 28, 1993||Kalpana, Inc.||Computer network switching system|
|US5274643||Dec 11, 1992||Dec 28, 1993||Stratacom, Inc.||Method for optimizing a network having virtual circuit routing over virtual paths|
|US5280480||Feb 21, 1991||Jan 18, 1994||International Business Machines Corporation||Source routing transparent bridge|
|US5291482||Jul 24, 1992||Mar 1, 1994||At&T Bell Laboratories||High bandwidth packet switch|
|US5303237||Jul 31, 1992||Apr 12, 1994||International Business Machines Corporation||Frame relay system capable of handling both voice and data frames|
|US5317562||May 7, 1993||May 31, 1994||Stratacom, Inc.||Method and apparatus for routing cell messages using delay|
|US5331637||Jul 30, 1993||Jul 19, 1994||Bell Communications Research, Inc.||Multicast routing using core based trees|
|US5345446||Nov 6, 1992||Sep 6, 1994||At&T Bell Laboratories||Establishing telecommunications call paths in broadband communication networks|
|US5361256||May 27, 1993||Nov 1, 1994||International Business Machines Corporation||Inter-domain multicast routing|
|US5361259||Feb 19, 1993||Nov 1, 1994||American Telephone And Telegraph Company||Wide area network (WAN)-arrangement|
|US5408469||Jul 22, 1993||Apr 18, 1995||Synoptics Communications, Inc.||Routing device utilizing an ATM switch as a multi-channel backplane in a communication network|
|US5426637||Dec 14, 1992||Jun 20, 1995||International Business Machines Corporation||Methods and apparatus for interconnecting local area networks with wide area backbone networks|
|US5428607||Dec 20, 1993||Jun 27, 1995||At&T Corp.||Intra-switch communications in narrow band ATM networks|
|US5430715||Sep 15, 1993||Jul 4, 1995||Stratacom, Inc.||Flexible destination address mapping mechanism in a cell switching communication controller|
|US5452297||Dec 20, 1993||Sep 19, 1995||At&T Corp.||Access switches for large ATM networks|
|US5477536||Dec 7, 1993||Dec 19, 1995||Picard; Jean L.||Method and system for routing information between nodes in a communication network|
|US5524254||Jul 1, 1994||Jun 4, 1996||Digital Equipment Corporation||Scheme for interlocking line card to an address recognition engine to support plurality of routing and bridging protocols by using network information look-up database|
|US5550816 *||Dec 29, 1994||Aug 27, 1996||Storage Technology Corporation||Method and apparatus for virtual switching|
|US5583862||Mar 28, 1995||Dec 10, 1996||Bay Networks, Inc.||Method and apparatus for routing for virtual networks|
|US5610910||Aug 17, 1995||Mar 11, 1997||Northern Telecom Limited||Access to telecommunications networks in multi-service environment|
|US5617421||Jun 17, 1994||Apr 1, 1997||Cisco Systems, Inc.||Extended domain computer network using standard links|
|US5740171||Mar 28, 1996||Apr 14, 1998||Cisco Systems, Inc.||Address translation mechanism for a high-performance network switch|
|US5764756||Jan 11, 1996||Jun 9, 1998||U S West, Inc.||Networked telephony central offices|
|US5802047||May 31, 1996||Sep 1, 1998||Nec Corporation||Inter-LAN connecting device with combination of routing and switching functions|
|US5802316||Jan 26, 1996||Sep 1, 1998||Ito; Yuji||Routers connecting LANs through public network|
|US5867495||Nov 18, 1996||Feb 2, 1999||Mci Communications Corporations||System, method and article of manufacture for communications utilizing calling, plans in a hybrid network|
|US5872783 *||Jul 24, 1996||Feb 16, 1999||Cisco Systems, Inc.||Arrangement for rendering forwarding decisions for packets transferred among network switches|
|US5959990||Mar 12, 1996||Sep 28, 1999||Bay Networks, Inc.||VLAN frame format|
|US6026087||Mar 14, 1997||Feb 15, 2000||Efusion, Inc.||Method and apparatus for establishing a voice call to a PSTN extension for a networked client computer|
|US6091725||Dec 29, 1995||Jul 18, 2000||Cisco Systems, Inc.||Method for traffic management, traffic prioritization, access control, and packet forwarding in a datagram computer network|
|US6151324||Jun 3, 1996||Nov 21, 2000||Cabletron Systems, Inc.||Aggregation of mac data flows through pre-established path between ingress and egress switch to reduce number of number connections|
|US6201810 *||Aug 13, 1997||Mar 13, 2001||Nec Corporation||High-speed routing control system|
|US6408001||Oct 21, 1998||Jun 18, 2002||Lucent Technologies Inc.||Method for determining label assignments for a router|
|US6463062 *||Dec 23, 1998||Oct 8, 2002||At&T Corp.||Integrating switching and facility networks using ATM|
|US6522630||May 21, 1999||Feb 18, 2003||Alcatel||Explicit routing method|
|US6526056||Dec 21, 1998||Feb 25, 2003||Cisco Technology, Inc.||Virtual private network employing tag-implemented egress-channel selection|
|US6680943 *||Oct 1, 1999||Jan 20, 2004||Nortel Networks Limited||Establishing bi-directional communication sessions across a communications network|
|US6985960 *||Dec 26, 2000||Jan 10, 2006||Fujitsu Limited||Routing information mapping device in a network, method thereof and storage medium|
|US7154889 *||Oct 23, 2001||Dec 26, 2006||Cisco Technology, Inc.||Peer-model support for virtual private networks having potentially overlapping addresses|
|US7158524 *||Dec 30, 2005||Jan 2, 2007||At&T Corp.||Integrating switching and facility networks|
|US20010021190||Jan 29, 2001||Sep 13, 2001||Heinrich Hummel||Method for transmitting data packets to a number of receivers in a heterogeneous communications network|
|US20020051458 *||Dec 7, 2001||May 2, 2002||Avici Systems||Composite trunking|
|US20020091855||Feb 1, 2001||Jul 11, 2002||Yechiam Yemini||Method and apparatus for dynamically addressing and routing in a data network|
|US20020163889||Feb 1, 2001||Nov 7, 2002||Yechiam Yemini||Method and apparatus for providing services on a dynamically addressed network|
|EP0567217A2||Mar 12, 1993||Oct 27, 1993||3Com Corporation||System of extending network resources to remote networks|
|WO1995020850A1||Jan 26, 1995||Aug 3, 1995||Cabletron Systems, Inc.||Network having secure fast packet switching and guaranteed quality of service|
|WO1997038511A2||Mar 20, 1997||Oct 16, 1997||At & T Corp.||Packet telephone system|
|1||Allen, M., Novell IPX Over Various WAN Media (IPXWAN), Dec. 1993, Novell, Inc., Network Working Group, pp. 1-22.|
|2||Awduche, et al., "Requirements for Traffic Engineering Over MPLS", Jun. 1999, Internet Engineering Task Force, Internet Draft, MPSL Working Group, Category: Informational, pp. 1-28.|
|3||Doeringer, Willibald, et al., "Routing on Longest-Matching Prefixes", Feb. 1996, IEEE/ACM Transactions on Networking, vol. 4, No. 1, pp. 86-97.|
|4||Rosen, et al., "Multiprotocol Label Switching Architecture", Apr. 1999, Network Working Group, Internet-Draft, pp. 1-62.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7624194||Dec 17, 2004||Nov 24, 2009||Microsoft Corporation||Establishing membership within a federation infrastructure|
|US7664116 *||Apr 4, 2006||Feb 16, 2010||Fujitsu Limited||Network based routing scheme|
|US7730220||Aug 17, 2006||Jun 1, 2010||Microsoft Corporation||Broadcasting communication within a rendezvous federation|
|US7760735 *||Feb 6, 2007||Jul 20, 2010||Google Inc.||Method and system for discovering network paths|
|US7826476 *||Aug 16, 2006||Nov 2, 2010||International Business Machines Corporation||Apparatus and method to reserve resources in communications system|
|US7958262||May 22, 2007||Jun 7, 2011||Microsoft Corporation||Allocating and reclaiming resources within a rendezvous federation|
|US8014321 *||Sep 7, 2005||Sep 6, 2011||Microsoft Corporation||Rendezvousing resource requests with corresponding resources|
|US8090880||Nov 7, 2007||Jan 3, 2012||Microsoft Corporation||Data consistency within a federation infrastructure|
|US8095600||Jun 30, 2006||Jan 10, 2012||Microsoft Corporation||Inter-proximity communication within a rendezvous federation|
|US8095601||Jun 30, 2006||Jan 10, 2012||Microsoft Corporation||Inter-proximity communication within a rendezvous federation|
|US8228926 *||Jan 18, 2006||Jul 24, 2012||Genband Us Llc||Dynamic loading for signaling variants|
|US8392515||Jun 25, 2009||Mar 5, 2013||Microsoft Corporation||Subfederation creation and maintenance in a federation infrastructure|
|US8397011 *||Oct 5, 2007||Mar 12, 2013||Joseph Ashwood||Scalable mass data storage device|
|US8417813||Jun 7, 2011||Apr 9, 2013||Microsoft Corporation||Rendezvousing resource requests with corresponding resources|
|US8549180||Jun 22, 2010||Oct 1, 2013||Microsoft Corporation||Optimizing access to federation infrastructure-based resources|
|US8990434||Aug 30, 2011||Mar 24, 2015||Microsoft Technology Licensing, Llc||Data consistency within a federation infrastructure|
|US9270584||Feb 28, 2012||Feb 23, 2016||Cisco Technology, Inc.||Diverse paths using a single source route in computer networks|
|US9647917||Apr 8, 2015||May 9, 2017||Microsoft Technology Licensing, Llc||Maintaining consistency within a federation infrastructure|
|US20060087990 *||Sep 7, 2005||Apr 27, 2006||Microsoft Corporation||Rendezvousing resource requests with corresponding resources|
|US20060088015 *||Dec 17, 2004||Apr 27, 2006||Microsoft Corporation||Establishing membership within a federation infrastructure|
|US20060090003 *||Oct 22, 2004||Apr 27, 2006||Microsoft Corporation||Rendezvousing resource requests with corresponding resources|
|US20060174035 *||Jan 28, 2005||Aug 3, 2006||At&T Corp.||System, device, & method for applying COS policies|
|US20060227779 *||Apr 4, 2006||Oct 12, 2006||Fujitsu Limited||Network based routing scheme|
|US20060282505 *||Jun 30, 2006||Dec 14, 2006||Hasha Richard L||Inter-proximity communication within a rendezvous federation|
|US20070002774 *||Aug 17, 2006||Jan 4, 2007||Microsoft Corporation||Broadcasting communication within a rendezvous federation|
|US20070002888 *||Aug 16, 2006||Jan 4, 2007||International Business Machines Corporation||Apparatus Method and Computer Program to Reserve Resources in Communications System|
|US20070189196 *||Jan 18, 2006||Aug 16, 2007||Santera Systems, Inc.||Dynamic loading for signaling variants|
|US20080031246 *||May 22, 2007||Feb 7, 2008||Microsoft Corporation||Allocating and reclaiming resources within a rendezvous federation|
|US20090094406 *||Oct 5, 2007||Apr 9, 2009||Joseph Ashwood||Scalable mass data storage device|
|US20090319684 *||Jun 25, 2009||Dec 24, 2009||Microsoft Corporation||Subfederation creation and maintenance in a federation infrastructure|
|US20100046399 *||Nov 3, 2009||Feb 25, 2010||Microsoft Corporation||Rendezvousing resource requests with corresponding resources|
|US20120218916 *||May 7, 2012||Aug 30, 2012||Peter Ashwood-Smith||Method and Apparatus for Establishing Forwarding State Using Path State Advertisements|
|U.S. Classification||370/395.32, 370/392, 370/401|
|International Classification||H04L12/66, H04L12/56|
|Cooperative Classification||H04L47/31, H04L47/2408, H04L47/17, H04L45/54, H04L45/04, H04L45/34, H04L47/125, H04L47/10|
|European Classification||H04L45/54, H04L47/17, H04L45/04, H04L47/12B, H04L47/31, H04L47/10, H04L47/24A, H04L45/34|
|Oct 4, 2004||AS||Assignment|
Owner name: CISCO TECHNOLOGY, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MA, QINGMING;REEL/FRAME:015880/0650
Effective date: 20000809
|May 18, 2012||FPAY||Fee payment|
Year of fee payment: 4
|May 18, 2016||FPAY||Fee payment|
Year of fee payment: 8