|Publication number||US20050243829 A1|
|Application number||US 10/534,346|
|Publication date||Nov 3, 2005|
|Filing date||Nov 11, 2003|
|Priority date||Nov 11, 2002|
|Also published as||CN1735878A, CN1736066A, CN1736066B, CN1736068A, CN1736068B, CN1736069A, CN1736069B, CN100557594C, US7522605, US7843951, US7882312, US8472457, US20050246452, US20050257025, US20050265368, US20110069716, WO2004044733A2, WO2004044733A3, WO2004045160A2, WO2004045160A3, WO2004045160A8, WO2004045161A1, WO2004045162A2, WO2004045162A3|
|Publication number||10534346, 534346, PCT/2003/4893, PCT/GB/2003/004893, PCT/GB/2003/04893, PCT/GB/3/004893, PCT/GB/3/04893, PCT/GB2003/004893, PCT/GB2003/04893, PCT/GB2003004893, PCT/GB200304893, PCT/GB3/004893, PCT/GB3/04893, PCT/GB3004893, PCT/GB304893, US 2005/0243829 A1, US 2005/243829 A1, US 20050243829 A1, US 20050243829A1, US 2005243829 A1, US 2005243829A1, US-A1-20050243829, US-A1-2005243829, US2005/0243829A1, US2005/243829A1, US20050243829 A1, US20050243829A1, US2005243829 A1, US2005243829A1|
|Original Assignee||Clearspeed Technology Pic|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (50), Referenced by (12), Classifications (26), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention concerns the management of traffic, such as data and communications traffic, and provides an architecture for a traffic manager that surpasses known traffic management schemes in terms of speed, efficiency and reliability.
The problem that modern traffic management schemes have to contend with is the sheer volume. Data arrives at a traffic handler from multiple sources at unknown rates and volumes and has to be received, sorted and passed on “on the fly” to the next items of handling downstream. Received data may be associated with a number of attributes by which priority allocation, for example, is applied to individual data packets or streams, depending on the class of service offered to an individual client. Some traffic may therefore have to be queued whilst later arriving but higher priority traffic is processed. A router's switch fabric can deliver packets from multiple ingress ports to one of a number of egress ports. The linecard connected to this egress port must then transmit these packets over some communication medium to the next router in the network. The rate of transmission is normally limited to a standard rate. For instance, an OC-768 link would transmit packets over an optical fibre at a rate of 40 Gbits/s.
With many independent ingress paths delivering packets for transmission at egress, the time-averaged rate of delivery cannot exceed 40 Gbits/s for this example. Although over time the input and output rates are equivalent, the short term delivery of traffic by the fabric is “bursty” in nature with rates often peaking above the 40 Gbits/s threshold. Since the rate of receipt can be greater than the rate of transmission, short term packet queueing is required at egress to prevent packet loss. A simple FIFO queue is adequate for this purpose for routers which provide a flat grade of service to all packets. However, more complex schemes are required in routes which provide Traffic Management. In a converged internetwork, different end user applications require different grades of service in order to run effectively. Email can be carried on a best effort service where no guarantees are made regarding rate of or delay in delivery. Real-time voice data has a much more demanding requirement for reserved transmission bandwidth and guaranteed minimum delay in delivery. This cannot be achieved if all traffic is buffered in the same FIFO queue. A queue per so-called “Class of Service” is required so that traffic routed through higher priority queues can bypass that in lower priority queues. Certain queues may also be assured a guaranteed portion of the available output line bandwidth. On first sight the traffic handling task appears to be straightforward. Packets are placed in queues according to their required class of service. For every forwarding treatment that a system provides, a queue must be implemented. These queues are then managed by the following mechanisms:
Different service levels can be provided by weighting the amount of bandwidth and buffer space allocated to different queues, and by prioritised packet dropping in times of congestion. Weighted Fair Queucing (WFQ), Deficit Round Robin (DRR) scheduling, Weighted Random Early Detect (WRED) are just a few of the many algorithms which might be employed to perform these scheduling and congestion avoidance tasks. In reality, system realisation is confounded by some difficult implementation issues:
Priority queue ordering for some (FQ) scheduling algorithms is a non-trivial problem at high speeds.
In a conventional approach to traffic scheduling, one might typically place packets directly into an appropriate queue on arrival, and then subsequently dequeue packets from those queues into an output stream.
The traffic scheduler 3 determines the order of dc-queuing. Since the scheduling decision can be processing-intensive as the number of input queues increases, queues are often arranged into small groups which are locally scheduled into an intermediate output queue.
This output queue is then the input queue to a following scheduling stage. The scheduling problem is thus simplified using a “divide-and-conquer” approach, whereby high performance can be achieved through parallelism between groups of queues in a tree type structure, or so-called hierarchical link sharing scheme.
This approach works in hardware up to a point. For the exceptionally large numbers of input queues (of the order 64 k) required for per-flow traffic handling, the first stage becomes unmanageably wide to a point that it becomes impractical to implement the required number of schedulers.
Alternatively, in systems which aggregate all traffic into a small number of queues parallelism between hardware schedulers cannot be exploited. It then becomes extremely difficult to implement a single scheduler—even in optimised hardware—that can meet the required performance point.
With other congestion avoidance and queue management tasks to perform in addition to scheduling, it is apparent that a new approach to traffic handling is required. The queue first, think later strategy often fails and data simply has to be jettisoned. There is therefore a need for an approach to traffic management that does not suffer from the same defects as the prior art and does not introduce its own fallibilities,
In one aspect, the invention provides a system comprising means for sorting incoming data packets in real time before said packets are stored in memory.
In another aspect, the invention provides a data packet handling system, comprising means whereby incoming data packets are assigned an exit order before being stored in memory.
In yet another aspect, the invention provides a method for sorting incoming data packets in real time, comprising sorting the packets into an exit order before storing them in memory.
The sorting means may be responsive to information contained within a packet and/or within a table and/or information associated with a data packet stream in which said packet is located, whereby to determine an exit order number for that packet. The packets may be inserted into one or more queues by a queue manager adapted to insert packets into the queue means in exit order. There may be means to drop certain packets before being output from said queue means or before being queued in the queue means.
The system may be such that the sorting means and the queue means process only packet records containing information about the packets, whereas data portions of the packets are stored in the memory for output in accordance with an exit order determined for the corresponding packet record.
The sorting means preferably comprises a parallel processor, such as an array processor, more preferably a SIMD processor.
There may be further means to provide access for the parallel processors to shared state. A state engine may control access to the shared state.
Tables of information for sorting said packets or said packet records may be provided, wherein said tables are stored locally to each processor or to each processor element of a parallel processor. The tables may be the same on each processor or on each processor element of a parallel processor. The tables may be different on different processors or on different processor elements of a parallel processor.
The processors or processor elements may share information from their respective tables, such that: (a) the information held in the table for one processor is directly accessible by a different processor or the information held in the table in one processor element may be accessible by other processing element(s) of the processor; and (b) processors may have access to tables in other processors or processor elements have access to other processor elements in the processor, whereby processors or processor elements can perform table lookups on behalf of other processor(s) or processor elements of the processor.
The invention also encompasses a computer system, comprising a data handling system as previously specified; a network processing system, comprising a data handling system as previously specified; and a data carrier containing program means adapted to perform a corresponding method.
The invention will be described with reference to the following drawings, in which:
The present invention turns current thinking on its head.
Packet data (traffic) received at the input 20 has the header portions stripped off and record portions of fixed length generated therefrom, containing information about the data, so that the record portions and the data portions can be handled separately. Thus, the data portions take the lower path and are stored in Memory Hub 21. At this stage, no attempt is made to organise the data portions in any particular order. However, the record portions are passed to a processor 22, such as a SIMD parallel processor, comprising one or more arrays of processor elements (PEs). Typically, each PE contains its own processor unit, local memory and register(s).
In contrast to the prior architecture outlined in
The record portions are handled in the processor 22. Here, information about the incoming packets is distributed amongst the PEs in the array. This array basically performs the same function as the processor 3 in the prior art (
Previous systems in which header and data portions were treated as one entity became unwieldy, slow and cumbersome because of the innate difficulty of preserving the integrity of the whole packet yet still providing enough bandwidth to handle the combination. In the present invention, it is only necessary for the Memory Hub 21 to provide sufficient bandwidth to handle just the data portions. The memory hub can handle packets streaming in at real time. The memory hub can nevertheless divide larger data portions into fragments, if necessary, and store them in physically different locations, provided, of course, there are pointers to the different fragments to ensure read out of the entire content of such data packets.
In order to overcome the problem of sharing state over all the PEs in the array, multiple PEs are permitted to access (and modify) the state variables. Such access is under the control of a State Engine (not shown), which automatically handles the “serialisation” problem of parallel access to shared state.
The output 25, in dependence on the exit order queue held in the Orderlist Manager 24, instructs the Memory Hub 21 to read out the corresponding packets in that required order, thereby releasing memory locations for newly received data packets in the process.
The chain-dotted line 26 enclosing the PE array 22, shared state/State Engine 23 and Orderlist Manager 24 signifies that this combination of elements can be placed on a single chip and that this chip can be replicated, so that there may be one or two (or more) chips interfacing with single input 20, output 25 and Memory Hub 21. As is customary, the chip will also include necessary additional components, such as a distributor and a collector per PE array to distribute data to the individual PEs and to collect processed data from the PEs, plus semaphore block(s) and interface elements.
The following features are significant to the new architecture:
This technique is made possible by the deployment of a high performance data flow processor which can perform the required functions at wire speed. Applicant's array processor is ideal for this purpose, providing a large number of processing cycles per packet for packets arriving at rates as high as one every couple of system clock cycles.
Class of Service (CoS) Tables:
CoS parameters are used in scheduling and congestion avoidance calculations. They are conventionally read by processors as a fixed group of values from a class of service table in a shared memory. This places further demands on system bus and memory access bandwidth. The table size also limits the number of different classes of service which may be stored.
An intrinsic capability of Applicant's array processor is rapid, parallel local memory access. This can be used to advantage as follows:
Table sharing between PEs—PEs can perform proxy lookups on behalf of each other. A single CoS table can therefore be split across two PEs, thus halving the memory requirement.
It can thus be appreciated that the present invention is capable of providing the following key features, marking considerable improvements over the prior art:
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5187780 *||Apr 7, 1989||Feb 16, 1993||Digital Equipment Corporation||Dual-path computer interconnect system with zone manager for packet memory|
|US5513134 *||Feb 21, 1995||Apr 30, 1996||Gte Laboratories Incorporated||ATM shared memory switch with content addressing|
|US5633865 *||Jul 29, 1996||May 27, 1997||Netvantage||Apparatus for selectively transferring data packets between local area networks|
|US5751987 *||May 4, 1995||May 12, 1998||Texas Instruments Incorporated||Distributed processing memory chip with embedded logic having both data memory and broadcast memory|
|US5768275 *||Apr 15, 1996||Jun 16, 1998||Brooktree Corporation||Controller for ATM segmentation and reassembly|
|US5822608 *||Sep 6, 1994||Oct 13, 1998||International Business Machines Corporation||Associative parallel processing system|
|US5956340 *||Aug 5, 1997||Sep 21, 1999||Ramot University Authority For Applied Research And Industrial Development Ltd.||Space efficient fair queuing by stochastic Memory multiplexing|
|US6052375 *||Nov 26, 1997||Apr 18, 2000||International Business Machines Corporation||High speed internetworking traffic scaler and shaper|
|US6088771 *||Oct 24, 1997||Jul 11, 2000||Digital Equipment Corporation||Mechanism for reducing latency of memory barrier operations on a multiprocessor system|
|US6094715 *||Jun 7, 1995||Jul 25, 2000||International Business Machine Corporation||SIMD/MIMD processing synchronization|
|US6097403 *||Mar 2, 1998||Aug 1, 2000||Advanced Micro Devices, Inc.||Memory including logic for operating upon graphics primitives|
|US6160814 *||May 27, 1998||Dec 12, 2000||Texas Instruments Incorporated||Distributed shared-memory packet switch|
|US6314489 *||Jul 10, 1998||Nov 6, 2001||Nortel Networks Limited||Methods and systems for storing cell data using a bank of cell buffers|
|US6356546 *||Aug 11, 1998||Mar 12, 2002||Nortel Networks Limited||Universal transfer method and network with distributed switch|
|US6396843 *||Oct 30, 1998||May 28, 2002||Agere Systems Guardian Corp.||Method and apparatus for guaranteeing data transfer rates and delays in data packet networks using logarithmic calendar queues|
|US6643298 *||Nov 23, 1999||Nov 4, 2003||International Business Machines Corporation||Method and apparatus for MPEG-2 program ID re-mapping for multiplexing several programs into a single transport stream|
|US6662263 *||Mar 3, 2000||Dec 9, 2003||Multi Level Memory Technology||Sectorless flash memory architecture|
|US6829218 *||Sep 15, 1998||Dec 7, 2004||Lucent Technologies Inc.||High speed weighted fair queuing system for ATM switches|
|US6907041 *||Mar 7, 2000||Jun 14, 2005||Cisco Technology, Inc.||Communications interconnection network with distributed resequencing|
|US6993027 *||Mar 17, 2000||Jan 31, 2006||Broadcom Corporation||Method for sending a switch indicator to avoid out-of-ordering of frames in a network switch|
|US6996117 *||Sep 19, 2002||Feb 7, 2006||Bay Microsystems, Inc.||Vertical instruction and data processing in a network processor architecture|
|US7035212 *||Jan 25, 2001||Apr 25, 2006||Optim Networks||Method and apparatus for end to end forwarding architecture|
|US7126959 *||Jul 15, 2002||Oct 24, 2006||Tropic Networks Inc.||High-speed packet memory|
|US7342887 *||Jul 20, 2006||Mar 11, 2008||Juniper Networks, Inc.||Switching device|
|US7382787 *||Jun 20, 2002||Jun 3, 2008||Cisco Technology, Inc.||Packet routing and switching device|
|US7499456 *||Jan 9, 2007||Mar 3, 2009||Cisco Technology, Inc.||Multi-tiered virtual local area network (VLAN) domain mapping mechanism|
|US7522605 *||Nov 11, 2003||Apr 21, 2009||Clearspeed Technology Plc||Data packet handling in computer or communication systems|
|US20010021174 *||Mar 5, 2001||Sep 13, 2001||International Business Machines Corporation||Switching device and method for controlling the routing of data packets|
|US20010021967 *||May 17, 2001||Sep 13, 2001||Tetrick Raymond S.||Method and apparatus for arbitrating deferred read requests|
|US20010024446 *||Mar 21, 2001||Sep 27, 2001||Craig Robert George Alexander||System and method for adaptive, slot-mapping input/output queuing for TDM/TDMA systems|
|US20020031086 *||Feb 16, 2001||Mar 14, 2002||Welin Andrew M.||Systems, processes and integrated circuits for improved packet scheduling of media over packet|
|US20020036984 *||Jun 4, 2001||Mar 28, 2002||Fabio Chiussi||Method and apparatus for guaranteeing data transfer rates and enforcing conformance with traffic profiles in a packet network|
|US20020062415 *||Sep 19, 2001||May 23, 2002||Zarlink Semiconductor N.V. Inc.||Slotted memory access method|
|US20020064156 *||Apr 20, 2001||May 30, 2002||Cyriel Minkenberg||Switching arrangement and method|
|US20020075882 *||Jun 15, 2001||Jun 20, 2002||Marc Donis||Multiple priority buffering in a computer network|
|US20020118689 *||Sep 27, 2001||Aug 29, 2002||Luijten Ronald P.||Switching arrangement and method with separated output buffers|
|US20030081623 *||Oct 27, 2001||May 1, 2003||Amplify.Net, Inc.||Virtual queues in a single queue in the bandwidth management traffic-shaping cell|
|US20030174699 *||Jul 15, 2002||Sep 18, 2003||Van Asten Kizito Gysbertus Antonius||High-speed packet memory|
|US20030179644 *||Jun 21, 2002||Sep 25, 2003||Ali Anvar||Synchronous global controller for enhanced pipelining|
|US20030188056 *||Mar 27, 2002||Oct 2, 2003||Suresh Chemudupati||Method and apparatus for packet reformatting|
|US20030227925 *||Jan 21, 2003||Dec 11, 2003||Fujitsu Limited||Packet processing device|
|US20040022094 *||Feb 5, 2003||Feb 5, 2004||Sivakumar Radhakrishnan||Cache usage for concurrent multiple streams|
|US20040044815 *||Aug 28, 2002||Mar 4, 2004||Tan Loo Shing||Storage replacement|
|US20040117715 *||Nov 24, 2003||Jun 17, 2004||Sang-Hyuck Ha||Method and apparatus for controlling turbo decoder input|
|US20040213291 *||Dec 14, 2000||Oct 28, 2004||Beshai Maged E.||Compact segmentation of variable-size packet streams|
|US20050163049 *||Mar 16, 2005||Jul 28, 2005||Takeki Yazaki||Packet shaper|
|US20050167648 *||Mar 28, 2005||Aug 4, 2005||Chang-Hasnain Connie J.||Variable semiconductor all-optical buffer using slow light based on electromagnetically induced transparency|
|US20050243829 *||Nov 11, 2003||Nov 3, 2005||Clearspeed Technology Pic||Traffic management architecture|
|US20050265368 *||Nov 11, 2003||Dec 1, 2005||Anthony Spencer||Packet storage system for traffic handling|
|US20070171900 *||Apr 4, 2007||Jul 26, 2007||Beshai Maged E||Data Burst Scheduling|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7689879||May 9, 2006||Mar 30, 2010||Micron Technology, Inc.||System and method for on-board timing margin testing of memory modules|
|US7823024 *||Jul 24, 2007||Oct 26, 2010||Micron Technology, Inc.||Memory hub tester interface and method for use thereof|
|US7856543 *||Feb 14, 2002||Dec 21, 2010||Rambus Inc.||Data processing architectures for packet handling wherein batches of data packets of unpredictable size are distributed across processing elements arranged in a SIMD array operable to process different respective packet protocols at once while executing a single common instruction stream|
|US7913122||Dec 30, 2008||Mar 22, 2011||Round Rock Research, Llc||System and method for on-board diagnostics of memory modules|
|US7917727 *||May 23, 2007||Mar 29, 2011||Rambus, Inc.||Data processing architectures for packet handling using a SIMD array|
|US7958412||Feb 24, 2010||Jun 7, 2011||Round Rock Research, Llc||System and method for on-board timing margin testing of memory modules|
|US8127112 *||Dec 10, 2010||Feb 28, 2012||Rambus Inc.||SIMD array operable to process different respective packet protocols simultaneously while executing a single common instruction stream|
|US8208432 *||Sep 4, 2008||Jun 26, 2012||Hitachi Kokusai Electric Inc.||Communication equipment|
|US8472455 *||Jan 8, 2010||Jun 25, 2013||Nvidia Corporation||System and method for traversing a treelet-composed hierarchical structure|
|US8472457||Nov 29, 2010||Jun 25, 2013||Rambus Inc.||Method and apparatus for queuing variable size data packets in a communication system|
|US20050243829 *||Nov 11, 2003||Nov 3, 2005||Clearspeed Technology Pic||Traffic management architecture|
|US20110170557 *||Jul 14, 2011||Nvidia Corporation||System and Method for Traversing a Treelet-Composed Hierarchical Structure|
|International Classification||H04L12/54, H04L12/823, H04L12/875, H04L12/861, H04L12/869, H04L12/851, H04L12/863|
|Cooperative Classification||H04L47/562, H04L47/624, H04L49/90, H04L47/6215, H04L47/2441, H04L49/9042, H04L47/60, H04L47/32, H04L12/5693|
|European Classification||H04L12/56K, H04L47/62C, H04L47/56A, H04L47/60, H04L47/24D, H04L47/62E, H04L47/32, H04L49/90K, H04L49/90|
|Jul 15, 2005||AS||Assignment|
Owner name: CLEARSPEED TECHNOLOGY PLC, UNITED KINGDOM
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SPENCER, ANTHONY;REEL/FRAME:016267/0314
Effective date: 20050629
|Jun 24, 2010||AS||Assignment|
Owner name: CLEARSPEED TECHNOLOGY LIMITED,UNITED KINGDOM
Free format text: CHANGE OF NAME;ASSIGNOR:CLEARSPEED TECHNOLOGY PLC;REEL/FRAME:024576/0975
Effective date: 20090729
Owner name: CLEARSPEED TECHNOLOGY LIMITED, UNITED KINGDOM
Free format text: CHANGE OF NAME;ASSIGNOR:CLEARSPEED TECHNOLOGY PLC;REEL/FRAME:024576/0975
Effective date: 20090729
|Sep 10, 2010||AS||Assignment|
Owner name: RAMBUS INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CLEARSPEED TECHNOLOGY LTD;REEL/FRAME:024964/0861
Effective date: 20100818