WO2002033429A1 - Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines - Google Patents

Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines Download PDF

Info

Publication number
WO2002033429A1
WO2002033429A1 PCT/US2001/032334 US0132334W WO0233429A1 WO 2002033429 A1 WO2002033429 A1 WO 2002033429A1 US 0132334 W US0132334 W US 0132334W WO 0233429 A1 WO0233429 A1 WO 0233429A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
message
data
interconnect structure
Prior art date
Application number
PCT/US2001/032334
Other languages
French (fr)
Inventor
Coke S. Reed
John E. Hesse
Original Assignee
Interactic Holdings, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactic Holdings, Llc filed Critical Interactic Holdings, Llc
Priority to EP01987883A priority Critical patent/EP1261881A4/en
Priority to JP2002536565A priority patent/JP3950048B2/en
Priority to IL150282A priority patent/IL150282A/en
Priority to AU2002224391A priority patent/AU2002224391A1/en
Publication of WO2002033429A1 publication Critical patent/WO2002033429A1/en
Priority to HK03106329.6A priority patent/HK1054267B/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/08Locating faults in cables, transmission lines, or networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/04Interdomain routing, e.g. hierarchical routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/06Deflection routing, e.g. hot-potato routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/30Routing of multiclass traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules

Definitions

  • the present invention relates to interconnection structures for computing and communication systems. More particularly the instant invention relates to a multiple level interconnection structure having a plurality of nodes wherein each node sends messages to other nodes and each node can accommodate a plurality of simultaneous inputs and can decide where to send messages using examination of nodes located at levels more than one level below the node sending a particular message.
  • the invention also provides a system in which latency is lower than in the prior art (described below) at the expense of a modest increase in the control logic.
  • the Internet advanced computing systems, such as massively parallel computers and advanced telecommunications systems all require an interconnection structure that reduces control and logic circuits while providing low latency and high throughput.
  • the Reed Patent describes a network and interconnect structure which utilizes a data flow technique that is based on timing and positioning of messages communicating throughout the interconnect structure. Switching control is distributed throughout multiple nodes in the structure so that a supervisory controller providing a global control function and complex logic structures are avoided.
  • the interconnect structure operates as a "deflection" or "hot potato” system in which processing and storage overhead at each node is minimized. Elimination of a global controller and also of buffering at the nodes greatly reduces the amount of control and logic structures in the interconnect structure, simplifying overall control components and network interconnect components while improving throughput and low latency for message communication.
  • the Reed Patent describes a design in which processing and storage overhead at each node is greatly reduced by routing a message packet through an additional output port to a node at the same level in the interconnect structure rather than holding the packet
  • the interconnect sti uGtui e includes a plurality of nodes and a plurality of interconnect lines selectively connecting the nodes in a multiple level structure in which the levels include a ⁇ chly interconnected collection of nngs with the multiple level structure including a plurality of J+l levels in a hierai chy of levels and a plurality of C»2 k nodes at each level (C is a an integer representing the number of angles)
  • Control information is sent to resolve data transmission conflicts in the mtei connect structure where each node is a successor to a node on an adjacent outer level and an immediate successor to a node on the same level Message data from an immediate predecessor has priority
  • Control information is sent from nodes on a level to nodes on the adjacent outer level to warn of impending conflicts
  • the Reed Patent is a substantial advance over the pnor art it is essentially a "look one step ahead" system in which messages proceed thiough the mtei connect structure based on the availability of an input port at a node, either at the same level as the message or at a lower level closer to the message's terminal destination Nodes in the Reed Patent could be capable of receiving a plurality of simultaneous messages at the input poits of each node
  • the Reed Patent did teach that each node could take into account information from a level more than one level below the cut rent level of the message, thus, reducing throughput and achieving reduction of latency in the netwoik
  • the interconnect structure using the scalable low-latency switch described in the Hesse Patent employs a method of achieving wormhole routing by a novel procedure for inserting messages into the network.
  • the scalable low-latency switch is made up of a large number of extremely simple control cells (nodes) which are arranged into arrays.
  • the number of nodes in an array is a design parameter typically in the range of 64 to 1024 and is usually a power of 2, with the arrays being arranged into levels and columns.
  • Each node has two data input ports and two data output ports wherein the nodes can be formed into more complex designs, such as "paired-node" designs which are combined to form larger units,
  • Hesse Patent messages are not simultaneously inserted into all the unblocked nodes on the outer cylinder of an array but are inserted simultaneously into two columns A and B of the array, only if an entire message fits between A and B.
  • This strategy advantageously prevents the first bit of one message from colliding with an interior bit of another message already in the switch. Therefore, contention between entire messages is addressed by resolving the contention between the first bit only of two contending messages with the desirable outcome that messages wormhole through many nodes in the interconnect structure.
  • Hesse Patent is certainly an improvement over the prior art, it is still essentially a "look one step ahead" system combined with wormhole routing, Additional improvements are possible to provide a low-latency, high throughput, interconnect structure and this invention is directed to such improvements.
  • an interconnect structure comprises a plurality of nodes with a plurality of interconnect lines selectively coupling the nodes in a hierarchical multiple level structure.
  • the level of a node within the structure is determined by the position of the node in the structure in which data moves from a source level to a destination level or alternatively laterally along a level of the multiple level structure.
  • Data messages are transmitted through the multiple level structure from a source node to one of a plurality of designated destination nodes.
  • each node included within said plurality of nodes has a plurality of input ports and a plurality of output ports, each node capable of receiving simultaneous data messages at two or more of its input ports, It is a further feature of the invention that each node is capable of receiving simultaneous data messages if the node is able to transmit each of said received data messages through separate ones if it's output ports to separate nodes in said interconnect structure.
  • a node in the interconnect structure can receive information regarding nodes more than one level below the node receiving the data messages.
  • FIGS. 1 and 2 illustrate part of the interconnection structure utilized in accordance with the present invention.
  • Figs. 3A-3C illustrate alternate node connections in accordance with the present invention.
  • FIG. 4 illustrates three levels of an interconnect structure which is applicable for use with the present invention
  • FIG. 5 illustrates an interconnect block diagram to show interconnection of various nodes within the interconnect structure of the present invention
  • FIGS. 6 A and 7 illustrate interconnection of control and message lines between various nodes
  • FIGS. 6B and 6C illustrate interconnections between nodes in a portion of an interconnect structure and show data paths through one of the nodes; and FIG. 8 illustrates an alternative arrangement of cell nodes in accordance with one embodiment of the present invention.
  • the present invention incorporates by reference the interconnect structure set forth in U . S Patent No. 5,996,020 (“the Reed Patent”), and U.S. Patent Application Serial No 09/009,703, filed on January 20, 1998, (“the Hesse Patent”).
  • the Reed Patent nodes are arranged in a cylindrical formation and in the Hesse Patent nodes are arranged in rows and columns. Both patents also describe various types of node configurations that can be used with the interconnect structure of the present invention. It is to be understood that all aspects of the Reed and Hesse patents, both in the interconnect structure and node configuration, are applicable to the present invention.
  • FIG. 1 there is shown an interconnect structure such as was described in the Reed Patent.
  • Three nodes are illustrated in FIG. 1.
  • the two nodes A, 102 and B, 104 are positioned to send messages directly to a third node C, 106.
  • Nodes B and C are on a level N of the network and node A is on a level N+1 of the network.
  • node B has priority over node A to send data to node C.
  • node B sends a message MB to node C on path 114
  • node B sends a control signal 120 informing A of the sending of MB to C so that A does not send a message MA to C in a time period that would conflict with the message MB.
  • A will route MA to C on path 1 12. If either of these conditions does not hold, then A will send MA to a node (not shown) distinct from C, with that node being on level N+1 of the network.
  • nodes A and B are said to be at the same angle on difiei ent cylinders.
  • nodes A and B are said to be in the same column on diffeient levels.
  • Four nodes are illustrated in FIG. 2.
  • Nodes B, C, and D are on level N of the network and node A is on level N+ 1 of the network. All of the output ports of the network that can be reached from node B can also be reached from nodes C and D. There are output ports than can be reached from A that cannot be reached from C. For this reason, when a message travels from A to C the set of output ports that the message can reach is narrowed.
  • node C has the highest priority to send messages to node D as node C is on the same level as node D. For this reason, when only one message M arrives at node C in a given time period, that message M can always travel to node D, and there is a path from D to a targeted output port of M. Therefore, it is not necessary to have a buffer at node C, and by the same argument buffers are not used at any other nodes. In the Reed and Hesse patents, a message MA is not allowed to travel from A to C unless the logic associated with node A is informed that B will not send a conflicting message to C.
  • FIG. 3 A there is shown a portion of the interconnect structure taught in the Reed Patent.
  • the Reed Patent only one message could enter C during a particular time interval.
  • two simultaneous messages may be allowed to enter node C so that messages from A to C and from B to C are allowed to enter node C at the same time.
  • FIG. 3B illustrates a portion of the interconnect structure used in the Flesse Patent.
  • Data path 306 accepts a message from either A or B and can transmit only a single message to C
  • the nodes of FIG. 3B can be modified as illustrated in FIG. 3 C with an additional path 3 16 from node B to C so that both nodes A and B can send to C.
  • node A uses data paths 304 and 306 to send to C; in FIG. 3C node A uses paths 314 and 3 16 to send to C.
  • the Hesse Patent, as well as the Reed Patent did not allow a particular node to accept two simultaneous messages, as is possible with the present invention.
  • the improvements of the pi esent invention can, however be readily applied to the Reed and Hesse configurations by changing the embodiment of 3B to the embodiment of 3C.
  • FIG. 4 illustrates a first embodiment of the present invention.
  • Node C is capable of sending data to node H
  • node B sends a message MB to C and that message travels from C to H
  • node A can send a message MA to C which will arrive at C simultaneously with the message MB.
  • Message MA can then travel from C to D in the same time period that MB travels from C to H
  • the ability of a node to accept two messages at the same time is one advantage of the present invention, and is a novel improvement over the earlier Reed and Hesse patents.
  • the routing of messages by C can depend upon quality of service (QOS).
  • QOS quality of service
  • a part of the header contains quality of service information so that when MA and MB travel to C, then C will route MB to H and MA to D unless the QOS level of MA is higher than the QOS level of MB in which case, C will route MA to H and MB to D, as illustrated in FIG. 6C.
  • QOS quality of service
  • a control signal 120 (FIG, 1 ) was sent to node A from B informing A whether or not A is blocked from sending a message to C, This blocking was guaranteed not to take place if B was not sending a message to C.
  • A was not allowed to send a message to C if. in the same time period, B sent a message to C,
  • A is allowed to send a message to C in the same time period that B sends a message to C if the message from B to C is guaranteed not to use the line from C to D, but instead uses the line from C to H, (See FIG. 4).
  • Logic associated with node A is capable of routing a message MA to node C. There is at least one additional node N, not pictured, so that the logic associated with node A is capable of routing MA to N, In case A routes MA to C, then logic associated with node C is capable of routing MA to nodes D and H, In this manner, the message MA can travel from A to D and the message MB can travel from B to H.
  • the logiG associated with A is incapable of routing MA to either D or H.
  • logic associated with B is able to route a message MB from B to C and logic associated with C can route MB to either node D or node H, So that while the message MB is able to travel from B to D or from B to H, the logic associated with node B is not capable of routing message MB to either node D or node H
  • FIG. 5 is a block diagram of a portion of a network described in the Hesse Patent.
  • Nodes are arranged in arrays.
  • the node arrays are arranged into rows and columns.
  • Node arrays in the rightmost column are connected back to node arrays in the leftmost column at the same level so that, for example the output B of column K- l of level J-l forms the input B of column 0 of level J-l
  • the node A is a node in the array in level N+1 of column M
  • B is in a node ai ray of level N of column M
  • C is in a node in the node array on level N in column M+l
  • D is in the node array in level N in column M+2
  • H is a node in the node an ay on el N- l in column M+2
  • FIGS 1 , 2, 3, 4, 6, 7 and 8 show connections between individual nodes that are members of node arrays as illustrated in FIG 5
  • FIG 6A is a furthei desd iption of an embodiment of the invention
  • there is an additional node E on level N and two additional nodes F and G on level N- 1 E can send a message to G F can send a message to G, and G can send messages to H
  • nodes read only one address bit in the header
  • B sends MB to C
  • C will read the same header address bit of MB that B leads
  • the topology of the network is such that the logic of B could determine if H is on a path to a target of MB
  • a single address bit of MB determines whether H is on a path to a target of MB, and that address bit is the same bit that is read by the logic for node B
  • It is also the same bit that will be read by the logic for node C when MB arrives at C If IT is on a path to a taiget of MB and there is no message distinct from MB arriving at H at the same time that MB would arrive there, then MB would travel first from B to C and then from C to H as illustrated in FIG 6B Messages arriving at H at the same time as MB would a ⁇ ive must come from either E oi
  • the control signal from B to A indicates whether or not B is sending a message to C, and additionally if there is a path from H to a target output port of MB.
  • the control signal from F to E indicates whether or not F is sending a message to G.
  • the control signal from E to A indicates whether or not either of E or F is sending a message to G, Node A advantageously is provided with all the information it needs to determine where to send MA. Specifically:
  • control signal from B to A indicates that there is a message MB at B and there is a path from H to the target output of MB
  • control signal from E to A indicates that there is no competing message being sent from E to G or from F to G, whereby node A determines that MB will travel from C to H, thereby not using the path from C to D for MB
  • A sends MA to a node (not shown) distinct fiom C that is on the same level as A.
  • C node distinct from C (not shown).
  • a feature of the above logic is that whenever two messages arrive simultaneously at a node, at least one of those messages will be allowed to drop to a lower level.
  • FIG. 7 has the same nodes as FIG. 6A but instead of the control line from E to A, has a control line CFB from F to B and an additional control line CEB from E to B.
  • the control line CFB sends information from F to B in the form of a single bit x.
  • the bit x is set to zero provided that the logiG at F determines that there is no message being sent from F to G that Gould arrive at H in the same time period as a message traveling from B to H F can set x to zero provided that either:
  • Control line CEB from E to B sends information in the form of a single bit y.
  • Bit y is set to zero if E is not sending a message from E to G that could arrive at H at the same time as a message traveling from B to H,
  • Node B does not use the information contained in the bits x and y in order to determine where to send its messages; it uses information from still another control line from a node on level N- l (not shown) in order to determine where to send its own message.
  • Node B uses the information in lines CEB and CFB in order to be able to send a control signal to A using the control line CBA.
  • Node B sends a single bit z on the control line CBA. Assume that exactly one message MA arrives at node A. Then MA is sent from node A to C, provided that the bit z is zero and C lies on a path to a target of MA. The bit z is set to zero provided that either:
  • B sends no message MB from B to C in a time period that could Gause a collision with a message MA from A, or
  • B sends a message MB to C, and based on the information contained in x and y, and in the header of MB, the logic at B determines that it is guaranteed that MB will travel from C to H.
  • Node A is able to route an incoming message MA based on the header of MA and on the value of the single bit z, In case two messages MA and MA' arrive simultaneously at A, then one of those two messages is sent to C according to the above logic, and the other message is sent to a node distinct from C (not shown).
  • a feature of the above logic is that one of the two messages MA and MA' will be allowed to drop to C. In particular, the messages MA and MA' are not routed to the same output port of A,
  • nodes in accordance with the present embodiment are able to route messages based on one header address bit and on control bits from lower levels. In this way the timing is the same as the timing in the Reed and Hesse patents. Lmportantly, with the embodiment of FIG, 7, node A is able to send a message to C in a case where node A using the logic of FIG. 6A was not able to send a message to C but instead sent its message to a node on level N+1.
  • nodes read the bit of the header that indicates that a message is present and they read one additional header address bit. They may also read additional bits such as quality of service bits. In accordance with a further embodiment of the invention nodes may also read multiple address bits. Referring to FIG. 6A, in an alternate embodiment the nodes read two address bits in the message header.
  • FIG. 6A (and earlier FIGS.) allows message MA to " Gross over" message MB at node C, such that the path of MA goes through nodes A, C, and H, and the path of MB goes through the nodes B, C, and D, as illustrated in FIG. 6C
  • An objective of this embodiment is to provide the nodes with information needed to determine when a message MA is permitted to cross over a competing message MB which passes through a common node C at the same time.
  • a message MA arrives at node A which reads one header bit that indicates whether or not there is a path through C to a target of MA.
  • Node A also reads an additional header bit that indicates if there is a path through H to a target of MA.
  • the control signal from E to A guarantees that no message from E or F will arrive at H at the same time as MA
  • the control signal from B to A indicates if there is a message MB at B that will arrive at C at the same time as the message MA and, if so, whether MB is guaranteed not to pass through H
  • node A sends a message MA to C provided that at least one of the following conditions is satisfied:
  • the first condition (1) above is discussed above, and the second condition pertains to the "cross over" case. If neither of the above conditions is satisfied, then A will send MA to a node (not shown) other than C, which node will be on level N+1. The case in which two messages MA and MA ' appear simultaneously at node A is handled as described above. Reading two header bits allows us to detect condition (2) above.
  • node A can send data to node H via node C, while node F can send data to node H via node G.
  • the control signals x and z enforce a priority of the transfer of data from F to H over the transfer of data from A to H
  • the nodes A and H of FIG. 8 are on level N- 1 in column K+2,
  • the nodes B and C at level N of column K+l are positioned to send data directly to A and H
  • the nodes U and V of level N+ 1 in column K are able to send data direGtly to B
  • the nodes W and X of level N+1 in column are able to send data directly to C.
  • the node B receives data directly from the node D at level N and sends data directly to node L at level N.
  • the node C receives data directly from node E at level N, and sends data directly to node M at level N. Not pictured in FIG.
  • Node D uses information from a node in R (not shown) and node E uses the identical information from node D,
  • the control information that node D receives from a node in R enables node D to determine if the paths from node B to node A and node H are unblocked,
  • FIG. 8 illustrates a portion of a data interconnect structure where each node C on a given level N is positioned to receive data from two nodes on level N+1 and one node on level N, and is also positioned to send data to two nodes on level N-l and one node on level N.
  • Networks with this data interconnect structure are referred to in the Reed Patent as the Multiple Interconnection to the Next Level Embodiment and in the Hesse Patent as the Flat Latency Embodiment.
  • the control interconnect is described in the Reed and Hesse Patents, the teachings of which are incorporated herein by reference.
  • the data interconnect structure is as described in the Reed and Hesse Patents, but the nodes are more sophisticated in that they receive and process more control information in order to increase throughput and achieve lower latency. Since the nodes are unbuffered, messages entering a node must be capable of leaving the node immediately and proceed to another node that is in route to a target output. Whenever two messages leave a node, one must continue along the same level and one must drop a level. The correct operation depends upon priority rules enforced by control signals.
  • Node B has priority over node C to send data to nodes A and H.
  • Node D has priority over nodes U and V to send data to node B, and node U has priority over node V to send data to node B.
  • node E has priority over nodes W and X to send data to node C, and node W has priority over node X to send data to node C.
  • control signals enter nodes D and E from nodes on column I
  • messages may enter nodes D and E. Based on the possible messages entering node D, and the control signals node D receives, node D may or may not send a message to node B.
  • node D sends a control signal to nodes U and E indicating that either: 1) no message has been sent from node D to node B; 2) a message M.D has been sent to node B, and when M.D arrives at node B, node B will direct MD to node A; 3) a message MD has been sent to node B, and when MD arrives at node B, node B will send the message MD to node H; or 4) a message M.D has been sent to node B, and it is possible that the message M ' D will travel from node B to node L, In cases 1 , 2 and 3, if there is a message at MU at node U, such that MU can reach its target through node B, then the message MU will be sent to node B, and no message from node V will be allowed to travel to node B.
  • node V will be "invited” to send a message to node B. That is to say, if node U does not send a message to node B, then node U will so inform node V by means of a control signal, and if there is a message MV at node
  • node V that can reach its target through node B, then node V will send MV to node B
  • node D is able to predict that node B will route message MD to A based on the information that no other message will arrive at A at a time to conflict with the arrival of MD at A and there is a path from A to a target output port of MD.
  • case 3 In the present invention if cases 2 or 3 hold, and either U or
  • V sends a message to B, then B will receive two messages. This is in contrast to the Reed and Hesse patents where only one message can be sent to B in a given time period.
  • E may or may not send a message to node C
  • the control signal from D to E does not influence the routing of messages by node E, but may influence the control signals that E sends to node W.
  • the logic associated with node E ascertains that one of the following conditions holds: 1) E sends no message to node C; 2) E sends a message ME to C, and when ME arrives at C, C will send ME to A; 3) E sends a message ME to C and when ME arrives at C, C will send ME to H; 4) E sends a message ME to C and the possibility exists that C will route ME to node M.
  • the control signal from D to E is used by the logi ⁇ associated with C to predict the routing of ME by C. This is because it is not allowed for both B and C to route to node A, nor is it allowed for both B and C to route to node H
  • node E sends a non-blocking control signal to node W giving W permission to route to node C.
  • node E sends a blocking control signal to node W and W sends a blocking control signal to X and neither W nor X sends a message to C.
  • the Reed and Hesse Patents essentially looked one step into the future.
  • the two embodiments presented in this invention look two steps into the future, One skilled in the art can use the techniques presented here to look still further into the future.
  • the Hesse Patent taught the design of an electronic switch that carries headers driving an optical switch that carries payloads. In this invention, it makes sense to spend more on the logic of the electronics and, therefore, this invention can be used as an alternative to implementing the switch disclosed in the Hesse Patent.
  • M V will be sent to node B and MU will be sent to a level N+1 node in column K+l .
  • Quality of service header bits can also be used to determine the priority of messages arriving at nodes D and E.
  • the invention includes two embodiments that make use of more control information and more sophisticated nodes to improve the performance of the two preferred embodiments. It will be clear to one skilled in the art that these techniques can be applied to other interconnect structures

Abstract

A network or interconnect structure which includes a plurality of nodes (102, 104, 106) which are interconnected within a hierarchical multiple level structure. The level of each node is determines by the position of the node within the structure and data messages move from node to node from a source level to a designation level. Each node within the interconnect structure is capable of receiving simultaneous data messages (108, 110) at its input ports from any other node and the receiving node is able to transmit each of the received data messages through its output ports to separate nodes in the interconnect structure to one or more levels below the level of the receiving node.

Description

SCALABLE APPARATUS AND METHOD FOR INCREASING THROUGHPUT IN MULTIPLE LEVEL MINIM UM LOGIC NETWORKS USING A PLURALITY OF CONTROL LINES
RELATED PATENTS AND APPLICATIONS
This application is related to U.S. patent application, Serial No. 09/009,703, filed on January 20, 1998, which is pending and is incorporated by reference in its entirety. This application is also related to and incorporates U.S. Patent No. 5,996,020, herein by reference in its entirety.
The disclosed system and operating method are related to subject matter disclosed in the following co-pending patent applications that are incorporated herein in their entirety:
1. U.S. patent application, serial no. , entitled "Scaleable Multipath Wormhole
Interconnect," Attorney Docket No. 8175US, naming John Hesse as inventor, and filed on even date herewith.
2. U.S. patent application, serial number , entitled "Scaleable Interconnect
Structure for Parallel Computing and Parallel Memory Access, Attorney Docket No M-905 1 US, naming Coke Reed and John Hesse as inventors and filed on even date herewith.
3. U.S. patent application, serial number , entitled "Scaleable Interconnect
Structure Utilizing Quality of Service Handling, Attorney Docket No. M905 1 US, naming Coke Reed and John Hesse as inventors and filed on even date herewith.
4. U.S. patent application, serial number , entitled Scaleable Woimhole
Routing Concentrator," Attorney Docket No. M-9458US, naming John Hesse and Coke Reed as inventors and filed on even date herewith.
FIELD OF THE INVENTION
The present invention relates to interconnection structures for computing and communication systems. More particularly the instant invention relates to a multiple level interconnection structure having a plurality of nodes wherein each node sends messages to other nodes and each node can accommodate a plurality of simultaneous inputs and can decide where to send messages using examination of nodes located at levels more than one level below the node sending a particular message. The invention also provides a system in which latency is lower than in the prior art (described below) at the expense of a modest increase in the control logic.
BACKGROUND OF THE INVENTION
The Internet, advanced computing systems, such as massively parallel computers and advanced telecommunications systems all require an interconnection structure that reduces control and logic circuits while providing low latency and high throughput.
One such system is described in U.S. Patent No. 5,996,020, granted to Coke S. Reed on November 30, 1999, ("the Reed Patent"), the teachings of which are incorporated herein by reference. The Reed Patent describes a network and interconnect structure which utilizes a data flow technique that is based on timing and positioning of messages communicating throughout the interconnect structure. Switching control is distributed throughout multiple nodes in the structure so that a supervisory controller providing a global control function and complex logic structures are avoided. The interconnect structure operates as a "deflection" or "hot potato" system in which processing and storage overhead at each node is minimized. Elimination of a global controller and also of buffering at the nodes greatly reduces the amount of control and logic structures in the interconnect structure, simplifying overall control components and network interconnect components while improving throughput and low latency for message communication.
More specifically, the Reed Patent describes a design in which processing and storage overhead at each node is greatly reduced by routing a message packet through an additional output port to a node at the same level in the interconnect structure rather than holding the packet
.'. _. until a desired output port is available With this design the usage of buffei s at each node is eliminated
In accordance with one aspect of the Reed Patent, the interconnect sti uGtui e includes a plurality of nodes and a plurality of interconnect lines selectively connecting the nodes in a multiple level structure in which the levels include a πchly interconnected collection of nngs with the multiple level structure including a plurality of J+l levels in a hierai chy of levels and a plurality of C»2k nodes at each level (C is a an integer representing the number of angles) Control information is sent to resolve data transmission conflicts in the mtei connect structure where each node is a successor to a node on an adjacent outer level and an immediate successor to a node on the same level Message data from an immediate predecessor has priority Control information is sent from nodes on a level to nodes on the adjacent outer level to warn of impending conflicts
Although the Reed Patent is a substantial advance over the pnor art it is essentially a "look one step ahead" system in which messages proceed thiough the mtei connect structure based on the availability of an input port at a node, either at the same level as the message or at a lower level closer to the message's terminal destination Nodes in the Reed Patent could be capable of receiving a plurality of simultaneous messages at the input poits of each node However, in the Reed Patent, there was available only one unblocked node to where an incoming message could be sent so that in practice the nodes in the Reed Patent could not accept simultaneous input messages The Reed Patent, however, did teach that each node could take into account information from a level more than one level below the cut rent level of the message, thus, reducing throughput and achieving reduction of latency in the netwoik
A second appi oach to achieving an optimum network sti uctui e has been shown and described in U S Patent Application Serial No 09/009,703 to John E Hesse filed on January 20, 1998 ("the Hesse Patent") This patent application is assigned to the same entity as is the instant application, and its teachings are also incorporated herein by iefei ence in then entn ety The Hesse Patent describes a scalable low-latency switch which extends the functionality of a multiple level minimum logic interconnect structure, such as is taught in the Reed Patent, for use in computers of all types, networks and communication systems. The interconnect structure using the scalable low-latency switch described in the Hesse Patent employs a method of achieving wormhole routing by a novel procedure for inserting messages into the network. The scalable low-latency switch is made up of a large number of extremely simple control cells (nodes) which are arranged into arrays. The number of nodes in an array is a design parameter typically in the range of 64 to 1024 and is usually a power of 2, with the arrays being arranged into levels and columns. Each node has two data input ports and two data output ports wherein the nodes can be formed into more complex designs, such as "paired-node" designs which are combined to form larger units,
In the Hesse Patent messages are not simultaneously inserted into all the unblocked nodes on the outer cylinder of an array but are inserted simultaneously into two columns A and B of the array, only if an entire message fits between A and B. This strategy advantageously prevents the first bit of one message from colliding with an interior bit of another message already in the switch. Therefore, contention between entire messages is addressed by resolving the contention between the first bit only of two contending messages with the desirable outcome that messages wormhole through many nodes in the interconnect structure.
Although the Hesse Patent is certainly an improvement over the prior art, it is still essentially a "look one step ahead" system combined with wormhole routing, Additional improvements are possible to provide a low-latency, high throughput, interconnect structure and this invention is directed to such improvements.
It is therefore our object of the present invention to provide a high throughput, low- latency interconnect structure which utilizes the advantages of the Reed Patent and the Hesse Patent while achieving improvements over their teachings. It is a further object of the present invention to adopt the interconnect structure shown in the Reed and Hesse Patents but add to the basic structure by improving upon the "look ahead, one step" system described in each of these patents.
It is another object of the present invention to allow each node, as described in the interconnect structure of the Reed and Hesse Patents, to function more efficiently thereby reducing latency and increasing message throughput.
It is a still further object of the present invention to improve the interconnect structure of the Reed and Hesse Patents by allowing each node to accommodate simultaneous messages at node input ports without blocking either message.
It is still another object of the present invention to provide a "look several steps ahead" system in which a node receives control information regarding other nodes on a level more than one level below the level at which the message enters a particular node.
SUMMARY OF THE INVENTION
In accordance with one embodiment of the present invention, an interconnect structure comprises a plurality of nodes with a plurality of interconnect lines selectively coupling the nodes in a hierarchical multiple level structure. The level of a node within the structure is determined by the position of the node in the structure in which data moves from a source level to a destination level or alternatively laterally along a level of the multiple level structure. Data messages are transmitted through the multiple level structure from a source node to one of a plurality of designated destination nodes.
It is a feature of the invention that each node included within said plurality of nodes has a plurality of input ports and a plurality of output ports, each node capable of receiving simultaneous data messages at two or more of its input ports, It is a further feature of the invention that each node is capable of receiving simultaneous data messages if the node is able to transmit each of said received data messages through separate ones if it's output ports to separate nodes in said interconnect structure.
It is a still further feature of the invention that a node in the interconnect structure can receive information regarding nodes more than one level below the node receiving the data messages.
These and other objects and features of the present invention will be more fully appreciated from the following detailed description when taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the Drawings:
FIGS. 1 and 2 illustrate part of the interconnection structure utilized in accordance with the present invention.
Figs. 3A-3C illustrate alternate node connections in accordance with the present invention.
FIG. 4 illustrates three levels of an interconnect structure which is applicable for use with the present invention,
FIG. 5 illustrates an interconnect block diagram to show interconnection of various nodes within the interconnect structure of the present invention,
FIGS. 6 A and 7 illustrate interconnection of control and message lines between various nodes;
FIGS. 6B and 6C illustrate interconnections between nodes in a portion of an interconnect structure and show data paths through one of the nodes; and FIG. 8 illustrates an alternative arrangement of cell nodes in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention incorporates by reference the interconnect structure set forth in U . S Patent No. 5,996,020 ("the Reed Patent"), and U.S. Patent Application Serial No 09/009,703, filed on January 20, 1998, ("the Hesse Patent"). In the Reed Patent nodes are arranged in a cylindrical formation and in the Hesse Patent nodes are arranged in rows and columns. Both patents also describe various types of node configurations that can be used with the interconnect structure of the present invention. It is to be understood that all aspects of the Reed and Hesse patents, both in the interconnect structure and node configuration, are applicable to the present invention.
Referring now to FIG. 1, there is shown an interconnect structure such as was described in the Reed Patent. Three nodes are illustrated in FIG. 1. The two nodes A, 102 and B, 104 are positioned to send messages directly to a third node C, 106. Nodes B and C are on a level N of the network and node A is on a level N+1 of the network. As described in the Reed and Hesse patents, node B has priority over node A to send data to node C. When node B sends a message MB to node C on path 114, node B sends a control signal 120 informing A of the sending of MB to C so that A does not send a message MA to C in a time period that would conflict with the message MB. If there is a path from C to a target output of MA as indicated by the header of MA and there is no blocking signal from B to A then A will route MA to C on path 1 12. If either of these conditions does not hold, then A will send MA to a node (not shown) distinct from C, with that node being on level N+1 of the network.
In the Reed Patent, nodes A and B are said to be at the same angle on difiei ent cylinders. In the Hesse Patent, nodes A and B are said to be in the same column on diffeient levels. Four nodes are illustrated in FIG. 2. Nodes B, C, and D are on level N of the network and node A is on level N+ 1 of the network. All of the output ports of the network that can be reached from node B can also be reached from nodes C and D. There are output ports than can be reached from A that cannot be reached from C. For this reason, when a message travels from A to C the set of output ports that the message can reach is narrowed. Among ah of the nodes in the network, node C has the highest priority to send messages to node D as node C is on the same level as node D. For this reason, when only one message M arrives at node C in a given time period, that message M can always travel to node D, and there is a path from D to a targeted output port of M. Therefore, it is not necessary to have a buffer at node C, and by the same argument buffers are not used at any other nodes. In the Reed and Hesse patents, a message MA is not allowed to travel from A to C unless the logic associated with node A is informed that B will not send a conflicting message to C. This priority of node B over node A of sending data to Node C is enforced by a control signal from B to A. In this way, A will route MA to C provided that A "wants" to send MA to C and A is not prohibited from sending MA to C by a control signal from B to A. In case FIG. 2 is a portion of a network as described in the Reed and Hesse patents, or "Scaleable Multipath Wormhole Interconnect" patent application, node A "wants" to send MA to C provided that there is a path from C to target output port of MA as specified in the header of MA, In case FIG. 2 is a portion of the interconnect structure taught in the "Scaleable Wormhole Routing Concentrator" patent application, then node A always "wants" to send MA to C because, in the case of the concentrator, all of the outputs are acceptable output ports for MA." Alternatively the Hesse Patent took advantage of the fact that only one message could arrive at node C at a given time by allowing messages from A to C to travel to C by going through node B.
Referring now to FIG. 3 A, there is shown a portion of the interconnect structure taught in the Reed Patent. In the Reed Patent only one message could enter C during a particular time interval. However, with the present invention, as described below, two simultaneous messages may be allowed to enter node C so that messages from A to C and from B to C are allowed to enter node C at the same time.
FIG. 3B illustrates a portion of the interconnect structure used in the Flesse Patent. Data path 306 accepts a message from either A or B and can transmit only a single message to C, The nodes of FIG. 3B can be modified as illustrated in FIG. 3 C with an additional path 3 16 from node B to C so that both nodes A and B can send to C. In FIG. 3B node A uses data paths 304 and 306 to send to C; in FIG. 3C node A uses paths 314 and 3 16 to send to C. However the Hesse Patent, as well as the Reed Patent did not allow a particular node to accept two simultaneous messages, as is possible with the present invention. The improvements of the pi esent invention can, however be readily applied to the Reed and Hesse configurations by changing the embodiment of 3B to the embodiment of 3C.
FIG. 4 illustrates a first embodiment of the present invention.
Five nodes are illustrated in FIG, 4. In addition to the four nodes shown in FIG. 2, there is a node H on level N-l . Node C is capable of sending data to node H, When node B sends a message MB to C and that message travels from C to H, then node A can send a message MA to C which will arrive at C simultaneously with the message MB. Message MA can then travel from C to D in the same time period that MB travels from C to H The ability of a node to accept two messages at the same time is one advantage of the present invention, and is a novel improvement over the earlier Reed and Hesse patents.
Since there are no buffers at the node C, when two messages MA and MB arrive at C concurrently, one of the two messages must travel to H and one of the two messages must travel to D. In the present embodiment, MB is free to travel to H allowing MA to travel to D. In case the two messages MA and MB both travel to C, then the logic at C routes one of MA and MB to H and the other of MA and MB to D. In one strategy node C sends MB from C to H and MA from C to D, as illustrated in FIG. 6B, This strategy is simple because it is always possible and, because B is on a lower level than A in the structure, MB has pi obably been in the sti uctui e longer than MA. In another embodiment, the routing of messages by C can depend upon quality of service (QOS). In this embodiment a part of the header contains quality of service information so that when MA and MB travel to C, then C will route MB to H and MA to D unless the QOS level of MA is higher than the QOS level of MB in which case, C will route MA to H and MB to D, as illustrated in FIG. 6C. In this way, messages with higher levels of QOS are able to obtain priority over messages with lower levels of QOS.
In the Reed and Hesse patents, a control signal 120 (FIG, 1 ) was sent to node A from B informing A whether or not A is blocked from sending a message to C, This blocking was guaranteed not to take place if B was not sending a message to C. In the Reed and Hesse patents, A was not allowed to send a message to C if. in the same time period, B sent a message to C, With the present invention, A is allowed to send a message to C in the same time period that B sends a message to C if the message from B to C is guaranteed not to use the line from C to D, but instead uses the line from C to H, (See FIG. 4).
Logic associated with node A is capable of routing a message MA to node C. There is at least one additional node N, not pictured, so that the logic associated with node A is capable of routing MA to N, In case A routes MA to C, then logic associated with node C is capable of routing MA to nodes D and H, In this manner, the message MA can travel from A to D and the message MB can travel from B to H. The logiG associated with A is incapable of routing MA to either D or H. Similarly, logic associated with B is able to route a message MB from B to C and logic associated with C can route MB to either node D or node H, So that while the message MB is able to travel from B to D or from B to H, the logic associated with node B is not capable of routing message MB to either node D or node H
FIG. 5 is a block diagram of a portion of a network described in the Hesse Patent. Nodes are arranged in arrays. The node arrays are arranged into rows and columns. Node arrays in the rightmost column are connected back to node arrays in the leftmost column at the same level so that, for example the output B of column K- l of level J-l forms the input B of column 0 of level J-l In FIG 4, the node A is a node in the array in level N+1 of column M B is in a node ai ray of level N of column M, C is in a node in the node array on level N in column M+l , D is in the node array in level N in column M+2, and H is a node in the node an ay on
Figure imgf000013_0001
el N- l in column M+2 Each of the FIGS 1 , 2, 3, 4, 6, 7 and 8 show connections between individual nodes that are members of node arrays as illustrated in FIG 5
Eight nodes are illustrated in FIG 6A, which is a furthei desd iption of an embodiment of the invention In addition to the five nodes in FIG 4, there is an additional node E on level N, and two additional nodes F and G on level N- 1 E can send a message to G F can send a message to G, and G can send messages to H
In a preferred embodiment of the Reed Patent, nodes read only one address bit in the header Consider a message MB at node B and suppose that B sends MB to C Then because B and C are on the same level, C will read the same header address bit of MB that B leads The topology of the network is such that the logic of B could determine if H is on a path to a target of MB This is because a single address bit of MB determines whether H is on a path to a target of MB, and that address bit is the same bit that is read by the logic for node B It is also the same bit that will be read by the logic for node C, when MB arrives at C If IT is on a path to a taiget of MB and there is no message distinct from MB arriving at H at the same time that MB would arrive there, then MB would travel first from B to C and then from C to H as illustrated in FIG 6B Messages arriving at H at the same time as MB would aπive must come from either E oi F If there is no such message M arriving at E or F then it is certain that MB would travel from B to C and then from C to H
There is already a control signal line from F to E 604 that indicates if there is a message traveling fi om F to G With the present invention but not in the Reed and Hesse patents, thei e is an additional contiol line 602 from E to A The logic at A operates as follows. A message MA arrives at node A. Node A reads one headei bit of MA If that header bit indicates that there is a path from C to a target of MA then A will send MA to C provided that either:
1 ) there is no competing message sent from B to C; or
2) there is a message MB that will arrive at C in the same time period as the arrival of MA at C, and message MB is guaranteed to travel from C to H, advantageously not using the link from C to D,
The control signal from B to A indicates whether or not B is sending a message to C, and additionally if there is a path from H to a target output port of MB.
The control signal from F to E indicates whether or not F is sending a message to G. The control signal from E to A indicates whether or not either of E or F is sending a message to G, Node A advantageously is provided with all the information it needs to determine where to send MA. Specifically:
1 ) if the control signal from B to A indicates that there is no competing message being sent from B to C, and if there is a path from C to a target of MA, then A will send MA to C; or
2) if the following conditions are met than A will send MA to C:
• the control signal from B to A indicates that there is a message MB at B and there is a path from H to the target output of MB; and
• the control signal from E to A indicates that there is no competing message being sent from E to G or from F to G, whereby node A determines that MB will travel from C to H, thereby not using the path from C to D for MB, and
• there is a path from C to a target output port of MA.
3) otherwise, A sends MA to a node (not shown) distinct fiom C that is on the same level as A. In case two messages M and MA' arrive simultaneously at Node A, then one of the two messages is sent to C according to the above logic, and the remaining message is sent to a node distinct from C (not shown). In this way, there are messages that advantageously drop down a level with the present invention that would not drop down a level in the Reed and Hesse patents. A feature of the above logic is that whenever two messages arrive simultaneously at a node, at least one of those messages will be allowed to drop to a lower level.
Notice that the multi-bit messages pass through node A without buffering. Therefore, there is a fixed maximum time T so that any message arriving at node A will leave node A within time T of its arrival at node A. Notice also that the control information carried by line 602 (FIG. 6 A) concerns the routing of messages through the nodes E and F and is, therefore, not determined by the messages arriving at node A.
FIG. 7 has the same nodes as FIG. 6A but instead of the control line from E to A, has a control line CFB from F to B and an additional control line CEB from E to B. The control line CFB sends information from F to B in the form of a single bit x. The bit x is set to zero provided that the logiG at F determines that there is no message being sent from F to G that Gould arrive at H in the same time period as a message traveling from B to H F can set x to zero provided that either:
1 ) no message is being sent from F to G, or
2) it is guaranteed that a message sent from F to G will be sent from G to a node J (not shown) distinct from H.
Control line CEB from E to B sends information in the form of a single bit y. Bit y is set to zero if E is not sending a message from E to G that could arrive at H at the same time as a message traveling from B to H,
Node B does not use the information contained in the bits x and y in order to determine where to send its messages; it uses information from still another control line from a node on level N- l (not shown) in order to determine where to send its own message. Node B uses the information in lines CEB and CFB in order to be able to send a control signal to A using the control line CBA. Node B sends a single bit z on the control line CBA. Assume that exactly one message MA arrives at node A. Then MA is sent from node A to C, provided that the bit z is zero and C lies on a path to a target of MA. The bit z is set to zero provided that either:
1) B sends no message MB from B to C in a time period that could Gause a collision with a message MA from A, or
2) B sends a message MB to C, and based on the information contained in x and y, and in the header of MB, the logic at B determines that it is guaranteed that MB will travel from C to H.
Node A is able to route an incoming message MA based on the header of MA and on the value of the single bit z, In case two messages MA and MA' arrive simultaneously at A, then one of those two messages is sent to C according to the above logic, and the other message is sent to a node distinct from C (not shown). A feature of the above logic is that one of the two messages MA and MA' will be allowed to drop to C. In particular, the messages MA and MA' are not routed to the same output port of A,
It is important to note that nodes in accordance with the present embodiment are able to route messages based on one header address bit and on control bits from lower levels. In this way the timing is the same as the timing in the Reed and Hesse patents. Lmportantly, with the embodiment of FIG, 7, node A is able to send a message to C in a case where node A using the logic of FIG. 6A was not able to send a message to C but instead sent its message to a node on level N+1.
In the Reed and Hesse patents and in the material so far described herein, nodes read the bit of the header that indicates that a message is present and they read one additional header address bit. They may also read additional bits such as quality of service bits. In accordance with a further embodiment of the invention nodes may also read multiple address bits. Referring to FIG. 6A, in an alternate embodiment the nodes read two address bits in the message header. Consider the case when a single message MA arrives at A with a target path that includes H, and a message MB arrives at B with a target path that does not include H, and where B must send MB to C, and hence to D, The stnjcture shown in FIG. 6A (and earlier FIGS.) allows message MA to " Gross over" message MB at node C, such that the path of MA goes through nodes A, C, and H, and the path of MB goes through the nodes B, C, and D, as illustrated in FIG. 6C, An objective of this embodiment is to provide the nodes with information needed to determine when a message MA is permitted to cross over a competing message MB which passes through a common node C at the same time. A message MA arrives at node A which reads one header bit that indicates whether or not there is a path through C to a target of MA. Node A also reads an additional header bit that indicates if there is a path through H to a target of MA. The control signal from E to A guarantees that no message from E or F will arrive at H at the same time as MA, The control signal from B to A indicates if there is a message MB at B that will arrive at C at the same time as the message MA and, if so, whether MB is guaranteed not to pass through H, Based on these control signals, node A sends a message MA to C provided that at least one of the following conditions is satisfied:
1 ) if the path from C to D is known to be free and there is a path through C to a target of MA; or
2) if the path from C to H is known to be free, there is a path from H to a target of MA, and there is no message from E or F that can arrive at H concurrently with the arrival of MA at H.
The first condition (1) above, is discussed above, and the second condition pertains to the "cross over" case. If neither of the above conditions is satisfied, then A will send MA to a node (not shown) other than C, which node will be on level N+1. The case in which two messages MA and MA' appear simultaneously at node A is handled as described above. Reading two header bits allows us to detect condition (2) above. This sometimes allows the sending from A to C of a message MA that would have stayed on the same level as A under the earlier embodiment of FIG, 6A, The reading of two header address bits requires only minor modifications to the control logic and control signals of the networks described herein and in the Reed and Hesse patents, Such modifications would be apparent to one skilled in the art of this invention and thus further description of such modifications will not be presented herein.
Note that in FIG. 7, node A can send data to node H via node C, while node F can send data to node H via node G. The control signals x and z enforce a priority of the transfer of data from F to H over the transfer of data from A to H
Refer now to FIG. 8. The nodes A and H of FIG. 8 are on level N- 1 in column K+2, The nodes B and C at level N of column K+l are positioned to send data directly to A and H, The nodes U and V of level N+ 1 in column K are able to send data direGtly to B, and the nodes W and X of level N+1 in column are able to send data directly to C. The node B receives data directly from the node D at level N and sends data directly to node L at level N. The node C receives data directly from node E at level N, and sends data directly to node M at level N. Not pictured in FIG. 8 is a collection R of nodes in column K such that the members of R are capable of sending control signals to nodes D and E. Node D uses information from a node in R (not shown) and node E uses the identical information from node D, The control information that node D receives from a node in R enables node D to determine if the paths from node B to node A and node H are unblocked,
FIG. 8 illustrates a portion of a data interconnect structure where each node C on a given level N is positioned to receive data from two nodes on level N+1 and one node on level N, and is also positioned to send data to two nodes on level N-l and one node on level N. Networks with this data interconnect structure are referred to in the Reed Patent as the Multiple Interconnection to the Next Level Embodiment and in the Hesse Patent as the Flat Latency Embodiment. The control interconnect is described in the Reed and Hesse Patents, the teachings of which are incorporated herein by reference. In the present invention, the data interconnect structure is as described in the Reed and Hesse Patents, but the nodes are more sophisticated in that they receive and process more control information in order to increase throughput and achieve lower latency. Since the nodes are unbuffered, messages entering a node must be capable of leaving the node immediately and proceed to another node that is in route to a target output. Whenever two messages leave a node, one must continue along the same level and one must drop a level. The correct operation depends upon priority rules enforced by control signals. We will consider the simple case where each node reads only one target header destination bit, This implies that no node on level N can simultaneously receive two messages from nodes on level N+1 , We will see that it will also be the case that when a level N node receives two messages, then the message arriving from the same level N can and will always be sent down to a node on level N- 1.
Node B has priority over node C to send data to nodes A and H. Node D has priority over nodes U and V to send data to node B, and node U has priority over node V to send data to node B. Similarly, node E has priority over nodes W and X to send data to node C, and node W has priority over node X to send data to node C. In a manner similar to the other examples in accordance with this invention, at a given time period, control signals enter nodes D and E from nodes on column I At the same time, messages may enter nodes D and E. Based on the possible messages entering node D, and the control signals node D receives, node D may or may not send a message to node B. At the proper time, node D sends a control signal to nodes U and E indicating that either: 1) no message has been sent from node D to node B; 2) a message M.D has been sent to node B, and when M.D arrives at node B, node B will direct MD to node A; 3) a message MD has been sent to node B, and when MD arrives at node B, node B will send the message MD to node H; or 4) a message M.D has been sent to node B, and it is possible that the message M'D will travel from node B to node L, In cases 1 , 2 and 3, if there is a message at MU at node U, such that MU can reach its target through node B, then the message MU will be sent to node B, and no message from node V will be allowed to travel to node B. I one of the cases 1 , 2 or 3 holds, and node U does not send a message to node B, then node V will be "invited" to send a message to node B. That is to say, if node U does not send a message to node B, then node U will so inform node V by means of a control signal, and if there is a message MV at node
V that can reach its target through node B, then node V will send MV to node B In case 2, as in the single down Gases already covered, node D is able to predict that node B will route message MD to A based on the information that no other message will arrive at A at a time to conflict with the arrival of MD at A and there is a path from A to a target output port of MD. A similar situation exists for case 3. In the present invention if cases 2 or 3 hold, and either U or
V sends a message to B, then B will receive two messages. This is in contrast to the Reed and Hesse patents where only one message can be sent to B in a given time period.
Based on the possible messages entering node E, and the control signals that E receives, E may or may not send a message to node C, The control signal from D to E does not influence the routing of messages by node E, but may influence the control signals that E sends to node W. At the proper time, the logic associated with node E ascertains that one of the following conditions holds: 1) E sends no message to node C; 2) E sends a message ME to C, and when ME arrives at C, C will send ME to A; 3) E sends a message ME to C and when ME arrives at C, C will send ME to H; 4) E sends a message ME to C and the possibility exists that C will route ME to node M. The control signal from D to E is used by the logiϋ associated with C to predict the routing of ME by C. This is because it is not allowed for both B and C to route to node A, nor is it allowed for both B and C to route to node H When a condition 1 , 2 or 3 holds, node E sends a non-blocking control signal to node W giving W permission to route to node C. In case 4, node E sends a blocking control signal to node W and W sends a blocking control signal to X and neither W nor X sends a message to C. In case node W receives a non-blocking control signal from E and W receives a message MW at the correct time and there is a path through C to a target of MW, then W will send MW to C and send a blocking control signal to X prohibiting X from sending a message to C. In case node W receives a non-blocking control signal from node E, and W does not send a message to C then W sends a non-blocking control signal to X. In the presence of the non-blocking control from W, if X receives a message MX at the proper time, and there is a path from C to a target output of MX, then X will send MX to C.
The Reed and Hesse Patents essentially looked one step into the future. The two embodiments presented in this invention look two steps into the future, One skilled in the art can use the techniques presented here to look still further into the future.
There are some trade offs here, As the nodes become more complex, the throughput per step is increased, and the total average steps through the structure is reduced, but the number of nodes that can be placed on a chip is reduced and the time per step may be increased, The Hesse Patent taught the design of an electronic switch that carries headers driving an optical switch that carries payloads. In this invention, it makes sense to spend more on the logic of the electronics and, therefore, this invention can be used as an alternative to implementing the switch disclosed in the Hesse Patent.
U.S. patent application, Serial No. , entitled "Scaleable Multipath Wormhole
Interconnect," Attorney Docket No. M8175US, naming John Hesse as inventor, and filed on even date herewith, taught how to effectively use quality of service information in message headers.
The teachings of U.S. Patent application, Serial No. , are hereby incorporated herein by reference, The techniques taught in that patent application can be effectively applied to this invention, so that if, for example, the control signal from node D informs nodes U and V that one of node U and node V can send a message to node B, then the rules above will apply unless there is a low quality of service messages MU at node U, such that there is a path from node B to a target output port of MU and a high quality of service message MV at node V, so that at node B there is a path from node B to a target output port of MV. In this case, M V will be sent to node B and MU will be sent to a level N+1 node in column K+l . Quality of service header bits can also be used to determine the priority of messages arriving at nodes D and E. The invention includes two embodiments that make use of more control information and more sophisticated nodes to improve the performance of the two preferred embodiments. It will be clear to one skilled in the art that these techniques can be applied to other interconnect structures
While the interconnect structures illustrated and described herein are the preferred embodiments of the invention, it will be understood that changes in both node construction and the interconnect construction may be made without departing from the spirit of the invention or eliminating any of the advantages of the invention as determined by the scope of the appended claims.

Claims

WE CLAIM:
1. An interconnect structure, comprising: a plurality of interconnected nodes, including distinct nodes A and E; the node A having a plurality of data input ports, a plurality of data output ports, and a control signal input port; and the node E having a plurality of data input ports, a plurality of data output ports, and a control signal output port; and a routing logic associated with the nodes, the routing logic for routing data selectively among the interconnected nodes; the nodes A and E being positioned in the interconnect structure so that node A cannot route data to the node E, the node E cannot route data to the node A, and no node exists in the interconnect structure that can have data routed to it from both the node A and the node E; and a logic included as part of said routing logic and associated with the node A that uses information concerning routing of data through the node E to route data through the node A.
2, An interconnect structure in accordance with Claim 1 wherein: the plurality of interconnected nodes includes a node F distinct from the nodes A and E, the node F having a plurality of data input ports, a plurality of data output ports, and a control signal output port; and the nodes A and F are positioned in the interconnect structure so that the node A cannot route data to the node F, the node F cannot route data through the node A, and no node exists in the interconnect stnjcture that can receive data routed both from the node A and the node F; and the logic associated with the node A uses information concerning routing of data through the node F to route data through the node A,
3. An interconnect structure in accordance with Claim 2 wherein: the plurality of interconnected nodes includes a node B distinct from the nodes A, E and F, the node B having a plurality of data input ports, a plurality of data output ports, and a control signal output port; and a logic associated with node B included as part of the routing logic being capable of sending a control signal z to the node A, the control signal z containing information concerning routing possibilities through the nodes B, F and E, and the logic associated with the node A for routing of data through the node A depending at least in part on information concerning routing of data through the nodes B, F and E.
4. An interconnect structure in accordance with Claim 3 wherein: the plurality of interconnected nodes including a node C distinct from the nodes A, B, E, and F, the node C having a plurality of data input ports, and a plurality of data output ports; the node B sends a message to the node C; the node E sends a control signal y to the node B; the node F sends a control signal x to the node B; the logic associated with the node B sends a non-blocking control signal z to the node A based on the control signals x and y; the node A sends a message to the node C; and the node C simultaneously receives messages into all of its input ports
5. An interconnect structure comprising: a plurality of nodes including distinct nodes A, B and C, the nodes A and B being both positioned to send data to the node C, a plurality of interconnect lines selectively coupling the nodes of the mtei connect structure, a control signal carrying line CBA connected from the node B to the node A foi carrying control signals from the node B to the node A, and a routing logic associated with the node B capable of sending data to the node C and sending a control signal z to the node A that can inform the node A that the node A is allowed to send a message to the node C
6 An interconnect structure in accoi dance with Claim 5 whei e the node C has a pluiahty of N input ports, and data from the nodes A and B arrive at the node C concurrently so that all N of the input ports of the node C receive messages simultaneously
7 An interconnect structure in accordance with Claim 6 whei ein the plurality of nodes includes distinct nodes A, B C, D E, F and H, and the node C is capable of simultaneously sending data from the node A to the node D and capable of sending data form the node B to the node H
8 An interconnect structure in accordance with Claim 7 whetein the interconnect structure is hierarchical, the node A is on a level of the hierarchy, the nodes B, C, and D are on the level of the hierarchy dn ectly below the level of the node A, and
-___. - the nodes E, F and H are on a level of the hierarchy directly below the level of the node B.
9. An interconnect structure comprising; a plurality of nodes including the distinct nodes A, B and C, and a collection of interconnect lines selectively coupling the nodes; the node C having a plurality of message input ports, the nodes A and C positioned in the structure so that A can route a data packet to C; the nodes B and C positioned in the structure so that B can route a data packet to
C; the nodes A and B positioned in the network so that B can send a control signal to A; the logic at the node A using the control signal B to route messages; the node B routing a message MB to C; the node A routing a message MA to C to arrive at concurrently with MG; all input ports of C concurrently receiving a message.
10, An interconnect structure comprising: a plurality of interconnected nodes including a node C having input ports lΛ and ln and output ports 0H and 0D. a plurality of interconnected structure output ports that are accessible from input port ln but not from output port 0H; and a routing logic included within the interconnect structure to assure that when a message MΛ arrives at input port IΛ and simultaneously a message Mu arrives at input port I„ there is a path through output port 0D to a target destination for message MA and a path through output port Oπ to a target destination for message MB. 1 1 , An interconnect structure in accordance with claim 10, wherein said routing logic assumes that message MH is not blocked from using output port Oπ and message MΛ is not blocked from using output port 0D.
12, An interconnect structure in accordance with claim 1 1 , wherein said routing logic for the routing of messages MΛ and MB depends in part on QOS criteria.
13, An interconnect structure comprising: a plurality of interconnected nodes including nodes A, B, C, D, and H, each of the nodes A, B, C, D and H having a plurality of input ports and a plurality of output ports, and node C being positioned to receive messages from A and B and to route messages to D and H; a plurality of interconnect structure output ports including the output port P so that P is accessible from node C but not node H; a routing logic included within the interconnect structure to assure that when node A sends a message MA to node C and concurrently node B sends a message MB to node C, then node C can route MA through node D to a target interconnect structure output port for MA and node C can route MB through node H to a target interconnect structure output port for MB.
14, An interconnect structure in accordance with claim 13, wherein said routing logic assures that message MB is not blocked from node H, and message MΛ is not blocked from node D.
15, An interconnect structure in accordance with claim 14, wherein said routing logic is responsive to QOS criteria.
PCT/US2001/032334 2000-10-19 2001-10-17 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines WO2002033429A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP01987883A EP1261881A4 (en) 2000-10-19 2001-10-17 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines
JP2002536565A JP3950048B2 (en) 2000-10-19 2001-10-17 Extensible apparatus and method for increasing throughput in multiple minimal logical networks using multiple control lines
IL150282A IL150282A (en) 2000-10-19 2001-10-17 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines
AU2002224391A AU2002224391A1 (en) 2000-10-19 2001-10-17 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines
HK03106329.6A HK1054267B (en) 2000-10-19 2003-09-05 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/692,073 2000-10-19
US09/692,073 US7221677B1 (en) 2000-10-19 2000-10-19 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines

Publications (1)

Publication Number Publication Date
WO2002033429A1 true WO2002033429A1 (en) 2002-04-25

Family

ID=24779136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/032334 WO2002033429A1 (en) 2000-10-19 2001-10-17 Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines

Country Status (9)

Country Link
US (1) US7221677B1 (en)
EP (1) EP1261881A4 (en)
JP (1) JP3950048B2 (en)
KR (1) KR20030009334A (en)
CN (1) CN1179214C (en)
AU (1) AU2002224391A1 (en)
HK (1) HK1054267B (en)
IL (1) IL150282A (en)
WO (1) WO2002033429A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060171386A1 (en) * 2004-09-01 2006-08-03 Interactic Holdings, Llc Means and apparatus for a scaleable congestion free switching system with intelligent control III
US8065433B2 (en) 2009-01-09 2011-11-22 Microsoft Corporation Hybrid butterfly cube architecture for modular data centers
US8509078B2 (en) * 2009-02-12 2013-08-13 Microsoft Corporation Bufferless routing in on-chip interconnection networks
US20110202682A1 (en) * 2010-02-12 2011-08-18 Microsoft Corporation Network structure for data center unit interconnection
JP6036690B2 (en) * 2011-07-07 2016-11-30 日本電気株式会社 Distributed execution system and distributed program execution method
EP3014821A4 (en) * 2013-06-28 2017-02-22 Intel Corporation Mechanism to control resource utilization with adaptive routing
US9678800B2 (en) * 2014-01-30 2017-06-13 International Business Machines Corporation Optimum design method for configuration of servers in a data center environment
CN112434483A (en) * 2020-12-18 2021-03-02 国微集团(深圳)有限公司 Data transmission system and generation method thereof
CN113219298B (en) * 2021-03-24 2022-10-11 昆明理工大学 Fault current traveling wave numerical simulation method for complex alternating current power grid

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175733A (en) * 1990-12-27 1992-12-29 Intel Corporation Adaptive message routing for multi-dimensional networks
US5835482A (en) * 1995-09-22 1998-11-10 Mci Communications Corporation Communication system and method providing optimal restoration of failed paths

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814980A (en) * 1986-04-01 1989-03-21 California Institute Of Technology Concurrent hypercube system with improved message passing
US5416769A (en) * 1993-07-13 1995-05-16 At&T Corp. Controlled-feedback packet switching system
US5617413A (en) * 1993-08-18 1997-04-01 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Scalable wrap-around shuffle exchange network with deflection routing
US5996020A (en) * 1995-07-21 1999-11-30 National Security Agency Multiple level minimum logic network
DE69738175T2 (en) * 1996-08-27 2008-01-31 Nippon Telegraph And Telephone Corp. Link transmission network
US6289021B1 (en) 1997-01-24 2001-09-11 Interactic Holdings, Llc Scaleable low-latency switch for usage in an interconnect structure
US5940389A (en) * 1997-05-12 1999-08-17 Computer And Communication Research Laboratories Enhanced partially self-routing algorithm for controller Benes networks
US6285679B1 (en) * 1997-08-22 2001-09-04 Avici Systems, Inc. Methods and apparatus for event-driven routing
US6396814B1 (en) * 1997-09-12 2002-05-28 Kabushiki Kaisha Toshiba Network construction method and communication system for communicating between different groups via representative device of each group
US6754207B1 (en) * 1998-01-20 2004-06-22 Interactic Holdings, Llc Multiple-path wormhole interconnect
US6947433B2 (en) * 2000-09-21 2005-09-20 Avici Systems, Inc. System and method for implementing source based and egress based virtual networks in an interconnection network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175733A (en) * 1990-12-27 1992-12-29 Intel Corporation Adaptive message routing for multi-dimensional networks
US5835482A (en) * 1995-09-22 1998-11-10 Mci Communications Corporation Communication system and method providing optimal restoration of failed paths

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1261881A4 *

Also Published As

Publication number Publication date
CN1179214C (en) 2004-12-08
EP1261881A4 (en) 2005-07-27
US7221677B1 (en) 2007-05-22
JP2004511992A (en) 2004-04-15
EP1261881A1 (en) 2002-12-04
IL150282A0 (en) 2002-12-01
HK1054267A1 (en) 2003-11-21
AU2002224391A1 (en) 2002-04-29
IL150282A (en) 2007-02-11
CN1401081A (en) 2003-03-05
HK1054267B (en) 2005-08-26
JP3950048B2 (en) 2007-07-25
KR20030009334A (en) 2003-01-29

Similar Documents

Publication Publication Date Title
US5689646A (en) Configuring of networked system to permit replacement of failed modes and selection of alternate paths
US6754207B1 (en) Multiple-path wormhole interconnect
AU744578B2 (en) A scalable low-latency switch for usage in an interconnect structure
EP0821816B1 (en) Adaptive routing mechanism for torus interconnection network
US5175733A (en) Adaptive message routing for multi-dimensional networks
EP0410568B1 (en) Adaptive routing in networks
US5721820A (en) System for adaptively routing data in switching network wherein source node generates routing message identifying one or more routes form switch selects
US6061345A (en) Crossbar routing switch for a hierarchical crossbar interconnection network
NZ531266A (en) Scalable switching system with intelligent control
US20050157717A1 (en) Method and system for transmitting messages in an interconnection network
US5699520A (en) Flow control apparatus and method for a computer interconnect using adaptive credits and flow control tags
EP1730987B1 (en) Highly parallel switching systems utilizing error correction ii
US7016363B1 (en) Scaleable interconnect structure utilizing quality-of-service handling
US7221677B1 (en) Scalable apparatus and method for increasing throughput in multiple level minimum logic networks using a plurality of control lines
CA2426377C (en) Scaleable multiple-path wormhole interconnect
WO2006089559A1 (en) A network, a system and a node for use in the network or system
KR0170493B1 (en) Non-blocking fault tolerance gamma network for multi-processor system
KR0164966B1 (en) The multistage interconnection network with the folded structure and loop-back function
WO1993003581A1 (en) Message structure for scalable self-routing non-blocking message switching and routing system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 150282

Country of ref document: IL

ENP Entry into the national phase

Ref document number: 2002 536565

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020027007842

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2001987883

Country of ref document: EP

Ref document number: 018039308

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 2001987883

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020027007842

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2001987883

Country of ref document: EP