WO2006106475A1 - Network-on-chip environment and method for reduction of latency - Google Patents

Network-on-chip environment and method for reduction of latency Download PDF

Info

Publication number
WO2006106475A1
WO2006106475A1 PCT/IB2006/051012 IB2006051012W WO2006106475A1 WO 2006106475 A1 WO2006106475 A1 WO 2006106475A1 IB 2006051012 W IB2006051012 W IB 2006051012W WO 2006106475 A1 WO2006106475 A1 WO 2006106475A1
Authority
WO
WIPO (PCT)
Prior art keywords
channels
network
data
slot
network interface
Prior art date
Application number
PCT/IB2006/051012
Other languages
French (fr)
Inventor
Edwin Rijpkema
Original Assignee
Koninklijke Philips Electronics N. V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N. V. filed Critical Koninklijke Philips Electronics N. V.
Priority to US11/910,749 priority Critical patent/US20080186998A1/en
Priority to EP06727812A priority patent/EP1869844A1/en
Priority to JP2008504892A priority patent/JP2008535435A/en
Publication of WO2006106475A1 publication Critical patent/WO2006106475A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/40Wormhole routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/245Traffic characterised by specific attributes, e.g. priority or QoS using preemption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/39Credit based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J2203/00Aspects of optical multiplex systems other than those covered by H04J14/05 and H04J14/07
    • H04J2203/0001Provisions for broadband connections in integrated services digital network using frames of the Optical Transport Network [OTN] or using synchronous transfer mode [STM], e.g. SONET, SDH
    • H04J2203/0089Multiplexing, e.g. coding, scrambling, SONET
    • H04J2203/0091Time slot assignment

Definitions

  • the invention relates to an integrated circuit having a plurality of processing modules and a network arranged for coupling processing modules and a method for time slot allocation in such an integrated circuit, and a data processing system.
  • Systems on silicon show a continuous increase in complexity due to the ever increasing need for implementing new features and improvements of existing functions. This is enabled by the increasing density with which components can be integrated on an integrated circuit.
  • the clock speed at which circuits are operated tends to increase too.
  • the higher clock speed in combination with the increased density of components has reduced the area which can operate synchronously within the same clock domain.
  • a processing system comprises a plurality of relatively independent, complex modules.
  • the modules usually communicate to each other via a bus.
  • a large number of modules represent a high bus load. Further the bus represents a communication bottleneck as it enables only one module to send data to the bus.
  • NoC Networks on chip
  • NoCs help resolve the electrical problems in new deep-submicron technologies, as they structure and manage global wires. At the same time the NoC concept share wires, allows a reduction of the number of wires and increases the utilization of wires.
  • NoCs can also be energy efficient and reliable and are scalable compared to buses.
  • NoCs also decouple computation from communication, which is essential in managing the design of billion-transistor chips. NoCs achieve this decoupling because they are traditionally designed using protocol stacks, which provide well- defined interfaces separating communication service usage from service implementation.
  • On-chip networks have different properties (e.g., tighter link synchronization) and resource constraints (e.g., higher memory cost) leading to different design choices, which ultimately affect the network services.
  • Storage i.e., memory
  • computation resources are relatively more expensive, whereas the number of point-to-point links is larger on chip than off chip.
  • Storage is expensive, because general-purpose on-chip memory, such as RAMs, occupies a large area. Having the memory distributed in the network components in relatively small sizes is even worse, as the overhead area in the memory then becomes dominant.
  • a network on chip typically consists of a plurality of routers and network interfaces. Routers serve as network nodes and are used to transport data from a source network interface to a destination network interface by routing data on a correct path to the destination on a static basis (i.e., route is predetermined and does not change), or on a dynamic basis (i.e., route can change depending e.g., on the NoC load to avoid hot spots). Routers can also implement time guarantees (e.g., rate-based, deadline-based, or using pipelined circuits in a TDMA fashion).
  • a known example for NoCs is AEthereal.
  • the network interfaces are connected to processing modules, also called IP blocks, which may represent any kind of data processing unit, a memory, a bridge, a compressor etc.
  • the network interfaces constitute a communication interface between the processing modules and the network.
  • the interface is usually compatible with the existing bus interfaces.
  • the network interfaces are designed to handle data sequentialization (fitting the offered command, flags, address, and data on a fixed- width (e.g., 32 bits) signal group) and packetization (adding the packet headers and trailers needed internally by the network).
  • the network interfaces may also implement packet scheduling, which may include timing guarantees and admission control.
  • An NoC provides various services to processing modules to transfer data between them.
  • the NoC could be operated according to best effort (BE) or guaranteed throughput (GT) services.
  • BE best effort
  • GT guaranteed throughput
  • On-chip systems often require timing guarantees for their interconnect communications.
  • a cost-effective way of providing time-related guarantees i.e., throughput, latency and jitter
  • TDMA Time Division Multiple Access
  • SoC systems on chip
  • a class of communication is provided, in which throughput, latency and jitter are guaranteed, based on a notion of global time (i.e., a notion of synchronicity between network components, i.e. routers and network interfaces), wherein the basic time unit is called a slot or time slot.
  • All network components usually comprise a slot table of equal size for each output port of the network component, in which time slots are reserved for different connections.
  • a connection is considered as a set of channels, each having a set of connection properties, between a first processing module and at least one second processing module.
  • the connection may comprises two channels, namely one from the first to the second processing module, i.e. the request or forward channel, and a second channel from the second to the first processing module, i.e. the response or reverse channel.
  • the forward or request channel is reserved for data and messages from the master to the slave, while the reverse or response channel is reserved for data and messages from the slave to the master.
  • the connection may only comprise one channel. It is not illustrated but possible, that the connection involves one master and N slaves. In that case 2*N channels are provided. Therefore, a connection or the path of the connection through the network comprises at least one channel. In other words, a channel corresponds to the connection path of the connection if only one channel is used. If two channels are used as mentioned above, one channel will provide the connection path e.g. from the master to the slave, while the second channel will provide the connection path from the slave to the master. Accordingly, for a typical connection, the connection path will comprise two channels.
  • connection properties may include ordering (data transport in order), flow control (a remote buffer is reserved for a connection, and a data producer will be allowed to send data only when it is guaranteed that buffer space is available for the produced data), throughput (a lower bound on throughput is guaranteed), latency (upper bound for latency is guaranteed), the lossiness (dropping of data), transmission termination, transaction completion, data correctness, priority, or data delivery.
  • ordering data transport in order
  • flow control a remote buffer is reserved for a connection, and a data producer will be allowed to send data only when it is guaranteed that buffer space is available for the produced data
  • throughput a lower bound on throughput is guaranteed
  • latency upper bound for latency is guaranteed
  • the lossiness droppingping of data
  • transmission termination transaction completion
  • transaction completion data correctness, priority, or data delivery.
  • the slot tables are used.
  • the slot tables as mentioned above are stored in the network components, including network interfaces and routers.
  • the slot tables allow a sharing of the same link or wires in a time-division multiple access, TDMA, manner.
  • the quantum of data that is injected into the network is called a flit, wherein a flit is a fixed size sub-packet.
  • the injection of flits is regulated by the slot table stored in the network interface.
  • the slot table advances in synchronization (i.e., all are in the same slot at the same time).
  • a channel may have one or more slots allocated within a slot table.
  • the slot tables in all network components are so filled that flits communicated over the network do not content.
  • the channels are used to identify different traffic classes and associate properties to them.
  • a data item is moved from one network component to the next one, i.e. between routers or between a router and a network interface. Therefore, when a slot is reserved at an output port, the next slot must be reserved on the following output port along the path between a master and a slave module, and so on.
  • the slot allocation must be performed such that there are no clashes (i.e., there is no slot allocated to more than one connection).
  • the slots must be reserved in such a way that data never has to contend with any other data. It is also called as contention free routing.
  • latency An important feature for transmission of data between processing modules is the latency.
  • a general definition of latency in networking could be summarized as the amount of time it takes a data packet to travel from source to destination. Together, latency and bandwidth define the speed and capacity of a network.
  • the latency to access data depends on the size of such a slot table, assignment of slots for a given channel in the table and the burst size.
  • the burst size is the amount of data that can be asked/sent in one request.
  • the number of slots allocated to a channel is less than the number of slots required to transfer a burst of data the latency to access data increases dramatically. In such case more than one revolution of the slot table is needed to completely send a burst of data.
  • the waiting time for the slots that are not allocated to this connection is also added to the latency.
  • the network interfaces contain conventionally a queue per channel.
  • the waiting time in that queue turns out to be the major contribution to the total communication latency.
  • all slots allocated to channels originating from the same network interface are shared. This will simplify the control of data transmission of the channels having shared slots.
  • channel scheduler included in the network interface, the scheduler is provided for scheduling the data of the set of channels to the shared slots.
  • the data of a channel are scheduled by the scheduler depending on the position in a queue.
  • the control of the data transmission could be achieved by queuing the data belonging to set of channels in only one queue.
  • a first come first serve policy is implemented. This will further reduce the chip area required for the input queue in the network interface.
  • the scheduler needs to schedule the data depending on its position in the queue.
  • a scheduling of data of the set of channel is performed depending the filling status of the queue of the set of the channels.
  • the scheduler will monitor the filling status of the queues of the channels. The first queue not being empty will be scheduled to be transferred. Then the scheduler will monitor the queues from that scheduled queue, wherein only queues are scheduled being not empty.
  • the invention also relates to a method for allocating time slots for data transmission in an integrated circuit having a plurality of processing modules and a network arranged for coupling the processing modules, and a plurality of network interfaces each being coupled between one of the processing modules and the network comprising the steps of: communicating between processing modules based on time division multiple access using time slots and contention free transmission by using channels; storing a slot table in each network interface including an allocation of a time slot to a certain channel, sharing of time slots allocated to channels originating from the same network interface.
  • the invention further relates to a data processing system comprising: a plurality of processing modules and a network arranged for coupling the processing modules, comprising: a network interface associated to each processing module which is provided for transmitting data to the network supplied by the associated processing module and for receiving data from the network destined for the associated processing module; wherein the data transmission between processing modules operates based on time division multiple access using time slots and contention free transmission by using a channels; each network interface includes a slot table for storing an allocation of a time slot to a certain channel, a sharing is provided of time slots allocated to channels originating from the same network interface. Accordingly, the time slot allocation may also be performed in a multi- chip network or a system or network with several separate integrated circuits.
  • Fig. 1 A shows the basic structure of a network on chip according to the invention
  • Fig. IB shows a basic slot allocation for a channel in a NoC
  • Fig. 2 illustrates a schematic structure for illustrating the contention free routing
  • Fig. 3 shows a schematic illustration of a network provided with a conventional slot allocation for channels
  • Fig. 4 shows the slot allocation according to the present invention
  • Fig. 5 illustrates a network interface according to the present invention
  • the embodiments relate to systems on chip SoC, i.e. a plurality of processing modules IP on the same chip communicate with each other via some kind of interconnect.
  • the interconnect is embodied as a network on chip NoC.
  • the network on chip NoC may include wires, bus, time-division multiplexing, switch, and/or routers within a network.
  • Fig. IA shows an example for an integrated circuit having a network on chip NoC according to the present invention.
  • the system comprises several processing modules IP, also called IP blocks.
  • the processing modules IP could be realized as computation elements, memories or a subsystem which may internally contain interconnect modules.
  • the processing modules IP are each connected to a network NoC via a network interface NI, respectively.
  • the network NoC comprises a plurality of routers R, which are connected to adjacent routers R via respective links Ll, L2, L3.
  • the network interfaces NI are used as interfaces between the processing modules IP and the network NoC.
  • the network interfaces NI are provided to manage the communication of the respective processing modules IP and the network NoC, so that the processing modules IP can perform their dedicated operation without having to deal with the communication with the network NoC or other processing modules IP.
  • the processing modules IP may act as masters IP M , i.e. initiating a request, or may act as slaves IPs, i.e. receiving a request from a master IP M and processing the request accordingly.
  • Fig. IB shows a block diagram of a single connection having one channel and a respective basic slot allocation in a network on chip NoC. In particular, the channel between a master IP M and a slave IPs is shown.
  • This connection path is realized by a network interface NI associated to the master IP M , two routers, and a network interface NI associated to a slave IPs.
  • the network interface NI associated to the master IP M comprises a time slot allocation unit SA.
  • the network interface NI associated to the slave IPs may also comprise a time slot allocation unit SA.
  • a first link Ll is present between the network interface NI associated to the master IP M and a first router R
  • a second link L2 is present between the two routers R
  • a third link L3 is present between a router R and the network interface NI associated to the slave IPs.
  • Three slot tables STl - ST3 for the output ports of the respective network components NI, R, R are also shown.
  • slot tables ST are preferably implemented on the output side, i.e. the data producing side, of the network elements NI, R, R.
  • the output side i.e. the data producing side
  • the inputs for the slot allocation determination performed by the time slot allocation unit SA are the network topology, like network components, with their interconnection, and the slot table size, and the connection set. For every connection, its paths and its bandwidth, latency, jitter, and/or slot requirements are given.
  • Each of these channels is set on an individual path, and may comprise different links having different bandwidth, latency, jitter, and/or slot requirements.
  • slots must be reserved for the links as shown in fig. IB. Different slots can be reserved for different connections or channels by means of TDMA. Data for a connection is then transferred over consecutive links along the connection in consecutive slots.
  • Fig. 2 illustrates a more detailed example for a contention free routing.
  • Each processing modules IP A and IP B is transmitting data using different channels.
  • the processing modules IP A and IP B are connected via their respective network interfaces NI A and NI B to the NoC represented by the two routers R.
  • Each of the network interfaces NI A and NI B includes a slot table ST A and ST B .
  • Channel a for processing module IP A has two slots 0, 2 allocated in the slot table ST A .
  • Channel b for IP B has one slot 1 allocated.
  • the paths for channel a and b are indicated by the solid and open headed arrows, respectively.
  • the slots s are reserved in such a way that flits do not content in the network. This is indicated by the numbers denoted next to the arrows.
  • Fig. 3 showing an exemplary network. Due to the sake of clarity only one IP and the associated network interface NI are shown. The remaining boxes represent routers Rl 1-R44 of the NoC, wherein only the routers having traffic are designated respectively.
  • the processing module IP needs four channels a, b, c, and d.
  • the 4x4 mesh represents the network NoC including the routers Rl 1-R44.
  • the links between the routers Rl 1-R44 are not drawn for clarity.
  • the slot table ST of the network interface NI of the processing module IP includes 40 slots.
  • the worst case waiting time at the head of the queue for channel a is the duration of 39 slots.
  • Each channel a, b, c and d require a bandwidth requirement of 1/40, 2/40, 3/40 and 4/40 of the bandwidth capacity of the links, respectively. Because bandwidth allocation is done at a granularity of 1/40 of the link bandwidth, the slot table ST requires at least 40 slots. As channel a has only 1 of the 40 slots, the worst case waiting time for a flit at the head of the queue for channel a in the network interface NI is the duration of 39 slots. When flits are injected into the network, the latency is the number of hops in the router network multiplied by the duration of a slot. For a large NoC the maximum number of hops is 20. This means that for a small slot- table the worst case waiting time for this small example already is dominant.
  • the numbers nearby the arrows of the respective channels a-d indicate slot positions in the slot tables which need to be reserved in the respective slot table of the outputting network component (NI or router).
  • the allocation of slots to the respective channels a-d between NI and Rl 1 could be derived from the slot table ST.
  • the slots 4-6 and 7-10 are reserved between RIl und R12.
  • slots 5-7 are reserved for channel c and slots 8-11 are reserved for channel d.
  • the slot 1 is reserved for channel a and the slots 2, 3 are reserved for channel b etc.
  • the solution that is proposed here is to allocate bandwidth for a set of channels a-d originating from the same NI. Instead of reserving a slot for each of those channels a-d individually, slots are reserved for the whole set of channels a-d. So, each of the channels a, b, c, or d, may access the network in slots 0 • • • 9. A local arbitration mechanism is required when more than one of these channels a-d want to access the same slot. This is explained below.
  • the ten slots 0-9 allocated to the set are now designated by S.
  • the ten slots S can be redistributed in the slot table ST.
  • the reduced slot table ST has four slots only, and one of these slots 0-3 is assigned to channel set. A complete traversal of the small slot table is thus four slots, and the slot for channel set is thus available every four slots, which is the same as the example in which the ten slots were nicely distributed over the forty slots.
  • Fig. 5 illustrates the components of a network interface NI. However, only the transmitting direction of the NI is illustrated. The part for receiving and depacketizing data packets is not illustrated.
  • the network interface NI comprises flow control means including an input queue 44, a remote space register 46, a request generator 45, a routing information register 47, a credit counter 49, a slot table 54, a slot scheduler 55, a header unit 48, a header insertion unit 52 as well as a packet length unit 51 and an output multiplexer 50.
  • the NI receives the data at its input port 42 from the transmitting processing module IP.
  • the NI outputs the packaged data at its output 43 to the router in form of a data sequence.
  • the data to be transmitted are supplied to the queue 44.
  • the first data in the queue 44 is monitored by the request generator 45.
  • the request generator 45 detects the data and generates a request req_i based on the queue filling and the available remote space as stored in the remote space register 46.
  • the request req_i for the queue is provided to the slot scheduler 55 for selecting the queue. The selection is may be performed by the slot scheduler 55 based on information from the slot table 54 and based on information of the used arbitration mechanism for controlling the set of channels.
  • the scheduler 55 detects whether the data in the queue belongs to a channel a-d having shared slots or belonging to data which are not part of shared channel set slots. As soon as the queue is selected in the scheduler 55 it is provided to a unit 51 which increments the packet lengths and to the header insertion unit 52, which controls whether a header H needs to be inserted or not. Routing information like the addresses is stored in a configurable routing information register 47. The credit counter 49 is incremented when data is consumed in the output queue and is decremented when new headers H are sent with credit value incorporated in the headers H. The routing information from the routing information register 47 as well as the value of the credit counter 49 is forwarded to the header unit 48 and form part of the header H.
  • the header unit 48 receives the credit value and routing info and outputs the header data to the output multiplexer 50.
  • the output multiplexer 50 multiplexes the data provided by the selected queue and the header info hdr provided from the header unit 48. When a data package is sent out the packet length is reset.
  • the request generator detects whether data are filled in one of the queue.
  • the data from the IP are not demultiplexed into multiple queues, but to keep all the data of the channel set in the same queue 44. This automatically implements a
  • FCFS first come first serve policy and reduce the queuing cost significantly.
  • the information that was used to control the de-multiplexer in the conventional architecture must now be queued in parallel to the data queue or in the same queue and increasing the word width of the queue.
  • This control information reflects the channel ID in the channel set and is used to, e.g., select the path of the channel.
  • a further not illustrated mechanism could be that the scheduler 55 may use a first-come first-serve (FCFS) policy.
  • FCFS first-come first-serve
  • the order in which the IP writes its data to the NI is queued.
  • the first element in the queue 44 then indicates from which data queue the data may come.
  • FCFS policy is a bit harder to use when the channel set is made from data coming from multiple IP blocks.
  • An alternative could be a simple round-robin (RR) scheduler that selects the first queue (the first from the previously selected queue) in the channel set that is non-empty.
  • RR round-robin
  • One advantage of the method is that latency can be reduced significantly.
  • the worst case waiting time for a slot is reduced by a factor of ten.
  • Another advantage is that this scheme does not require that all the channels in the set have both the same source and same destination. All that is required is that the channels have the same source.
  • Yet another advantage is that this scheme allows to reduce the number of queues in the network interface. Referring to this example, one queue needs to be used instead of four.
  • the only one disadvantage is that the more the channel set diverges, the more the overallocation of slots for the channels is required.
  • TDMA however it is also applicable for single TDMA systems. In general it is applicable to interconnect structures basing on connections and providing guarantees.

Abstract

The invention relates to an integrated circuit comprising a plurality of processing modules (IP) and a network (NoC) arranged for coupling processing modules (IP), comprising: the processing module (IP) includes an associated network interface (NI) which is provided for transmitting data to the network (NoC) supplied by the associated processing module and for receiving data from the network (NoC) destined for the associated processing module, wherein the data transmission between processing modules (IP) operates based on time division multiple access (TDMA) using time slots (S) and contention free transmission by using channels (a-d); each network interface (NI) includes a slot table (ST) for storing an allocation of a time slot to a certain channel (a-d), wherein at least a part of the time slots (0-9) allocated to channels (a-d) originated from the same network interface (NI) are shared for transmission of data of the set of channels (a-d). The invention uses the idea to utilize all or at least a part of slots of channels (a-d) in common, which are originating from the same network interface (NI). This will at first reduce the latency of such channels (a-d). Additionally the sizes of the slot tables (ST) in all network components (NI, Rl 1-R44) are reduced drastically.

Description

NETWORK-ON-CHIP ENVIRONMENT AND METHOD FOR REDUCTION OF LATENCY
The invention relates to an integrated circuit having a plurality of processing modules and a network arranged for coupling processing modules and a method for time slot allocation in such an integrated circuit, and a data processing system. Systems on silicon show a continuous increase in complexity due to the ever increasing need for implementing new features and improvements of existing functions. This is enabled by the increasing density with which components can be integrated on an integrated circuit. At the same time the clock speed at which circuits are operated tends to increase too. The higher clock speed in combination with the increased density of components has reduced the area which can operate synchronously within the same clock domain. This has created the need for a modular approach. According to such an approach a processing system comprises a plurality of relatively independent, complex modules. In conventional processing systems the modules usually communicate to each other via a bus. As the number of modules increases however, this way of communication is no longer practical for the following reasons. A large number of modules represent a high bus load. Further the bus represents a communication bottleneck as it enables only one module to send data to the bus.
A communication network forms an effective way to overcome these disadvantages. Networks on chip (NoC) have received considerable attention recently as a solution to the interconnection problem in highly-complex chips. The reason is twofold. First, NoCs help resolve the electrical problems in new deep-submicron technologies, as they structure and manage global wires. At the same time the NoC concept share wires, allows a reduction of the number of wires and increases the utilization of wires. NoCs can also be energy efficient and reliable and are scalable compared to buses. Second, NoCs also decouple computation from communication, which is essential in managing the design of billion-transistor chips. NoCs achieve this decoupling because they are traditionally designed using protocol stacks, which provide well- defined interfaces separating communication service usage from service implementation.
Introducing networks as on-chip interconnects radically changes the communication when compared to direct interconnects, such as buses or switches. This is because of the multi-hop nature of a network, where communication modules are not directly connected, but are remotely separated by one or more network nodes. This is in contrast with the prevalent existing interconnects (i.e., buses) where modules are directly connected. The implications of this change reside in the arbitration (which must change from centralized to distributed), and in the communication properties (e.g., ordering, or flow control), which must be handled either by an intellectual property block (IP) or by the network. Most of these topics have been already the subject of research in the field of local and wide area networks (computer networks) and as an interconnect for parallel processor networks. Both are very much related to on-chip networks, and many of the results in those fields are also applicable on chip. However, NoCs premises are different from off-chip networks, and, therefore, most of the network design choices must be reevaluated. On-chip networks have different properties (e.g., tighter link synchronization) and resource constraints (e.g., higher memory cost) leading to different design choices, which ultimately affect the network services. Storage (i.e., memory) and computation resources are relatively more expensive, whereas the number of point-to-point links is larger on chip than off chip. Storage is expensive, because general-purpose on-chip memory, such as RAMs, occupies a large area. Having the memory distributed in the network components in relatively small sizes is even worse, as the overhead area in the memory then becomes dominant.
A network on chip (NoC) typically consists of a plurality of routers and network interfaces. Routers serve as network nodes and are used to transport data from a source network interface to a destination network interface by routing data on a correct path to the destination on a static basis (i.e., route is predetermined and does not change), or on a dynamic basis (i.e., route can change depending e.g., on the NoC load to avoid hot spots). Routers can also implement time guarantees (e.g., rate-based, deadline-based, or using pipelined circuits in a TDMA fashion). A known example for NoCs is AEthereal.
The network interfaces are connected to processing modules, also called IP blocks, which may represent any kind of data processing unit, a memory, a bridge, a compressor etc. In particular, the network interfaces constitute a communication interface between the processing modules and the network. The interface is usually compatible with the existing bus interfaces. Accordingly, the network interfaces are designed to handle data sequentialization (fitting the offered command, flags, address, and data on a fixed- width (e.g., 32 bits) signal group) and packetization (adding the packet headers and trailers needed internally by the network). The network interfaces may also implement packet scheduling, which may include timing guarantees and admission control.
An NoC provides various services to processing modules to transfer data between them.
The NoC could be operated according to best effort (BE) or guaranteed throughput (GT) services. In best effort (BE) service, there are no guarantees about latency or throughput. Data is forwarded through routers without any reservation of slots. So this kind of data faces contention in the router, whereas giving guarantees is not possible. In contrast, GT service allows deriving exact value for latency and throughput for transmitting data between processing modules.
On-chip systems often require timing guarantees for their interconnect communications. A cost-effective way of providing time-related guarantees (i.e., throughput, latency and jitter) is to use pipelined circuits in a TDMA (Time Division Multiple Access) fashion, which is advantageous as it requires less buffer space compared to rate -based and deadline-based schemes on systems on chip (SoC) which have tight synchronization. Therefore, a class of communication is provided, in which throughput, latency and jitter are guaranteed, based on a notion of global time (i.e., a notion of synchronicity between network components, i.e. routers and network interfaces), wherein the basic time unit is called a slot or time slot. All network components usually comprise a slot table of equal size for each output port of the network component, in which time slots are reserved for different connections. At the transport layer of the network, the communication between the processing modules is performed over connections. A connection is considered as a set of channels, each having a set of connection properties, between a first processing module and at least one second processing module. For a connection between a first processing module and a single second processing module, the connection may comprises two channels, namely one from the first to the second processing module, i.e. the request or forward channel, and a second channel from the second to the first processing module, i.e. the response or reverse channel. The forward or request channel is reserved for data and messages from the master to the slave, while the reverse or response channel is reserved for data and messages from the slave to the master. If no response is required, the connection may only comprise one channel. It is not illustrated but possible, that the connection involves one master and N slaves. In that case 2*N channels are provided. Therefore, a connection or the path of the connection through the network comprises at least one channel. In other words, a channel corresponds to the connection path of the connection if only one channel is used. If two channels are used as mentioned above, one channel will provide the connection path e.g. from the master to the slave, while the second channel will provide the connection path from the slave to the master. Accordingly, for a typical connection, the connection path will comprise two channels. The connection properties may include ordering (data transport in order), flow control (a remote buffer is reserved for a connection, and a data producer will be allowed to send data only when it is guaranteed that buffer space is available for the produced data), throughput (a lower bound on throughput is guaranteed), latency (upper bound for latency is guaranteed), the lossiness (dropping of data), transmission termination, transaction completion, data correctness, priority, or data delivery. In NoC, connections are built on top channels. A channel is a uni-directional path through the network from a source (master, initiator) to a destination (slave, target) or back.
For implementing GT services slot tables are used. The slot tables as mentioned above are stored in the network components, including network interfaces and routers. The slot tables allow a sharing of the same link or wires in a time-division multiple access, TDMA, manner. The quantum of data that is injected into the network is called a flit, wherein a flit is a fixed size sub-packet. The injection of flits is regulated by the slot table stored in the network interface. The slot table advances in synchronization (i.e., all are in the same slot at the same time). A channel may have one or more slots allocated within a slot table. The slot tables in all network components are so filled that flits communicated over the network do not content. The channels are used to identify different traffic classes and associate properties to them. At each slot, a data item is moved from one network component to the next one, i.e. between routers or between a router and a network interface. Therefore, when a slot is reserved at an output port, the next slot must be reserved on the following output port along the path between a master and a slave module, and so on. When multiple connections are set up between several processing modules with timing guarantees, the slot allocation must be performed such that there are no clashes (i.e., there is no slot allocated to more than one connection). The slots must be reserved in such a way that data never has to contend with any other data. It is also called as contention free routing.
The task of finding an optimum slot allocation for a given network topology i.e. a given number of routers and network interfaces, and a set of connections between processing modules is a highly computational-intensive problem as it involves finding an optimal solution which requires exhaustive computation time.
An important feature for transmission of data between processing modules is the latency. A general definition of latency in networking could be summarized as the amount of time it takes a data packet to travel from source to destination. Together, latency and bandwidth define the speed and capacity of a network. The latency to access data depends on the size of such a slot table, assignment of slots for a given channel in the table and the burst size. The burst size is the amount of data that can be asked/sent in one request. When the number of slots allocated to a channel is less than the number of slots required to transfer a burst of data the latency to access data increases dramatically. In such case more than one revolution of the slot table is needed to completely send a burst of data. The waiting time for the slots that are not allocated to this connection is also added to the latency. The network interfaces contain conventionally a queue per channel.
The waiting time in that queue turns out to be the major contribution to the total communication latency. The larger the slot table in number of slots and the fewer slots are reserved for a channel, the higher the waiting latency.
The other problem is that when a single processing module requires many channels, say n, then the slot table requires at least n slots, one for each channel. However, this is not practical in general because the bandwidth requirements of the various channels may differ significantly which require even larger slot tables to allocate bandwidth at a finer granularity. The cost of the slot tables and thus of the network interfaces and thus of the network as a whole highly depends on the number of slots in the slot tables.
Therefore it is an object of the present invention is to provide an arrangement and a method having an improved slot allocation in a Network on Chip environment.
This object is solved by an integrated circuit according to claim 1 and to a method for time slot allocation according to claim 7.
It is proposed to share slots of channels having their origin at the same network interface. At least a part of the slots allocated to channels originating from the same network interface are shared. So a pool of slots is formed, which could be used by all channels together. They will drastically reduce the latency. In particular the latency of channels having only a small number of slots allocated will be reduced. Since the number of slots in the slot table could be reduced by the sharing the memory space requirements in all network components are reduced.
Other aspects and advantages of the invention are defined in the dependent claims.
In a preferred embodiment of the invention all slots allocated to channels originating from the same network interface are shared. This will simplify the control of data transmission of the channels having shared slots.
In a further predetermined embodiment of the invention there is channel scheduler included in the network interface, the scheduler is provided for scheduling the data of the set of channels to the shared slots.
In a further predetermined embodiment of the invention the data of a channel are scheduled by the scheduler depending on the position in a queue. The control of the data transmission could be achieved by queuing the data belonging to set of channels in only one queue. Thus a first come first serve policy is implemented. This will further reduce the chip area required for the input queue in the network interface. Conventionally there is one queue per channel. According to the present invention it is advantageously to input all data of the shared channels in only one queue. The scheduler needs to schedule the data depending on its position in the queue.
In a preferred embodiment of the invention a scheduling of data of the set of channel is performed depending the filling status of the queue of the set of the channels. In an embodiment having a queue for each channels the scheduler will monitor the filling status of the queues of the channels. The first queue not being empty will be scheduled to be transferred. Then the scheduler will monitor the queues from that scheduled queue, wherein only queues are scheduled being not empty.
The invention also relates to a method for allocating time slots for data transmission in an integrated circuit having a plurality of processing modules and a network arranged for coupling the processing modules, and a plurality of network interfaces each being coupled between one of the processing modules and the network comprising the steps of: communicating between processing modules based on time division multiple access using time slots and contention free transmission by using channels; storing a slot table in each network interface including an allocation of a time slot to a certain channel, sharing of time slots allocated to channels originating from the same network interface.
The invention further relates to a data processing system comprising: a plurality of processing modules and a network arranged for coupling the processing modules, comprising: a network interface associated to each processing module which is provided for transmitting data to the network supplied by the associated processing module and for receiving data from the network destined for the associated processing module; wherein the data transmission between processing modules operates based on time division multiple access using time slots and contention free transmission by using a channels; each network interface includes a slot table for storing an allocation of a time slot to a certain channel, a sharing is provided of time slots allocated to channels originating from the same network interface. Accordingly, the time slot allocation may also be performed in a multi- chip network or a system or network with several separate integrated circuits.
Preferred embodiments of the invention are described in detail below, by way of example only, with reference to the following schematic drawings.
Fig. 1 Ashows the basic structure of a network on chip according to the invention;
Fig. IB shows a basic slot allocation for a channel in a NoC; Fig. 2 illustrates a schematic structure for illustrating the contention free routing; Fig. 3 shows a schematic illustration of a network provided with a conventional slot allocation for channels;
Fig. 4 shows the slot allocation according to the present invention; Fig. 5 illustrates a network interface according to the present invention;
The drawings are provided for illustrative purpose only and do not necessarily represent practical examples of the present invention to scale.
In the following the various exemplary embodiments of the invention are described.
Although the present invention is applicable in a broad variety of applications it will be described with the focus put on NoC, especially to AEthereal design. A further field for applying the invention might each NoC providing guaranteed services by using time slots and slot tables. In the following the general architecture of a NoC will be described referring to figures IA, IB and 2.
The embodiments relate to systems on chip SoC, i.e. a plurality of processing modules IP on the same chip communicate with each other via some kind of interconnect. The interconnect is embodied as a network on chip NoC. The network on chip NoC may include wires, bus, time-division multiplexing, switch, and/or routers within a network.
Fig. IA shows an example for an integrated circuit having a network on chip NoC according to the present invention. The system comprises several processing modules IP, also called IP blocks. The processing modules IP could be realized as computation elements, memories or a subsystem which may internally contain interconnect modules. The processing modules IP are each connected to a network NoC via a network interface NI, respectively. The network NoC comprises a plurality of routers R, which are connected to adjacent routers R via respective links Ll, L2, L3. The network interfaces NI are used as interfaces between the processing modules IP and the network NoC. The network interfaces NI are provided to manage the communication of the respective processing modules IP and the network NoC, so that the processing modules IP can perform their dedicated operation without having to deal with the communication with the network NoC or other processing modules IP. The processing modules IP may act as masters IPM, i.e. initiating a request, or may act as slaves IPs, i.e. receiving a request from a master IPM and processing the request accordingly. Fig. IB shows a block diagram of a single connection having one channel and a respective basic slot allocation in a network on chip NoC. In particular, the channel between a master IPM and a slave IPs is shown. This connection path is realized by a network interface NI associated to the master IPM, two routers, and a network interface NI associated to a slave IPs. The network interface NI associated to the master IPM comprises a time slot allocation unit SA. Alternatively, the network interface NI associated to the slave IPs may also comprise a time slot allocation unit SA. A first link Ll is present between the network interface NI associated to the master IPM and a first router R, a second link L2 is present between the two routers R, and a third link L3 is present between a router R and the network interface NI associated to the slave IPs. Three slot tables STl - ST3 for the output ports of the respective network components NI, R, R are also shown. These slot tables ST are preferably implemented on the output side, i.e. the data producing side, of the network elements NI, R, R. For each requested slot s, one slot s is reserved in each slot table ST of the links along the connection path. All these slots s must be free, i.e., not reserved by other channels. Since the data advance from one network component to another each slot, starting from slot s=l, the next slot along the connection must be reserved at slot s=2 and then at slot s=3. The inputs for the slot allocation determination performed by the time slot allocation unit SA are the network topology, like network components, with their interconnection, and the slot table size, and the connection set. For every connection, its paths and its bandwidth, latency, jitter, and/or slot requirements are given. Each of these channels is set on an individual path, and may comprise different links having different bandwidth, latency, jitter, and/or slot requirements. To provide time related guarantees, slots must be reserved for the links as shown in fig. IB. Different slots can be reserved for different connections or channels by means of TDMA. Data for a connection is then transferred over consecutive links along the connection in consecutive slots. Fig. 2 illustrates a more detailed example for a contention free routing.
There are only two processing modules IPA and IPB. Each processing modules IPA and IPB is transmitting data using different channels. The processing modules IPA and IPB are connected via their respective network interfaces NIA and NIB to the NoC represented by the two routers R. Each of the network interfaces NIA and NIB includes a slot table STA and STB. Channel a for processing module IPA has two slots 0, 2 allocated in the slot table STA. Channel b for IPB has one slot 1 allocated. The paths for channel a and b are indicated by the solid and open headed arrows, respectively. The slots s are reserved in such a way that flits do not content in the network. This is indicated by the numbers denoted next to the arrows. They represent the slots s at which the links are reserved. This will be explained in detail for the path of the flits transmitted by processing module IPA. At slot 0 and 2 the link between the network interface NIA and the first router R is reserved for the flits for channel a. For the next step the link between the two routers R is reserved during slot 1 and 3 for data from processing module IPA. During slot 2 that link is reserved for channel b. Since a slot table ST has only four positions for allocating slots s to channels a, b. The slots 2 and 0 are reserved for channel a for the outgoing flits from the right side router R. In the not shown slot table of that right side router R slot 3 is reserved for channel 3. This shows that no positions in the slot tables ST are allocated that flits will content. By this procedure a guaranteed throughput could be provided. However the small example illustrates also the difficulties or effort for allocating the slots s to channels a, b throughout the NoC.
The underlying problem of high latency will be illustrated referring to Fig. 3 showing an exemplary network. Due to the sake of clarity only one IP and the associated network interface NI are shown. The remaining boxes represent routers Rl 1-R44 of the NoC, wherein only the routers having traffic are designated respectively. The processing module IP needs four channels a, b, c, and d. The 4x4 mesh represents the network NoC including the routers Rl 1-R44. The links between the routers Rl 1-R44 are not drawn for clarity. The slot table ST of the network interface NI of the processing module IP includes 40 slots. The worst case waiting time at the head of the queue for channel a is the duration of 39 slots. Each channel a, b, c and d require a bandwidth requirement of 1/40, 2/40, 3/40 and 4/40 of the bandwidth capacity of the links, respectively. Because bandwidth allocation is done at a granularity of 1/40 of the link bandwidth, the slot table ST requires at least 40 slots. As channel a has only 1 of the 40 slots, the worst case waiting time for a flit at the head of the queue for channel a in the network interface NI is the duration of 39 slots. When flits are injected into the network, the latency is the number of hops in the router network multiplied by the duration of a slot. For a large NoC the maximum number of hops is 20. This means that for a small slot- table the worst case waiting time for this small example already is dominant. The numbers nearby the arrows of the respective channels a-d indicate slot positions in the slot tables which need to be reserved in the respective slot table of the outputting network component (NI or router). The allocation of slots to the respective channels a-d between NI and Rl 1 could be derived from the slot table ST. For channels c and d the slots 4-6 and 7-10 are reserved between RIl und R12. Between R12 and R13 slots 5-7 are reserved for channel c and slots 8-11 are reserved for channel d. Between RIl and R21 the slot 1 is reserved for channel a and the slots 2, 3 are reserved for channel b etc.
In the following the present invention will be explained referring to fig. 4. The solution that is proposed here is to allocate bandwidth for a set of channels a-d originating from the same NI. Instead of reserving a slot for each of those channels a-d individually, slots are reserved for the whole set of channels a-d. So, each of the channels a, b, c, or d, may access the network in slots 0 • • • 9. A local arbitration mechanism is required when more than one of these channels a-d want to access the same slot. This is explained below. The ten slots 0-9 allocated to the set are now designated by S. The ten slots S can be redistributed in the slot table ST. A good redistribution will place these slots S at equal distances in the slot table ST with possibly a minor over allocation of slots. This means that the ten slots S are located at slots 0, 4, 8, • • •, 36. However this distribution not only minimizes the worst case waiting time for a slot, but also allows to reduce the size of slot table by a factor of ten. This will cause a strong reduction of memory space required for the slot tables in each of the participating network components NI, R11-R44, etc. The reduced slot table ST has four slots only, and one of these slots 0-3 is assigned to channel set. A complete traversal of the small slot table is thus four slots, and the slot for channel set is thus available every four slots, which is the same as the example in which the ten slots were nicely distributed over the forty slots. Since all channels outgoing from the network interface NI are combined in that channel set, the rest of the slots in the slot table are used for channels not outgoing from the respective network interface NI. When multiple channels a-d are combined into a channel set, some mechanism is required to schedule the data sequentially onto the network. There are basically two approaches for that. However before explaining the mechanism for scheduling the data of the multiple channels the structure of a network interface NI will be explained referring to fig. 5. Fig. 5 illustrates the components of a network interface NI. However, only the transmitting direction of the NI is illustrated. The part for receiving and depacketizing data packets is not illustrated. The network interface NI comprises flow control means including an input queue 44, a remote space register 46, a request generator 45, a routing information register 47, a credit counter 49, a slot table 54, a slot scheduler 55, a header unit 48, a header insertion unit 52 as well as a packet length unit 51 and an output multiplexer 50.
The NI receives the data at its input port 42 from the transmitting processing module IP. The NI outputs the packaged data at its output 43 to the router in form of a data sequence. The data to be transmitted are supplied to the queue 44. The first data in the queue 44 is monitored by the request generator 45. The request generator 45 detects the data and generates a request req_i based on the queue filling and the available remote space as stored in the remote space register 46. The request req_i for the queue is provided to the slot scheduler 55 for selecting the queue. The selection is may be performed by the slot scheduler 55 based on information from the slot table 54 and based on information of the used arbitration mechanism for controlling the set of channels. The scheduler 55 detects whether the data in the queue belongs to a channel a-d having shared slots or belonging to data which are not part of shared channel set slots. As soon as the queue is selected in the scheduler 55 it is provided to a unit 51 which increments the packet lengths and to the header insertion unit 52, which controls whether a header H needs to be inserted or not. Routing information like the addresses is stored in a configurable routing information register 47. The credit counter 49 is incremented when data is consumed in the output queue and is decremented when new headers H are sent with credit value incorporated in the headers H. The routing information from the routing information register 47 as well as the value of the credit counter 49 is forwarded to the header unit 48 and form part of the header H. The header unit 48 receives the credit value and routing info and outputs the header data to the output multiplexer 50. The output multiplexer 50 multiplexes the data provided by the selected queue and the header info hdr provided from the header unit 48. When a data package is sent out the packet length is reset.
As shown in fig. 5 the request generator detects whether data are filled in one of the queue. The data from the IP are not demultiplexed into multiple queues, but to keep all the data of the channel set in the same queue 44. This automatically implements a
FCFS (first come first serve) policy and reduce the queuing cost significantly. The information that was used to control the de-multiplexer in the conventional architecture must now be queued in parallel to the data queue or in the same queue and increasing the word width of the queue. This control information reflects the channel ID in the channel set and is used to, e.g., select the path of the channel.
A further not illustrated mechanism could be that the scheduler 55 may use a first-come first-serve (FCFS) policy. When this policy is used the order in which the IP writes its data to the NI is queued. The first element in the queue 44 then indicates from which data queue the data may come. Note that the FCFS policy is a bit harder to use when the channel set is made from data coming from multiple IP blocks.
An alternative could be a simple round-robin (RR) scheduler that selects the first queue (the first from the previously selected queue) in the channel set that is non-empty.
One advantage of the method is that latency can be reduced significantly. In the example given, the worst case waiting time for a slot is reduced by a factor of ten. And the higher the ratio of the total bandwidth and the lowest bandwidth of a group of channels initiating from the same NI, the higher the latency reduction gets.
Another advantage is that this scheme does not require that all the channels in the set have both the same source and same destination. All that is required is that the channels have the same source.
Yet another advantage is that this scheme allows to reduce the size of the slot table. The example in this document shows a reduction of a factor of ten.
Yet another advantage is that this scheme allows to reduce the number of queues in the network interface. Referring to this example, one queue needs to be used instead of four.
Previous two advantages reduce the cost of the NI significantly, as the cost of the slot table and queues are dominant in the NI. Moreover, in practical networks it was further found that the cost of the NI dominates.
The only one disadvantage is that the more the channel set diverges, the more the overallocation of slots for the channels is required.
In systems in which the communication of data streams is done via shared memory the application of the invention is very important. In these schemes there are many processing modules writing and reading from a shared memory or multiple memories in general. It is typical of processing modules (CPUs) to have non- blocking writes and blocking reads. And hence the performance of the system depends highly on the latency of the reads. As the reads represents many data streams, all origination from the memory or memory controller the presented invention is very beneficial. Since there are many channels originating from the memory latency is reduced significantly, the slot-table size can be reduced significantly and the queue cost can be reduced significantly.
As all data streams go back and forth to memory, the overallocation is higher as one goes closer to the processing modules. But as all the streaming goes via memory, this overallocation is not a problem at all. The invention is explained in the context of multiple synchronized
TDMA however it is also applicable for single TDMA systems. In general it is applicable to interconnect structures basing on connections and providing guarantees.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Furthermore, any reference signs in the claims shall not be construed as limiting the scope of the claims.

Claims

CLAIMS:
1. Integrated circuit comprising a plurality of processing modules (IP) and a network (NoC) arranged for coupling processing modules (IP), comprising:
• the processing module (IP) includes an associated network interface (NI) which is provided for transmitting data to the network (NoC) supplied by the associated processing module and for receiving data from the network (NoC) destined for the associated processing module;
• wherein the data transmission between processing modules (IP) operates based on time division multiple access (TDMA) using time slots (S) and contention free transmission by using channels (a-d); • each network interface (NI) includes a slot table (ST) for storing an allocation of a time slot to a certain channel (a-d), wherein at least a part of the time slots (0-9) allocated to channels (a-d) originated from the same network interface (NI) are shared for transmission of data of the set of channels (a-d).
2. Integrated circuit as claimed in claim 1, wherein all slots (0-9) allocated to the channels (a-d) are shared and are used in common for data transmission of the set of channels (a-d) from the same network interface (NI).
3. Integrated circuit as claimed in claim 1 or 2, including a scheduler (55) included in the network interface (NI), the scheduler (55) is provided for scheduling the data of the set of channels to the shared slots (S).
4. Integrated circuit as claimed in one of the preceding claims, wherein data of a channels (a-d) are scheduled by the scheduler (55) depending on the position in a queue (44).
5. Integrated circuit as claimed in one of the preceding claims, wherein a scheduling of data of the set of channel is performed depending the filling status of the queue (44) of the set of the channels.
6. Integrated circuit as claimed in one of the preceding claims, wherein the data of channels allocated to the set of channels is queued in a single queue (44).
7. Method for allocating time slots for data transmission in a integrated circuit having a plurality of processing modules (IP) and a network (NoC) arranged for coupling the processing modules (IP), and a plurality of network interfaces (NI) each being coupled between one of the processing modules (IP) and the network (NoC) comprising the steps of: • communicating between processing modules (IP) based on time division multiple access (TDMA) using time slots and contention free transmission by using channels (a-d);
• storing a slot table (ST) in each network interface (NI) including an allocation of a time slot to a certain channel (a-d), • sharing of time slots (S) allocated to channels originating from the same network interface (NI).
8. Data processing system comprising:
• a plurality of processing modules (IP) and a network (NoC) arranged for coupling the processing modules (IP), comprising:
• a network interface (NI) associated to each processing module (IP) which is provided for transmitting data to the network (NoC) supplied by the associated processing module and for receiving data from the network (NoC) destined for the associated processing module; • wherein the data transmission between processing modules (IP) operates based on time division multiple access (TDMA) using time slots and contention free transmission by using a channels (a-d);
• each network interface (NI) includes a slot table (ST) for storing an allocation of a time slot to a certain channel (a-d), • a sharing is provided of time slots (S) allocated to channels (a-d) originating from the same network interface (NI).
PCT/IB2006/051012 2005-04-06 2006-04-04 Network-on-chip environment and method for reduction of latency WO2006106475A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/910,749 US20080186998A1 (en) 2005-04-06 2006-04-04 Network-On-Chip Environment and Method for Reduction of Latency
EP06727812A EP1869844A1 (en) 2005-04-06 2006-04-04 Network-on-chip environment and method for reduction of latency
JP2008504892A JP2008535435A (en) 2005-04-06 2006-04-04 Network-on-chip environment and delay reduction method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05102702.7 2005-04-06
EP05102702 2005-04-06

Publications (1)

Publication Number Publication Date
WO2006106475A1 true WO2006106475A1 (en) 2006-10-12

Family

ID=36613481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/051012 WO2006106475A1 (en) 2005-04-06 2006-04-04 Network-on-chip environment and method for reduction of latency

Country Status (4)

Country Link
US (1) US20080186998A1 (en)
EP (1) EP1869844A1 (en)
JP (1) JP2008535435A (en)
WO (1) WO2006106475A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006126127A2 (en) 2005-05-26 2006-11-30 Nxp B.V. Electronic device and method of communication resource allocation
WO2007010461A2 (en) * 2005-07-19 2007-01-25 Koninklijke Philips Electronics N.V. Electronic device and method of communication resource allocation
EP2063581A1 (en) * 2007-11-20 2009-05-27 STMicroelectronics (Grenoble) SAS Transferring a stream of data between first and second electronic devices via a network on-chip
US8499029B1 (en) 2008-12-23 2013-07-30 International Business Machines Corporation Management of process-to-process communication requests
US8521895B2 (en) 2008-12-23 2013-08-27 International Business Machines Corporation Management of application to application communication requests between data processing systems
US9009214B2 (en) 2008-12-23 2015-04-14 International Business Machines Corporation Management of process-to-process inter-cluster communication requests
US9098354B2 (en) 2008-12-23 2015-08-04 International Business Machines Corporation Management of application to I/O device communication requests between data processing systems
WO2016099782A1 (en) * 2014-12-17 2016-06-23 Intel Corporation Pointer chasing across distributed memory
CN107800700A (en) * 2017-10-27 2018-03-13 中国科学院计算技术研究所 A kind of router and network-on-chip Transmission system and method

Families Citing this family (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602005008573D1 (en) * 2004-04-05 2008-09-11 Koninkl Philips Electronics Nv Integrated circuit and method for time slot allocation
US7320090B2 (en) * 2004-06-09 2008-01-15 International Business Machines Corporation Methods, systems, and media for generating a regression suite database
KR100737943B1 (en) * 2006-09-13 2007-07-13 삼성전자주식회사 Apparatus for controlling response signal of network-on-chip and method using the same
WO2008038235A2 (en) * 2006-09-27 2008-04-03 Ecole Polytechnique Federale De Lausanne (Epfl) Method to manage the load of peripheral elements within a multicore system
US20080273475A1 (en) * 2007-05-03 2008-11-06 Microsoft Corporation Reconfigurable computer bus
US8261025B2 (en) 2007-11-12 2012-09-04 International Business Machines Corporation Software pipelining on a network on chip
US8526422B2 (en) * 2007-11-27 2013-09-03 International Business Machines Corporation Network on chip with partitions
US7917703B2 (en) * 2007-12-13 2011-03-29 International Business Machines Corporation Network on chip that maintains cache coherency with invalidate commands
US8473667B2 (en) * 2008-01-11 2013-06-25 International Business Machines Corporation Network on chip that maintains cache coherency with invalidation messages
US8010750B2 (en) * 2008-01-17 2011-08-30 International Business Machines Corporation Network on chip that maintains cache coherency with invalidate commands
US8018466B2 (en) * 2008-02-12 2011-09-13 International Business Machines Corporation Graphics rendering on a network on chip
US8490110B2 (en) * 2008-02-15 2013-07-16 International Business Machines Corporation Network on chip with a low latency, high bandwidth application messaging interconnect
US7913010B2 (en) * 2008-02-15 2011-03-22 International Business Machines Corporation Network on chip with a low latency, high bandwidth application messaging interconnect
US20090245257A1 (en) * 2008-04-01 2009-10-01 International Business Machines Corporation Network On Chip
US8078850B2 (en) * 2008-04-24 2011-12-13 International Business Machines Corporation Branch prediction technique using instruction for resetting result table pointer
US20090271172A1 (en) * 2008-04-24 2009-10-29 International Business Machines Corporation Emulating A Computer Run Time Environment
US8423715B2 (en) 2008-05-01 2013-04-16 International Business Machines Corporation Memory management among levels of cache in a memory hierarchy
US7861065B2 (en) * 2008-05-09 2010-12-28 International Business Machines Corporation Preferential dispatching of computer program instructions
US20090282211A1 (en) * 2008-05-09 2009-11-12 International Business Machines Network On Chip With Partitions
US8214845B2 (en) * 2008-05-09 2012-07-03 International Business Machines Corporation Context switching in a network on chip by thread saving and restoring pointers to memory arrays containing valid message data
US7991978B2 (en) * 2008-05-09 2011-08-02 International Business Machines Corporation Network on chip with low latency, high bandwidth application messaging interconnects that abstract hardware inter-thread data communications into an architected state of a processor
US8392664B2 (en) * 2008-05-09 2013-03-05 International Business Machines Corporation Network on chip
US8020168B2 (en) * 2008-05-09 2011-09-13 International Business Machines Corporation Dynamic virtual software pipelining on a network on chip
US8494833B2 (en) * 2008-05-09 2013-07-23 International Business Machines Corporation Emulating a computer run time environment
US7958340B2 (en) * 2008-05-09 2011-06-07 International Business Machines Corporation Monitoring software pipeline performance on a network on chip
US8230179B2 (en) * 2008-05-15 2012-07-24 International Business Machines Corporation Administering non-cacheable memory load instructions
US8040799B2 (en) * 2008-05-15 2011-10-18 International Business Machines Corporation Network on chip with minimum guaranteed bandwidth for virtual communications channels
US8438578B2 (en) 2008-06-09 2013-05-07 International Business Machines Corporation Network on chip with an I/O accelerator
US8195884B2 (en) 2008-09-18 2012-06-05 International Business Machines Corporation Network on chip with caching restrictions for pages of computer memory
US7992043B2 (en) * 2008-10-22 2011-08-02 International Business Machines Corporation Software debugger for packets in a network on a chip
US20100158023A1 (en) * 2008-12-23 2010-06-24 Suvhasis Mukhopadhyay System-On-a-Chip and Multi-Chip Systems Supporting Advanced Telecommunication Functions
US20100191911A1 (en) * 2008-12-23 2010-07-29 Marco Heddes System-On-A-Chip Having an Array of Programmable Processing Elements Linked By an On-Chip Network with Distributed On-Chip Shared Memory and External Shared Memory
WO2010074872A1 (en) * 2008-12-23 2010-07-01 Transwitch Corporation System-on-a-chip and multi-chip systems supporting advanced telecommunications and other data processing applications
US20100161938A1 (en) * 2008-12-23 2010-06-24 Marco Heddes System-On-A-Chip Supporting A Networked Array Of Configurable Symmetric Multiprocessing Nodes
US20100162265A1 (en) * 2008-12-23 2010-06-24 Marco Heddes System-On-A-Chip Employing A Network Of Nodes That Utilize Logical Channels And Logical Mux Channels For Communicating Messages Therebetween
US8370855B2 (en) * 2008-12-23 2013-02-05 International Business Machines Corporation Management of process-to-process intra-cluster communication requests
US20100158005A1 (en) * 2008-12-23 2010-06-24 Suvhasis Mukhopadhyay System-On-a-Chip and Multi-Chip Systems Supporting Advanced Telecommunication Functions
KR101191673B1 (en) * 2012-04-12 2012-10-17 서울과학기술대학교 산학협력단 Adaptive error correcting apparatus based on network-on-chip
JP2014078214A (en) * 2012-09-20 2014-05-01 Nec Corp Schedule system, schedule method, schedule program, and operating system
US8885510B2 (en) 2012-10-09 2014-11-11 Netspeed Systems Heterogeneous channel capacities in an interconnect
US9571402B2 (en) * 2013-05-03 2017-02-14 Netspeed Systems Congestion control and QoS in NoC by regulating the injection traffic
US9471726B2 (en) 2013-07-25 2016-10-18 Netspeed Systems System level simulation in network on chip architecture
US9473388B2 (en) 2013-08-07 2016-10-18 Netspeed Systems Supporting multicast in NOC interconnect
US20150049758A1 (en) * 2013-08-13 2015-02-19 Utah State University Hot carrier injection tolerant network on chip router architecture
US10193827B2 (en) 2013-08-13 2019-01-29 Dean Michael Ancajas Hot carrier injection tolerant network on chip router architecture
WO2015024680A1 (en) * 2013-08-21 2015-02-26 Siemens Ag Österreich Method and circuit arrangement for temporally limiting and separating access in a system on a chip
US9699079B2 (en) 2013-12-30 2017-07-04 Netspeed Systems Streaming bridge design with host interfaces and network on chip (NoC) layers
US9473415B2 (en) 2014-02-20 2016-10-18 Netspeed Systems QoS in a system with end-to-end flow control and QoS aware buffer allocation
US9553762B1 (en) * 2014-06-26 2017-01-24 Altera Corporation Network-on-chip with fixed and configurable functions
US9742630B2 (en) 2014-09-22 2017-08-22 Netspeed Systems Configurable router for a network on chip (NoC)
US9571341B1 (en) 2014-10-01 2017-02-14 Netspeed Systems Clock gating for system-on-chip elements
US9660942B2 (en) 2015-02-03 2017-05-23 Netspeed Systems Automatic buffer sizing for optimal network-on-chip design
US9444702B1 (en) 2015-02-06 2016-09-13 Netspeed Systems System and method for visualization of NoC performance based on simulation output
US9568970B1 (en) 2015-02-12 2017-02-14 Netspeed Systems, Inc. Hardware and software enabled implementation of power profile management instructions in system on chip
US9928204B2 (en) 2015-02-12 2018-03-27 Netspeed Systems, Inc. Transaction expansion for NoC simulation and NoC design
US10050843B2 (en) 2015-02-18 2018-08-14 Netspeed Systems Generation of network-on-chip layout based on user specified topological constraints
US10348563B2 (en) 2015-02-18 2019-07-09 Netspeed Systems, Inc. System-on-chip (SoC) optimization through transformation and generation of a network-on-chip (NoC) topology
US9825809B2 (en) 2015-05-29 2017-11-21 Netspeed Systems Dynamically configuring store-and-forward channels and cut-through channels in a network-on-chip
US9864728B2 (en) 2015-05-29 2018-01-09 Netspeed Systems, Inc. Automatic generation of physically aware aggregation/distribution networks
US10218580B2 (en) 2015-06-18 2019-02-26 Netspeed Systems Generating physically aware network-on-chip design from a physical system-on-chip specification
US10644958B2 (en) 2016-01-30 2020-05-05 Western Digital Technologies, Inc. All-connected by virtual wires network of data processing nodes
US10222992B2 (en) 2016-01-30 2019-03-05 Western Digital Technologies, Inc. Synchronization method and apparatus for an interconnection network using parallel-headerless TDMA routing
US10666578B2 (en) * 2016-09-06 2020-05-26 Taiwan Semiconductor Manufacturing Company Limited Network-on-chip system and a method of generating the same
US10452124B2 (en) 2016-09-12 2019-10-22 Netspeed Systems, Inc. Systems and methods for facilitating low power on a network-on-chip
US20180159786A1 (en) 2016-12-02 2018-06-07 Netspeed Systems, Inc. Interface virtualization and fast path for network on chip
US10313269B2 (en) 2016-12-26 2019-06-04 Netspeed Systems, Inc. System and method for network on chip construction through machine learning
US10063496B2 (en) 2017-01-10 2018-08-28 Netspeed Systems Inc. Buffer sizing of a NoC through machine learning
US10084725B2 (en) 2017-01-11 2018-09-25 Netspeed Systems, Inc. Extracting features from a NoC for machine learning construction
US10469337B2 (en) 2017-02-01 2019-11-05 Netspeed Systems, Inc. Cost management against requirements for the generation of a NoC
US10298485B2 (en) 2017-02-06 2019-05-21 Netspeed Systems, Inc. Systems and methods for NoC construction
US10896476B2 (en) 2018-02-22 2021-01-19 Netspeed Systems, Inc. Repository of integration description of hardware intellectual property for NoC construction and SoC integration
US10547514B2 (en) 2018-02-22 2020-01-28 Netspeed Systems, Inc. Automatic crossbar generation and router connections for network-on-chip (NOC) topology generation
US10983910B2 (en) * 2018-02-22 2021-04-20 Netspeed Systems, Inc. Bandwidth weighting mechanism based network-on-chip (NoC) configuration
US11144457B2 (en) * 2018-02-22 2021-10-12 Netspeed Systems, Inc. Enhanced page locality in network-on-chip (NoC) architectures
US11176302B2 (en) 2018-02-23 2021-11-16 Netspeed Systems, Inc. System on chip (SoC) builder
US11023377B2 (en) 2018-02-23 2021-06-01 Netspeed Systems, Inc. Application mapping on hardened network-on-chip (NoC) of field-programmable gate array (FPGA)
US10802974B2 (en) * 2018-10-15 2020-10-13 Texas Instruments Incorporated Virtual network pre-arbitration for deadlock avoidance and enhanced performance
EP3895398A1 (en) * 2018-12-12 2021-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Communication system with de-jitter buffer for reducing jitter

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004034176A2 (en) * 2002-10-08 2004-04-22 Koninklijke Philips Electronics N.V. Integrated circuit and method for establishing transactions

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5935232A (en) * 1995-11-20 1999-08-10 Advanced Micro Devices, Inc. Variable latency and bandwidth communication pathways
US6910092B2 (en) * 2001-12-10 2005-06-21 International Business Machines Corporation Chip to chip interface for interconnecting chips
US8020163B2 (en) * 2003-06-02 2011-09-13 Interuniversitair Microelektronica Centrum (Imec) Heterogeneous multiprocessor network on chip devices, methods and operating systems for control thereof
US7380035B1 (en) * 2005-03-24 2008-05-27 Xilinx, Inc. Soft injection rate control for buses or network-on-chip with TDMA capability

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004034176A2 (en) * 2002-10-08 2004-04-22 Koninklijke Philips Electronics N.V. Integrated circuit and method for establishing transactions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOLOTIN E ET AL: "QNoC: QoS architecture and design process for network on chip", JOURNAL OF SYSTEMS ARCHITECTURE, ELSEVIER SCIENCE PUBLISHERS BV., AMSTERDAM, NL, vol. 50, no. 2-3, February 2004 (2004-02-01), pages 105 - 128, XP004492175, ISSN: 1383-7621 *
DIELISSEN J ET AL: "Concepts and Implementation of the Philips Network-on-Chip", -, 13 November 2003 (2003-11-13), XP002330547 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006126127A2 (en) 2005-05-26 2006-11-30 Nxp B.V. Electronic device and method of communication resource allocation
WO2006126127A3 (en) * 2005-05-26 2007-03-29 Koninkl Philips Electronics Nv Electronic device and method of communication resource allocation
US7809024B2 (en) 2005-05-26 2010-10-05 St-Ericsson Sa Electronic device and method of communication resource allocation
JP4756158B2 (en) * 2005-05-26 2011-08-24 エスティー‐エリクソン、ソシエテ、アノニム Communication resource allocation electronic device and method
WO2007010461A2 (en) * 2005-07-19 2007-01-25 Koninklijke Philips Electronics N.V. Electronic device and method of communication resource allocation
WO2007010461A3 (en) * 2005-07-19 2007-05-10 Koninkl Philips Electronics Nv Electronic device and method of communication resource allocation
EP2063581A1 (en) * 2007-11-20 2009-05-27 STMicroelectronics (Grenoble) SAS Transferring a stream of data between first and second electronic devices via a network on-chip
US8521895B2 (en) 2008-12-23 2013-08-27 International Business Machines Corporation Management of application to application communication requests between data processing systems
US8499029B1 (en) 2008-12-23 2013-07-30 International Business Machines Corporation Management of process-to-process communication requests
US8560594B2 (en) 2008-12-23 2013-10-15 International Business Machines Corporation Management of process-to-process communication requests
US9009214B2 (en) 2008-12-23 2015-04-14 International Business Machines Corporation Management of process-to-process inter-cluster communication requests
US9098354B2 (en) 2008-12-23 2015-08-04 International Business Machines Corporation Management of application to I/O device communication requests between data processing systems
WO2016099782A1 (en) * 2014-12-17 2016-06-23 Intel Corporation Pointer chasing across distributed memory
US9940236B2 (en) 2014-12-17 2018-04-10 Intel Corporation Pointer chasing across distributed memory
CN107800700A (en) * 2017-10-27 2018-03-13 中国科学院计算技术研究所 A kind of router and network-on-chip Transmission system and method
CN107800700B (en) * 2017-10-27 2020-10-27 中国科学院计算技术研究所 Router and network-on-chip transmission system and method

Also Published As

Publication number Publication date
US20080186998A1 (en) 2008-08-07
EP1869844A1 (en) 2007-12-26
JP2008535435A (en) 2008-08-28

Similar Documents

Publication Publication Date Title
US20080186998A1 (en) Network-On-Chip Environment and Method for Reduction of Latency
US20080205432A1 (en) Network-On-Chip Environment and Method For Reduction of Latency
US20080232387A1 (en) Electronic Device and Method of Communication Resource Allocation
US7564865B2 (en) Weight factor based allocation of time slot to use link in connection path in network on chip IC
JP4791530B2 (en) Electronic device and flow control method
Feliciian et al. An asynchronous on-chip network router with quality-of-service (QoS) support
EP1759559B1 (en) Data processing system and method for time slot allocation
US7809024B2 (en) Electronic device and method of communication resource allocation
EP1891778B1 (en) Electronic device and method of communication resource allocation.
US20070010205A1 (en) Time-division multiplexing circuit-switching router
EP1552669A1 (en) Integrated circuit and method for establishing transactions
US20080123666A1 (en) Electronic Device And Method Of Communication Resource Allocation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006727812

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 11910749

Country of ref document: US

Ref document number: 2008504892

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Ref document number: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Ref document number: RU

WWP Wipo information: published in national office

Ref document number: 2006727812

Country of ref document: EP