WO2016144953A1 - Monitoring errors during idle time in ethernet pcs - Google Patents

Monitoring errors during idle time in ethernet pcs Download PDF

Info

Publication number
WO2016144953A1
WO2016144953A1 PCT/US2016/021366 US2016021366W WO2016144953A1 WO 2016144953 A1 WO2016144953 A1 WO 2016144953A1 US 2016021366 W US2016021366 W US 2016021366W WO 2016144953 A1 WO2016144953 A1 WO 2016144953A1
Authority
WO
WIPO (PCT)
Prior art keywords
idle
counter
link
pcs
circuitry
Prior art date
Application number
PCT/US2016/021366
Other languages
French (fr)
Inventor
Adee O. RAN
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Publication of WO2016144953A1 publication Critical patent/WO2016144953A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0847Transmission error
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/20Arrangements for detecting or preventing errors in the information received using signal quality detector
    • H04L1/203Details of error rate determination, e.g. BER, FER or WER

Definitions

  • Gb/s Ethernet 25 Gigabit per second (Gb/s) Ethernet is a new standard for Ethernet being developed in the Institute of Electrical and Electronics Engineers (IEEE) P802.3by task force.
  • IEEE Institute of Electrical and Electronics Engineers
  • PHY Physical Layer
  • BER bit error ratio
  • the high data rate makes this BER requirement more challenging than in past generations, and for typical links it is almost certain that errors will occur occasionally (such that the BER is not practically equal to 0).
  • Ethernet frames include a 32-bit CRC (Cyclic Redundancy Check) field, so an Ethernet frame that was corrupted by errors is practically guaranteed to be detected by the MAC (Media Access Channel) and discarded.
  • the MAC typically has counters for bad CRC frames, and these counters can be monitored to check the frame error rate. For example, a fully utilized 10 gigabit Ethernet link with the worst case BER is expected to encounter a CRC error approximately once in every 100 seconds. Many systems expect much lower error rates (such as less than once a week).
  • the MAC CRC counters advance faster than expected, it can indicate a hardware problem.
  • Network management systems can monitor the counters across multiple network nodes and trigger some maintenance action (such as board or cable replacement) on any indication of a high CRC error count.
  • CRC error counting is a well-known and widely used feature of the Ethernet MAC.
  • a problem with MAC CRC error counting is that many Ethernet links are not fully utilized and may have long idle times between packets. Errors during idle times do not corrupt data so are considered safe, but they also do not cause CRC errors, so their rate cannot be monitored by the MAC counters.
  • Some Ethernet PHYs include forward error correction (FEC) codes, and the FEC decoding function can monitor the number of corrected errors. In a bad link, errors occur very often, and while the FEC decoder can recover the MAC data, it also counts the number of corrections.
  • the Ethernet standard includes error counters for each kind of FEC. These counters provide error rate information with very fine resolution and operate in both data and idle times.
  • FEC decoders increase the data delay (latency) in the receiver, and may increase power consumption, so some applications choose to disable them and some products do not even implement them. Therefore error monitoring may not always be available. Specifically, in the 25 Gb/s Ethernet project, although two kinds of FEC encoding are available, there is a strong desire to also enable operation without FEC encoding.
  • This PCS periodically inserts alignment markers into the data stream to enable assembling the multi-lane data in the receiver correctly.
  • the alignment markers include a special field (bit interleaved parity, or BIP) that serves as an error monitoring mechanism at the PCS level. Any single bit error between two alignment markers would cause an incorrect BIP field and increment a BIP error counter.
  • BIP provides good error monitoring capability in both data and idle times. However, it is not available in the single-lane PCS chosen for the 25 Gb/s, which is based on IEEE 802.3-2012 clause 49.
  • Figure 1 is a schematic diagram depicting the relationship between the 10GBASE-R PCS and its associated sublayers
  • Figure 2 is a schematic diagram illustrating the structure of an Ethernet Packet format
  • Figure 3 is a schematic diagram illustrating the relationship of the reconciliation sublayer 102 and the 10 Gigabit Media Independent Interface (XGMII) to the ISO/IEC (IEEE) OSI reference model;
  • XGMII Gigabit Media Independent Interface
  • Figure 4 is a diagram illustrating the parts of an XGMII data stream
  • Figure 5 is a schematic diagram illustrating mapping of data octets to lanes for the XGMII transmission and reception of bits in an XGMII data stream;
  • Figure 6 is a table depicting control codes user by the PCS sublayer.
  • Figure 7 is a PCS receive state machine diagram in which a new control block counter and a new error counter are implemented to determine PCS BER during idle times, according to one embodiment.
  • PCS Physical Coding Sublayer
  • inventions are disclosed for monitoring errors in high-speed Ethernet links during idle periods.
  • the techniques are implemented at the Ethernet PCS sublayer.
  • the Ethernet PCS sublayer (such as the 10GBASE- R PCS described in clause 49 of IEEE802.3-2012) is implemented in the Ethernet Physical layer (Ethernet PHY).
  • Figure 1 depicts the relationship between the 10GBASE-R PCS and its associated sublayers.
  • the techniques are implemented for the new 25 Gb/s standard for Ethernet being developed in the IEEE P802.3by task force.
  • the proposed 25 Gb/s Ethernet employs the a similar PCS as defined by IEEE802.3-2012 clause 49, which defines the
  • FIG. 1 depicts the relationship between the 10GBASE-R PCS and its associated sublayers.
  • a 10GBASE-R PCS sublayer 100 interfaces with a reconciliation sublayer 102 that sits below the MAC layer 104 in the Data Link layer.
  • 10GBASE-R PCS sublayer 100 also interfaces with a serial PMA (Physical Media Attachment) sublayer 106 in the 10GBASE-R PHY.
  • PMA Physical Media Attachment
  • Figure 2 illustrates the structure of an Ethernet Packet format.
  • the fields of the packet include a Preamble 200, a Start Frame Delimiter (SFD) 202, the addresses of the MAC frame's destination address 204 and source address 206, a length or type field 208 to indicate the length or protocol type of the following field that contains the MAC client data 210, a pad field 212 that contains padding if required, and a Frame Check Sequence (FCS) field 214 containing a cyclic redundancy check value to detect errors in a received MAC frame.
  • FCS Frame Check Sequence
  • An optional extension field 216 is added, if required. Of these fields, all are of fixed size except for the MAC Client Data 210, pad field 212 and extension field 216, which may contain an integer number of octets between the minimum and maximum values that are determined by the specific implementation of the MAC.
  • Figure 3 illustrates the relationship of the reconciliation sublayer 102 and the 10 Gigabit Media Independent Interface (XGMII) 300 to the ISO/IEC (IEEE) OSI reference model.
  • the purpose of the XGMII is to provide a simple, inexpensive, and easy-to-implement
  • a 10 Gigabit Attachment Unit Interface may optionally be used to extend the operational distance of the XGMII with reduced pin count (as defined in clause 47).
  • Reconciliation sublayer 102 adapts the bit serial protocols of the MAC to the parallel encodings of 10 Gb/s PHYs (when applied to 10GBASE-R).
  • the 10 Gb/s PCS is specified to the XGMII, so if not implemented, a conforming implementation will behave functionally as if the reconciliation sublayer and XGMII were implemented.
  • the XGMII is proposed to be used for the 25 Gb/s Ethernet as defined in IEEE 802.3by.
  • a new 25G-MII (to be defined in clause 106) that is similar to XGMII is implemented (not shown)
  • Packets transmitted through the XGMII are transferred within the XGMII data stream.
  • the data stream is a sequence of bytes, where each byte conveys either a data octet or control character.
  • the parts of the data are shown in Figure 4, and include an inter-frame, followed by a preamble, and SFD, data, and the EFD.
  • transmission and reception of each bit and mapping of data octets to lanes is as shown in Figure 5.
  • the inter-frame ⁇ inter-frame> period on an XGMII transmit or receive path is an interval during which no frame data activity occurs.
  • the ⁇ inter-frame> corresponding to the MAC interpacket gap begins with the Terminate control character, continues with Idle control characters and ends with the Idle control character prior to a Start control character.
  • the length of the interpacket gap may be changed between the transmitting MAC and receiving MAC by one or more functions (e.g., reconciliation sub-layer (RS) lane alignment, PHY clock rate compensation, or 10GBASE-W data rate adaptation functions).
  • the minimum interpacket gap at the XGMII of the receiving RS is five octets.
  • the signaling of link status information logically occurs in the ⁇ inter-frame> period, as described in IEEE802.3-2012 subclause 46.3.4. IEEE802.3-2012 subclause 46.3.3 describes frame processing when signaling of link status information is initiated or terminated.
  • Idle control characters ⁇ III) are transmitted when idle control characters are received from the XGMII.
  • Idle characters may be added or deleted by the PCS to adapt between clock rates. Ill insertion and deletion occurs in groups of 4. /I/s may be added following idle or ordered sets, but are not added while data is being received. When deleting /I/s, the first four characters after a IT I are not be deleted.
  • LPI control character /LI/ is sent continuously in place of III.
  • LPI control characters are transmitted when LPI control characters are received from the XGMII.
  • LPI characters may be added or deleted by the PCS to adapt between clock rates in a similar manner to idle control characters.
  • /LI/ insertion and deletion occurs in groups of four.
  • /LI/s may only be added following other LPI characters.
  • the ability to send or receive LPI characters is an optional function.
  • Figure 6 is a table 600 depicting control codes user by the PCS sublayer.
  • Table 600 includes a Start control character ISI, a Terminate control character IT I, an Error control character IEI, and a ordered set control character IQI.
  • the Start control character (ISI) indicates the start of a packet, while the terminate control character (IT/) indicates the end of a packet.
  • the ordered_set control characters (IOI) indicate the start of an ordered_set.
  • the Error /E/ is sent whenever an /E/ is received. It is also sent when invalid blocks are received.
  • the /E/ allows physical sublayers such as the XGXS and PCS to propagate received errors.
  • the PCS maps XGMII signals into 66-bit blocks, and vice versa, using a 64B/66B coding scheme.
  • the synchronization headers of the blocks allow establishment of block boundaries by the PCS Synchronization process. Blocks are unobservable and have no meaning outside the PCS.
  • the PCS functions ENCODE and DECODE generate, manipulate, and interpret blocks as provided by the rules in IEEE802.3-2012 subclause 49.2.4.
  • the PCS uses a transmission code to improve the transmission characteristics of information to be transferred across the link and to support transmission of control and data characters.
  • the encodings defined by the transmission code ensure that sufficient transitions are present in the PHY bit stream to make clock recovery possible at the receiver.
  • the encoding also preserves the likelihood of detecting any single or multiple bit errors that may occur during transmission and reception of information.
  • the synchronization headers of the code enable the receiver to achieve block alignment on the incoming PHY bit stream.
  • the 64B/66B transmission code specified for use in this standard has a high transition density and is a run- length-limited code.
  • the Ethernet PCS sublayer (such as the 10GBASE-R PCS) include a basic capability for monitoring errors, called BER monitor. This is a function of the PCS that detects errors only in predefined locations in the data stream (-3% of the total bits) in a short time window (125 microseconds). It is aimed at detecting high BER, which typically occurs when the medium is physically disconnected or when the remote link partner breaks the link. When triggered, it is visible to management. However, its BER threshold is 10 "4 , much worse than the operational requirement, so it provides only an indication of extreme conditions, and does not provide link health assessment.
  • a PCS decoder is used to detect errors during idle times and define a new counter to count these events. This, in addition to CRC error counting at the PCS, provides monitoring of the error rate without the limit of not detecting errors during idle periods.
  • the PCS decoder has special rules for correctness of data blocks. As a result of these rules, a single error in a sequence of idle characters is practically guaranteed to be detected. For example, the PCS is configured to detect invalid sequences of characters and replace them with "error" characters.
  • the PCS decoding converts the serial bit stream into octets (bytes), where each octet can be either data or a control character.
  • octets bytes
  • each octet can be either data or a control character.
  • the transmitter repeatedly sends Idle ⁇ III) control characters. Exiting idle transmission is allowed only by sending "SPD" (start packet delimiter— another control character) followed by data characters. Therefore any error that corrupts an idle character into something other than SPD (idle error) can be detected as an invalid octet.
  • the PCS decoding is specified to convert any block of 64 bits that contains an invalid octet into 8 "error" characters (IEI) so that corrupted data will not reach the MAC.
  • a single error or a burst of errors that occur within in an idle character can change it to any other control character. Almost all other control characters are invalid and cause the current 64-bit block to be converted to 8 error characters IEI. The exception is the error character itself. Any such event can be recoded and counted.
  • IEEE802.3-2012 subclause 49.2.4 contains a full description of the PCS block structure, control character types and rules.
  • Subclause 49.2.13.2.3 includes a definition of R BLOCK TYPE which lists the possible received block types (idle characters are mostly included in blocks of type "C").
  • R BLOCK TYPE is,
  • R BLOCK TYPE ⁇ C, S, T, D, E, LI ⁇
  • This function classifies each 66-bit rx_coded vector as belonging to one of the following types depending on its contents.
  • LI For EEE capability, the LI type is supported where the vector contains a sync header of 10, a block type field of Oxle, and eight control characters of 0x06 (/LI/).
  • the vector contains a sync header of 10 and one of the following:
  • the vector contains a sync header of 10, a block type field of 0x87, 0x99, Oxaa,
  • a control block counter and an error counter are added to the receive state machine defined in clause 49, as shown in a receive state machine diagram 700 Figure 7.
  • Receive state machine diagram 700 is similar to the receive state machine diagram of Figure 49- 17 of IEEE802.3-2012 clause 49, with the addition of a control block counter
  • CRLBLOCK COUNTER 702 and an error counter (ERROR COUNTER) 704.
  • Receive state machine diagram 700 shows the state machine that detects legal character sequences. After a reset, the receive state enters RX_INIT (receiver initialization) state 706.
  • the state may advance to a state RX T from either state of states RX D and RX E.
  • idle characters (R BLOCK TYPE C) keep the state machine in state RX C. While in state RX C, each control block (which includes an idle character or an ordered set of codes) is counted by control block counter 702. The only other valid block type (except for an additional idle) is S, which denotes start of a new frame, and causes transition to state RX D. Any other code cause a transition to state RX E and replacement of the current 8 characters with an "error" value EBLOCK R. While in state RX E, each error is counted by error counter 704.
  • the error counter may be implemented as part of the check performed on received 64-bit blocks, as described in the definition of R BLOCK TYPE above.
  • the PCS input is scrambled data, and it includes a descrambling function.
  • One known feature of the descrambler is that it translates any single error on its input to 3 errors on its output (also known as error multiplication). Since the PCS decodes the idle characters after descrambling, any error even on the scrambled data would cause 3 errors in the decoder. Accordingly, the error counter is expected to advance three times per single error event. An error counter with a value that is not divisible by 3 may indicate a burst of errors.
  • a BER may be calculated. Periodically, each of control block counter 702 and error counter 704 may be reset to prevent overflow.
  • error counters are 32 bits wide and exposed to network management through the standard MDIO interface.
  • the MDIO defines features useful for these counters, such as atomic reads and clear-on-read.
  • a rate of idle errors exceeding some threshold can be specified to break the link, since it signals that the link is unsafe.
  • estimation of the BER from error counters is performed, by monitoring the error counters over some period of time, and using the number of bits received during this time (roughly known from the data rate) as the denominator in the BER expression:
  • the denominator counter could count idle characters (each being 8 bits long), or idle blocks (blocks are 64 bits long, either blocks that contain at least one idle character, or blocks that contain only idle characters, or similar variations). Assuming idle periods are longer than non-idle periods (utilization of ⁇ 50%), the number of idle characters will be approximately equal to 8 times the number of idle blocks, and most bit errors in the receiver will corrupt 3 idle characters residing in 2 idle blocks, due to error multiplication effect of the descrambler. This can be used to estimate the BER as follows
  • An idle character or block counter is expected to advance very quickly, especially when the link is under-utilized (for example, with a 25 Gb/s link that is 100% idle, a 32-bit counter of idle blocks will overflow after 11 seconds, while errors are expected to occur only once per 40 seconds if the BER is 10 "12 ).
  • a counter should be able to accurately count to a high enough value, although the exact value is not important for calculation.
  • a possible solution is to use a long counter, e.g. 48 bits, but report only the most significant part, e.g., the 32 topmost bits. This can be implemented by various well-known mean in efficient ways.
  • the number of idle characters/blocks received can then be approximated by multiplying the counter value by the appropriate power of 2, e.g. 2 16 .
  • FIG. 8 shows an architecture 800 for a network node employing a network chip 802 configured to BER determination during idle periods in accordance with aspects of the embodiments disclosed herein.
  • Network chip 802 comprises PHY (Physical Layer) circuitry 804 including a Physical Coding Sublayer (PCS) module 806, a Physical Medium Attachment (PMA) module 807, a PMD module 808, a BER measurement module 809 including state machine logic 810 for implementing the state machine in Figure 7, a transmitter port 812 including transmitter circuitry 813 and a receiver port 814 including receiver circuitry 815.
  • Network chip 802 further includes a DMA (Direct Memory Access) interface 816, an MAC (Direct Memory Access) interface 816, an mapped to Physical Medium
  • DMA Direct Memory Access
  • I/O interface comprising a Peripheral Component Interconnect Express (PCIe) interface 818, a MAC (Media Access Channel) module 820 and a Reconciliation Sublayer (RS) module 822.
  • PCIe Peripheral Component Interconnect Express
  • MAC Media Access Channel
  • RS Reconciliation Sublayer
  • Network node 800 also comprises a System on a Chip (SoC) 824 including a Central Processing Unit (CPU) 826 having one or more processor cores, coupled to a memory interface 828 and a PCIe interface 830 via an interconnect 832.
  • SoC System on a Chip
  • CPU Central Processing Unit
  • Memory interface 828 is further depicted as being coupled to memory 834.
  • network chip 802, SoC 824 and memory 834 will be mounted on or otherwise operatively coupled to a circuit board 836 that includes wiring traces for coupling these components in communication, as depicted by single lines connecting DMA 816 to memory 834 and PCIe interface 818 to PCIe interface 830 at a PCIe port 838.
  • MAC module 820 is configured to implement aspects of the MAC layer operations performed that are well-known in the art. Similar, RS module 822 is configured to implement reconciliation sub-layer operations.
  • BER measurement module 809 is implemented for determining BER.
  • data is exchanged between PHY transmitter and receiver ports 813 and 815 of node 800 and its link partner, as depicted by a link partner 844 including a receiver port 846 and a transmitter port 848.
  • link partner 844 including a receiver port 846 and a transmitter port 848.
  • the configuration of node 800 and link partner 844 are similar, and are linked in communication via an Ethernet link 850.
  • network chip 802 comprises a 25 Gb/s Ethernet Network Interface Controller (NIC) chip employing a 25GBASE-KR PHY or a 25GBASE-CR PHY.
  • NIC Network Interface Controller
  • the circuitry and components of network chip 802 may also be implemented in other types of chips and components, including SoCs, multi-chip modules, and NIC chips including support for multiple network interfaces (e.g., wired and wireless).
  • aspects of the idle link error-detection embodiments disclosed herein may be implemented hardware (via, e.g., embedded logic), or via a combination of hardware and software.
  • aspects of the operations performed by the embodiments may be implemented via embedded logic in a NIC, large-scale network interface, or the like.
  • aspects of the embodiments disclosed herein may be implemented in other high-speed links. These include, but are not limited to current and future Ethernet links, Peripheral Component Interconnect Express (PCIe) links, Universal Serial Bus (USB) links, Serial ATA (SATA) links, InfiniBand links, RapidIO links, and Intel® OmniPath links.
  • PCIe Peripheral Component Interconnect Express
  • USB Universal Serial Bus
  • SATA Serial ATA
  • InfiniBand links InfiniBand links
  • RapidIO links RapidIO links
  • Intel® OmniPath links Intel® OmniPath links.
  • Ethernet link includes an Ethernet Physical Layer (PHY) including a PCS, and wherein the Ethernet PHY does not include a Forward Error Correction (FEC) sublayer.
  • PHY Ethernet Physical Layer
  • FEC Forward Error Correction
  • Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
  • An apparatus configured to communicate with a link partner over a high-speed link, comprising:
  • PHY Physical Layer
  • a transmitter port including transmitter circuitry
  • a receiver port including receiver circuitry
  • PCS Physical Coding Sublayer
  • Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
  • An apparatus configured to communicate with a link partner over a high-speed link, comprising: Physical Layer (PHY) circuitry and logic, including,
  • a transmitter port including transmitter circuitry
  • a receiver port including receiver circuitry
  • PCS Physical Coding Sublayer
  • MAC Media Access Control
  • RS Reconciliation Sublayer
  • I/O Input/Output
  • BER Bit Error Rate
  • the high-speed link comprises an Ethernet link.
  • the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
  • the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar.
  • an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein.
  • the various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
  • Coupled may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • An embodiment is an implementation or example of the inventions.
  • Reference in the specification to "an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions.
  • the various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments.
  • An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. In addition, embodiments of the present description may be implemented not only within a semiconductor chip such as a NIC, but also within non-transient machine-readable media.
  • the designs described above may be stored upon and/or embedded within non-transient machine readable media associated with a design tool used for designing semiconductor devices.
  • Examples include a netlist formatted in the VHSIC Hardware Description Language (VHDL) language, Verilog language or SPICE language, or other Hardware Description Language.
  • Some netlist examples include: a behavioral level netlist, a register transfer level (RTL) netlist, a gate level netlist and a transistor level netlist.
  • Machine-readable media also include media having layout information such as a GDS-II file.
  • netlist files or other machine-readable media for semiconductor chip design may be used in a simulation environment to perform the methods of the teachings described above.
  • the operations and functions performed by various components described herein may be implemented by software running on a processing element, via embedded hardware or the like, or any combination of hardware and software.
  • Such components may be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, ASICs, DSPs, etc.), embedded controllers, hardwired circuitry, hardware logic, etc.
  • Software content e.g., data, instructions, configuration information, etc.
  • an article of manufacture including computer-readable or machine-readable non-transitory storage medium, which provides content that represents instructions that can be executed. The content may result in a computer performing various functions/operations described herein.
  • a list of items joined by the term "at least one of can mean any combination of the listed terms.
  • the phrase "at least one of A, B or C" can mean A; B; C; A and B; A and C; B and C; or A, B and C.

Abstract

Methods and apparatus for monitoring errors during idle times in high-speed links including a Physical Coding Sublayer (PCS). A denominator counter is implemented in a receive state machine of a PCS to count control blocks primarily consisting of idle control blocks or to count idle characters. An error counter is also implemented in the receive state machine to count errors while in an idle state. While operating the link in idle states, the counts of the denominator counter and error counter are used as inputs to various Bit Error Rate (BER) functions to estimate a BER of the PCS during idle times. The method and apparatus may be implemented in various high-speed link including Ethernet links.

Description

MONITORING ERRORS DURING IDLE TIME IN ETHERNET PCS
BACKGROUND INFORMATION
25 Gigabit per second (Gb/s) Ethernet is a new standard for Ethernet being developed in the Institute of Electrical and Electronics Engineers (IEEE) P802.3by task force. Like other modern Ethernet standards, the Physical Layer (PHY) is specified to operate at a bit error ratio (BER) of less than 10"12. The high data rate makes this BER requirement more challenging than in past generations, and for typical links it is almost certain that errors will occur occasionally (such that the BER is not practically equal to 0).
Errors can cause data loss if they occur within Ethernet frame boundaries. Ethernet frames include a 32-bit CRC (Cyclic Redundancy Check) field, so an Ethernet frame that was corrupted by errors is practically guaranteed to be detected by the MAC (Media Access Channel) and discarded. The MAC typically has counters for bad CRC frames, and these counters can be monitored to check the frame error rate. For example, a fully utilized 10 gigabit Ethernet link with the worst case BER is expected to encounter a CRC error approximately once in every 100 seconds. Many systems expect much lower error rates (such as less than once a week).
If the MAC CRC counters advance faster than expected, it can indicate a hardware problem. Network management systems can monitor the counters across multiple network nodes and trigger some maintenance action (such as board or cable replacement) on any indication of a high CRC error count. Thus, CRC error counting is a well-known and widely used feature of the Ethernet MAC.
A problem with MAC CRC error counting is that many Ethernet links are not fully utilized and may have long idle times between packets. Errors during idle times do not corrupt data so are considered safe, but they also do not cause CRC errors, so their rate cannot be monitored by the MAC counters.
Some Ethernet PHYs include forward error correction (FEC) codes, and the FEC decoding function can monitor the number of corrected errors. In a bad link, errors occur very often, and while the FEC decoder can recover the MAC data, it also counts the number of corrections. The Ethernet standard includes error counters for each kind of FEC. These counters provide error rate information with very fine resolution and operate in both data and idle times. However, FEC decoders increase the data delay (latency) in the receiver, and may increase power consumption, so some applications choose to disable them and some products do not even implement them. Therefore error monitoring may not always be available. Specifically, in the 25 Gb/s Ethernet project, although two kinds of FEC encoding are available, there is a strong desire to also enable operation without FEC encoding.
The 40 Gb/s and 100 Gb/s Ethernet varieties, which operate over multiple physical lanes, use the PCS (Physical Coding Sublayer) defined in clause 82 of IEEE 802.3-2012. This PCS periodically inserts alignment markers into the data stream to enable assembling the multi-lane data in the receiver correctly. The alignment markers include a special field (bit interleaved parity, or BIP) that serves as an error monitoring mechanism at the PCS level. Any single bit error between two alignment markers would cause an incorrect BIP field and increment a BIP error counter. The BIP provides good error monitoring capability in both data and idle times. However, it is not available in the single-lane PCS chosen for the 25 Gb/s, which is based on IEEE 802.3-2012 clause 49.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified:
Figure 1 is a schematic diagram depicting the relationship between the 10GBASE-R PCS and its associated sublayers;
Figure 2 is a schematic diagram illustrating the structure of an Ethernet Packet format; Figure 3 is a schematic diagram illustrating the relationship of the reconciliation sublayer 102 and the 10 Gigabit Media Independent Interface (XGMII) to the ISO/IEC (IEEE) OSI reference model;
Figure 4 is a diagram illustrating the parts of an XGMII data stream;
Figure 5 is a schematic diagram illustrating mapping of data octets to lanes for the XGMII transmission and reception of bits in an XGMII data stream;
Figure 6 is a table depicting control codes user by the PCS sublayer; and
Figure 7 is a PCS receive state machine diagram in which a new control block counter and a new error counter are implemented to determine PCS BER during idle times, according to one embodiment.
DETAILED DESCRIPTION
Embodiments of methods and apparatus for monitoring errors during idle time in Ethernet Physical Coding Sublayer (PCS) are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
For clarity, individual components in the Figures herein may also be referred to by their labels in the Figures, rather than by a particular reference number. Additionally, reference numbers referring to a particular type of component (as opposed to a particular component) may be shown with a reference number followed by "(typ)" meaning "typical." It will be understood that the configuration of these components will be typical of similar components that may exist but are not shown in the drawing Figures for simplicity and clarity or otherwise similar components that are not labeled with separate reference numbers. Conversely, "(typ)" is not to be construed as meaning the component, element, etc. is typically used for its disclosed function, implement, purpose, etc.
In accordance was aspect of the embodiments describe herein, techniques are disclosed for monitoring errors in high-speed Ethernet links during idle periods. The techniques are implemented at the Ethernet PCS sublayer. The Ethernet PCS sublayer (such as the 10GBASE- R PCS described in clause 49 of IEEE802.3-2012) is implemented in the Ethernet Physical layer (Ethernet PHY). Figure 1 depicts the relationship between the 10GBASE-R PCS and its associated sublayers.
In one embodiment, the techniques are implemented for the new 25 Gb/s standard for Ethernet being developed in the IEEE P802.3by task force. The proposed 25 Gb/s Ethernet employs the a similar PCS as defined by IEEE802.3-2012 clause 49, which defines the
10GBASE-R PCS, except the data rate is 25 Gb/s rather than 10 Gb/s. Figure 1 depicts the relationship between the 10GBASE-R PCS and its associated sublayers.
As illustrated, a 10GBASE-R PCS sublayer 100 interfaces with a reconciliation sublayer 102 that sits below the MAC layer 104 in the Data Link layer. 10GBASE-R PCS sublayer 100 also interfaces with a serial PMA (Physical Media Attachment) sublayer 106 in the 10GBASE-R PHY.
Figure 2 illustrates the structure of an Ethernet Packet format. The fields of the packet include a Preamble 200, a Start Frame Delimiter (SFD) 202, the addresses of the MAC frame's destination address 204 and source address 206, a length or type field 208 to indicate the length or protocol type of the following field that contains the MAC client data 210, a pad field 212 that contains padding if required, and a Frame Check Sequence (FCS) field 214 containing a cyclic redundancy check value to detect errors in a received MAC frame. An optional extension field 216 is added, if required. Of these fields, all are of fixed size except for the MAC Client Data 210, pad field 212 and extension field 216, which may contain an integer number of octets between the minimum and maximum values that are determined by the specific implementation of the MAC.
Figure 3 illustrates the relationship of the reconciliation sublayer 102 and the 10 Gigabit Media Independent Interface (XGMII) 300 to the ISO/IEC (IEEE) OSI reference model. The purpose of the XGMII is to provide a simple, inexpensive, and easy-to-implement
interconnection between the MAC sublayer 104 and PHY 302. A 10 Gigabit Attachment Unit Interface (XAUI) may optionally be used to extend the operational distance of the XGMII with reduced pin count (as defined in clause 47).
Reconciliation sublayer 102 adapts the bit serial protocols of the MAC to the parallel encodings of 10 Gb/s PHYs (when applied to 10GBASE-R). The 10 Gb/s PCS is specified to the XGMII, so if not implemented, a conforming implementation will behave functionally as if the reconciliation sublayer and XGMII were implemented. In one embodiment, the XGMII is proposed to be used for the 25 Gb/s Ethernet as defined in IEEE 802.3by. Optionally, a new 25G-MII (to be defined in clause 106) that is similar to XGMII is implemented (not shown)
Packets transmitted through the XGMII are transferred within the XGMII data stream. The data stream is a sequence of bytes, where each byte conveys either a data octet or control character. The parts of the data are shown in Figure 4, and include an inter-frame, followed by a preamble, and SFD, data, and the EFD. For the XGMII, transmission and reception of each bit and mapping of data octets to lanes is as shown in Figure 5.
The inter-frame <inter-frame> period on an XGMII transmit or receive path is an interval during which no frame data activity occurs. The <inter-frame> corresponding to the MAC interpacket gap begins with the Terminate control character, continues with Idle control characters and ends with the Idle control character prior to a Start control character. The length of the interpacket gap may be changed between the transmitting MAC and receiving MAC by one or more functions (e.g., reconciliation sub-layer (RS) lane alignment, PHY clock rate compensation, or 10GBASE-W data rate adaptation functions). The minimum interpacket gap at the XGMII of the receiving RS is five octets.
The signaling of link status information logically occurs in the <inter-frame> period, as described in IEEE802.3-2012 subclause 46.3.4. IEEE802.3-2012 subclause 46.3.3 describes frame processing when signaling of link status information is initiated or terminated.
Idle control characters {III) are transmitted when idle control characters are received from the XGMII. Idle characters may be added or deleted by the PCS to adapt between clock rates. Ill insertion and deletion occurs in groups of 4. /I/s may be added following idle or ordered sets, but are not added while data is being received. When deleting /I/s, the first four characters after a IT I are not be deleted.
To communicate Low-power idle (LPI) (a special idle-like sequence that may be sent to signal long idle periods), LPI control character /LI/ is sent continuously in place of III. LPI control characters are transmitted when LPI control characters are received from the XGMII. LPI characters may be added or deleted by the PCS to adapt between clock rates in a similar manner to idle control characters. /LI/ insertion and deletion occurs in groups of four. /LI/s may only be added following other LPI characters. The ability to send or receive LPI characters is an optional function.
Figure 6 is a table 600 depicting control codes user by the PCS sublayer. In addition to Idle III and LPI /LI/, Table 600 includes a Start control character ISI, a Terminate control character IT I, an Error control character IEI, and a ordered set control character IQI. The Start control character (ISI) indicates the start of a packet, while the terminate control character (IT/) indicates the end of a packet. The ordered_set control characters (IOI) indicate the start of an ordered_set. The Error /E/ is sent whenever an /E/ is received. It is also sent when invalid blocks are received. The /E/ allows physical sublayers such as the XGXS and PCS to propagate received errors.
The PCS maps XGMII signals into 66-bit blocks, and vice versa, using a 64B/66B coding scheme. The synchronization headers of the blocks allow establishment of block boundaries by the PCS Synchronization process. Blocks are unobservable and have no meaning outside the PCS. The PCS functions ENCODE and DECODE generate, manipulate, and interpret blocks as provided by the rules in IEEE802.3-2012 subclause 49.2.4.
The PCS uses a transmission code to improve the transmission characteristics of information to be transferred across the link and to support transmission of control and data characters. The encodings defined by the transmission code ensure that sufficient transitions are present in the PHY bit stream to make clock recovery possible at the receiver. The encoding also preserves the likelihood of detecting any single or multiple bit errors that may occur during transmission and reception of information. In addition, the synchronization headers of the code enable the receiver to achieve block alignment on the incoming PHY bit stream. The 64B/66B transmission code specified for use in this standard has a high transition density and is a run- length-limited code.
The Ethernet PCS sublayer (such as the 10GBASE-R PCS) include a basic capability for monitoring errors, called BER monitor. This is a function of the PCS that detects errors only in predefined locations in the data stream (-3% of the total bits) in a short time window (125 microseconds). It is aimed at detecting high BER, which typically occurs when the medium is physically disconnected or when the remote link partner breaks the link. When triggered, it is visible to management. However, its BER threshold is 10"4, much worse than the operational requirement, so it provides only an indication of extreme conditions, and does not provide link health assessment.
In accordance with aspects of the embodiments now disclosed, a PCS decoder is used to detect errors during idle times and define a new counter to count these events. This, in addition to CRC error counting at the PCS, provides monitoring of the error rate without the limit of not detecting errors during idle periods.
The PCS decoder has special rules for correctness of data blocks. As a result of these rules, a single error in a sequence of idle characters is practically guaranteed to be detected. For example, the PCS is configured to detect invalid sequences of characters and replace them with "error" characters.
The PCS decoding converts the serial bit stream into octets (bytes), where each octet can be either data or a control character. During idle times the transmitter repeatedly sends Idle {III) control characters. Exiting idle transmission is allowed only by sending "SPD" (start packet delimiter— another control character) followed by data characters. Therefore any error that corrupts an idle character into something other than SPD (idle error) can be detected as an invalid octet. The PCS decoding is specified to convert any block of 64 bits that contains an invalid octet into 8 "error" characters (IEI) so that corrupted data will not reach the MAC.
A single error or a burst of errors that occur within in an idle character (octet value 00) can change it to any other control character. Almost all other control characters are invalid and cause the current 64-bit block to be converted to 8 error characters IEI. The exception is the error character itself. Any such event can be recoded and counted.
IEEE802.3-2012 subclause 49.2.4 contains a full description of the PCS block structure, control character types and rules. Subclause 49.2.13.2.3 includes a definition of R BLOCK TYPE which lists the possible received block types (idle characters are mostly included in blocks of type "C"). In further detail, the definition for R BLOCK TYPE is,
R BLOCK TYPE = {C, S, T, D, E, LI}
This function classifies each 66-bit rx_coded vector as belonging to one of the following types depending on its contents.
Values: C; The vector contains a sync header of 10 and one of the following:
a) A block type field of 0x1 e and eight valid control characters other than IEI; and, if the EEE capability is supported, zero or four of the characters are /LI/; b) A block type field of 0x2d or 0x4b, a valid O code, and four valid control characters;
c) A block type field of 0x55 and two valid O codes.
LI; For EEE capability, the LI type is supported where the vector contains a sync header of 10, a block type field of Oxle, and eight control characters of 0x06 (/LI/).
S; The vector contains a sync header of 10 and one of the following:
a) A block type field of 0x33 and four valid control characters;
b) A block type field of 0x66 and a valid O code;
c) A block type field of 0x78.
T; The vector contains a sync header of 10, a block type field of 0x87, 0x99, Oxaa,
0xb4,0xcc, 0xd2, Oxel or Oxff and all control characters are valid. D; The vector contains a sync header of 01.
E; The vector does not meet the criteria for any other value.
In one embodiment, a control block counter and an error counter are added to the receive state machine defined in clause 49, as shown in a receive state machine diagram 700 Figure 7. Receive state machine diagram 700 is similar to the receive state machine diagram of Figure 49- 17 of IEEE802.3-2012 clause 49, with the addition of a control block counter
(CTRLBLOCK COUNTER) 702 and an error counter (ERROR COUNTER) 704.
Receive state machine diagram 700 shows the state machine that detects legal character sequences. After a reset, the receive state enters RX_INIT (receiver initialization) state 706.
Depending on the R TYPE of the received LBLOCK R, the state will advance to a state RX C (for an R BLOCK TYPE=C), a state RX D (for an R BLOCK TYPE=D), or a state RX E (for an R BLOCK TYPE=E). The state may advance to a state RX T from either state of states RX D and RX E.
Under normal operation, idle characters (R BLOCK TYPE C) keep the state machine in state RX C. While in state RX C, each control block (which includes an idle character or an ordered set of codes) is counted by control block counter 702. The only other valid block type (except for an additional idle) is S, which denotes start of a new frame, and causes transition to state RX D. Any other code cause a transition to state RX E and replacement of the current 8 characters with an "error" value EBLOCK R. While in state RX E, each error is counted by error counter 704.
As another option (not shown), the error counter may be implemented as part of the check performed on received 64-bit blocks, as described in the definition of R BLOCK TYPE above.
It should be noted that the PCS input is scrambled data, and it includes a descrambling function. One known feature of the descrambler is that it translates any single error on its input to 3 errors on its output (also known as error multiplication). Since the PCS decodes the idle characters after descrambling, any error even on the scrambled data would cause 3 errors in the decoder. Accordingly, the error counter is expected to advance three times per single error event. An error counter with a value that is not divisible by 3 may indicate a burst of errors.
Based on the ratio of the error count to the received control block count, a BER may be calculated. Periodically, each of control block counter 702 and error counter 704 may be reset to prevent overflow.
In one embodiment, error counters are 32 bits wide and exposed to network management through the standard MDIO interface. The MDIO defines features useful for these counters, such as atomic reads and clear-on-read. Optionally, a rate of idle errors exceeding some threshold can be specified to break the link, since it signals that the link is unsafe.
Under some conventional approaches, estimation of the BER from error counters is performed, by monitoring the error counters over some period of time, and using the number of bits received during this time (roughly known from the data rate) as the denominator in the BER expression:
Figure imgf000009_0001
For the proposed solution, counting the number of errors over a period does not provide sufficient information to estimate the BER, since this mechanism detects only errors in the idle characters between frames, and the number of idle characters depends on link utilization (or "sparseness" of frames) which is unknown. Therefore the denominator for the expression above is not available. In order to enable calculating the BER, another counter can be added (as discussed above), to estimate the denominator.
The denominator counter could count idle characters (each being 8 bits long), or idle blocks (blocks are 64 bits long, either blocks that contain at least one idle character, or blocks that contain only idle characters, or similar variations). Assuming idle periods are longer than non-idle periods (utilization of <50%), the number of idle characters will be approximately equal to 8 times the number of idle blocks, and most bit errors in the receiver will corrupt 3 idle characters residing in 2 idle blocks, due to error multiplication effect of the descrambler. This can be used to estimate the BER as follows
If counting at the character level:
Figure imgf000009_0002
The factor of 2 or 3 is often considered insignificant and estimation with either character error counting or block error counting in the numerator would practically yield the same result.
An idle character or block counter is expected to advance very quickly, especially when the link is under-utilized (for example, with a 25 Gb/s link that is 100% idle, a 32-bit counter of idle blocks will overflow after 11 seconds, while errors are expected to occur only once per 40 seconds if the BER is 10"12). A counter should be able to accurately count to a high enough value, although the exact value is not important for calculation.
A possible solution is to use a long counter, e.g. 48 bits, but report only the most significant part, e.g., the 32 topmost bits. This can be implemented by various well-known mean in efficient ways. The number of idle characters/blocks received can then be approximated by multiplying the counter value by the appropriate power of 2, e.g. 216. A 48-bit idle blocks counter in an idle 25 Gb/s link can count about 200 hours without overflowing, and the number of blocks can be estimated from the 32 topmost bits with a resolution equivalent to less than 1 millisecond (negligible compared to a period of 40 seconds expected between errors when BER=10"12). On an idle link, this enables estimating the BER with a resolution of 10"12 within a few minutes. With longer measurement times, BER can be estimated with a resolution better than 10"16.
Figure 8 shows an architecture 800 for a network node employing a network chip 802 configured to BER determination during idle periods in accordance with aspects of the embodiments disclosed herein. Network chip 802 comprises PHY (Physical Layer) circuitry 804 including a Physical Coding Sublayer (PCS) module 806, a Physical Medium Attachment (PMA) module 807, a PMD module 808, a BER measurement module 809 including state machine logic 810 for implementing the state machine in Figure 7, a transmitter port 812 including transmitter circuitry 813 and a receiver port 814 including receiver circuitry 815. Network chip 802 further includes a DMA (Direct Memory Access) interface 816, an
Input/Output (I/O) interface comprising a Peripheral Component Interconnect Express (PCIe) interface 818, a MAC (Media Access Channel) module 820 and a Reconciliation Sublayer (RS) module 822. Network node 800 also comprises a System on a Chip (SoC) 824 including a Central Processing Unit (CPU) 826 having one or more processor cores, coupled to a memory interface 828 and a PCIe interface 830 via an interconnect 832. Memory interface 828 is further depicted as being coupled to memory 834. Under a typical configuration, network chip 802, SoC 824 and memory 834 will be mounted on or otherwise operatively coupled to a circuit board 836 that includes wiring traces for coupling these components in communication, as depicted by single lines connecting DMA 816 to memory 834 and PCIe interface 818 to PCIe interface 830 at a PCIe port 838. In one embodiment, MAC module 820 is configured to implement aspects of the MAC layer operations performed that are well-known in the art. Similar, RS module 822 is configured to implement reconciliation sub-layer operations.
During idle periods, BER measurement module 809 is implemented for determining BER. During both data and idle periods, data is exchanged between PHY transmitter and receiver ports 813 and 815 of node 800 and its link partner, as depicted by a link partner 844 including a receiver port 846 and a transmitter port 848. In one embodiment the configuration of node 800 and link partner 844 are similar, and are linked in communication via an Ethernet link 850.
In one embodiment, network chip 802 comprises a 25 Gb/s Ethernet Network Interface Controller (NIC) chip employing a 25GBASE-KR PHY or a 25GBASE-CR PHY. However, the circuitry and components of network chip 802 may also be implemented in other types of chips and components, including SoCs, multi-chip modules, and NIC chips including support for multiple network interfaces (e.g., wired and wireless).
In general, aspects of the idle link error-detection embodiments disclosed herein may be implemented hardware (via, e.g., embedded logic), or via a combination of hardware and software. For example, aspects of the operations performed by the embodiments may be implemented via embedded logic in a NIC, large-scale network interface, or the like.
In addition to implementation in 25 GB/s Ethernet links, aspects of the embodiments disclosed herein may be implemented in other high-speed links. These include, but are not limited to current and future Ethernet links, Peripheral Component Interconnect Express (PCIe) links, Universal Serial Bus (USB) links, Serial ATA (SATA) links, InfiniBand links, RapidIO links, and Intel® OmniPath links. The techniques disclosed herein are particularly advantages for high-speed links that do not employ forward error correction.
Further aspects of the subject matter described herein are set out in the following numbered clauses:
1. A method for monitoring errors during idle states in a high-speed link including a Physical Coding Sublayer (PCS), comprising:
implementing a denominator counter in a receive state machine of the PCS;
implementing an error counter in the receive state machine to count errors while in the idle state;
while the high-speed Ethernet link is operating in an idle state,
counting one of idle characters or idle blocks received over the high-speed link with the denominator counter
counting one of character errors or block errors that are detected in the idle characters or idle blocks received over the high-speed link with the error counter; and estimating a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter.
2. The method of clause 1, wherein the high-speed link comprises an Ethernet link including an Ethernet PCS.
3. The method of clause 2, wherein the Ethernet link includes an Ethernet Physical Layer (PHY) including a PCS, and wherein the Ethernet PHY does not include a Forward Error Correction (FEC) sublayer.
4. The method of clause 2, wherein the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
5. The method of clause 2, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
6. The method of any of the preceding clauses, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000012_0001
7. The method of any of the preceding clauses, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000012_0002
8. The method of any of the preceding clauses, further comprising:
employing an «-bit counter in the denominator counter;
reporting out m topmost bits of the «-bit counter; and
multiplying the m topmost bits by 2 to determine one of a number of idle characters or idle blocks received.
9. The method of any of the preceding clauses, further comprising detecting that a BER during operation in an idle state exceeds a threshold, and in response thereto, one of resetting the link or disabling the link.
10. An apparatus configured to communicate with a link partner over a high-speed link, comprising:
Physical Layer (PHY) circuitry and logic, including,
a transmitter port including transmitter circuitry; and
a receiver port including receiver circuitry; and
Physical Coding Sublayer (PCS) circuitry and logic including a receive state machine having a denominator counter and an error counter, wherein when the high-speed link is operating in an idle state the PCS circuitry and logic is configured to,
count one of idle characters or idle blocks received at the receiver port with the denominator counter;
count one of character errors or block errors that are detected in the idle characters or idle blocks received at the receiver port; and
estimate a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter. 11. The apparatus of clause 10, wherein the high-speed link comprises an Ethernet link.
12. The apparatus of clause 11, wherein the PHY Layer circuitry does not include circuitry for a Forward Error Correction (FEC) sublayer.
13. The apparatus of clause 11, wherein the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
14. The apparatus of clause 11, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
15. The apparatus of any of clauses 10-14, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000013_0001
16. The apparatus of any of clauses 10-15, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000013_0002
17. The apparatus of any of clauses 10-16, wherein the PCS circuitry and logic includes an n- bit counter and is configured to:
employ the «-bit counter in the denominator counter;
output m topmost bits of the «-bit counter; and
multiply the m topmost bits by 2 to determine one of a number of idle characters or idle blocks received.
18. The apparatus of any of clauses 10-17, wherein the PHY Layer circuitry and logic is further configured to detect that a BER while operating the high-speed link in an idle state exceeds a threshold, and in response thereto, one of reset the link or disable the link.
19. An apparatus configured to communicate with a link partner over a high-speed link, comprising: Physical Layer (PHY) circuitry and logic, including,
a transmitter port including transmitter circuitry; and
a receiver port including receiver circuitry; and
Physical Coding Sublayer (PCS) circuitry and logic including a receive state machine having a denominator counter and an error counter;
a Media Access Control (MAC) module;
a Reconciliation Sublayer (RS) module; and
an Input/Output (I/O) interface;
wherein when the high-speed link is operating in an idle state the PCS circuitry and logic is configured to,
count one of idle characters or idle blocks received at the receiver port with the denominator counter;
count one of character errors or block errors that are detected in the idle characters or idle blocks received at the receiver port; and
estimate a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter.
20. The apparatus of clause 19, wherein the PHY Layer circuitry does not include circuitry for a Forward Error Correction (FEC) sublayer.
21. The apparatus of clause 19, wherein the high-speed link comprises an Ethernet link. 22. The apparatus of clause 21, wherein the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
23. The apparatus of clause 21, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
24. The apparatus of any of clauses 19-23, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000014_0001
25. The apparatus of any of clauses 19-24, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000014_0002
Although some embodiments have been described in reference to particular
implementations, other implementations are possible according to some embodiments.
Additionally, the arrangement and/or order of elements or other features illustrated in the drawings and/or described herein need not be arranged in the particular way illustrated and described. Many other arrangements are possible according to some embodiments.
In each system shown in a figure, the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar. However, an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein. The various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
In the description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Rather, in particular embodiments, "connected" may be used to indicate that two or more elements are in direct physical or electrical contact with each other. "Coupled" may mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
An embodiment is an implementation or example of the inventions. Reference in the specification to "an embodiment," "one embodiment," "some embodiments," or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions. The various appearances "an embodiment," "one embodiment," or "some embodiments" are not necessarily all referring to the same embodiments.
Not all components, features, structures, characteristics, etc. described and illustrated herein need be included in a particular embodiment or embodiments. If the specification states a component, feature, structure, or characteristic "may", "might", "can" or "could" be included, for example, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to "a" or "an" element, that does not mean there is only one of the element. If the specification or claims refer to "an additional" element, that does not preclude there being more than one of the additional element.
An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. In addition, embodiments of the present description may be implemented not only within a semiconductor chip such as a NIC, but also within non-transient machine-readable media. For example, the designs described above may be stored upon and/or embedded within non-transient machine readable media associated with a design tool used for designing semiconductor devices. Examples include a netlist formatted in the VHSIC Hardware Description Language (VHDL) language, Verilog language or SPICE language, or other Hardware Description Language. Some netlist examples include: a behavioral level netlist, a register transfer level (RTL) netlist, a gate level netlist and a transistor level netlist. Machine-readable media also include media having layout information such as a GDS-II file. Furthermore, netlist files or other machine-readable media for semiconductor chip design may be used in a simulation environment to perform the methods of the teachings described above.
The operations and functions performed by various components described herein may be implemented by software running on a processing element, via embedded hardware or the like, or any combination of hardware and software. Such components may be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, ASICs, DSPs, etc.), embedded controllers, hardwired circuitry, hardware logic, etc. Software content (e.g., data, instructions, configuration information, etc.) may be provided via an article of manufacture including computer-readable or machine-readable non-transitory storage medium, which provides content that represents instructions that can be executed. The content may result in a computer performing various functions/operations described herein.
As used herein, a list of items joined by the term "at least one of can mean any combination of the listed terms. For example, the phrase "at least one of A, B or C" can mean A; B; C; A and B; A and C; B and C; or A, B and C.
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the drawings. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.

Claims

CLAIMS What is claimed is:
1. A method for monitoring errors during idle states in a high-speed link including a Physical Coding Sublayer (PCS), comprising:
implementing a denominator counter in a receive state machine of the PCS;
implementing an error counter in the receive state machine to count errors while in the idle state;
while the high-speed Ethernet link is operating in an idle state,
counting one of idle characters or idle blocks received over the high-speed link with the denominator counter
counting one of character errors or block errors that are detected in the idle characters or idle blocks received over the high-speed link with the error counter; and estimating a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter.
The method of claim 1 , wherein the high-speed link comprises an Ethemet link including Ethernet PCS.
3. The method of claim 2, wherein the Ethemet link includes an Ethernet Physical Layer (PHY) including a PCS, and wherein the Ethernet PHY does not include a Forward Error Correction (FEC) sublayer.
4. The method of claim 2, wherein the Ethemet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
5. The method of claim 2, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
6. The method of any of the preceding claims, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000017_0001
7. The method of any of the preceding claims, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000018_0001
8. The method of any of the preceding claims, further comprising:
employing an n-bit counter in the denominator counter;
reporting out m topmost bits of the n-bit counter; and
multiplying the m topmost bits by 2 to determine one of a number of idle characters or idle blocks received.
9. The method of any of the preceding claims, further comprising detecting that a BER during operation in an idle state exceeds a threshold, and in response thereto, one of resetting link or disabling the link.
10. An apparatus configured to communicate with a link partner over a high-speed link, comprising:
Physical Layer (PHY) circuitry and logic, including,
a transmitter port including transmitter circuitry; and
a receiver port including receiver circuitry; and
Physical Coding Sublayer (PCS) circuitry and logic including a receive state machine having a denominator counter and an error counter,
wherein when the high-speed link is operating in an idle state the PCS circuitry and logic is configured to,
count one of idle characters or idle blocks received at the receiver port with the denominator counter;
count one of character errors or block errors that are detected in the idle characters or idle blocks received at the receiver port; and
estimate a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter.
11. The apparatus of claim 10, wherein the high-speed link comprises an Ethernet link.
12. The apparatus of claim 11, wherein the PHY Layer circuitry does not include circuitry for a Forward Error Correction (FEC) sublayer.
13. The apparatus of claim 11, wherein the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
14. The apparatus of claim 11, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
15. The apparatus of any of claims 10-14, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000019_0001
16. The apparatus of any of claims 10-15, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000019_0002
17. The apparatus of any of claims 10-16, wherein the PCS circuitry and logic includes an n- bit counter and is configured to:
employ the «-bit counter in the denominator counter;
output m topmost bits of the «-bit counter; and
multiply the m topmost bits by 2 to determine one of a number of idle characters or idle blocks received.
18. The apparatus of any of claims 10-17, wherein the PHY Layer circuitry and logic is further configured to detect that a BER while operating the high-speed link in an idle state exceeds a threshold, and in response thereto, one of reset the link or disable the link.
19. An apparatus configured to communicate with a link partner over a high-speed link, comprising:
Physical Layer (PHY) circuitry and logic, including,
a transmitter port including transmitter circuitry; and
a receiver port including receiver circuitry; and
Physical Coding Sublayer (PCS) circuitry and logic including a receive state machine having a denominator counter and an error counter; a Media Access Control (MAC) module;
a Reconciliation Sublayer (RS) module; and
an Input/Output (I/O) interface;
wherein when the high-speed link is operating in an idle state the PCS circuitry and logic is configured to,
count one of idle characters or idle blocks received at the receiver port with the denominator counter;
count one of character errors or block errors that are detected in the idle characters or idle blocks received at the receiver port; and
estimate a Bit Error Rate (BER) when operating in the idle state as a function of outputs from the denominator counter and error counter.
20. The apparatus of claim 19, wherein the PHY Layer circuitry does not include circuitry for a Forward Error Correction (FEC) sublayer.
The apparatus of claim 19, wherein the high-speed link comprises an Ethernet link.
22. The apparatus of claim 21, wherein the Ethernet link comprises one of a 10 Gigabits per second (Gb/s), 25 Gb/s, or 40 Gb/s Ethernet link.
23. The apparatus of claim 21, wherein the PCS has a block structure in accordance with IEEE802.3-2012 subclause 49.
24. The apparatus of any of claims 19-23, wherein the denominator counter is a control block counter that counts idle blocks, and the BER is estimated by the equation,
Figure imgf000020_0001
25. The apparatus of any of claims 19-24, wherein the denominator counter is an idle character counter that counts idle characters, and the BER is estimated by the equation,
Figure imgf000020_0002
PCT/US2016/021366 2015-03-10 2016-03-08 Monitoring errors during idle time in ethernet pcs WO2016144953A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562130980P 2015-03-10 2015-03-10
US62/130,980 2015-03-10

Publications (1)

Publication Number Publication Date
WO2016144953A1 true WO2016144953A1 (en) 2016-09-15

Family

ID=56879325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/021366 WO2016144953A1 (en) 2015-03-10 2016-03-08 Monitoring errors during idle time in ethernet pcs

Country Status (1)

Country Link
WO (1) WO2016144953A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018074933A1 (en) * 2016-10-18 2018-04-26 Numascale As Programmable cache coherent node controller
WO2018203754A1 (en) * 2017-05-02 2018-11-08 Numascale As Cache coherent node controller for scale-up shared memory systems
WO2020029892A1 (en) * 2018-08-07 2020-02-13 华为技术有限公司 Method for receiving code block stream, method for sending code block stream and communication apparatus
CN113938247A (en) * 2020-07-14 2022-01-14 中国移动通信有限公司研究院 Code block processing method, node and medium
WO2023137666A1 (en) * 2022-01-20 2023-07-27 华为技术有限公司 Data transmission method and data transmission apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6690650B1 (en) * 1998-02-27 2004-02-10 Advanced Micro Devices, Inc. Arrangement in a network repeater for monitoring link integrity by monitoring symbol errors across multiple detection intervals
US20050005189A1 (en) * 2002-04-25 2005-01-06 Lior Khermosh Forward error correction coding in ethernet networks
KR20050062339A (en) * 2003-12-18 2005-06-23 한국전자통신연구원 Method for effective configuration methodology of fec in epon
US20080228941A1 (en) * 2003-11-06 2008-09-18 Petre Popescu Ethernet Link Monitoring Channel
US20140258813A1 (en) * 2012-07-10 2014-09-11 Kent C. Lusted Network System Configured for Resolving Forward Error Correction During A Data Mode

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6690650B1 (en) * 1998-02-27 2004-02-10 Advanced Micro Devices, Inc. Arrangement in a network repeater for monitoring link integrity by monitoring symbol errors across multiple detection intervals
US20050005189A1 (en) * 2002-04-25 2005-01-06 Lior Khermosh Forward error correction coding in ethernet networks
US20080228941A1 (en) * 2003-11-06 2008-09-18 Petre Popescu Ethernet Link Monitoring Channel
KR20050062339A (en) * 2003-12-18 2005-06-23 한국전자통신연구원 Method for effective configuration methodology of fec in epon
US20140258813A1 (en) * 2012-07-10 2014-09-11 Kent C. Lusted Network System Configured for Resolving Forward Error Correction During A Data Mode

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018074933A1 (en) * 2016-10-18 2018-04-26 Numascale As Programmable cache coherent node controller
US11157405B2 (en) 2016-10-18 2021-10-26 Numascale As Programmable cache coherent node controller
WO2018203754A1 (en) * 2017-05-02 2018-11-08 Numascale As Cache coherent node controller for scale-up shared memory systems
US10956329B2 (en) 2017-05-02 2021-03-23 Numascale As Cache coherent node controller for scale-up shared memory systems having interconnect switch between a group of CPUS and FPGA node controller
WO2020029892A1 (en) * 2018-08-07 2020-02-13 华为技术有限公司 Method for receiving code block stream, method for sending code block stream and communication apparatus
US11251905B2 (en) 2018-08-07 2022-02-15 Huawei Technologies Co., Ltd. Method for receiving code block stream, method for transmitting code block stream, and communications apparatus
CN113938247A (en) * 2020-07-14 2022-01-14 中国移动通信有限公司研究院 Code block processing method, node and medium
WO2023137666A1 (en) * 2022-01-20 2023-07-27 华为技术有限公司 Data transmission method and data transmission apparatus

Similar Documents

Publication Publication Date Title
US11016922B2 (en) Interface for bridging out-of-band information from a downstream communication link to an upstream communication link
WO2016144953A1 (en) Monitoring errors during idle time in ethernet pcs
US6873630B1 (en) Method and apparatus for a multi-gigabit ethernet architecture
US9344219B2 (en) Increasing communication safety by preventing false packet acceptance in high-speed links
US8370704B2 (en) Cable interconnection techniques
EP2223457B1 (en) Long-reach ethernet for 1000base-t and 10gbase-t
US7751442B2 (en) Serial ethernet device-to-device interconnection
USRE46523E1 (en) Method and system for a multi-rate gigabit media independent interface
US20100189168A1 (en) System, method and device for autonegotiation
US9281970B2 (en) Error burst detection for assessing reliability of a communication link
KR101750053B1 (en) Low power idle signaling for gigabit media independent interfaces operating in legacy modes
CA2869236C (en) Ethernet point to point link incorporating forward error correction
JP2014116943A (en) Eee refresh and wake signaling for 100gbase-kp4
US8145973B2 (en) Data processing apparatus and method, and program
US20050013317A1 (en) Method and system for an integrated dual port gigabit Ethernet controller chip
EP1668832B1 (en) Scalable device-to-device interconnection
CN112333024A (en) Adaptation device for fusing high-speed network link layer and 100G Ethernet coding layer
Gad et al. Implementation of gigabit ethernet standard using fpga
Alekya et al. High Speed Design of Ethernet MAC

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16762341

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16762341

Country of ref document: EP

Kind code of ref document: A1