« PreviousContinue »
PROTOCOL PROCESSING STACK FOR USE WITH INTELLIGENT NETWORK INTERFACE DEVICE
CROSS REFERENCE TO RELATED
 The present application claims the benefit under 35 U.S.C. §120 of U.S. patent application Ser. No. 09/514,425, filed Feb. 28, 2000, which in turn claims the benefit under 35 U.S.C. §120 of: a) U.S. patent application Ser. No. 09/141,713, filed Aug. 28, 1998, now U.S. Pat. No. 6,389, 479, which in turn claims the benefit under 35 U.S.C. §119 of provisional application 60/098,296, filed Aug. 27, 1998; b) U.S. patent application Ser. No. 09/067,544, filed Apr. 27, 1998, now U.S. Pat. No. 6,226,680, which in turn claims the benefit under 35 U.S.C. §119 of provisional application 60/061,809, filed Oct. 14, 1997; and c) U.S. patent application Ser. No. 09/384,792, filed Aug. 27, 1999, which in turn claims the benefit under 35 U.S.C. §119 of provisional application 60/098,296, filed Aug. 27, 1998.
 The present application also claims the benefit under 35 U.S.C. §120 of U.S. patent application Ser. No. 09/464,283, filed Dec. 15, 1999, which in turn claims the benefit under 35 U.S.C. §120 of U.S. patent application Ser. No. 09/439,603, filed Nov. 12, 1999, now U.S. Pat. No. 6,247,060, which in turn claims the benefit under 35 U.S.C. §120 of U.S. patent application Ser. No. 09/067,544, filed Apr. 27, 1998, now U.S. Pat. No. 6,226,680, which in turn claims the benefit under 35 U.S.C. §119 of provisional application 60/061,809, filed Oct. 14, 1997.
 The subject matter of all of the applications listed above and patents listed above is incorporated herein by reference.
REFERENCE TO COMPACT DISC APPENDIX
 The Compact Disc Appendix (CD Appendix), which is a part of the present disclosure, includes three folders, designated CD Appendix A, CD Appendix B, and CD Appendix C on the compact disc. CD Appendix A contains a hardware description language (verilog code) description of an embodiment of a receive sequencer. CD Appendix B contains microcode executed by a processor that operates in conjunction with the receive sequencer of CD Appendix A. CD Appendix C contains a device driver executable on the host as well as ATCP code executable on the host. A portion of the disclosure of this patent document contains material (other than any portion of the "free BSD" stack included in CD Appendix C) which is subject to copyright protection. The copyright owner of that material has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and Trademark Office patent files or records, but otherwise reserves all copyright rights.
 The present invention relates to the management of information communicated via a network, including protocol processing.
 Various individuals, companies and governments have worked for many years to provide communication over
computer networks. As different computer and network architectures have been created, many types of protocols have evolved to facilitate that communication. Conventionally, network messages contain information regarding a number of protocol layers that allow information within the messages to be directed to the correct destination and decoded according to appropriate instructions, despite substantial differences that may exist between the computers or other devices transmitting and receiving the messages. Processing of these messages is usually performed by a central processing unit (CPU) running software instructions designed to recognize and manipulate protocol information contained in the messages.
 With the increasing prevalence of network communication, a large portion of the CPU's time may be devoted to such protocol processing, interfering with other tasks the CPU may need to perform. Multiple interrupts to the CPU can also be problematic when transferring many small messages or for large data transfers, which are conventionally divided into a number of packets for transmission over a network.
 In accordance with the present invention, means for offloading some of the most time consuming protocol processing from a host CPU to a specialized device designed for network communication processing are provided. The host has a protocol processing stack that provides instructions not only to process network messages but also to allocate processing of certain network messages to the specialized network communication device. By allocating some of the most common and time consuming network processes to the network communication device, while retaining the ability to handle less time intensive and more varied processing on the host stack, the network communication device can be relatively simple and cost effective. The host CPU, operating according to the instructions from the stack, and the specialized network communication device together determine whether and to what extent a given message is processed by the host CPU or by the network communication device.
BRIEF DESCRIPTION OF THE DRAWINGS
 FIG. 1 is a schematic plan view of a host computer having an intelligent network interface card or communication processing device (INIC/CPD) connected to a remote host via a network.
 FIG. 2 is a schematic plan view of a protocol processing stack of the present invention passing a connection context between host storage and the INIC/CPD.
 FIG. 3 is a diagram of a general method employed to process messages received by the host computer via the INIC/CPD.
 FIG. 4 illustrates a handout of the connection context from the host protocol processing stack to the INIC/CPD via a miniport driver installed in the host.
 FIG. 5 shows a return of the connection context to the host protocol processing stack from the INIC/CPD via a miniport driver installed in the host.
 FIG. 6 diagrams a control mechanism for transmitting a message via the fast-path.
 FIG. 7 diagrams a control mechanism for receiving a message via the fast-path.
DESCRIPTION OF THE PREFERRED
 Referring now to FIG. 1, the present invention can operate in an environment including a host computer shown generally at 20 connected to a remote host 22 via a network 25. The host 20 includes a central processing unit (CPU) 28 and storage 35, while an intelligent network interface card or communication processing device (INIC/CPD) 30 provides an interface between the host and the network 25. A computer is defined in the present invention to be a device including a CPU, a memory and instructions for running the CPU. The network 25 is a medium for transmission of information from one computer to another, such as conductive wires, optical fibers or wireless space, including any supporting hardware or software such as switches and routers. Network implementations include local area networks, wide area networks, telecommunication networks and the Internet. The INIC/CPD 30 is depicted on a border of host 20 because the INIC/CPD provides a network interface that may be added with an adapter card, for example, or integrated as a part of the host computer. A bus 33 such as a peripheral component interface (PCI) bus provides a connection within the host 20 between the CPU 28, the INIC/CPD 30, and a storage device 35 such as a semiconductor memory or disk drive, along with any related controls.
 Referring additionally to FIG. 2, the host CPU 28 runs a protocol processing stack 44 of instructions stored in storage 35, the stack including a data link layer 36, network layer 38, transport layer 40, upper layer 46 and an upper layer interface 42. A general description of these protocol layers can be found in the book by W. Richard Stevens entitled TCP/IP Illustrated, Volume 1 (13th printing, 1999), which is incorporated herein by reference. The upper layer 46 may represent a session, presentation and/or application layer, depending upon the particular protocol being employed and message communicated. The upper layer interface 42, along with the CPU 28 and any related controls can send or retrieve data to or from the upper layer 46 or storage 35, as shown by arrow 48. The upper layer interface 42 may be called a Transport driver interface (TDI), for example, in accord with Microsoft terminology. A connection context 50 has been created, as will be explained below, the context summarizing various features of a message connection, such as the protocol types, source and destination addresses and status of the message. The context 50 may be passed between an interface for the session layer 42 and the INIC/CPD 30, as shown by arrows 52 and 54, and stored as a communication control block (CCB) of information in either an INIC/CPD 30 memory or storage 35.
 When the INIC/CPD 30 holds a CCB defining a particular connection, data received by the INIC/CPD from the network and pertaining to the connection is referenced to that CCB and can then be sent directly to storage 35 according to a fast-path 58, bypassing sequential protocol processing by the data link 36, network 38 and transport 40 layers. Transmitting a message, such as sending a file from storage 35 to remote host 22, can also occur via the fast-path 58, in which case the context for the file data is added by the INIC/CPD 30 referencing the CCB, rather than by sequen
tially adding headers during processing by the transport 40, network 38 and data link 36 layers. The DMA controllers of the INIC/CPD 30 can perform these message transfers between INIC/CPD and storage 35.
 The INIC/CPD 30 can collapse multiple protocol stacks each having possible separate states into a single state machine for fast-path processing. The INIC/CPD 30 does not handle certain exception conditions in the single state machine, primarily because such conditions occur relatively infrequently and to deal with them on the INIC/CPD would provide little performance benefit to the host. A response to such exceptions can be INIC/CPD 30 or CPU 28 initiated. The INIC/CPD 30 deals with exception conditions that occur on a fast-path CCB by passing back or flushing to the host protocol stack 44 the CCB and any associated message frames involved, via a control negotiation. The exception condition is then processed in a conventional manner by the host protocol stack 44. At some later time, usually directly after the handling of the exception condition has completed and fast-path processing can resume, the host stack 44 hands the CCB back to the INIC/CPD. This fallback capability enables most performance-impacting functions of the host protocols to be quickly processed by the specialized INIC/ CPD hardware, while the exceptions are dealt with by the host stacks, the exceptions being so rare as to negligibly effect overall performance.
 FIG. 3 diagrams a general flow chart for messages sent to the host via the network according to the current invention. A large TCP/IP message such as a file transfer may be received by the host from the network in a number of separate, approximately 64 KB transfers, each of which may be split into many, approximately 1.5 KB frames or packets for transmission over a network. Novel NetWare® protocol suites running Sequenced Packet Exchange Protocol (SPX) or NetWare® Core Protocol (NCP) over Internetwork Packet Exchange (IPX) work in a similar fashion. Another form of data communication which can be handled by the fast-path is Transaction TCP (hereinafter T/TCP or TTCP), a version of TCP which initiates a connection with an initial transaction request after which a reply containing data may be sent according to the connection, rather than initiating a connection via a several-message initialization dialogue and then transferring data with later messages. In general, any protocol for which a connection can be set up to define parameters for a message or plurality of messages between network hosts may benefit from the present invention. In any of the transfers typified by these protocols, each packet conventionally includes a portion of the data being transferred, as well as headers for each of the protocol layers and markers for positioning the packet relative to the rest of the packets of this message.
 When a message packet or frame is received 47 from a network by the INIC/CPD, it is first validated by a hardware assist. This includes determining the protocol types of the various layers of the packet, verifying relevant checksums, and summarizing 57 these findings into a status word or words. Included in these words is an indication whether or not the frame is a candidate for fast-path data flow. Selection 59 of fast-path candidates is based on whether the host may benefit from this message connection being handled by the INIC/CPD, which includes determining whether the packet has header bytes denoting particular protocols, such as TCP/IP or SPX/IPX for example. The
typically small percentage of frames that are not fast-path candidates are sent 61 to the host protocol stacks for slow-path protocol processing. Subsequent network microprocessor work with each fast-path candidate determines whether a fast-path connection such as a TCP or SPX CCB is already extant for that candidate, or whether that candidate may be used to set up a new fast-path connection, such as for a TTCP/IP transaction. The validation provided by the INIC/CPD provides advantages whether a frame is processed by the fast-path or a slow-path, as only error free, validated frames are processed by the host CPU even for the slow-path processing.
 All received message frames which have been determined by the INIC/CPD hardware assist to be fast-path candidates are examined 53 by the network microprocessor or INIC comparator circuits to determine whether they match a CCB held by the INIC/CPD. Upon confirming such a match, and assuming no exception conditions exist, the INIC/CPD removes lower layer headers and sends 69 the remaining application data from the frame directly into its final destination in the host using direct memory access (DMA) units of the INIC/CPD. This operation may occur immediately upon receipt of a message packet, for example when a TCP connection already exists and destination buffers have been negotiated, or it may first be necessary to process an initial header to acquire a new set of final destination addresses for this transfer. In this latter case, the INIC/CPD will queue subsequent message packets while waiting for the destination address, and then DMA the queued application data to that destination. The final destination addresses may be provided as a scatter-gather list of host buffer address and length pairs. For a Microsoft type operating system and stack 44, the scatter gather list is a memory descriptor data list (MDL).
 A fast-path candidate that does not match a CCB may be used to set up a new fast-path connection, by sending 65 the frame to the host for sequential protocol processing. In this case, the host uses this frame to create 51 a CCB, which is then passed to the INIC/CPD to control subsequent frames on that connection. The CCB, which is cached 67 in the INIC/CPD, includes control and state information pertinent to all protocols that would have been processed had conventional software layer processing been employed. The CCB also contains storage space for per-transfer information used to facilitate moving application-level data contained within subsequent related message packets directly to a host application in a form available for immediate usage. The INIC/CPD takes command of connection processing upon receiving a CCB for that connection from the host.
 As mentioned above, the present invention improves system performance by offloading TCP/IP data processing from the host protocol stack to the INIC/CPD. Since only the data movement portion of the protocol stack is offloaded, TCP control processing generally remains on the host protocol stack. In addition, the host protocol stack also handles TCP exception processing, such as retransmissions. Leaving TCP control and exception processing on the host protocol stack has the advantage of giving the operating system complete control over the TCP connection. This is convenient because the operating system may choose not to hand out a connection to the network communication device for various reasons. For example, if someone wishes to monitor network frames on the host, the host protocol stack
can be programmed to handle all TCP connections, so that no packets are processed on the INIC/CPD. A second advantage to leaving TCP control and exception processing on the host protocol stack is that this greatly simplifies the complexity of operations required by the INIC/CPD, which can be made from an inexpensive application specific integrated circuit (ASIC) as opposed to an expensive CPU.
 In order for a connection to be handled by both the host protocol stack 44 for control and exception conditions, and by the INIC/CPD 30 for data movement, the connection context is made to migrate between the host and the INIC/ CPD. A CCB, which contains the set of variables used to represent the state of a given TCP connection, provides the mechanism for this migration. Transfer of a CCB from the host to the INIC/CPD is termed a connection handout, and transfer of a CCB from the INIC/CPD back to the host is termed a connection flush. This transfer may occur several times during the course of a TCP connection as the result of dropped packets or other exceptions, which are discussed below. Once a connection handout occurs, the INIC/CPD handles all TCP processing, according to the fast-path mode. Any message transmissions occurring while in the fast-path mode are referred to as fast-path sends. Likewise, any message receptions that occur while in the fast-path mode are referred to as fast-path receives.
 A portion of the CCB corresponds to a conventional TCP control block, containing items such as sequence numbers and ports, as well as lower protocol values such as IP addresses and the first-hop MAC addresses. A list of variables for such a conventional TCP control block can be found in the book by Gary R. Wright and W. Richard Stevens entitled TCP/IP Illustrated, Volume 2 (7th Edition, 1999), which is incorporated by reference herein, on pages 803-805.
 In addition to those TCP variables, a number of variables are provided in the CCB for maintaining state information involving the present invention. A first of these variables, a character termed conn_nbr, denotes the connection number for this CCB. The INIC/CPD 30 may maintain, for example, 256 connections, so that the conn_nbr delineates which of those connections is defined by this CCB. Another CCB-specific variable is termed hosttcbaddr, which lists the address in the host for this particular CCB. This address is used when the CCB is returned from the INIC/ CPD to the host. For accelerated processing of the most active connections, the INIC/CPD 30 stores the connections in a hash table in SRAM. A CCB variable termed Hash Value gives a hash table offset for the CCB, which is a hash of the source and destination IP addresses, and source and destination TCP ports for the connection.
 Another character, termed buffjstate, tells whether a CCB that has been cached in SRAM matches the corresponding CCB stored in DRAM. After processing of a frame or burst of frames against an SRAM cached connection, the state of the CCB is changed, which is indicated by the buff_state character. When the cached connection is flushed back by DMA to DRAM, replacing the CCB held in DRAM with the SRAM CCB having updated status, the character buff_state is set clean.
 Additional variables contained in a CCB include a character termed rcv_state, which denotes the status of a receive finite state machine for the CCB, and a character
termed xmt_state, which denotes the status of a transmit finite state machine for the CCB. Both of these state machines pertain to fast path processing by the INIC/CPD 30. In other words, the state of a fast path receive state machine for a given CCB can be defined by a number of different values indicated by the setting of the rcv_state character, and the state of a fast path transmit state machine for that CCB can be likewise be defined by the setting of the xmt_state character. Events processed against the receive and transmit state machines are denoted in the CCB by characters labeled rcv_evts and xmt_evts, respectively. These event characters offer a history of events that have transpired as well as the current events affecting those state machines. For example, the rcv_evts character may contain eight bits defining previous events and another eight bits defining current events, with the xmt-evts character similarly apportioned.
 Also contained in a CCB are variables associated with frames that have been received by the INIC/CPD 30 corresponding to the connection. For example, fast path received frames may accumulate in the host while the INIC/CPD 30 is waiting for an MDL delineating a host destination for the received message. A CCB field termed RcvQ[RCV_MAX] offers a number of thirty-two-bit words for storing pointers to such frames in DRAM, essentially forming a receive queue. A CCB variable termed OHIO (for overflow input/output pointers), offers information corresponding to the RcvQ, such as pointers to the last frame in and first frame out, while a variable termed QdCnt indicates the number of frames in the RcvQ.
 A number of CCB variables pertain to the MDL that has been provided for storing a received message. A character termed RHHandle is used to report to the host a command that has been completed by the INIC/CPD 30 regarding that MDL. RNxtDAdd is a CCB field that is used to denote the next scatter/gather address list to be acquired from DRAM in the INIC/CPD 30 for storage according to the MDL. The variable RCurBuff describes the current buffer of the MDL for storing data, and RCurLen tells the length of that buffer. Similarly, the variable RNxtBuff tells the next receive buffer from the MDL for storing data, and RNxtLen tells the length of that buffer. RTotLen is used to designate the total length of the MDL, which is reduced as data is stored in the buffers designated by the MDL.
 The CCB similarly keeps track of buffer queues during transmission of a message. The variable XNxtDAdd pertains to the next address in INIC/CPD 30 DRAM from which to acquire a scatter/gather list of data to be sent over a network, while XTotLen provides the total length of the data to be sent, which is reduced as data is sent. The variable XCurBuff describes the current host buffer from which to send data, and XCurLen tells the length of that buffer. Similarly, the variable XNxtBuff tells the next host buffer from which data is acquired, and XNxtLen tells the length of that buffer.
 Some CCB variables pertain to commands sent from the host stack 44 to the INIC/CPD 30 during transmission of a message. Several commands sent by the host regarding a particular CCB may be processed at one time by the INIC/CPD 30, and the CCB maintains variables keeping track of those commands. A variable termed XRspSN holds a TCP sequence number for each message that has been sent
over a network. This TCP sequence number is used for matching with an acknowledgement (ACK) from the remote host of receipt of that transmission. A variable termed XHHandle provides a handle or DRAM address of the host regarding a particular command, so that for example upon receiving such an ACK the INIC/CPD can notify the host. CCB variables that keep track of commands being processed by the INIC/CPD include XCmdln, which tells the next command storage slot, XCmdOut, which describes the command to be executed, and XCmd2Ack, which points to commands that have been sent but not yet ACKed. XCmdCnts lists the number of commands currently being processed and commands that have been sent but not yet ACKed. XmtQ provides a queued list of all the commands being processed by the INIC/CPD.
 The CCB also contains a couple of fields for IP and TCP checksums, termed ip_ckbase and tcp_ckbase, respectively. Fast-path transmission of a message occurs with the INIC/CPD prepending protocol headers derived from the CCB to message data provided by the host for the CCB. The ip_ckbase and tcp_ckbase offer the possibility of adjusting the base checksums provided by the host for prepending to the data along with the headers.
 As mentioned above, fast-path operations can be divided into four categories: handout, flush, send and receive. These fast-path operations may be implemented in the form of a generic Microsoft Task Offload (TCP_TASK_OFFLOAD), which may be independent from the specific hardware of the INIC/CPD 30. For the currently preferred implementation, hardware-specific code is placed in the NDIS miniport driver. Implementations for other protocol processing stacks, such as for Unix, Linux, Novel or Macintosh operating systems, may also be hardwareindependent. The present invention illustrates a Microsoft stack implementation since it involves one of the most popular operating systems, and substantial improvements are provided. The description below illustrates the modifications required to integrate the four basic fast-path operations into the Microsoft TCP/IP protocol processing stack. Also defined is the format of the TCP_TASK_OFFLOAD as well as miscellaneous issues associated with these changes.
 Support for the fast-path offload mechanisms requires the definition of a new type of TCP_TASK_OFFLOAD. As with other task offloads, TCP will determine the capabilities of the NDIS miniport by submitting an OID_TCP_TASK_OFFLOAD OID to the driver.
 Fast-path information is passed between the protocol processing stack 44 and the miniport driver 70 as media specific information in an out-of-band data block of a packet descriptor. There are two general fast-path TCP_TASK_OFFLOAD structures—commands and frames. The TCP_OFFLOAD_COMMAND structure contains fast-path information that is being sent from the TCPIP driver to the miniport. The TCP_OFFLOAD_FRAME structure contains fast-path information being sent from the miniport to the TCPIP driver. The header file that defines the fast-path TCP_TASK_OFFLOAD mechanism is described on a later page.
 Six types of offload commands are defined below:
 1] TCP_OFFLOAD_HANDOUTl (this is the first phase of a two-phase handshake used in the connection handout);