« PreviousContinue »
//kill unwanted thread-~x_^108
//compute thread-local-storage pointers—10!
User's initialization code
//invisible barrier—v^i 12
Void_main () —
//kill unwanted thread^-\^116
/*invisible call to _spaceless_preamble() at
User's main code-x^i20
Insert a call to initialization code in main
Insert compiler generated (single execution) code before user's initialization
Inline any functions called in the
initialization code 166
Insert code barrier as last statement in initialization dode
Perform analysis, optimizations, register allocation, scheduling, etc. as usual HQ
Generate code for initialization code
Generate code for main code
METHOD OF REPLACING INITIALIZATION
CODE IN A CONTROL STORE WITH MAIN
CODE AFTER EXECUTION OF THE
INITIALIZATION CODE HAS COMPLETED
A network processor can include multiple embedded processors or engines. Each engine may be dedicated to a particular task and executes instructions to complete the 10 task. Instructions used by the engine to execute a particular process or task are often stored in a control store.
DESCRIPTION OF DRAWINGS
FIG. 1 is a block diagram of a system.
FIG. 2 is a block diagram of a network processor including multiple engines.
FIG. 3 is a flow chart of a process to initialize an engine and execute main code. 20
FIG. 4 is a block diagram of pseudo-code for an engine in the network processor.
FIG. 5 is a flow chart of a process to execute initialization code and overwrite the initialization code with main code. 25
FIG. 6 is a flow chart of a compiler process.
Referring to FIG. 1, a system 10 for transmitting data 30 from a computer system 12 to another computer system 14 is shown. System 10 includes a networking device 20 (e.g., a router or switch) that collects a stream of "n" data packets 18 and classifies each of the data packets for transmission to the appropriate destination computer system 14. The net- 35 working device 20 includes a network processor 28 or other multi-core processor that processes the data packets 18 with an array 32 of, for example, four, (as illustrated in FIG. 2) or six or twelve, and so forth programmable multithreaded engines 58. An engine 58 can also be referred to as a 40 processing element, an embedded processor, a processing engine, microengine, picoengine, and the like. Each engine executes instructions that are associated with a set of instructions (e.g., a reduced instruction set computer (RISC) architecture) and can be independently programmable. In general, 45 the engines 58 and control plane processor 30 are implemented on a common semiconductor die, although other configurations are possible. The control plane processor 30 coordinates multiple data-plane processors or engines 58 and handles exceptions generated by the engines. The func- 50 tionality of the control plane processor could be implemented in another type of processor such as a general purpose processor.
Referring to FIG. 2, the network processor 28 includes multiple engines 58 and each engine 58 includes a control 55 store 60. The control store 60 stores application specific code and instructions accessed by the engine 58 to perform specific tasks. For example, control store 60 may include a set of instructions related to tasks required by an application such as packet classification, packet processing, and quality 60 of service (QOS) actions. Such as set of instructions related to performing specific tasks determined by a programmer can be viewed as main code. The size of the control store 60 in the embedded microprocessor or engine 58 is limited. Thus, programs and instructions stored in the control store 65 are generated to effectively utilize the space provided. Engine 58 can be single-threaded or multi-threaded (i.e.,
executes a number of threads). When an engine is multithreaded, each thread acts independently as if there are multiple virtual engines.
The network processor interfaces and communicates with a PC, workstation, or other device that includes a loader 50, a compiler and linker 52, a simulator 54, and a debugger 56 that are used to load, execute, and debug programs stored in the control store 60. The loader 50, compiler and linker 52, simulator 54, and debugger 56 transmit code or data to the network processor 28. Programs executed by the engine 58 often include two components: an initialization routine (e.g., a compiled set of instructions from the initialization code) and a main routine (e.g., a compiled set of instructions from the main code). The initialization routine executes at start-up to initialize the engine 58, e.g., so that main code can be executed by the engine 58. Initialization code is generated by a high-level language compiler to initialize global and static variables, store constants in registers, initialize the software pipelining, and the like. The length of the initialization code may be substantial and, in some examples, the initialization code can be similar in length to the main code.
Referring to FIG. 3, a process 70 for executing initialization code and the main code using the limited storage in the control store 60 is shown. The initialization code (e.g, the compiled set of instructions resulting from the compiler compiling the initialization code) is not stored in the control store of the processor subsequent to execution (e.g., during execution of main code). This provides the advantage of allowing the main code (e.g, the compiled set of instructions resulting from the compiler compiling the main code) to use the entire space provided in the control store. The additional space allows more instructions or more complex code to be stored in the control store 60 for a given amount of control store storage space. In order to execute both the compiled initialization code and the compiled main code without storing the initialization code during execution of the main code, process 70 executes 72 static initializations for the system and executes and debugs 74 the initialization code. Process 70 replaces 76 the initialization code with the main code in the control store. The initializations set by the initialization code are maintained when the initialization code is overwritten. Process 70 subsequently executes 78 the main code.
Referring to FIG. 4, a program image including an initialization portion or code 98 and a main code 102 is shown. Both the initialization code 98 and the main code 102 include code-debugging information 100 and 104 respectively. The code debugging information is included in both portions because the debugger maintains a separate set of execution states (e.g., break points) for the initialization code 98 and the main code 102 (as described below).
The main code 102 includes an initial statement 116 to kill any unwanted threads. For example, in an 8 threaded engine, if a programmer elects to use only 4 threads instead of all 8, the compiler generates code to kill unwanted threads. If unwanted threads are not killed, unexpected behaviors may occur.
From the compiler's perspective, the main code 114 includes a call 118 to the initialization code 98. The compiler does not generate an explicit "call" instruction. Initialization code 98 is executed prior to the main code 102 as if there were a "call" from the main code to the initialization code from programmer's perspective based on the invisible call 118. In addition, the main code 102 also includes the code 120 or set of programmed instructions to cause an engine to perform a particular process or task.
The initialization code 98 (e.g., spaceless_preamble 106) is 'called' from the main code 102 for the purpose of compiler data-flow analysis. Spaceless_preamble is an arbitrary name given to the initialization code in this example. Other names could be used. The compiler does not generate 5 a standard "call" and "return" sequence for the initialization code. This 'call' allows for use of a single execution of the initialization code 98. The initialization code 98 includes an initial statement 108 to kill any unwanted threads. Killing the unwanted threads ensures that the process begins execu- 10 tion with the correct state and ensures that and previously running processes are completed or terminated before the initialization process begins. The initialization code 98 includes an instruction 110 to compute thread pointers and a set of initialization instructions. Another statement 112 in 15 the initialization code 98 is a barrier. The barrier is a statement replacing the typical "return" statement in the routine. The barrier indicates the end of the initialization process and is used to set a flag or provide an indication that the initialization is complete. 20
When the process reaches barrier 112, it kills any running threads. In order to replace the initialization code with the main code, the loader 50 needs to detect when all threads in initialization code finish execution. The loader 50 in control plane processor 30 detects when all threads are killed either 25 by periodically querying the engine, or based on the engine sending an interrupt to the control plane processor 30. When the control plane processor 30 detects that all threads have been killed, the control plane processor removes the initialization code from the control store and replaces the initial- 30 ization code with the main-code. Most of architecture states remain after the main code is written into the control store. Exceptions include resetting the program counter and reviving the threads. Other examples can include more or fewer exceptions to maintain the architectural state after execution 35 of the initialization code.
The invisible barrier code 112 provides a synchronization point but differs from a return statement in a typical routine because a return statement indicates a location in the program to return to. Since the main code is not loaded in the 40 control store during execution of the initialization code, a return statement would point to a nonexistent portion of the code (or an incorrect address). Based on the invisible barrier code 112 the loader makes sure that all threads executing the initialization routine reach the invisible barrier code 112 45 before overwriting the initialization code with the main code in the control store. The initialization code can execute on one or more thread.
The instructions included in the initialization code 98 are written by a programmer according to a set of rules. The 50 compiler compiles the initialization code 98 and the main code 102 concurrently. Thus, the initialization code uses the same compilation options as the main code. The initialization code can communicate with the main code using global variables resulting in a single program image with two parts 55 (e.g., the initialization code 98 and the main code 102). The initialization code executes after execution of any static initialization code but before any execution of any instructions in the main code. Thus, the initialization code does not rely on statements from the main code. Since the initializa- 60 tion code 98 is not stored in the control store during the execution of the main code 102, the initialization code cannot be called explicitly from functions in the main code (one exception is the invisible call to the initialization at compile time 118). For similar reasons, the initialization 65 code cannot include calls to functions in the main code. Variables and state machines initialized by the initialization
code are not reset before execution of the main code. Thus, any counters and state machines that are initialized for use in the initialization code, but are not correct for the main code are reset in either the initialization routine or the main code.
Referring to FIG. 5, a process 130 executed using the loader 50, compiler/linker 52, simulator 54, and debugger 56 is shown. Process 130 allows the initialization code 98 to execute a single time before the main code 102. Process 130 includes executing 132 the static initializations. The static initializations include initializing global and static variables and storing constants in registers. Subsequent to the execution of the static initializations, process 130 begins execution 134 of the main program. As described above, the main program 102 includes an implicit call 118 executed by the compiler in the initialization code or routine. Process 130 executes the call 136 to the initialization code and subsequently loads the initialization code into the control store, executes and debugs 138 the initialization code. Upon completion of the initialization process, threads reach a barrier and are killed 140. The initialization code can be executed on a single thread, on a subset of threads, or on all threads in the engine. The process checks 142 to see if all running threads have been killed. For example, a flag or bit can be set in a register when all threads have been killed and process 130 can check the register for a particular status. If process 130 determines 142 that the threads have not all been killed, process 130 returns to killing 140 the threads. If process 130 determines 142 that the threads have been killed, process 130 revives 144 the threads. Process 130 resets 148 the program counter and executes 150 the main code. Thus, the main code 102 is stored in the control store overwriting the initialization code 98 such that the initialization code 98 is not stored in the control store during the execution of the main code.
Referring to FIG. 6, a process 160 executed by the compiler 52 during various portions of process 130 or in addition to process 130 is shown. This process allows the compiler to compile the initialization code 98 and allow the initialization code 98 to be overwritten with the main code 102 such that only the main code is stored in the control store of the engine. The compiler inserts 162 a call, e.g., call 118 to the initialization code in the main code. This call is inserted and executed as if it is the first statement in the main code (e.g., call 118 in FIG. 4) but is invisible not written by the programmer. The compiler inserts 164 compiler-generated code into the initialization code before the initialization code. Examples of the compiler-generated code include thread initialization code or a first iteration of a software pipelined loop.
The compiler inserts 166 the actual code for (also referred to an inlining) any functions called in the initialization code. Alternatively, the compiler may replace the call to the function with the code for the function only for functions also called from the main code. The code for functions replaces the call to the function for a variety of reasons. For example, if a function is called from both the main code and the initialization code the addresses, compilation, or debugging of the function might be different when executed in the main code than when executed in initialization code. If the calls to the functions are not replaced by the actual code, but are called from both the main code and the initialization code, the main code analysis and optimizations may be inefficient or incorrect because the non-existence of the call site from the initialization code.
The compiler also inserts 168 a code barrier as the last statement in the initialization code. The code barrier replaces