CN101866280A

CN101866280A - Microprocessor and manner of execution thereof

Info

Publication number: CN101866280A
Application number: CN201010185596A
Authority: CN
Inventors: 杰拉德·M·卡尔; 罗德尼·E·虎克; 布莱恩·W·伯格
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2009-05-29
Filing date: 2010-05-19
Publication date: 2010-10-20
Anticipated expiration: 2030-05-19
Also published as: CN101866280B

Abstract

Microprocessor and manner of execution thereof are used for non-execution in proper order of pipelineization and resignation in proper order.This microprocessor comprises: branch predictor, extraction unit and performance element.The predicted target address of branch predictor predicted branches instruction.Extraction unit extracts instruction on predicted target address.Performance element is resolved the destination address of branch instruction, and whether detect predicted target address different with the destination address that is parsed; When predicted target address and the destination address that parsed not simultaneously, it is older in the branch instruction and the not instruction retired that need be corrected to judge whether to have procedure order; If do not have, then remove the branch instruction of the error prediction that extracts by predicted target address, and make extraction unit extract, so that carry out branch instruction by the destination address that is parsed; If have, then postpone the execution of branch instruction.

Description

Microprocessor and manner of execution thereof

Technical field

The present invention relates to the non-field of microprocessors of carrying out in proper order, the particularly execution of its branch instruction.

Background technology

Superscalar microprocessor (superscalar microprocessors) has a plurality of performance elements (execution units), in order to carry out the instruction set (instruction set) of microprocessor.Superscalar microprocessor improves treatment efficiency by multiple performance element (multiple execution unit), so superscalar microprocessor can side by side be carried out multiple instruction in each clock period.The key that makes superscalar microprocessor have this latent effectiveness improvement is to allow instruction be supplied to performance element to carry out constantly; Otherwise the usefulness of superscalar microprocessor can't be better than the usefulness of scalar micro-processor, and superscalar microprocessor can spend more hardware cost than scalar micro-processor.For example, execution units is written into and storage instruction operand (instruction operands), calculated address, actuating logic and calculating operation, and resolves (resolve) branch instruction (branch instructions).If use the more performance element of quantity and type, in each clock period, microprocessor will to give the scope of programmed instruction crossfire (program instruction stream) of instruction for change also bigger for each performance element.This is commonly referred to the pre-searching ability (lookahead capability) of microprocessor.

Have a kind of method to be used to improve the pre-searching ability of microprocessor, it allows instruction not carry out according to the order of program, and is commonly referred to as the non-microprocessor (out-of-order executionmicroprocessor) of carrying out in proper order.Though instruction can non-ly be carried out sequentially, most microprocessor architecture designs still need the instruction amenable to process is retired from office in proper order (retired in program order).In other words, the state of the microprocessor architecture design that is influenced by instruction results only can be updated by amenable to process in proper order.

The non-execution in proper order and the microprocessor of retiring from office in proper order generally include considerable pipeline stages (pipelinestages), so sometimes be called super pipelineization (super-pipelining).Microprocessor has one of reason of multi-line level so: if the instruction set architecture of microprocessor allows instruction to change length, then just need considerable pipeline stages usually at the pipeline front end, in order to will be not the crossfire of differentiation command byte (undifferentiated instruction bytes) do grammatical analysis (parse), and the instruction of grammatical analysis (parsed instructions) is translated into micro-order.

Though in the microprocessor Design field, using branch predictor is helpful to usefulness, in the microprocessor of super pipelineization, the generation branch instruction is well known to the adverse effect of usefulness.Particularly, in level in order to instruct according to predicted branches destination address (the predictedbranch target address) extraction that branch predictor provided, and make extraction apparatus (fetcher) be different between the level that the destination address that parses of predicted branches destination address (resolved target address) begins to extract instruction, if the quantity of pipeline stages is many more, the loss of efficacy (penalty) that then is relevant to branch misprediction (branch mispredictions) is also big more.

Therefore, need a kind of effective manner of execution badly, in order in the non-microprocessor of carrying out in proper order and retiring from office in proper order, to carry out branch instruction.

Summary of the invention

One embodiment of the invention provides a kind of pipelineization the non-microprocessor of carrying out and retiring from office in proper order in proper order, comprising: branch predictor, extraction unit and at least one performance element.Branch predictor is in order to the predicted target address of predicted branches instruction.Extraction unit is coupled to branch predictor, in order to extract branch instruction from above-mentioned predicted target address.Performance element is coupled to extraction unit, in order to: resolve the destination address of branch instruction, and whether predicted target address is different with the destination address that is parsed; When predicted target address and the destination address that parsed not simultaneously, it is older in the branch instruction and the not instruction retired that need be corrected to judge whether to have procedure order; If there is not procedure order older in the branch instruction and the not instruction retired that need be corrected, then remove the branch instruction of the error prediction that extracts by predicted target address, and make extraction unit extract by the destination address that is parsed, so that carry out branch instruction; And if there is procedure order older in the above-mentioned branch instruction and the instruction that need be corrected, then temporary respite branch instruction.

Another embodiment of the present invention provides a kind of manner of execution, in order to carry out branch instruction in the non-execution in proper order of pipelineization and little processor of retiring from office in proper order, comprising: the predicted branches instruction is extracted path extraction branch instruction to the first extraction path and according to above-mentioned prediction by first with resolved; After above-mentioned prediction and extraction step, resolve branch instruction to the second and extract the path, second extracts the path is different from the first extraction path; It is older in the branch instruction and the not instruction retired that need be corrected to judge whether to have procedure order; If there is not procedure order older, then removes by first and extract the error prediction branch instruction that extract in the path, and change by second and extract the path extraction branch instruction, to carry out branch instruction in the branch instruction and the not instruction retired that need be corrected; And if there is procedure order older in the branch instruction and the not instruction retired that need be corrected, then temporary respite branch instruction.

Description of drawings

Fig. 1 is the calcspar according to microprocessor of the present invention;

Fig. 2 is the operational flowchart of Fig. 1 microprocessor, and it is carried out in order to the selectivity that non-microprocessor in proper order is described.

[main element symbol description]

100～microprocessor; 102～instruction memory cache;

124～order format device; 126～format instruction queue;

104～instruction transfer interpreter; 128～translate instruction queue;

106～register alias table; 108～reservation station;

166～be written into the unit; 164～execution logic unit;

162～branch instruction comparison logical block;

158～steering logic unit; 152,154,156～register;

112～performance element; 114～retirement unit;

118～branch predictor; 116～reformation impact damper;

122～extraction unit; 172～totalizer;

170,174～extraction address; 176～predicted target address; 138～re-execute instruction;

136～correct branch address; 134～branch correction signal;

178～instruction; 146～be written into instruction to miss label;

144～integer instructions re-executes label;

142～branch instruction error prediction label;

132～clear signal.

Embodiment

Fig. 1 is the calcspar of microprocessor 100 of the present invention.Microprocessor 100 comprises the pipeline that is made of a plurality of level (stages) or functional units (functional units), this pipeline comprises extraction unit (instruction fetch unit) 122, instruction memory cache (instruction cache) 102, order format device (instruction formatter) 124, formatd instruction queue (formatted instructionqueue) 126, instruction transfer interpreter (instruction translator) 104, translated instruction queue (translatedinstruction queue) 128, register alias table (register alias table) 106, reservation station (reservation station) 108, performance element (execution unit) 112, and retirement unit (retireunit) 114.Microprocessor 100 also comprises branch predictor (branch predictor) 118, is couple to extraction unit 122.Microprocessor 100 also comprises reformation impact damper (reorder buffer) 116, is couple to register alias table 106, reservation station 108, performance element 112 and retirement unit 114.

Performance element 112 comprises and is written into unit (load unit) 166, execution logic unit (executionlogic) 164, and branch instruction comparison logical block (branch checking logic) 162, above-mentioned each all be coupled to steering logic unit (control logic) 158.Performance element 112 also comprises register (register) 156, register 154, and register 152.Register 156 is in order to the oldest reformation impact damper label of missing (ROB tag) that is written into instruction (oldest missing load instruction) of storage; Register 154 is in order to the reformation impact damper label of the integer instructions (oldest replaying integerinstruction) that re-executes of the oldest quilt of storage; Register 152 is in order to the reformation impact damper label of the oldest mispredicted branch instruction (oldest mispredicted branch instruction) of storage, and above-mentioned each register all is coupled to steering logic unit 158.Steering logic unit 158 produces correct branch address (correct branch address) 136 to extraction unit 122.Steering logic unit 158 also produce branch correction signal (branch correct signal) 134 to extraction unit 122, order format device 124, format instruction queue 126, instruction transfer interpreter 104, translate instruction queue 128 and register alias table 106.In one embodiment, in the performance element 112 except being written into unit 166 all unit be included in the single performance element (being in a plurality of integer units (integer unit)), are performance elements 112 discrete and be written into unit 166 with integer unit.

In one embodiment, microprocessor 100 is the x86 architecture microprocessor.The microprocessor that can correctly be implemented as the major applications program of x86 architecture microprocessor design is called the x86 architecture microprocessor.If microprocessor wishes to obtain correct result, then application program need correctly be carried out.Microprocessor 100 will instruct non-the execution sequentially and resignation sequentially.Therefore, even the non-results that produce instruction sequentially of performance element 112 till retirement unit 114 can wait until that this instruction is the instruction of finishing the oldest in the microprocessor 100, just can upgrade the architecture states of microprocessor 100 by the result of instruction.

Extraction unit 122 provides and extracts address 170 to instruction memory cache 102, extracts instruction in order to specify next extraction address from instruction memory cache 102.Totalizer 172 will be extracted address 170 increments (increment) and be connected in the extraction address 174 that the next one that extracts address 170 continues with generation, extract address 174 and be provided to extraction unit 122.Extraction unit 122 also receives predicted target address (predicted target address) 176 from branch predictor 118.Extraction unit 122 also receives correct branch address 136 from performance element 112.As described below, extraction unit 122 by selection in the aforementioned a plurality of addresses that provide as provide to the instruction memory cache 102 extraction address 170.

If steering logic unit 158 sends branch correction signal 134, then extraction unit 122 is selected correct branch address 136; If branch predictor 118 predicted branch direction take place, then extraction unit 122 is selected predicted target address 176; Otherwise, the extraction address 174 that extraction unit 122 selections continue.The predicted branches address of branch instruction be along with along pipeline and under branch instruction and be provided.If branch predictor 118 predicted branches do not take place, then the predicted branches address is the extraction address 174 of continuing; If branch predictor 118 predicted branches take place, then the predicted branches address is predicted target address 176.Branch predictor 118 may error prediction (mispredict) branch instruction, so need microprocessor 100 to proofread and correct this error prediction, makes correct instruction to be extracted and to carry out.If performance element 112 is correction branch (as discussion after a while) then, then the predicted branches address becomes correct branch address 136.The predicted branches address is to be provided to performance element 112 along with branch instruction with relevant for other information of instruction 178.

In microprocessor 100, the technical scheme that is of value to the branch instruction of error recovery prediction can be described respectively by following steps.

Step 1: performance element 112 is resolved branch instruction.In other words, performance element 112 receives in order to the input operand (input operands) of parsing branch instruction and according to the input operand and judges branch direction and branch address.Particularly, performance element 112 is checked the specified condition code (condition code) of branch instruction, make that to judge whether this condition code meets by the specified branch condition of branch instruction branch is about to take place or can not take place, and performance element 112 also comes the destination address of Branch Computed instruction according to the specified source operand of branch instruction (source operands).Behind the branch direction and destination address of resolving branch instruction, no matter because branch predictor 118 has been predicted wrong direction (branch takes place or do not take place) and/or wrong branch target address, performance element 112 all can have been judged branch predictor 118 error predictions branch.For the purpose of simplifying the description, below explanation hypothesis: branch predictor 118 predicted path A (path A) and extraction unit 122 are extracted by path A; But path B (path B) is only correct path.

Step 2: then, performance element 112 is carried out branch instruction.In other words, performance element 112:(1) notice (tell) register alias table 106 stops to send with charge free (dispatch) instruction; (2) remove (flush) pipeline front end; And the correct branch address 136 that (3) are provided by performance element 112 points out correct path B, begins to extract at correct path B with notice extraction unit 122.Register alias table 106 is afterbodies (last stages) of pipeline, and amenable to process receives instruction in proper order.The front end of pipeline is the part before register alias table 106.In a lot of situations, microprocessor 100 often has the instruction of much older branch instruction in (older) error prediction, must become the oldest instruction in the branch instruction of error prediction and is retired from office by first amenable to process order before the resignation.Therefore, branch instruction become the oldest instruction during, microprocessor 100 extracts the instruction (extracting the instruction from correct path B) of also handling good (good), and fills up the front end of pipeline with this.

Step 3: last, because the end of pipeline comprises the instruction (because of extracting from wrong path A) that should not be performed, retirement unit 114 is with the end of branch instruction resignation and removing pipeline.The end of pipeline is the part after register alias table 106.

Step 4: retirement unit 114 notice register alias tables 106 begin to send with charge free instruction, i.e. the instruction of being extracted and being handled by correct path B when performance element 112 once more when step 2 is carried out branch instruction.

Because before the branch instruction resignation of retirement unit 114 with error prediction, microprocessor 100 perhaps just can the correct branch address 136 in the pipeline front end begin to extract and processing instruction, thus the branch instruction described in the step 2 as if can be not according to procedure order to be performed (the early stage correction that is error prediction) be helpful.In other words, the instruction of being extracted by correct path B can be shorter than performance element 112 with N the clock period that is performed and not carry out branch instruction according to error prediction, the empty time of waiting till branch instruction is ready to retire from office.The maximal value of N begins to count the clock period till first instruction from " correct " individual path B arrives register alias table 106 by begin to carry out branch instruction (being the error recovery prediction) when microprocessor 100.In one embodiment, because branch's loss (branch penalty) is up to 17 clock period, so so work is helpful especially.Particularly, in one embodiment, lead (redirected) again to new individual path, till first instruction that comes from new individual path arrives register alias table 106, need spend 10 clock period in case extract address 170 quilts.In other words, by early stage correction, the clock period of branch's loss is hidden (hidden); Be noted that microprocessor 100 can stash between proofreading and correct the clock number that beginning and retirement unit 114 be ready between the instruction of the error prediction of will retiring from office.

Yet the non-instruction of execution/correction branch sequentially might not be helpful.Below the 100 non-instructions of execution/correction branch sequentially of explanation microprocessor but do not produce situation about benefiting.Particularly, this situation is branch predictor 118 predicted branches instruction correctly really, yet performance element 112 is but because receive incorrect input operand (for example condition code and/or destination address calculation operations unit), and resolves branch instruction improperly.Then, performance element 112 is assert branch predictor 118 error prediction branches (being that performance element 112 judges that the branch direction and/or the branch address of prediction do not meet the branch direction and/or the branch address of having resolved) mistakenly, and this branch of execution/correction.Why performance element 112 receives wrong input operand, no matter be, and do not provide correct input operand to performance element 112 because branch instruction is that directly or indirectly the link (chain of dependencies) by correlativity is relevant with the condition code and/or the operand of originating of older branch instruction.For example, the older somewhere that is written into the correlativity link of instruction in the data memory cache is missed and stale data (stale data) is provided.Do not have the technical scheme of benefiting for the purpose of simplifying the description, following hypothesis branch fallout predictor 118 correctly predicted path A and extraction unit 122 is extracted by path A, and its step division is as follows:

Step 1: as the step 1 of above-mentioned helpful technical scheme, promptly performance element 112 has been resolved branch instruction and has been pointed to path B.

Step 2: as the step 2 of above-mentioned helpful technical scheme, i.e. performance element 112 execution/correction branch instruction is to path B.

Step 3: continue in the execution/correction of step 2, than the executed of step 1 and 2/the old instruction of correction branch instruction becomes the oldest instruction, and retirement unit 114 begins to re-execute instruction the oldest in (replay) reformation impact damper 116 and the instruction that all are newer, comprising the branch instruction of executed/proofreaied and correct.Re-execute expression retirement unit 114 and remove the end of pipeline, and send from reformation impact damper 116 and to re-execute instruction 138 to reservation station 108, in the impact damper 116 that also is about to reform all effectively instruction send (re-dispatch) sequentially again with charge free to reservation station 108.(if is the branch instruction of error prediction by the older instruction that re-executed, then during the older branch instruction in performance element 112 execution/corrections are just re-executing at present, the front end of pipeline also can be eliminated)

Step 4: during the identical branch instruction that is performed in re-executing step 2/proofreaies and correct, performance element 112 is resolved and is judged that path A is correct path but not path B after the branch instructions.This expression: in step 2, in fact the instruction that is eliminated by the pipeline front end is the instruction that correctly is extracted.Unfortunately, this also represents: microprocessor 100 must be proofreaied and correct " correction " carried out now in step 2.(be noted that in re-executing " prediction " that performance element 112 is seen is path B, i.e. performance element 112 path of in step 2, being proofreaied and correct; Yet, but be not that branch predictor 118 is predicted toward " prediction " of path B; Exactly, should " prediction " be in step 2, to predict when this branch instruction of its execution/correction by performance element 112.)

Step 5: be similar to above-mentioned steps 2, according to re-executing the parsing of being done in the step 4, performance element 112 execution/correction branch.Yet in step 4, performance element 112 is the path to be proofreaied and correct be path A.So execution/correction that step 2 is done is a shortcoming, this is because execution/correction of doing of step 2 causes microprocessor 100 to remove those branch predictors predicted path A correctly before step 1, and begun the instruction extracting and handle, and those identical instructions must be extracted (re-fetched) and processing (re-processed) again again at the pipeline front end now.

The technical scheme of error recovery predicted branches that the above-mentioned nothing of following explanatory memorandum is benefited instruction, i.e. branch predictor 118 predicted path A at first, and extraction unit 122 extracts branch instruction by path A.Then, performance element 112 is resolved and is branched into path B, in fact because performance element 112 receives wrong input operand, so path B is incorrect, and, make extraction unit 122 extract branch instruction by (mistake) path B in step 2 execution/correction.Yet branch instruction is still re-executed (because an older instruction make its like this), and during re-executing, performance element 112 is resolved and branched into path A, and wherein path A is correct path.Because performance element 112 receives the correct input operand of branch instruction during re-executing,, performance element 112 branches into path A during re-executing so resolving.This is because of the correct result of the present generation of instruction in the correlativity link that correct result is not provided in resolving for the first time, and correct result is provided for performance element 112, in order to resolve branch instruction.In other words, resolve branch for the first time compared to performance element 112, condition code flag and/or destination address calculation operations unit used in this re-executes are different.Therefore, performance element 112 execution/corrections make extraction unit 122 to extract branch instruction by path A.

At this problem (can enjoy the advantage of helpful scheme, also can reduce the possibility of not having the scheme of benefiting), microprocessor 100 is the branch of the non-prediction of execution error sequentially of meeting usually; Yet, microprocessor 100 also can attempt to differentiate (identify), and some resolve to branch instruction the situation of error prediction (being that branch predictor 118 is correctly predicted) the most commonly mistakenly, and " error prediction " branch of can't non-execution/correction sequentially in these situations having resolved of microprocessor 100.More particularly, microprocessor 100 can be attempted to tell these branch instructions and must be re-executed, and in re-executing resolved for correctly the prediction modal situation.In one embodiment, when above-mentioned modal situation occurs in an older instruction in branch instruction and will be re-executed, that is:

It to be error prediction that (1) older branch is resolved

(missed) missed in (2) older instructions that are written into

(3) older integer instructions are wrong (faulted)

By " error prediction " branch that temporary respite has been resolved, microprocessor 100 does not need to extract again and handle identical instruction again at the pipeline front end, because it is extracted and handles when for the first time predicted in branch.

Which kind of situation can be included into and consider is decidable in design, and this decision weigh in: make complexity/operating speed/power consumption increase because considering a given situation and cost increases, and make that because not considering above-mentioned situation the gain and loss of usefulness between not good, wherein above-mentioned gain and loss are relevant with its occurrence frequency in some clock period and average loss of efficacy in essence.

With reference to figure 1, reformation impact damper 116 is organized into a circle queue (circular queue) and has a plurality of projects (entry) distributes to each is dispatched into reservation station 108 by register alias table 106 instruction.Each project all has a relevant index (index) in the reformation impact damper 116, and the scope of its value is 0 to (n-1), and wherein n is the number of project in the reformation impact damper 116.Register alias table 106 amenable to process are assigned project in (allocates) reformation impact damper 116 sequentially for each instruction in proper order.Therefore, can make comparisons, and judge that what person is the oldest on the order of program the index or the label of two instructions in the reformation impact damper 116.

Microprocessor 100 is carried out the prediction that is written into instruction and is carried out (speculative execution).In other words, microprocessor 100 hypothesis are written into instruction and always can hit (hit) data memory cache.Then, do not knowing whether to obtain correct being written under the data conditions, reservation station 108 sends to use and is written into data instructing to performance element 112 as the source operand.Therefore, instruction (for example branch instruction) may receive incorrect data, and this is to have used the wrong data that are written into owing to older instruction directly or indirectly.Detecting and be written into instruction (load instruction) and in the data memory cache, missed when being written into unit 166, and must will be written into and instruct when re-executing, being written into unit 166 will miss label 146 and export steering logic unit 158 in the instruction that is written into that is written into instruction that the data memory cache is missed, and miss label 146 and will be the label of reformation impact damper and be written into instruction.Steering logic unit 158 with the label (the oldest label of missing that is written into instruction) in the register 156 be written into instruction and miss label 146 and make comparisons.To miss label 146 be older if be written into instruction, and then steering logic unit 158 is missed label 146 and upgraded registers 156 to be written into instruction.Steering logic unit 158 is kept the label that is written into instruction that the oldest quilt is missed in the microprocessor 100 by this.

Similarly, when execution logic unit 164 detects an integer instructions (integer instruction) and need be re-executed, execution logic unit 164 will need the integer instructions of the integer instructions that re-executed to re-execute label 144 to export steering logic unit 158 to, and integer instructions re-executes label 144 and is the label of reformation impact damper.Steering logic unit 158 re-executes label 144 with label (label of the integer instructions that the oldest quilt re-executes) in the register 154 and integer instructions and makes comparisons.If it is older that integer instructions re-executes label 144, then steering logic unit 158 re-executes label 144 renewal registers 154 with integer instructions.Steering logic unit 158 is kept the label of the integer instructions that the oldest quilt re-executes in the microprocessor 100 by this.

Moreover, resolved and be branched instruction comparison logical block 162 and detect when being error prediction when branch instruction, branch instruction comparison logical block 162 exports the branch instruction error prediction label 142 of mispredicted branch instruction to steering logic unit 158, and branch instruction error prediction label 142 is the label of reformation impact damper.Label (label of the oldest mispredicted branch instruction) in the register 152 and branch instruction error prediction label 142 are made comparisons in steering logic unit 158.If branch instruction error prediction label 142 is older, then steering logic unit 158 upgrades register 152 with branch instruction error prediction label 142.Steering logic unit 158 is kept the label of mispredicted branch instruction the oldest in the microprocessor 100 by this.

According to the present invention, Fig. 2 is the operational flowchart of Fig. 1 microprocessor 100, and it is in order to the optionally non-operation of carrying out branch instruction in proper order of explanation microprocessor 100.The flow process beginning is in step 202.

In step 202, performance element 112 is resolved branch instructions and is judged that it is an error prediction.Flow process advances to determining step 204.

At determining step 204, performance element 112 is made comparisons branch instruction error prediction label 142 and the label (label of the oldest mispredicted branch instruction) in the register 152, is error prediction and therefore needs to proofread and correct in order to judge whether the older not branch instruction of resignation.If have, then flow process advances to step 206; Otherwise flow process advances to determining step 208.

In step 206, performance element 112 can postpone going that non-correction/execution sequentially is resolved in step 202 is newer (newer) branch instruction of error prediction.Resolvedly in step 202 will can not be retired from office for the newer branch instruction of error prediction, because before the branch instruction of newer error prediction has an opportunity to become the branch instruction of the oldest error prediction, the branch instruction of older error prediction can make the branch instruction of newer error prediction be eliminated in end of line.The invention has the advantages that non-correction/execution sequentially is resolved in step 202 to be the branch instruction of error prediction by postponing, microprocessor 100 can be avoided the shortcoming of above-mentioned relatively poor technical scheme.In other words, if proof branch predictor 118 path of predicted branches instruction correctly, then the instruction from the path extraction of correct prediction does not need to be extracted again and handle again at the pipeline front end.Flow process terminates in step 206.

At determining step 208, performance element 112 is missed the instruction that is written into of the label (the oldest label of missing that is written into instruction) in the register 156 and error prediction branch label 146 and is made comparisons, in order to judge whether that the older instruction that is written in error prediction branch is missed.If have, flow process advances to step 212; Otherwise flow process advances to determining step 214.

In step 212, performance element 112 can postpone going that non-correction/execution sequentially is resolved in step 202 is the branch instruction of error prediction.Resolvedly in step 202 will can not be retired from office for the branch instruction of error prediction, because before the branch instruction of error prediction has an opportunity to become the oldest instruction, being written into when instruction becomes the oldest instruction in computing machine of missing can make the branch instruction of error prediction re-execute.The invention has the advantages that non-correction/execution sequentially is resolved in step 202 to be the branch instruction of error prediction by postponing, microprocessor 100 can be avoided the shortcoming of above-mentioned relatively poor technical scheme.Flow process terminates in step 212.

At determining step 214, performance element 112 re-executes label 144 with the integer instructions of the label (label of the integer instructions that the oldest quilt re-executes) in the register 154 and error prediction branch and makes comparisons, in order to judge whether to be labeled in order to re-execute the integer instructions of (marked) and older branch instruction in error prediction.If have, flow process advances to step 216; Otherwise flow process advances to step 218.

In step 216, performance element 112 can postpone going that non-correction/execution sequentially is resolved in step 202 is the branch instruction of error prediction.Resolvedly in step 202 will can not be retired from office for the branch instruction of error prediction, because before the branch instruction of error prediction has an opportunity to become the oldest instruction, the integer instructions that re-executes can make the branch instruction of error prediction re-execute when becoming the oldest instruction in computing machine.The invention has the advantages that non-correction/execution sequentially is resolved in step 202 is the branch instruction of error prediction by postponing going, microprocessor 100 can be avoided the shortcoming of above-mentioned relatively poor technical scheme.Flow process terminates in step 216.

In step 218, the steering logic unit 158 of performance element 112 provides correct branch address 136 to extraction unit 122.Steering logic unit 158 also sends branch correction signal 134, wherein branch correction signal 134 makes extraction unit 122 select correct branch address 136 to extract addresses 170 as next, and the pipeline front end is proofreaied and correct resolvedly in step 202 be the branch instruction of error prediction.In other words, by sending branch correction signal 134, steering logic unit 158 is performed the branch instruction right and wrong of error prediction sequentially, and realizes the above-mentioned advantage about helpful technical scheme by this.Flow process advances to step 222.

In step 222, register alias table 106 can stop the dispensing instruction according to the branch correction signal 134 that is sent.Flow process advances to step 224.

In step 224, the part pipeline before register alias table 106 is removed (flush) (or ineffective treatment (invalidate)) according to the branch correction signal 134 that is sent with all instructions, and extracts instruction and processing instruction on correct branch address 136.Flow process advances to step 226.

In step 226, the branch instruction that retirement unit 114 is judged error prediction be ready to now to retire from office (being that mispredicted branch instruction is the oldest instruction in the computing machine), therefore send clear signal 132 to remove register alias table 106 all instructions afterwards, promptly retirement unit 114 is removed new for all instructions of mispredicted branch instruction.The clear signal of being sent 132 also provides to register alias table 106, with notice register alias table 106 present circumstances.Flow process advances to step 228.

In step 228, register alias table 106 is sent instruction with charge free again according to the clear signal of being sent 132.Flow process terminates in step 228.

Though the present invention by several embodiment openly as above, it in order to as an example, is not in order to limit the present invention only.Those skilled in the art will be understood that under the premise of without departing from the spirit of the present invention, when doing a little change to the present invention.For example, function, manufacturing, modelling, simulation, various character that software can activation apparatus and method of the present invention, and/or test.It can be reached by different program languages, for example program language (as C, C++), hardware description language (hardware description language, HDL is as Verilog HDL, VHDL), or other possible program languages.Above-mentioned software can be arranged at any known computer read/write memory medium (computer usable medium), for example semiconductor, disk, or CD (as CD-ROM, DVD-ROM).Apparatus and method of the present invention may be included within any semiconductor Wise property core (semiconductor IP core), for example (embed) microcontroller core with HDL, or when integrated circuit is made, are transferred to hardware.In addition, the present invention may realize by the combination of hardware and software.Therefore, the present invention should not limited by any embodiment described herein, and the present invention should be that the device/method according to appended claim and its equivalence is defined.Specifically, the present invention can be arranged in the micro processor, apparatus of general service computing machine.At last, those skilled in the art will be understood that: under the prerequisite that does not break away from the scope of the invention that is defined by claims, it can be with notion disclosed by the invention and certain embodiments as the basis, in order to design or revise other frameworks and carry out the purpose identical with the present invention.

Claims

1. microprocessor is carried out and instruction retired in proper order in proper order in order to non-, comprising:

One branch predictor is in order to predict a predicted target address of a branch instruction;

One extraction unit is coupled to above-mentioned branch predictor, in order to extract above-mentioned branch instruction from above-mentioned predicted target address; And

At least one performance element is coupled to the said extracted unit, in order to:

Resolve the destination address of above-mentioned branch instruction, and whether detect above-mentioned predicted target address different with the destination address that is parsed;

When above-mentioned predicted target address and the above-mentioned destination address that parses not simultaneously, it is older in the above-mentioned branch instruction and the instruction retired not that need be corrected to judge whether to have procedure order;

If there is not procedure order older in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected, then remove the branch instruction of an error prediction that extracts by above-mentioned predicted target address, and make the said extracted unit extract by the above-mentioned destination address that parses, so that carry out above-mentioned branch instruction; And

If there is procedure order older in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected, the then above-mentioned branch instruction of temporary respite.

2. microprocessor as claimed in claim 1, it is older in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected that wherein above-mentioned performance element judges whether to have procedure order by one in the following condition:

To have doped procedure order older in the do not retire from office destination address of branch instruction of one of above-mentioned branch instruction for above-mentioned branch predictor, and the said extracted unit is from the target address fetch instruction of the above-mentioned branch instruction of not retiring from office predicted, and above-mentioned performance element has parsed the destination address of the above-mentioned branch instruction of not retiring from office, and it is different with the destination address of the above-mentioned branch instruction of being predicted of not retiring from office to detect the destination address of the above-mentioned branch instruction of not retiring from office that is parsed;

In a memory cache of above-mentioned microprocessor, whether have procedure order older in above-mentioned branch instruction one resignation be written into instruction and missed; And

Whether have that procedure order is older need to be re-executed in one of above-mentioned branch instruction integer instructions of not retiring from office.

3. microprocessor as claimed in claim 1, also comprise a memory element, in order to leave the reformation impact damper label of the oldest and not instruction retired that need be corrected on the procedure order in, wherein above-mentioned performance element by more above-mentioned branch instruction reformation impact damper label and above-mentioned on procedure order the reformation impact damper label of the oldest and not instruction retired that need be corrected, older to judge whether to have procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected.

4. microprocessor as claimed in claim 3, wherein above-mentioned memory element with each that think a plurality of instruction types deposit above-mentioned on procedure order the reformation impact damper label of the oldest and not instruction retired that need be corrected, wherein above-mentioned performance element by for the reformation impact damper label of each more above-mentioned branch instruction of above-mentioned instruction type and above-mentioned on procedure order the reformation impact damper label of the oldest and not instruction retired that need be corrected, older to judge whether to have procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected.

5. microprocessor as claimed in claim 4, wherein above-mentioned a plurality of instruction types comprise at least one of following instruction type:

One first instruction type, in order to represent one be written into instruction above-mentioned microprocessor one the instruction memory cache miss;

One second instruction type is different with above-mentioned performance element to the destination address that above-mentioned branch instruction parsed to the destination address that above-mentioned branch instruction predicted in order to represent above-mentioned branch predictor; And

One the 3rd instruction type need be re-executed in order to represent an integer instructions.

6. microprocessor as claimed in claim 1 also comprises:

One register alias table sequentially receives a plurality of programmed instruction in order to amenable to process, and the said procedure instruction is dispatched into a plurality of above-mentioned performance element of above-mentioned microprocessor so that carry out non-execution in proper order; And

More than first pipeline stages is positioned at before the above-mentioned register alias table, and above-mentioned more than first pipeline stages comprises above-mentioned branch predictor and said extracted unit,

Wherein above-mentioned performance element is by providing the above-mentioned destination address that parses to above-mentioned extraction unit and send a signal and carry out above-mentioned branch instruction, above-mentioned register alias table stops to send with charge free instruction according to above-mentioned signal, above-mentioned more than first pipeline stages is according to all instructions in it of above-mentioned signal removal, and the said extracted unit begins to extract instruction according to above-mentioned signal by the above-mentioned destination address that parses.

7. microprocessor as claimed in claim 6 also comprises:

One retirement unit is in order to sequentially retire from office said procedure instruction of amenable to process; And

More than second pipeline stages is positioned at after the above-mentioned register alias table, comprises a plurality of above-mentioned performance elements and above-mentioned retirement unit,

Wherein when above-mentioned retirement unit when to judge above-mentioned branch instruction be the oldest not instruction retired in the above-mentioned microprocessor, above-mentioned retirement unit is eliminated all programmed instruction of above-mentioned more than second pipeline stages, and make after all programmed instruction of above-mentioned more than second pipeline stages are eliminated in above-mentioned retirement unit, above-mentioned register alias table begins to send with charge free programmed instruction to a plurality of above-mentioned performance elements.

8. manner of execution, in order to a pipelineization non-carry out in proper order and the microprocessor of resignation in proper order in carry out a branch instruction, comprising:

Predict that above-mentioned branch instruction extracts the path and extract the above-mentioned branch instruction of path extraction according to above-mentioned prediction by above-mentioned first to one first resolved;

After above-mentioned prediction and extraction step, resolve above-mentioned branch instruction to one second and extract the path, above-mentioned second extracts the path is different from the above-mentioned first extraction path;

It is older in the above-mentioned branch instruction and the instruction retired not that need be corrected to judge whether to have procedure order;

If there is not procedure order older in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected, then remove the branch instruction of an error prediction that extracts by the above-mentioned first extraction path, and change by the above-mentioned branch instruction of the above-mentioned second extraction path extraction, to carry out above-mentioned branch instruction; And

9. manner of execution as claimed in claim 8 wherein above-mentionedly judges whether to have the older step in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected of procedure order in comprising the following steps:

Judging whether that procedure order is older does not retire from office in one of above-mentioned branch instruction that branch instruction is resolved to be extracted one the 4th of path and extract the path to being different from one the 3rd, and above-mentioned microprocessor before had been that prediction the above-mentioned the 3rd is extracted the path and extracted the above-mentioned branch instruction of not retiring from office of path extraction by the above-mentioned the 3rd;

Judgement in a memory cache of above-mentioned microprocessor, whether have procedure order older in above-mentioned branch instruction one resignation wear into instruction and missed; And

Judging whether that procedure order is older need be re-executed in one of the above-mentioned branch instruction integer instructions of not retiring from office.

10. manner of execution as claimed in claim 8 also comprises:

Leave the reformation impact damper label of the oldest and not instruction retired that need be corrected on the procedure order in, wherein above-mentioned judge whether to have the older step of procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected comprise by more above-mentioned branch instruction reformation impact damper label and above-mentioned on procedure order the reformation impact damper label of the oldest and not instruction retired that need be corrected, older to judge whether to have procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected.

11. manner of execution as claimed in claim 10, wherein above-mentioned leave in the oldest and reformation impact damper label not instruction retired that need be corrected on the procedure order be for a plurality of instruction types each and deposit, and above-mentioned judge whether to have the older step of procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected comprise by for above-mentioned instruction type each respectively more above-mentioned branch instruction reformation impact damper label and above-mentioned on procedure order the reformation impact damper label of the oldest and not instruction retired that need be corrected, older to judge whether to have procedure order in the above-mentioned branch instruction and the above-mentioned not instruction retired that need be corrected.

12. manner of execution as claimed in claim 11, wherein above-mentioned a plurality of instruction types comprise at least one of following instruction type:

One second instruction type, in order to represent above-mentioned branch instruction predicted to above-mentioned first to extract the path be different with the above-mentioned second extraction path that is parsed; And

13. manner of execution as claimed in claim 8, the step of the above-mentioned branch instruction of wherein above-mentioned execution also comprises:

Stop to send with charge free instruction; And

Removing is positioned at all instructions more than the node of sending instruction with charge free in the pipeline stages of above-mentioned microprocessor.

14. manner of execution as claimed in claim 13 also comprises:

After all instructions before in the pipeline stages of removing at above-mentioned microprocessor, being positioned at the node of sending instruction with charge free, judge whether above-mentioned branch instruction is not instruction retired the oldest in the above-mentioned microprocessor;

In above-mentioned branch instruction is above-mentioned microprocessor during the oldest not instruction retired, all instructions that are positioned in the pipeline stages of above-mentioned microprocessor after the node of sending instruction with charge free are eliminated; And

Be positioned in the pipeline stages of above-mentioned microprocessor send instruction with charge free node after all instructions be eliminated after, restart to send with charge free instruction.