BACKGROUND OF THE INVENTION
This is a continuation of International Application No. PCT/US03/30741, filed Sep. 30, 2003, which relies for priority upon U.S. Provisional Application No. 60/414,656, filed Oct. 1, 2002, the contents of both of which are incorporated herein by reference in their entireties.
1. Field of the Invention
The invention generally relates to the field of plasma processing. More particularly, the invention relates to monitoring and analyzing process parameters in a plasma processing facility.
2. Background Information
Throughout the various stages of plasma processing, such as semiconductor or display manufacturing, etc., critical process parameters may vary significantly. Processing conditions change over time with the slightest changes in critical process parameters creating undesirable results. Small changes can easily occur in the composition or pressure of an etch gas, process chamber, or wafer temperature. As such, plasma processing facilities require constant monitoring.
The measuring and monitoring of these process parameters at any given time permits valuable data to be accumulated and analyzed. Process control feedback may be used to adjust the process parameters or determine the viability of certain process materials. However, in many cases, changes of process data reflecting deterioration of processing characteristics cannot be detected by simply referring to the process data displayed. It is difficult to detect early stage abnormalities and characteristic deterioration of a process, and often may be necessary to obtain prediction and pattern recognition by an Advanced Process Control (APC).
- SUMMARY OF THE INVENTION
Computers are generally used to control, monitor, and analyze manufacturing processes, due to the various complexities that may occur in a semiconductor manufacturing plant from the reentrant wafer flows, critical processing steps, and maintenance of the processes. However, the software implemented with the data collection system is not applicable to a variety of plasma processes and does not optimize accumulation and analysis of the data and the resulting processing relationships.
BRIEF DESCRIPTION OF THE DRAWINGS
In an embodiment of the present invention, a system and method that is broadly applicable to a variety of plasma processes is utilized to determine a relationship between process parameters and performance measurements. The interaction between harmonic and other measurements and process variables is determined and matched to the current plasma process to achieve desired process results.
The above and other features of the present invention are further described in the detailed description which follows, with reference to the drawings, and by way of a non-limiting exemplary embodiment of the present invention, wherein like reference numerals represent similar parts of the present invention throughout the several views and wherein:
FIG. 1 illustrates a plasma processing system in accordance with an embodiment of the present invention;
FIG. 2 illustrates a plasma processing system in accordance with another embodiment of the present invention;
FIG. 3 illustrates a plasma processing system in accordance with another embodiment of the present invention;
FIG. 4 illustrates a plasma processing system in accordance with another embodiment of the present invention;
FIG. 5 illustrates a plasma processing system in accordance with another embodiment of the present invention;
FIG. 6 illustrates a simplified diagram of an RF signal and harmonic measurements;
FIG. 7A illustrates a flow diagram for a response surface method in accordance with an embodiment of the present invention;
FIG. 7B illustrates an exemplary response surface model in accordance with an embodiment of the present invention;
FIG. 8 illustrates a flow diagram for a neural network model in accordance an embodiment of the present invention;
FIG. 9 illustrates neural network model in accordance with an embodiment of the present invention;
FIG. 10 illustrates an overall system in accordance with an embodiment of the present invention; and
DETAILED DESCRIPTION OF THE EMBODIMENTS
FIG. 11 illustrates a neural network model in accordance with an embodiment of the present invention.
The following detailed description of the present invention refers to the accompanying drawings that illustrate exemplary embodiments consistent with this invention. Other embodiments are possible and modifications may be made to the embodiments without departing from the spirit and scope of the invention. Therefore, the following detailed description is not meant to limit the invention. Rather the scope of the invention is defined by the appended claims.
Referring now more particularly to the drawings, FIG. 1 illustrates a plasma processing system 100 in accordance with the present invention. System 100 includes a processing chamber 102, a device 104 configured to measure and adjust at least one processing parameter, a device 106 configured to measure at least one process performance measurement, and a controller 108. The controller 608 is coupled to the processing chamber 602, the device 604 and device 606.
The plasma processing system 100 of FIG. 1, utilizes, for example, a plasma for material processing, and includes an etch chamber. In the alternative, the system 100 may include a photoresist coating chamber such as, a photoresist spin coating system. The system may also include a photoresist patterning chamber such as a ultraviolet (UV) lithography system, or a dielectric coating chamber such as a spin-on-glass (SOG) or spin-on-dielectic (SOD) system. Additionally, the system 100 may include: a deposition chamber such as a chemical vapor deposition (CCVD) or a physical vapor deposition (PVD) system, a rapid thermal processing (RTP) chamber, such as a RTP system for thermal annealing, or a batch diffusion furnace. Device 104 can adjust process parameters such as electrode spacing backs holding (He) pressure, total flow rate, flow rate ratio, bottom electrode RF power, top electrode RF power chuck temperature, or the like.
FIG. 2 illustrates plasma processing system 200 in greater detail in accordance with the present invention. The system 200 may include, as depicted in FIG. 1, a process chamber 102, a substrate holder 212 upon which a substrate 204 to be processed may be affixed, a gas injection system 206, and a vacuum pumping system 208. The substrate 204 may be, for example, a semiconductor substrate, a wafer, or a liquid crystal display (LCD). Process chamber 102 may be, for example, configured to facilitate the generation of plasma in a processing region 202 adjacent a surface of the substrate 204. The plasma may be formed by collisions between heated electrons and an ionizable gas within the processing chamber 102. The ionizable gas or a mixture of gases 210 may be introduced into the region 202 via the gas injection system 206. A control mechanism, which is not illustrated, may be used to throttle the vacuum pumping system 208. The plasma within the processing region 202 may be utilized to create materials specific to a pre-determined materials process, and to aid the deposition of material to substrate 204 or the removal of material from the exposed surfaces of substrate 204.
Substrate 204 may be, for example, transferred into and out of processing chamber 102 through a slot valve (not illustrated) and a chamber feed-through (not illustrated) via a robotic substrate transfer system. The substrate 204 may be received by substrate lift pins (not illustrated) that may be housed within the substrate holder 212 and mechanically translated by devices that are housed therein.
Substrate 204 may be, for example, affixed to the substrate holder 212 via an electrostatic clamping system 214. The substrate holder 212, may include a cooling system having a re-circulating coolant flow that receives heat from substate holder 212 and transfers the heat to a heat exchanger (not illustrated). The cooling system may also include a device 216 configured to monitor the substrate 204 and/or the substrate holder 212 temperature. The device 216 may be, for example, a thermocouple such as a K-type thermocouple. Moreover, gas may be delivered to the back-side of the substrate 204 via a backside gas system 218 to improve the gas-gap thermal conductance between the substrate 204 and the substrate holder 212. The backside gas system may be utilized when the temperature control of the substrate 204 is required to be at an elevated or reduced temperature. For example, temperature control of the substrate 204 may be useful at temperatures in excess of the steady-state temperature achieved due to a balance of the heat flux delivered to the substrate 204 from the plasma and the heat flux removed from the substrate 204 by conduction to the substrate holder 212. Heating elements, such as resistive heating elements or thermo-electric heater/coolers may also be included in the system.
Substrate holder 212 may, for example, further serve as an electrode through which RF power may be coupled to plasma in the processing region 202. As an example, substrate holder 212 may be electrically biased at a RF voltage via the transmission of RF power from the RF generator 220 through impedance match network 222 to substrate holder 212. The RF bias can serve to heat electrons and, thereby, form and maintain plasma. In this configuration, the system can operate as a reactive ion etch (RIE) reactor, wherein the chamber and upper gas injection electrode serve as ground surfaces. A typical frequency for the RF bias can range from 1 MHz to 100 MHz and can be 13.56 MHz. RF systems for plasma processing are well known to those in the art.
Alternatively, RF power may be applied to the substrate holder electrode at multiple frequencies. Furthermore, impedance match network 222 may serve to maximize the transfer of RF power to plasma in processing chamber 202 by minimizing the reflected power. Match network topologies (e.g. L-type, π-type, T-type, etc) and automatic control methods are well known to those skilled in the art.
Process gas 210, of FIG. 2, may be, for example, introduced to processing region 202 through gas injection system 206. Process gas 210 may, for example, include a mixture of gases such as argon, CF4 and O2, or argon, C4F8 and O2 for oxide etch applications. Gas injection system 206 may include a showerhead, wherein process gas 210 may be supplied from a gas delivery system (not illustrated) to the processing region 202 through a gas injection plenum (not illustrated), a series of baffle plates (not illustrated) and a multi-orifice showerhead gas injection plate (not shown). Gas injection systems are well known to those skilled in the art.
Vacuum pump system 208 may, for example, include a turbo-molecular vacuum pump (TMP) capable of a pumping speed of 5000 liters per second (and greater) and a gate valve for throttling the chamber pressure. In conventional plasma processing devices utilized for dry plasma etch, a 1000 to 3000 liter per second TMP may be employed. TMPs are useful for low pressure processing, typically less than 50 mTorr. At higher pressures, the TMP pumping speed falls off dramatically. For high pressure processing, i.e., greater than 100 mTorr, a mechanical booster pump and dry roughing pump may be used. Furthermore, a device for monitoring chamber pressure 224 may be coupled to the processing chamber 102. The pressure measuring device 224 may be, for example, a Type 628B Baratron absolute capacitance manometer that is commercially available from MKS Instruments, Inc. (Andover, Mass.).
Plasma processing system 200 may further include a metrology tool 230 configured to measure performance measurements such as, in etch systems, an etch rate, an etch selectivity (i.e. ratio of etch rate of one material to etch rate of a second material), an etch uniformity, a feature profile angle, and a critical dimension. The metrology tool 230 may be either an in-situ or ex-situ device. In the case of an in-situ device, the metrology tool 230 may be, for example, a scatterometer, incorporating beam profile ellipsometry and beam profile reflectometry. The scatterometer may be positioned within a transfer chamber (not illustrated) to analyze substrates 204 that are transferred into and out of process chamber 102. In the case of an ex-situ device, the metrology tool 230 may be, for example, a scanning electron microscope (SEM), wherein substrates have been cleaved and features are illuminated to determine the performance parameters. The metrology tool 230 may be further coupled to controller 108 to provide controller 108 with spatially resolved measurements of the performance measurements.
Controller 108 may comprise a microprocessor, memory, and a digital 1/0 port that is capable of generating control voltages sufficient to communicate and activate inputs to plasma processing system 100, as well as monitor outputs from plasma processing system 100. Moreover, controller 108 is coupled to and exchanges information with RF generator 220, impedance match network 222, gas injection system 206, vacuum pump system 208, pressure measuring device 224, backside gas delivery system 218, substrate/substrate holder temperature measurement system 216, electrostatic clamping system 214, and metrology tool 230. A program stored in the memory may be utilized to activate the inputs to the aforementioned components of a plasma processing system 200 according to a stored process.
FIG. 3 illustrates a plasma processing system 300 in accordance with another embodiment of the present invention. As illustrated in FIG. 3, the plasma processing system 200 illustrated in FIG. 2 may, for example, further include a mechanically or electrically rotating dc magnetic field system 304 to potentially increase plasma density and/or improve plasma processing uniformity, in addition to those components described with reference to FIGS. 1 and 2. Moreover, controller 108 may be coupled to rotating magnetic field system 304 to regulate the speed of rotation and field strength. The design and implementation of a rotating magnetic field is well known to those skilled in the art.
FIG. 4 illustrates a plasma processing system 400 in accordance with another embodiment of the present invention. In FIG. 4, the plasma processing system 400 is similar to the system depicted in FIGS. 1 and 2; however the system 400 of FIG. 4 may further comprise an upper electrode 402 to which RF power may be coupled from a RF generator 404 via impedance match network 222. A typical frequency for the application of power to the upper electrode 402 can range from 0.1 MHz to 30 MHz and can be 2 MHz. Moreover, controller 108 may be coupled to RF generator 404 to control the application of RF power to upper electrode 402. The design and implementation of an upper electrode is well known to those skilled in the art.
FIG. 5 illustrates a plasma processing system 500 in accordance with another embodiment of the present invention. As illustrated in FIG. 5, the system of FIG. 1 may further include an inductive coil 510 to which RF power may be coupled, via RF generator 520, to an impedance match network 530. RF power is inductively coupled from inductive coil 510 through a dielectric window (not illustrated) to plasma processing region 202. A typical frequency for the application of RF power to the inductive coil 510 may range from 10 MHz to 100 MHz and may be 13.56 MHz. Similarly, a typical frequency for the application of power to a chuck electrode may range from 0.1 MHz to 30 MHz and may be 13.56 MHz. Additionally, a slotted Faraday shield (not illustrated) may be employed to reduce capacitive coupling between the inductive coil 510 and plasma. Moreover, controller 108 may be coupled to RF generator 520 and impedance match network 530 to control the application of power to inductive coil 510. The design and implementation of an inductively coupled plasma (ICP) source is well known to those skilled in the art.
Alternatively, the plasma may be formed: using electron cyclotron resonance (ECR); from the launching of a Helicon wave; or from a propagating surface wave. Each of these plasma sources is well known to those skilled in the art.
In each of the FIGS. 1-5, substrate 204 may be processed in process chamber 102 and performance measurements may be measured utilizing, for example, the metrology tool 230. The performance measurements may include, for example, etch rate, deposition rate, etch selectivity (ratio of the rate at which at first material is etched to the rate at which a second material is etched), and etch critical dimension (e.g. length or width of feature), an etch feature anisotropy (e.g. etch feature sidewall profile), a film property (e.g., film stress, porosity, etc), a plasma density (obtained, for example, from a Langmuir probe), an ion energy (obtained, for example, from an ion energy spectrum analyzer), a concentration of a chemical specie (obtained, for example, from optical emission spectroscopy), a mask (e.g. photoresist) film thickness, a mask (e.g. photoresist) pattern critical dimension, a self-induced DC substrate bias VDC that is measurable using a voltage probe, a peak-to-peak RF substrate bias Vpp, and harmonic amplitudes of RF voltage or current. The non-linear characteristics of the plasma create harmonics that are also measured along with other performance parameters.
Processing parameters may include, for example, process pressure, Helium backside gas pressure, process gas (e.g. CF4, C4F8, O2, Ar, etc), partial pressure, process gas flow rate, upper electrode RF power, lower electrode RF power, substrate (or chuck) temperature, electrode spacing, and size of focus ring. The process pressure may be adjusted and monitored during the process using either change in, for example, the gate valve setting, or the total process gas mass flow rate, in concert with a pressure measuring device 224. The forward and reflected RF power can be adjusted and monitored using commands to the RF generator 220, the match network 222, a dual directional coupler (not illustrated) and power meters (not illustrate). Process gas partial pressures may be adjusted and monitored using a mass flow controller to regulate the flow of process gas components. The (Helium) backside gas pressure may be adjusted and monitored using the backside gas delivery system 218, which includes a pressure regulator. In addition, the substrate temperature may be monitored using temperature monitoring system 216.
Alternatively, processing parameters may include a film material viscosity, a film material surface tension, an exposure time, and a depth of focus.
Referring more particularly to measurements of the processing and performance parameters of the processing system, FIG. 6 illustrates a simplified diagram 600 of a RF signal and harmonic components, w1-w5, and magnitude measurements, x1-x5 measured in the processing chamber. In FIG. 6, the fundamental frequency 602 is labeled along the x-axis as w. Each harmonic frequency is labeled along the x-axis as w1, w2, w3, w4, w5, etc.
In order to obtain optimal performance in the process system, a response surface analysis may be used. Response surface analysis is a mathematical or graphical representation of the connection between important independent variables, controlled factors, and a dependent variable. An independent variable is a factor that is, or conceivably could be, controlled. Examples include flow rate and temperature. The value of the dependent variable is the result of the settings of one or more independent variables. In an embodiment of the present invention, a response surface method is utilized to improve or find the optimal process settings of a plasma process, such that the performance measurements obtained by the device 106 of FIG. 1 are compared against the model measurements of the response surface analysis. The response surface method is designed to estimate interaction and quadratic effects to give an idea of the shape of the response surface. Response surface models may involve just main effects and interactions or they may have quadratic or cubic terms to account for curvature. In some circumstances, a response surface may be described with only main effects and interactions, while a complete description of the process behavior may require a quadratic or cubic representation. With the response surface method, selected important factors influencing the process are varied, measurements are made on the operating capabilities of the process, and this data is analyzed to indicate the ways in which the factors may be adjusted to improve performance of the process.
The response surface method of the present invention is based upon a design of experiment (DOE) using, for example, common processing parameters from a silicon etch process. TABLE 1 provides exemplary processing parameters that may be measured and adjusted by device 104
. In TABLE 1, a valid range is formulated for each processing parameter. For example, the processing parameters may be adjusted to a high (+), medium (0), or low (−) level, based on a Box-Behnken design requiring three levels for each factor.
| ||TABLE 1 |
| || |
| || |
| ||Processing || |
| ||Parameters (Factors) ||Levels |
| || |
| ||Gap, electrode spacing ||+, 0, − |
| ||He, backside He pressure ||+, 0, − |
| ||P, process pressure ||+, 0, − |
| ||Q, total flow rate ||+, 0, − |
| ||% Q, flow rate ratio ||+, 0, − |
| ||RFb, bottom electrode RF power ||+, 0, − |
| ||RFt, top electrode RF power ||+, 0, − |
| ||T, chuck temperature ||+, 0, − |
| || |
This technique is based on systematically varying the levels of independent variables. Careful analysis of the data can provide invaluable information about how the input variables affect the response, which can result in significant improvements to products and processes.
A screening design may be used to determine a reasonable number of runs to develop a response surface based on the number of factors involved. The following is provided as an example of the determination of the reasonable runs: For a three-level, four-factor experiment, 81 (34=81) runs are needed for every possible combination of factors. Therefore, response surfaces based on more than four processing parameters use a partial combination of runs to determine the main interactions and response surface.
In one embodiment, a response surface for etch uniformity may be constructed such that data on uniformity, Vdc and power radiated at the first three RF harmonics (W1, W2, W3) are collected from an etch tool at 100 msec intervals. The 100 msec interval is chosen to correspond to typical servo response times for the system. Systems having a longer response time may have a longer sample interval; system having a shorter servo response time may require a shorter sample interval. Each element of the data set is normalized by subtracting the mean and then dividing by the standard deviation of the measured parameter. The normalized value is represented by the following equation: (Vt−Vm)/Vsd, where Vt is the measurement at time t, Vm is the mean value for a 30 second run, and Vsd is the standard deviation of the measurement. For example, if the mean Vdc measurement over the total 30 second etch interval is 234 volts, and the standard deviation of the measurements is 6.5 volts, then the normalized value is (Vt−234)/6.5. A library of response surfaces may be built from this data. The surface for each run that produces an acceptable result, acceptable uniformity, for example, may be stored in the library for comparison to data taken during later processing runs.
FIG. 7A illustrates a flow diagram for the response surface method in accordance with an embodiment of the present invention. In FIG. 7A, the process begins at P702 and continues to P704 and P706. At P704 and P706, tool performance trends are measured. The performance measurements may include, for example, critical dimension measurement (CD), peak to peak RF voltage (Vpp), and self-developed DC offset (VDC), as well as n+1 RF harmonic measurements x to xn.
The process continues to P708. At P708, the performance and harmonic measurements of P704 and P706 are compared with a response surface model. This is accomplished first by determining the optimum response surface 722 having the best fit, for example, to the measured response 720 of FIG. 7B. Next, the difference between the measured response 720 and an optimum response surface 722 is determined. This difference is used to adjust the operating parameters to move the measured response 720 towards the optimum response surface 722.
For example, a measured response 720
may be compared to a response surface model 722
by finding the surface model 722
exhibiting the best fit to the measured response surface, 720
. Then, the processing parameters are adjusted based upon a difference between the processing parameters corresponding to the best fit surface model 722
, and the processing parameters used to create the measured response 720
. The performance and harmonic measurement trends, for example, CD, Vpp
, x, x1
, . . . xn
, may be used to determine the complex relationships between the performance measurements to the processing parameters in the following manner:
- x=f(gap, He, P, Q, % Q, RFb, RFt, T);
- x1=g(gap, He, P, Q, % Q, RFb, RFt, T);
- x2=h(gap, He, P, Q, % Q, RFb, RFt, T);
- xN=m(gap, He, P, Q, % Q, RFb, RFt, T);
- CD=p(gap, He, P, Q, % Q, RFb, RFt, T);
- Vpp=q(gap, He, P, Q, % Q, RFb, RFt, T); and
- VDC=r(gap, He, P, Q, % Q, RFb, RFt, T).
Thus, at P710, based upon the nonlinearity of the performance measurements to the processing parameters of the above-identified relationships, it is possible to predict the in-situ processing parameters to cause the measured response 720 to approach the surface model 722. More particularly, it is possible to determine what needs to be done to drive or converge the performance measurements to the optimum surface.
At P712, it is determined whether the current processing parameters of P710 meet the expectations of the etching tool as determined by the predicted processing parameters. The expectations of the etching tool are based upon the requirements of the current process, such as uniformity ±3% or 99.7% yield, or <1% shading damage, etc. If the processing parameters do not meet expectations, then the process continues to P714 where adjustments to the processing parameters may be made via an advanced process control (APC) system. The process returns to P704 upon the necessary adjustments.
If the processing parameters meet expectations, then the process returns to P704 to perform the next set of measurements.
Like the response surface method of FIG. 7A, a neural network method may be used with the system of the present invention to optimize the overall process performance. A neural network usually involves a large number of processors operating in parallel, each with its own small sphere of knowledge and access to data in its memory. A neural network is “trained” or fed large amounts of data and rules about data relationships. A model of a neural network 900 is illustrated in FIG. 9.
In FIG. 9, an input layer 904 receives values from inputs 902. The input of each neuron of the input layer 904 has a weight associated with it. Each neuron also has a threshold value. If the sum of all the weighted active inputs is greater than the threshold, then the neuron is active.
The nodes are interconnected in a network that can identify patterns in data as the nodes are exposed to the data. The network is able to “learn” from each exposure and this distinguishes neural networks from traditional computing programs that simply follow instructions in a fixed sequential order. As illustrated in FIG. 9, inputs 902 to the input layer 904 can include the RF input power RFIN, biasing voltage VDC, performance parameters P3 -P5 including harmonic measurements x, x1, x2, . . . , xn and an indication of the time slice that the other inputs 902 are measured in (ts20-ts28). Middle or hidden layers 906 are provided with a variable number of nodes. The hidden layers 906 perform most of the work of the network. An output layer 908 may have multiple nodes, and as illustrated in FIG. 9, produce outputs such as Process OK and RFapplied. Each node in the hidden layers 906 is fully connected to the outputs of the input layer 904. The information learned in the hidden nodes is based upon all the inputs taken together, and permits the network to learn the interdependencies in the model. A weighted sum is performed on the inputs for each hidden node and each output node. Each summation is transformed using a nonlinear function before the value is passed to the next hidden layer or to the output layer 908.
In establishing a neural network to model the process, one may want to determine which process parameters affect performance measurements. Multiple performance measurements may be collected. Data on uniformity, Vdc and power radiated at the first three RF harmonics (W1
), for example, are collected from an etch tool at 100 msec intervals. The 100 msec interval is chosen to correspond to typical servo response times for the system. Systems having a longer response time may have a longer sample interval; while systems having a shorter servo response time may require a shorter sample interval. Each element of the data set is normalized by subtracting the mean and then dividing by the standard deviation of the measured parameter. The normalized value is determined by the following equation: (Vt−Vm)/Vsd, where Vt is the measurement at time t, Vm is the mean value for a 30 second run, and Vsd is the standard deviation of the measurement. For example, if the mean Vdc measurement over the total 30 second etch interval is 234 volts, and the standard deviation of the measurements is 6.5 volts, then the normalized value is (Vt−234)/6.5. After a statistically significant number of process runs, for example 100, the data are analyzed by computing the eigensolution of the covariance matrix for corresponding time slices of each process run. An example of the computation of the eigensolution of the covariance matrix, using actual data, is provided by the following:
- Compute and sort eigenvalues:
In this example it is evident that the 3rd, 4th and 5th harmonics are not related to the measurement of interest and will therefore not be considered for this time slice. Each time slice is examined and all relevant harmonics are selected for generation of the response surface. Further time slice analysis of the example data indicates that the 2nd and 3rd harmonics are of interest and the 4th and 5th harmonics may be discarded.
During the “training” of the network, the network is repeatedly shown observations from available data related to problems that need to be solved. For example, a Back Propagation (BP) network learns by example, that is, a learning set is provided that consists of some input examples and the known-correct output for each case.
The BP learning process works in small iterative steps: such that one of the input example cases is applied to the network, and the network produces output based on the current state of it's synaptic weights (initially, the output will be random). This output is compared to the known-good output, and a mean-squared error signal is calculated. The error value is then propagated backwards through the network, and small changes are made to the weights in each layer. The weight changes are calculated to reduce the error signal for the example case in question. The whole process is repeated for each of the example cases, then back to the first case again, and so on. The cycle is repeated until the overall error value drops below some pre-determined threshold. At this point the network has learned the problem, noting that the network will never exactly learn the ideal function, but rather will asymptotically approach the ideal function.
FIG. 11 illustrates an example in the modification of the neuron connection weights in accordance with the present invention. In FIG. 11 (I1, I2), (H1, H2), and (O1, O2) are designated as the inputs hidden-layer outputs 1104 and output-layer outputs respectively of a (2,2,2) Back Propagation network. The outputs of Hidden Node 1 and 2 are given by
The output-layer outputs are given by
or, using (1) and (2),
Based upon the above equations, it's possible to calculate an output given a particular set of inputs. This allows the Mean Squared Error (MSE) to be calculated between the actual output and the desired output for the given input in this training example. The MSE is the average of the squares of the difference between the desired output and the current result. Since we are interested in the shape of the error curve rather than the precise MSE function, there is no need to divide by the number of outputs, and the minimization algorithm will still find the correct minimum. Thus, the error function can be formally written as
or, using (6) and (7),
where Dk is the kth desired output.
As an example, suppose we have in the output 0.75 and 0.05 and the desired outputs 0.9 and 0.1. The (true) MSE is now ((0.9−0.75)2+(0.1−0.05)2)/2, which is equal to 0.0125 (Note: in the BPN algorithm we wouldn't need to divide by N). Clearly, for any given training example, this value is a function only of the weights of the network. So, to reduce the error, we can try to move to the lowest point on this surface. To find this point, it is necessary to calculate the gradient of the error function with respect to each network weight. One may then move each weight slightly in the opposite direction to the gradient—if the surface is sloping upwards in a particular direction, the weights may be adjusted so that the point on the error surface moves downwards.
The gradient is fairly straightforward to calculate, due to the convenient fact that the derivative of the sigmoid function can be expressed in terms of the function itself:
The gradient is defined as the vector of partial derivatives of the multivariate function with respect to each of variable. Because the error is a function of the network outputs, first it's necessary to calculate a set of partial derivatives for each output node with respect to each associated connection weight. This turns out to be trivial, since all other variables but the one of interest are held constant when the partial derivative is calculated. Thus, only one linear term is left in the calculation of the partial derivative of the output, and leaving the coefficient the equation is:
Now, the gradient of the error function can be calculated
The expression −2(Dn−On)((1−sgm(So))sgm(So)) is denoted δn o.
The new values for the network weights are calculated by multiplying the negative gradient with a step size parameter (called the learning rate) and adding the resultant vector to the vector of network weights attached to the current layer. This change does not take place, however, until after the hidden-layer weights are updated as well, since this would corrupt the weight-update procedure for the hidden layer.
Clearly, the error at the output will be affected by the weights at the hidden layer, too. However, the relationship is more complicated. A new gradient is derived, but this time the output weights are treated as constants rather than the hidden-layer weights. Now, the actual output is a function of the weights attached to the hidden layer only (and in a generic network there are LM of those, for L input nodes and M middle-layer nodes). This relationship may be expressed as:
The hidden layer weights of FIG. 11 may be updated using the same procedure as the output layer, and the output layer weights. This completes the training cycle for one piece of training data in the neural network.
It should be noted that the input layer of FIG. 11 is really only a buffer to hold the input vector. Therefore, it has no weights which need to be modified. However, in a more generic network, one may have more than one hidden layer. Again, the update procedure is quite similar. Once the modifications have been calculated, all weights (hidden and output) may be updated.
Also, the above description assumes a (2, 2, 2) Back Propagation network. The only difference in the mathematics resulting from a larger network are longer summations. All of the principles would remain the same.
FIG. 8 illustrates a flow diagram demonstrating the data collection and APC modifications of a plasma process based on the neural network model in the run mode. The process begins at P802 and continues to P804 and P806.
At P804, multiple performance measurements are collected. The multiple performance measurements may include performance measurements CD, Vpp, and VDC. At P806, a determined number of RF harmonic measurements are taken on a processing tool which may be any semiconductor manufacturing tool, including, without limitation, an etch tool of a plasma process, a photo resistive tool or a patterning tool. These measurements may be used as inputs in the input layer 904.
At P808, the performance measurements of P804 and the n harmonic measurements of P806 are placed in the input layer 904 of the neural network model illustrated in FIG. 9.
At P810, the neural network 900 is used to predict whether the performance measurements will result in a successful process or not as determined by outputs 910. The process continues to P812.
At P812, it is determined whether the predicted results of the processing tool are acceptable or not. If the predicted result is not acceptable, then a change is needed. The process continues to P814.
At P814, adjustments are made to the process parameters through an advanced process control (APC). The process continues to P816.
If the predicted results of the processing tool are acceptable, no change is needed. The process continues to P816.
At P816, it is determined whether the time window has expired. The time window may extend for an entire run or for a predetermined time period. If the time window has expired, then the process continues to P818.
At P818, the neural network model weights are updated based on the newly collected data and the process continues back to P804. As discussed above, the new values for the network weights are calculated by multiplying the negative gradient with a step size parameter (i.e., the learning rate) and adding the resultant vector to the vector of network weights attached to the current layer.
If the time window has not expired, then the process returns to P804.
FIG. 10 illustrates a system in accordance with the present invention. System 1000 includes a processing tool 1002 that is connected to an advanced process control (APC) server 1004. Data from performance measurements 1010 of the processing tool may be sent directly to a data collection hub 1008 located within the APC server. The performance measurements data from the data collection hub 1008 may be sent to a module 1006 that utilizes a response surface method (RSM) or neural network (NN) method, as disclosed in FIGS. 7A and 8. A control computer 1012 collects tool events, and controls recipe setpoints and maintenance counters, based upon suggestions from the APC server 1004.
The foregoing description of the embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible consistent with the above teachings or may be acquired from practice of the invention. For example, the various features of the invention, which are described in the contexts of separate embodiments for the purposes of clarity, may also be combined in a single embodiment. Conversely, the various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination. Accordingly, persons skilled in the art will appreciate that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention is defined only by the attached claims and their equivalents.