US 20050199341 A1 Abstract A method and system for analyzing multivariate data of plasma processes in which response surface and neural networks are utilized to improve or find optimal process settings of the plasma process such that performance measurements are compared against model measurements to modify a current plasma process to achieve optimized processing performance.
Claims(13) 1. A system for monitoring and analyzing multivariate plasma processing data, comprising:
a processing chamber configured to facilitate generation of plasma in a processing region; a first mechanism configured to measure at least one processing parameter; a second mechanism configured to measure at least one performance parameter; a device configured to analyze data from the first mechanism and the second mechanism based upon a mathematical model to improve performance of the system; and a device configured to adjust the at least one processing parameter based upon performance parameters applied to the mathematical model. 2. The system of 3. The system of a substrate positioned for processing; a gas injection mechanism configured to introduce process gas into the processing region; and a vacuum pumping mechanism, wherein the at least one performance parameters includes a characteristic of the process gas and pressure in the processing chamber. 4. The system of 5. The system of 6. The system of an RF generator configured to couple energy to the processing chamber to generate plasma in the processing region; a gas injection mechanism coupled to the processing chamber; and a pressure measuring device coupled to the processing chamber and configured to measure pressure in the processing chamber, wherein the at least one processing parameters includes characteristics of process gas introduced to the processing region by gas injection mechanism. 7. The system of 8. The system of 9. A method for monitoring and analyzing multivariate data from a plasma process, comprising:
measuring performance parameters of the plasma process; measuring a number of harmonic parameters of the plasma process; determining a mathematical model of the plasma process; applying the measured performance parameters and harmonic parameters to the mathematical model of the plasma process; predicting process parameters based upon the performance parameters, harmonic parameters, and the applied mathematical model; and adjusting the processing parameters in response to the applied mathematical model. 10. The method of 11. The method of 12. The method of 13. The method of Description This is a continuation of International Application No. PCT/US03/30741, filed Sep. 30, 2003, which relies for priority upon U.S. Provisional Application No. 60/414,656, filed Oct. 1, 2002, the contents of both of which are incorporated herein by reference in their entireties. 1. Field of the Invention The invention generally relates to the field of plasma processing. More particularly, the invention relates to monitoring and analyzing process parameters in a plasma processing facility. 2. Background Information Throughout the various stages of plasma processing, such as semiconductor or display manufacturing, etc., critical process parameters may vary significantly. Processing conditions change over time with the slightest changes in critical process parameters creating undesirable results. Small changes can easily occur in the composition or pressure of an etch gas, process chamber, or wafer temperature. As such, plasma processing facilities require constant monitoring. The measuring and monitoring of these process parameters at any given time permits valuable data to be accumulated and analyzed. Process control feedback may be used to adjust the process parameters or determine the viability of certain process materials. However, in many cases, changes of process data reflecting deterioration of processing characteristics cannot be detected by simply referring to the process data displayed. It is difficult to detect early stage abnormalities and characteristic deterioration of a process, and often may be necessary to obtain prediction and pattern recognition by an Advanced Process Control (APC). Computers are generally used to control, monitor, and analyze manufacturing processes, due to the various complexities that may occur in a semiconductor manufacturing plant from the reentrant wafer flows, critical processing steps, and maintenance of the processes. However, the software implemented with the data collection system is not applicable to a variety of plasma processes and does not optimize accumulation and analysis of the data and the resulting processing relationships. In an embodiment of the present invention, a system and method that is broadly applicable to a variety of plasma processes is utilized to determine a relationship between process parameters and performance measurements. The interaction between harmonic and other measurements and process variables is determined and matched to the current plasma process to achieve desired process results. The above and other features of the present invention are further described in the detailed description which follows, with reference to the drawings, and by way of a non-limiting exemplary embodiment of the present invention, wherein like reference numerals represent similar parts of the present invention throughout the several views and wherein: The following detailed description of the present invention refers to the accompanying drawings that illustrate exemplary embodiments consistent with this invention. Other embodiments are possible and modifications may be made to the embodiments without departing from the spirit and scope of the invention. Therefore, the following detailed description is not meant to limit the invention. Rather the scope of the invention is defined by the appended claims. Referring now more particularly to the drawings, The plasma processing system Substrate Substrate Substrate holder Alternatively, RF power may be applied to the substrate holder electrode at multiple frequencies. Furthermore, impedance match network Process gas Vacuum pump system Plasma processing system Controller Alternatively, the plasma may be formed: using electron cyclotron resonance (ECR); from the launching of a Helicon wave; or from a propagating surface wave. Each of these plasma sources is well known to those skilled in the art. In each of the Processing parameters may include, for example, process pressure, Helium backside gas pressure, process gas (e.g. CF Alternatively, processing parameters may include a film material viscosity, a film material surface tension, an exposure time, and a depth of focus. Referring more particularly to measurements of the processing and performance parameters of the processing system, In order to obtain optimal performance in the process system, a response surface analysis may be used. Response surface analysis is a mathematical or graphical representation of the connection between important independent variables, controlled factors, and a dependent variable. An independent variable is a factor that is, or conceivably could be, controlled. Examples include flow rate and temperature. The value of the dependent variable is the result of the settings of one or more independent variables. In an embodiment of the present invention, a response surface method is utilized to improve or find the optimal process settings of a plasma process, such that the performance measurements obtained by the device The response surface method of the present invention is based upon a design of experiment (DOE) using, for example, common processing parameters from a silicon etch process. TABLE 1 provides exemplary processing parameters that may be measured and adjusted by device
This technique is based on systematically varying the levels of independent variables. Careful analysis of the data can provide invaluable information about how the input variables affect the response, which can result in significant improvements to products and processes. A screening design may be used to determine a reasonable number of runs to develop a response surface based on the number of factors involved. The following is provided as an example of the determination of the reasonable runs: For a three-level, four-factor experiment, 81 (3 In one embodiment, a response surface for etch uniformity may be constructed such that data on uniformity, Vdc and power radiated at the first three RF harmonics (W The process continues to P For example, a measured response -
- x=f(gap, He, P, Q, % Q, RF
_{b}, RF_{t}, T); - x
_{1}=g(gap, He, P, Q, % Q, RF_{b}, RF_{t}, T); - x
_{2}=h(gap, He, P, Q, % Q, RF_{b}, RF_{t}, T); - x
_{N}=m(gap, He, P, Q, % Q, RF_{b}, RF_{t}, T); - CD=p(gap, He, P, Q, % Q, RF
_{b}, RF_{t}, T); - V
_{pp}=q(gap, He, P, Q, % Q, RF_{b}, RF_{t}, T); and - V
_{DC}=r(gap, He, P, Q, % Q, RF_{b}, RF_{t}, T).
- x=f(gap, He, P, Q, % Q, RF
Thus, at P At P If the processing parameters meet expectations, then the process returns to P Like the response surface method of In The nodes are interconnected in a network that can identify patterns in data as the nodes are exposed to the data. The network is able to “learn” from each exposure and this distinguishes neural networks from traditional computing programs that simply follow instructions in a fixed sequential order. As illustrated in In establishing a neural network to model the process, one may want to determine which process parameters affect performance measurements. Multiple performance measurements may be collected. Data on uniformity, Vdc and power radiated at the first three RF harmonics (W -
- S:=Covar(B)
$S=\left(\begin{array}{ccccc}0.222& 0.064& -0.1& -0.013& -1.128\times {10}^{-3}\\ 0.064& 0.137& -0.107& -0.014& -1.929\times {10}^{-3}\\ -0.1& -0.107& 0.097& 0.013& 1.565\times {10}^{-3}\\ -0.013& -0.014& 0.013& 1.653\times {10}^{-3}& 2.055\times {10}^{-4}\\ -1.128\times {10}^{-3}& -1.929\times {10}^{-3}& 1.565\times {10}^{-3}& 2.055\times {10}^{-4}& 2.737\times {10}^{-5}\end{array}\right)$ - Compute and sort eigenvalues:
- V:=reverse(sort(eigenvals(S)))
$V=\left(\begin{array}{c}0.34\\ 0.117\\ 0\\ 0\\ 0\end{array}\right)\text{\hspace{1em}}\mathrm{Pct}:=\frac{V\xb7100}{\sum V}\text{\hspace{1em}}\mathrm{Pct}=\left(\begin{array}{c}74.461\\ 25.539\\ 1.821\times {10}^{-14}\\ 2.759\times {10}^{-15}\\ -9.291\times {10}^{-15}\end{array}\right)$
- S:=Covar(B)
In this example it is evident that the 3 During the “training” of the network, the network is repeatedly shown observations from available data related to problems that need to be solved. For example, a Back Propagation (BP) network learns by example, that is, a learning set is provided that consists of some input examples and the known-correct output for each case. The BP learning process works in small iterative steps: such that one of the input example cases is applied to the network, and the network produces output based on the current state of it's synaptic weights (initially, the output will be random). This output is compared to the known-good output, and a mean-squared error signal is calculated. The error value is then propagated backwards through the network, and small changes are made to the weights in each layer. The weight changes are calculated to reduce the error signal for the example case in question. The whole process is repeated for each of the example cases, then back to the first case again, and so on. The cycle is repeated until the overall error value drops below some pre-determined threshold. At this point the network has learned the problem, noting that the network will never exactly learn the ideal function, but rather will asymptotically approach the ideal function. Based upon the above equations, it's possible to calculate an output given a particular set of inputs. This allows the Mean Squared Error (MSE) to be calculated between the actual output and the desired output for the given input in this training example. The MSE is the average of the squares of the difference between the desired output and the current result. Since we are interested in the shape of the error curve rather than the precise MSE function, there is no need to divide by the number of outputs, and the minimization algorithm will still find the correct minimum. Thus, the error function can be formally written as
As an example, suppose we have in the output 0.75 and 0.05 and the desired outputs 0.9 and 0.1. The (true) MSE is now ((0.9−0.75) The gradient is fairly straightforward to calculate, due to the convenient fact that the derivative of the sigmoid function can be expressed in terms of the function itself:
The gradient is defined as the vector of partial derivatives of the multivariate function with respect to each of variable. Because the error is a function of the network outputs, first it's necessary to calculate a set of partial derivatives for each output node with respect to each associated connection weight. This turns out to be trivial, since all other variables but the one of interest are held constant when the partial derivative is calculated. Thus, only one linear term is left in the calculation of the partial derivative of the output, and leaving the coefficient the equation is:
The new values for the network weights are calculated by multiplying the negative gradient with a step size parameter (called the learning rate) and adding the resultant vector to the vector of network weights attached to the current layer. This change does not take place, however, until after the hidden-layer weights are updated as well, since this would corrupt the weight-update procedure for the hidden layer. Clearly, the error at the output will be affected by the weights at the hidden layer, too. However, the relationship is more complicated. A new gradient is derived, but this time the output weights are treated as constants rather than the hidden-layer weights. Now, the actual output is a function of the weights attached to the hidden layer only (and in a generic network there are LM of those, for L input nodes and M middle-layer nodes). This relationship may be expressed as:
The hidden layer weights of It should be noted that the input layer of Also, the above description assumes a (2, 2, 2) Back Propagation network. The only difference in the mathematics resulting from a larger network are longer summations. All of the principles would remain the same. At P At P At P At P At P If the predicted results of the processing tool are acceptable, no change is needed. The process continues to P At P At P If the time window has not expired, then the process returns to P The foregoing description of the embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible consistent with the above teachings or may be acquired from practice of the invention. For example, the various features of the invention, which are described in the contexts of separate embodiments for the purposes of clarity, may also be combined in a single embodiment. Conversely, the various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination. Accordingly, persons skilled in the art will appreciate that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention is defined only by the attached claims and their equivalents. Referenced by
Classifications
Legal Events
Rotate |