US 20050199341 A1
A method and system for analyzing multivariate data of plasma processes in which response surface and neural networks are utilized to improve or find optimal process settings of the plasma process such that performance measurements are compared against model measurements to modify a current plasma process to achieve optimized processing performance.
1. A system for monitoring and analyzing multivariate plasma processing data, comprising:
a processing chamber configured to facilitate generation of plasma in a processing region;
a first mechanism configured to measure at least one processing parameter;
a second mechanism configured to measure at least one performance parameter;
a device configured to analyze data from the first mechanism and the second mechanism based upon a mathematical model to improve performance of the system; and
a device configured to adjust the at least one processing parameter based upon performance parameters applied to the mathematical model.
2. The system of
3. The system of
a substrate positioned for processing;
a gas injection mechanism configured to introduce process gas into the processing region; and
a vacuum pumping mechanism, wherein the at least one performance parameters includes a characteristic of the process gas and pressure in the processing chamber.
4. The system of
5. The system of
6. The system of
an RF generator configured to couple energy to the processing chamber to generate plasma in the processing region;
a gas injection mechanism coupled to the processing chamber; and
a pressure measuring device coupled to the processing chamber and configured to measure pressure in the processing chamber,
wherein the at least one processing parameters includes characteristics of process gas introduced to the processing region by gas injection mechanism.
7. The system of
8. The system of
9. A method for monitoring and analyzing multivariate data from a plasma process, comprising:
measuring performance parameters of the plasma process;
measuring a number of harmonic parameters of the plasma process;
determining a mathematical model of the plasma process;
applying the measured performance parameters and harmonic parameters to the mathematical model of the plasma process;
predicting process parameters based upon the performance parameters, harmonic parameters, and the applied mathematical model; and
adjusting the processing parameters in response to the applied mathematical model.
10. The method of
11. The method of
12. The method of
13. The method of
This is a continuation of International Application No. PCT/US03/30741, filed Sep. 30, 2003, which relies for priority upon U.S. Provisional Application No. 60/414,656, filed Oct. 1, 2002, the contents of both of which are incorporated herein by reference in their entireties.
1. Field of the Invention
The invention generally relates to the field of plasma processing. More particularly, the invention relates to monitoring and analyzing process parameters in a plasma processing facility.
2. Background Information
Throughout the various stages of plasma processing, such as semiconductor or display manufacturing, etc., critical process parameters may vary significantly. Processing conditions change over time with the slightest changes in critical process parameters creating undesirable results. Small changes can easily occur in the composition or pressure of an etch gas, process chamber, or wafer temperature. As such, plasma processing facilities require constant monitoring.
The measuring and monitoring of these process parameters at any given time permits valuable data to be accumulated and analyzed. Process control feedback may be used to adjust the process parameters or determine the viability of certain process materials. However, in many cases, changes of process data reflecting deterioration of processing characteristics cannot be detected by simply referring to the process data displayed. It is difficult to detect early stage abnormalities and characteristic deterioration of a process, and often may be necessary to obtain prediction and pattern recognition by an Advanced Process Control (APC).
Computers are generally used to control, monitor, and analyze manufacturing processes, due to the various complexities that may occur in a semiconductor manufacturing plant from the reentrant wafer flows, critical processing steps, and maintenance of the processes. However, the software implemented with the data collection system is not applicable to a variety of plasma processes and does not optimize accumulation and analysis of the data and the resulting processing relationships.
In an embodiment of the present invention, a system and method that is broadly applicable to a variety of plasma processes is utilized to determine a relationship between process parameters and performance measurements. The interaction between harmonic and other measurements and process variables is determined and matched to the current plasma process to achieve desired process results.
The above and other features of the present invention are further described in the detailed description which follows, with reference to the drawings, and by way of a non-limiting exemplary embodiment of the present invention, wherein like reference numerals represent similar parts of the present invention throughout the several views and wherein:
The following detailed description of the present invention refers to the accompanying drawings that illustrate exemplary embodiments consistent with this invention. Other embodiments are possible and modifications may be made to the embodiments without departing from the spirit and scope of the invention. Therefore, the following detailed description is not meant to limit the invention. Rather the scope of the invention is defined by the appended claims.
Referring now more particularly to the drawings,
The plasma processing system 100 of
Substrate 204 may be, for example, transferred into and out of processing chamber 102 through a slot valve (not illustrated) and a chamber feed-through (not illustrated) via a robotic substrate transfer system. The substrate 204 may be received by substrate lift pins (not illustrated) that may be housed within the substrate holder 212 and mechanically translated by devices that are housed therein.
Substrate 204 may be, for example, affixed to the substrate holder 212 via an electrostatic clamping system 214. The substrate holder 212, may include a cooling system having a re-circulating coolant flow that receives heat from substate holder 212 and transfers the heat to a heat exchanger (not illustrated). The cooling system may also include a device 216 configured to monitor the substrate 204 and/or the substrate holder 212 temperature. The device 216 may be, for example, a thermocouple such as a K-type thermocouple. Moreover, gas may be delivered to the back-side of the substrate 204 via a backside gas system 218 to improve the gas-gap thermal conductance between the substrate 204 and the substrate holder 212. The backside gas system may be utilized when the temperature control of the substrate 204 is required to be at an elevated or reduced temperature. For example, temperature control of the substrate 204 may be useful at temperatures in excess of the steady-state temperature achieved due to a balance of the heat flux delivered to the substrate 204 from the plasma and the heat flux removed from the substrate 204 by conduction to the substrate holder 212. Heating elements, such as resistive heating elements or thermo-electric heater/coolers may also be included in the system.
Substrate holder 212 may, for example, further serve as an electrode through which RF power may be coupled to plasma in the processing region 202. As an example, substrate holder 212 may be electrically biased at a RF voltage via the transmission of RF power from the RF generator 220 through impedance match network 222 to substrate holder 212. The RF bias can serve to heat electrons and, thereby, form and maintain plasma. In this configuration, the system can operate as a reactive ion etch (RIE) reactor, wherein the chamber and upper gas injection electrode serve as ground surfaces. A typical frequency for the RF bias can range from 1 MHz to 100 MHz and can be 13.56 MHz. RF systems for plasma processing are well known to those in the art.
Alternatively, RF power may be applied to the substrate holder electrode at multiple frequencies. Furthermore, impedance match network 222 may serve to maximize the transfer of RF power to plasma in processing chamber 202 by minimizing the reflected power. Match network topologies (e.g. L-type, π-type, T-type, etc) and automatic control methods are well known to those skilled in the art.
Process gas 210, of
Vacuum pump system 208 may, for example, include a turbo-molecular vacuum pump (TMP) capable of a pumping speed of 5000 liters per second (and greater) and a gate valve for throttling the chamber pressure. In conventional plasma processing devices utilized for dry plasma etch, a 1000 to 3000 liter per second TMP may be employed. TMPs are useful for low pressure processing, typically less than 50 mTorr. At higher pressures, the TMP pumping speed falls off dramatically. For high pressure processing, i.e., greater than 100 mTorr, a mechanical booster pump and dry roughing pump may be used. Furthermore, a device for monitoring chamber pressure 224 may be coupled to the processing chamber 102. The pressure measuring device 224 may be, for example, a Type 628B Baratron absolute capacitance manometer that is commercially available from MKS Instruments, Inc. (Andover, Mass.).
Plasma processing system 200 may further include a metrology tool 230 configured to measure performance measurements such as, in etch systems, an etch rate, an etch selectivity (i.e. ratio of etch rate of one material to etch rate of a second material), an etch uniformity, a feature profile angle, and a critical dimension. The metrology tool 230 may be either an in-situ or ex-situ device. In the case of an in-situ device, the metrology tool 230 may be, for example, a scatterometer, incorporating beam profile ellipsometry and beam profile reflectometry. The scatterometer may be positioned within a transfer chamber (not illustrated) to analyze substrates 204 that are transferred into and out of process chamber 102. In the case of an ex-situ device, the metrology tool 230 may be, for example, a scanning electron microscope (SEM), wherein substrates have been cleaved and features are illuminated to determine the performance parameters. The metrology tool 230 may be further coupled to controller 108 to provide controller 108 with spatially resolved measurements of the performance measurements.
Controller 108 may comprise a microprocessor, memory, and a digital 1/0 port that is capable of generating control voltages sufficient to communicate and activate inputs to plasma processing system 100, as well as monitor outputs from plasma processing system 100. Moreover, controller 108 is coupled to and exchanges information with RF generator 220, impedance match network 222, gas injection system 206, vacuum pump system 208, pressure measuring device 224, backside gas delivery system 218, substrate/substrate holder temperature measurement system 216, electrostatic clamping system 214, and metrology tool 230. A program stored in the memory may be utilized to activate the inputs to the aforementioned components of a plasma processing system 200 according to a stored process.
Alternatively, the plasma may be formed: using electron cyclotron resonance (ECR); from the launching of a Helicon wave; or from a propagating surface wave. Each of these plasma sources is well known to those skilled in the art.
In each of the
Processing parameters may include, for example, process pressure, Helium backside gas pressure, process gas (e.g. CF4, C4F8, O2, Ar, etc), partial pressure, process gas flow rate, upper electrode RF power, lower electrode RF power, substrate (or chuck) temperature, electrode spacing, and size of focus ring. The process pressure may be adjusted and monitored during the process using either change in, for example, the gate valve setting, or the total process gas mass flow rate, in concert with a pressure measuring device 224. The forward and reflected RF power can be adjusted and monitored using commands to the RF generator 220, the match network 222, a dual directional coupler (not illustrated) and power meters (not illustrate). Process gas partial pressures may be adjusted and monitored using a mass flow controller to regulate the flow of process gas components. The (Helium) backside gas pressure may be adjusted and monitored using the backside gas delivery system 218, which includes a pressure regulator. In addition, the substrate temperature may be monitored using temperature monitoring system 216.
Alternatively, processing parameters may include a film material viscosity, a film material surface tension, an exposure time, and a depth of focus.
Referring more particularly to measurements of the processing and performance parameters of the processing system,
In order to obtain optimal performance in the process system, a response surface analysis may be used. Response surface analysis is a mathematical or graphical representation of the connection between important independent variables, controlled factors, and a dependent variable. An independent variable is a factor that is, or conceivably could be, controlled. Examples include flow rate and temperature. The value of the dependent variable is the result of the settings of one or more independent variables. In an embodiment of the present invention, a response surface method is utilized to improve or find the optimal process settings of a plasma process, such that the performance measurements obtained by the device 106 of
The response surface method of the present invention is based upon a design of experiment (DOE) using, for example, common processing parameters from a silicon etch process. TABLE 1 provides exemplary processing parameters that may be measured and adjusted by device 104. In TABLE 1, a valid range is formulated for each processing parameter. For example, the processing parameters may be adjusted to a high (+), medium (0), or low (−) level, based on a Box-Behnken design requiring three levels for each factor.
A screening design may be used to determine a reasonable number of runs to develop a response surface based on the number of factors involved. The following is provided as an example of the determination of the reasonable runs: For a three-level, four-factor experiment, 81 (34=81) runs are needed for every possible combination of factors. Therefore, response surfaces based on more than four processing parameters use a partial combination of runs to determine the main interactions and response surface.
In one embodiment, a response surface for etch uniformity may be constructed such that data on uniformity, Vdc and power radiated at the first three RF harmonics (W1, W2, W3) are collected from an etch tool at 100 msec intervals. The 100 msec interval is chosen to correspond to typical servo response times for the system. Systems having a longer response time may have a longer sample interval; system having a shorter servo response time may require a shorter sample interval. Each element of the data set is normalized by subtracting the mean and then dividing by the standard deviation of the measured parameter. The normalized value is represented by the following equation: (Vt−Vm)/Vsd, where Vt is the measurement at time t, Vm is the mean value for a 30 second run, and Vsd is the standard deviation of the measurement. For example, if the mean Vdc measurement over the total 30 second etch interval is 234 volts, and the standard deviation of the measurements is 6.5 volts, then the normalized value is (Vt−234)/6.5. A library of response surfaces may be built from this data. The surface for each run that produces an acceptable result, acceptable uniformity, for example, may be stored in the library for comparison to data taken during later processing runs.
The process continues to P708. At P708, the performance and harmonic measurements of P704 and P706 are compared with a response surface model. This is accomplished first by determining the optimum response surface 722 having the best fit, for example, to the measured response 720 of
For example, a measured response 720 may be compared to a response surface model 722 by finding the surface model 722 exhibiting the best fit to the measured response surface, 720. Then, the processing parameters are adjusted based upon a difference between the processing parameters corresponding to the best fit surface model 722, and the processing parameters used to create the measured response 720. The performance and harmonic measurement trends, for example, CD, Vpp, VDC, x, x1, x2, . . . xn, may be used to determine the complex relationships between the performance measurements to the processing parameters in the following manner:
Thus, at P710, based upon the nonlinearity of the performance measurements to the processing parameters of the above-identified relationships, it is possible to predict the in-situ processing parameters to cause the measured response 720 to approach the surface model 722. More particularly, it is possible to determine what needs to be done to drive or converge the performance measurements to the optimum surface.
At P712, it is determined whether the current processing parameters of P710 meet the expectations of the etching tool as determined by the predicted processing parameters. The expectations of the etching tool are based upon the requirements of the current process, such as uniformity ±3% or 99.7% yield, or <1% shading damage, etc. If the processing parameters do not meet expectations, then the process continues to P714 where adjustments to the processing parameters may be made via an advanced process control (APC) system. The process returns to P704 upon the necessary adjustments.
If the processing parameters meet expectations, then the process returns to P704 to perform the next set of measurements.
Like the response surface method of
The nodes are interconnected in a network that can identify patterns in data as the nodes are exposed to the data. The network is able to “learn” from each exposure and this distinguishes neural networks from traditional computing programs that simply follow instructions in a fixed sequential order. As illustrated in
In establishing a neural network to model the process, one may want to determine which process parameters affect performance measurements. Multiple performance measurements may be collected. Data on uniformity, Vdc and power radiated at the first three RF harmonics (W1, W2, W3), for example, are collected from an etch tool at 100 msec intervals. The 100 msec interval is chosen to correspond to typical servo response times for the system. Systems having a longer response time may have a longer sample interval; while systems having a shorter servo response time may require a shorter sample interval. Each element of the data set is normalized by subtracting the mean and then dividing by the standard deviation of the measured parameter. The normalized value is determined by the following equation: (Vt−Vm)/Vsd, where Vt is the measurement at time t, Vm is the mean value for a 30 second run, and Vsd is the standard deviation of the measurement. For example, if the mean Vdc measurement over the total 30 second etch interval is 234 volts, and the standard deviation of the measurements is 6.5 volts, then the normalized value is (Vt−234)/6.5. After a statistically significant number of process runs, for example 100, the data are analyzed by computing the eigensolution of the covariance matrix for corresponding time slices of each process run. An example of the computation of the eigensolution of the covariance matrix, using actual data, is provided by the following:
In this example it is evident that the 3rd, 4th and 5th harmonics are not related to the measurement of interest and will therefore not be considered for this time slice. Each time slice is examined and all relevant harmonics are selected for generation of the response surface. Further time slice analysis of the example data indicates that the 2nd and 3rd harmonics are of interest and the 4th and 5th harmonics may be discarded.
During the “training” of the network, the network is repeatedly shown observations from available data related to problems that need to be solved. For example, a Back Propagation (BP) network learns by example, that is, a learning set is provided that consists of some input examples and the known-correct output for each case.
The BP learning process works in small iterative steps: such that one of the input example cases is applied to the network, and the network produces output based on the current state of it's synaptic weights (initially, the output will be random). This output is compared to the known-good output, and a mean-squared error signal is calculated. The error value is then propagated backwards through the network, and small changes are made to the weights in each layer. The weight changes are calculated to reduce the error signal for the example case in question. The whole process is repeated for each of the example cases, then back to the first case again, and so on. The cycle is repeated until the overall error value drops below some pre-determined threshold. At this point the network has learned the problem, noting that the network will never exactly learn the ideal function, but rather will asymptotically approach the ideal function.
Based upon the above equations, it's possible to calculate an output given a particular set of inputs. This allows the Mean Squared Error (MSE) to be calculated between the actual output and the desired output for the given input in this training example. The MSE is the average of the squares of the difference between the desired output and the current result. Since we are interested in the shape of the error curve rather than the precise MSE function, there is no need to divide by the number of outputs, and the minimization algorithm will still find the correct minimum. Thus, the error function can be formally written as
As an example, suppose we have in the output 0.75 and 0.05 and the desired outputs 0.9 and 0.1. The (true) MSE is now ((0.9−0.75)2+(0.1−0.05)2)/2, which is equal to 0.0125 (Note: in the BPN algorithm we wouldn't need to divide by N). Clearly, for any given training example, this value is a function only of the weights of the network. So, to reduce the error, we can try to move to the lowest point on this surface. To find this point, it is necessary to calculate the gradient of the error function with respect to each network weight. One may then move each weight slightly in the opposite direction to the gradient—if the surface is sloping upwards in a particular direction, the weights may be adjusted so that the point on the error surface moves downwards.
The gradient is fairly straightforward to calculate, due to the convenient fact that the derivative of the sigmoid function can be expressed in terms of the function itself:
The gradient is defined as the vector of partial derivatives of the multivariate function with respect to each of variable. Because the error is a function of the network outputs, first it's necessary to calculate a set of partial derivatives for each output node with respect to each associated connection weight. This turns out to be trivial, since all other variables but the one of interest are held constant when the partial derivative is calculated. Thus, only one linear term is left in the calculation of the partial derivative of the output, and leaving the coefficient the equation is:
The new values for the network weights are calculated by multiplying the negative gradient with a step size parameter (called the learning rate) and adding the resultant vector to the vector of network weights attached to the current layer. This change does not take place, however, until after the hidden-layer weights are updated as well, since this would corrupt the weight-update procedure for the hidden layer.
Clearly, the error at the output will be affected by the weights at the hidden layer, too. However, the relationship is more complicated. A new gradient is derived, but this time the output weights are treated as constants rather than the hidden-layer weights. Now, the actual output is a function of the weights attached to the hidden layer only (and in a generic network there are LM of those, for L input nodes and M middle-layer nodes). This relationship may be expressed as:
The hidden layer weights of
It should be noted that the input layer of
Also, the above description assumes a (2, 2, 2) Back Propagation network. The only difference in the mathematics resulting from a larger network are longer summations. All of the principles would remain the same.
At P804, multiple performance measurements are collected. The multiple performance measurements may include performance measurements CD, Vpp, and VDC. At P806, a determined number of RF harmonic measurements are taken on a processing tool which may be any semiconductor manufacturing tool, including, without limitation, an etch tool of a plasma process, a photo resistive tool or a patterning tool. These measurements may be used as inputs in the input layer 904.
At P808, the performance measurements of P804 and the n harmonic measurements of P806 are placed in the input layer 904 of the neural network model illustrated in
At P810, the neural network 900 is used to predict whether the performance measurements will result in a successful process or not as determined by outputs 910. The process continues to P812.
At P812, it is determined whether the predicted results of the processing tool are acceptable or not. If the predicted result is not acceptable, then a change is needed. The process continues to P814.
At P814, adjustments are made to the process parameters through an advanced process control (APC). The process continues to P816.
If the predicted results of the processing tool are acceptable, no change is needed. The process continues to P816.
At P816, it is determined whether the time window has expired. The time window may extend for an entire run or for a predetermined time period. If the time window has expired, then the process continues to P818.
At P818, the neural network model weights are updated based on the newly collected data and the process continues back to P804. As discussed above, the new values for the network weights are calculated by multiplying the negative gradient with a step size parameter (i.e., the learning rate) and adding the resultant vector to the vector of network weights attached to the current layer.
If the time window has not expired, then the process returns to P804.
The foregoing description of the embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible consistent with the above teachings or may be acquired from practice of the invention. For example, the various features of the invention, which are described in the contexts of separate embodiments for the purposes of clarity, may also be combined in a single embodiment. Conversely, the various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination. Accordingly, persons skilled in the art will appreciate that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention is defined only by the attached claims and their equivalents.