US 6857938 B1
One embodiment disclosed relates to a chemical-mechanical polishing process. The process includes performing chemical-mechanical polishing on an entire wafer lot without look ahead polishing of a first article wafer. A normalized polish rate is determined, and a process time for a next wafer lot is predicted using the normalized polish rate. Another embodiment of the invention relates to a polishing apparatus for chemical-mechanical planarization of semiconductor wafers.
1. A chemical-mechanical polishing process, the process comprising:
performing chemical-mechanical polishing on an entire first wafer lot;
determining a normalized polish rate from the chemical-mechanical polishing of the first wafer lot; and
predicting a process time for a second wafer lot using the normalized polish rate derived from the first wafer lot.
2. The process of
3. The process of
4. The process of
5. The process of
6. The process of
7. The process of
8. The process of
9. The process of
10. The process of
performing chemical-mechanical polishing on an entirety of the second wafer lot;
determining a normalized polish rate from the chemical-mechanical polishing of the second wafer lot; and
predicting a process time for a third wafer lot using the normalized polish rates derived from the first and second wafer lots.
11. A polishing apparatus for chemical-mechanical planarization (CMP) of semiconductor wafers, the apparatus comprising:
a CMP machine configured to polish an entire wafer lot without look ahead polishing of a first article wafer;
a control mechanism operatively coupled to the CMP machine for controlling a process time for polishing wafer lots; and
a computing mechanism operatively coupled to the control mechanism for calculating a normalized polish rate for a preceding wafer lot and for predicting a process time for a next wafer lot using the normalized polish rate derived from the preceding wafer lot.
12. The apparatus of
13. The apparatus of
14. The apparatus of
15. The apparatus of
16. The apparatus of
17. The apparatus of
18. A chemical-mechanical polishing apparatus, the apparatus comprising:
means for performing chemical-mechanical polishing on an entire preceding wafer lot;
means for determining a normalized polish rate from the chemical-mechanical polishing of the preceding wafer lot; and
means for predicting a process time for a next wafer lot using the normalized polish rate derived from the preceding wafer lot.
19. The apparatus of
20. The apparatus of
1. Field of the Invention
The invention relates generally to semiconductor manufacturing. More particularly, the invention relates to processes for chemical mechanical polishing (CMP).
2. Description of the Background Art
Chemical Mechanical Polishing or Chemical Mechanical Planarization (CMP) is an industry recognized process for making silicon wafers flat. The CMP process is used to achieve global planarization (planarization of the entire wafer). Both chemical and mechanical forces produce the desired polishing of the wafer. The CMP process generally includes an automated rotating polishing platen and a wafer holder. The wafer holder is generally used to hold the wafer in place while the platen exerts a force on the wafer. At the same time, the wafer and platen may be independently rotated. A polishing slurry feeding system may be implemented to wet the polishing pad and the wafer. The polishing pad bridges over relatively low spots on the wafer, thus removing material from the relatively high spots on the wafer. Planarization occurs because generally high spots on the wafer polish faster than low spots on the wafer. Thus, the relatively high portions of the wafer are smoothed to a uniform level faster than the other, relatively low portions of the wafer.
In the first step 102, chemical-mechanical polishing is performed for a “first article” or “look ahead” wafer selected from the wafer lot to be polished. Because the first article polishing is monitored to determine an appropriate process time, the first article polishing is disadvantageously operator intensive. Furthermore, the first article polishing disadvantageously occupies the CMP tool and so reduces the available time to polish the wafer lots. In other words, the first article polishing reduces the throughput (units per hour or UPH) of the CMP process. In addition, the first article wafer may have differences from the remainder of the wafer lot, and such differences may result in less accurate polishing of the remaining wafers and the need for rework if required specifications for the polishing are not met.
In the second step 104, a process time is calculated based on measurements from the CMP of the first article wafer. In the third step 106, the process time for CMP of the remaining wafers is set to be the calculated process time. CMP is performed for the remaining wafers of the wafer lot in the fourth step 108. In the fifth step 110, the process goes to the next lot of wafers. The process then begins again with the first step 102 where CMP is performed on the first article wafer.
While progress has been made in CMP processes, further improvement is desired to improve them. For instance, improvement in the throughput of CMP processes is desirable.
One embodiment of the invention relates to a chemical-mechanical polishing process. The process includes performing chemical-mechanical polishing on an entire wafer lot without look ahead polishing of a first article wafer. A normalized polish rate is determined, and a process time for a next wafer lot is advantageously predicted using the normalized polish rate.
Another embodiment of the invention relates to a polishing apparatus for chemical-mechanical planarization of semiconductor wafers. The apparatus includes a CMP machine, a control mechanism operatively coupled to the CMP machine, and a computing mechanism operatively coupled to the control mechanism. The CMP machine is configured to polish an entire wafer lot without look ahead polishing of a first article wafer, and the control mechanism controls a process time for polishing wafer lots. Advantageously, the computing mechanism calculates a normalized polish rate for a preceding wafer lot and predicts a process time for a next wafer lot using the normalized polish rate derived from the preceding wafer lot.
The use of the same reference label in different drawings indicates the same or like components. Drawings are not necessarily to scale unless otherwise noted.
In the present disclosure, numerous specific details are provided, such as examples of apparatus, components, and methods to provide a thorough understanding of embodiments of the invention. Persons of ordinary skill in the art will recognize, however, that the invention can be practiced without one or more of the specific details. In other instances, well-known details are not shown or described to avoid obscuring aspects of the invention.
In the first step 302, chemical-mechanical polishing is performed for an entire wafer lot. This advantageously avoids the operator intensive first article polishing step 102 of the conventional method 100.
In the second step 304, a process rate is calculated from the polish time and polish distance of the “last wafer,” where the “last wafer” refers to a wafer (or more than one wafers) from the just processed wafer lot.
In the third step 306, the process rate is normalized. As described in further detail below, the normalization may be done using a device and layer coefficient (DLC) in accordance with an embodiment of the invention. Normalization using the DLC advantageously compensates for variations in circuits and materials between wafer lots.
A prediction of the process time for the next wafer lot is performed in the fourth step 308. The prediction may utilize a model to advantageously analyze the data from one or more previous lots. In one particular embodiment of the invention, the model used is an autoregressive integrated moving average (ARIMA) model. Application of the ARIMA model provides an advantageous smoothing effect that allows for a more accurate prediction of a next process time based upon past data.
In the fifth step 310, the process goes to the next lot of wafers. The process then begins again with the first step 302 where CMP is advantageously performed on the entire next lot.
The following descriptions provide further details relating to an embodiment of the invention.
In developing embodiments of the present invention, data from eight polish tools were gathered over a month and a half to generate a table with about 2,500 rows of data. A spreadsheet (database) was generated that contained the following data as retrieved from the manufacturing execution system: lot #; step; device; process (technology); machine number; logging date/time into step; process time; pre-thickness (“last wafer”) from deposition; final thickness from CMP; and target from the statistical process control (SPC) chart. Tool-based data was also extracted from the SPC work environment. The following data was generated: tool; date/time; pre-thickness (thickness after deposition but before polishing); post thickness (thickness after polishing); filter hours; and pad hours. The database was then sorted by tool and time to allow for pad change characterization. This allowed combining the lot-based and tool-based information into one spreadsheet.
Advanced Term Calculations
In accordance with embodiments of the present invention, a polish rate may be calculated from the previous lot based upon the “last” wafer's process time and polish distance. This rate is then used to calculate the process time for the next lot's “first wafer distance to target.”
Since the different wafer lots have different devices having different circuit densities with different layers of different materials (doped oxide, undoped oxide, doped nitride, undoped nitride, and so on), a way to normalize the polish rate is desirable. In accordance with an embodiment of the present invention, a “device and layer coefficient” (DLC) is calculated for each device/layer combination in the database. The DLC is used to effectively change the distance to be polished by the calculated ratio of the DLC, thus normalizing the polish rate with a controlled procedure.
In order to determine the “normalization” and the correlation of this value to the actual polish rate, the following methodology was employed. The polish rate for this particular pad was calculated from polishing a flat qualification wafer. Qualification tests are done when a new pad is installed on the machine. This rate will also vary pad change to pad change. This rate was then held constant for each run of that pad cycle (the cycle of runs until the next pad was installed). The raw (individual lot) DLC value is calculated for each lot in the database using the following formula:
The average DLC value for each device/layer combination may then be calculated, for example, using the Microsoft Excel functionality called a “PivotTable” report. A PivotTable report is an interactive table that you can use to quickly summarize large amounts of data. You can rotate its rows and columns to see different summaries of the source data, filter the data by displaying different pages, or display the details for areas of interest. The PivotTable allows you to average, sum, count, etc. and put into a tabled format, the output of one variable or group of variables. The average DLC for each device/layer combination in the database was calculated using the PivotTable “average” function.
For each wafer lot, the effect of the specific device/layer on the polish rate is taken into account. The resultant value is termed the raw normalized polish rate or NPR. The raw NPR is calculated using the following formula:
Our investigation has identified an additional factor that should be accounted for. The factor may be called the compensated rate factor or CRF. As shown by the following equation, the CRF is the ratio of the actual rate of the qualification test (QUAL_Rate) to the target rate of the qualification test (Target_QUAL_Rate).
The actual or compensated NPR may be calculated by the following formula:
Note that this NPR value could have been determined in an alternate manner by directly using the target rate of the qualification test. The NPR values are used in the lot-to-lot analysis and predictions that is described further below.
Polish Time Predictions
In developing embodiments of the invention, the NPR data calculated as described above was entered into a time-series analysis for Westech CMP tools. Analysis determined a preferred modeling methodology and the constant term values to use. In this instance, the time-series analysis is performed using “JMP” software to implement the analysis.
The plot near the top of the
Statistical analysis of the data generates autocorrelation and partial correlation functions. These statistical functions are shown as a function of lag in the bar graphs in the middle of the FIG. 7. The lag of one relates to the statistical correlation between a run and the run just preceding it. The lag of two relates to the statistical correlation between a run and the run that was two runs before it. And so on. As shown by the partial correlation graph, the partial correlation is greater than 0.5 for a lag of one (indicating a relatively substantial correlation) and is less than 0.5 for a lag of two (indicating a less substantial correlation).
Several models were applied to the data in an attempt to find a model that would predict the run-to-run variation in the NPR values. Most of the models resulted in mediocre predictions of the values. However, one model did a relatively good job. That model was the autoregressive integrated moving average (ARIMA) model. The relatively low RSquare value (0.458) at the bottom of
The plot near the bottom of
The graph near the top of
Now the parameter estimates for the ARIMA modeling are discussed in further detail. The parameter estimates depicted in
Parameter estimates for the fleet of tools (each tool labeled by number) are shown in FIG. 10. From
Further Optimization of DLC Model
As discussed above, the present invention advantageously uses DLC values to improve the automated CMP process. The technique for determining the DLC values to use may be further honed or optimized.
Consider, for example, the situation after the system is initially turned on. There may be a period of time needed to fully populate the database. During this period, a model may be utilized to help determine the DLC values to use in real time. For example, the model may be an exponentially weighted moving average (EWMA) or similar model.
As a side note, investigation was also made into whether the following variables contributed to the distribution of DLC values: deposition compensation (for difference between actual and target deposition thickness); cumulative pad; cumulative filter; pad duty cycle (relates to idle time between lots); and pad device cycle (relates to processing same layer for several lots versus switching from processing one layer to processing a different layer). The result of the investigation was that those variables did not appear to have significant effect on the DLC distribution.
The ARIMA model is now discussed in further detail. ARIMA stands for autoregressive integrated moving average. In accordance with an embodiment of the invention, use of the ARIMA model to lot-by-lot CMP runs advantageously allows for a more accurate prediction of a next process time based upon past data. The ARIMA model has three parameters: p; d; and q. The order of the autoregressive component is given by p. The order of differencing used is given by d. The order of moving average used is given by q. ARIMA(p,d,q) is the notation indicating the components used for a particular ARIMA model. For example, one particular ARIMA model is the ARIMA(2,1,1) model. ARIMA(2,1,1) refers to a model with a second order autoregressive component, first order differencing component, and a first order moving average component.
The vertical axis of the bar charts indicates the amount of over (positive) or under (negative) polishing. Over polishing is when the polishing goes beyond the target distance. Under polishing is when more polishing is needed to reach the target distance.
The runs that resulted in overpolishing beyond the specification tolerance are circled in FIG. 11. Such runs require scrapping of the wafers. The portion of runs ending up in overpolishing beyond tolerance was similar (about 8%) for control and ARIMA predicted cases. From this, it is seen that the lot-by-lot feed forward polishing method in accordance with an embodiment of the invention achieve similar tolerances as the conventional method. This means that the invention may be advantageously used to eliminate the need for a “first article” run in the CMP process without adversely affecting polishing results. Thus, higher throughput CMP processes may be advantageously achieved with the present invention.
Further Details Time Series Analysis and Forecasting Utilized
A detailed explanation of the theory for time series analysis and forecasting as utilized in accordance with an embodiment of the invention is given as follows. Additional explanation of the theory is given in “Demand Signal Modeling: A Model-Based Approach to the Forecasting of Future Product Demand,” by Russell J. Elias, Masters Thesis, Arizona State University, December 2000. The aforementioned thesis is hereby incorporated by reference in its entirety.
A time series is a discrete set of realizations that have an underlying, fundamental sequential time order. A time series may be defined as a sequence of observations taken sequentially in time. A characteristic feature of these sequences of observations, or series, is that typically realizations adjacent to each other in time share some type of interdependence. It is interesting to note that this same interdependence, which in other statistical analysis protocols (e.g., hypothesis testing and design of experiments) is viewed as a corrupting effect, here forms the enabling basis of a powerful methodology that may be called Time Series Analysis.
For a stationary time series (as previously defined), the degree of interdependence between directly adjacent and nearly adjacent realizations can be quantified as an autocorrelation at lag k, or k
A plot of the autocorrelation coefficient k, versus the lag k is known as the autocorrelation function, or ACF, of the time series, which will later be shown as a key identification tool for correct time series model form. Given that the autocorrelation function is an even function, i.e., that explicitly k=−k, the function is typically plotted only for positive values of the lag k.
In order to test whether the autocorrelation coefficients are statistically significant (i.e., non-zero in value) at various lags, the predictand series average<Y> is substituted for the unknown mean μ in Equation 6, which now produces the sample autocorrelation coefficient rk. This sample statistic is compared against its standard error, which is estimated based upon an approximation first forwarded by Bartlett (1946), which states that for a stationary normal process, the variance of the sample autocorrelation coefficient may be estimated as:
The quantity of Equation 8 may be qualitatively interpreted as the simple autocorrelation between two observations at lag k (say yt and yt−k) with the effect of the intervening observations (yt+k, yt+2, . . . , yt+k−1) assumed known. In practice, both the ACF and the PACF are automatically calculated for sample predictand series utilizing any of several commercially available statistical software packages, making them readily available to assist in model identification.
The simplest time series model form is the autoregressive model. In this process model, realizations are deemed to emanate from a linear combination of past realizations and a single current random shock. A first order autoregressive model, denoted as AR(1), is represented as:
The mean of the first order autoregressive process is equal to
The ACF for an AR(2) process monotonically decreases. The following critical value relates to the ACF:
Another class of times series models is the moving average models, in which realizations are deemed to emanate from a linear combination of historical random shocks. A first order moving average model, or MA(1), is represented as follows:
The mean of the MA(1) process is simply p, and the variance is given by
The autoregressive-moving average model (ARMA) involves combining the two previous model classes into a unified form. A model which is first order in both components, known as ARMA(1,1), is represented as follows:
All of the times series models discussed thus far in this section (AR, MA, and ARMA) all presuppose that they are modeling stationary processes. However, these procedures can be easily extended to non-stationary processes through a transformation algorithm known as differencing. Consider a backward difference operator whose operation is defined as:
The application of the difference operator results in a stationary time series in this instance. At times more than one differencing operation is required to achieve stationarity in the process in question; it is helpful in these instances to introduce the backward-shift operator B, defined as \=1=B. The backward shift operator forces a backwards indexing of variables, such that Byt=yt−1, which provides a computationally efficient method of expanding models from notational to operational forms (as will be demonstrated). Second order differencing can be expressed as \2=(1−B)2, a notation that will be utilized shortly.
Implementation of differencing prior to time series modeling leads to an extremely versatile and powerful class of models known as autoregressive integrated moving average models, or ARIMA. The order of each of the three components is specified in the model notation as p,d,q: for example, the ARIMA(2,1,1) notation refers to a model with a second order autoregressive component, first order differencing component, and a first order moving average component. The ARIMA(2,1,1) process may be succinctly expressed as
While specific embodiments of the present invention have been provided, it is to be understood that these embodiments are for illustration purposes and not limiting. Many additional embodiments will be apparent to persons of ordinary skill in the art reading this disclosure. Thus, the present invention is limited only by the following claims.