US 6898102 B2
A novel bi-level DRAM architecture is described which achieves significant reductions in die size while maintaining the noise performance of traditional folded architectures. Die size reduction results primarily by building the memory arrays with 6F2 or smaller memory cells in a type of cross-point memory cell layout. The memory arrays utilize stacked digitlines and vertical digitline twisting to achieve folded architecture operation and noise performance.
1. A method for improved differential electrical noise reduction for an integrated circuit device having a memory having multiple vertical levels of circuitry, the memory having at least four arrays of memory cells, each array of memory cells being substantially equally spaced from an adjacent array of memory cells, each array of memory cells including a plurality of memory cells and at least four pairs of digitlines, each pair of digitlines including a first digitline and a second digitline, the first digitline and the second digitline being substantially vertically aligned in an upper conductive level and a lower conductive level of the integrated circuit device, the first digitline and second digitline of each pair of digitlines each connected to an equal number of the memory cells in each array of memory cells, the method comprising:
electrically balancing the first digitline and the second digitline of each digitline pair of the at least four pairs of digitlines to balance the electrical noise therebetween by vertically twisting the first digitline and the second digitline of each pair of digitlines of the at least four pairs of digitlines between arrays of the at least four arrays of memory cells in a twist region located between each array of the at least four arrays of memory cells, a first pair of digitlines and a third pair of digitlines of the at least four pairs of digitlines vertically twisted in the twist region located between a first array of memory cells and a second array of memory cells and vertically twisted in the twist region located between a third array of memory cells and a fourth array of memory cells while a second pair of digitlines and a fourth pair of digitlines of the at least four pairs of digitlines are vertically twisted in the twist region located between the second array of memory cells and the third array of memory cells, the electrically balancing including connecting an equal number of memory cells to a portion of one of the first digitline and the second digitline of a pair of digitlines of the at least four pairs of digitlines when located in a lower conductive level of an array of the at least four arrays of memory cells.
2. The method of
isolating adjacent memory cell of an array of the at least four arrays of memory cells using an isolation region comprising a plurality of isolation transistors, each isolation transistor having a gate biased to a predetermined voltage.
This application is a continuation of application Ser. No. 10/150,236, filed May 17, 2002, now U.S. Pat. No. 6,594,173, issued Jul. 15, 2003, which is a continuation of application Ser. No. 09/826,764, filed Apr. 5, 2001, now U.S. Pat. No. 6,392,303, issued May 21, 2002, which is a continuation of application Ser. No. 09/507,170, filed Feb. 18, 2000, now U.S. Pat. No. 6,243,311, issued Jun. 5, 2001, which is a divisional of application Ser. No. 08/701,749, filed Aug. 22, 1996, now U.S. Pat. No. 6,043,562, issued Mar. 28, 2000. This application claims priority to provisional application Ser. Nos. 60/010,293 filed Feb. 1, 1996, and 60/010,622 filed Jan. 26, 1996.
Field of the Invention: The present invention relates generally to memory devices and, in particular, the present invention relates to a digitline architecture in a DRAM.
State of the Art: A modern DRAM memory cell or memory bit, as shown in
The digitline 14, as depicted in
The memory bit transistor's gate terminal connects to a wordline (towline) 16. The wordline, which connects to a multitude of memory bits, consists of an extended segment of the same polysilicon used to form the transistor's gate. The wordline is physically orthogonal to the digitline. A memory array, shown in
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for a new array architecture which combines the advantages of both folded and open digitline architectures while avoiding their respective disadvantages. To meet this objective, the architecture needs to include the following features and characteristics: an open digitline memory bit configuration, a small 6F2 memory bit, and a small, efficient array layout. The memory must also include a folded digitline sense amplifier configuration, adjacent true and complement digitlines, and twisted digitline pairs to achieve a high signal to noise ratio. Further, a relaxed wordline pitch should be used to facilitate better layout.
The above-mentioned problems with digitline architectures and other problems are addressed by the present invention and will be understood by reading and studying the following specification. A memory device is described which reduces overall die size beyond that obtainable from either the folded or open digitline architectures. A signal to noise performance is achieved which approaches that of the folded digitline architecture.
In particular, the present invention describes a dynamic memory device comprising a multi-level digitline pair fabricated on a semiconductor die. The multi-level digitline pair has vertically offset first and second digitlines. The digitline pair is vertically twisted such that the first digitline is located below the second digitline on one horizontal side of the vertical twist and located above the second digitline as an upper digitline on an opposite horizontal side of the twist.
In another embodiment, an integrated circuit dynamic memory device comprises an integrated circuit die having multiple, vertically offset conductive levels, and a multi-level digitline pair fabricated on the integrated circuit die having first and second electrically isolated digitlines, each of the first and second digitlines comprising first and second sections located in different ones of the multiple conductive levels and electrically connected via a vertically traversing electrical path. The first and second digitlines are located such that the first section of the first digitline is vertically located above the first section of the second digitline and the second section of the first digitline is vertically located below the second section of the second digitline.
In yet another embodiment, a method is described for reducing noise in an integrated circuit memory device. The method comprises the step of electrically balancing first and second vertically stacked digitlines. To balance the digitlines, the first and second digitlines can be fabricated in first and second conductive levels such that the first and second digitlines are substantially vertically aligned. A vertical conductive twist can be provided to locate a portion of each of the first and second digitlines in both the first and second conductive levels. Finally, an equal number of memory cells can be coupled to the portion of the first and second digitlines located in a lower conductive level.
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific preferred embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical and electrical changes may be made without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
An understanding of basic DRAM operation, such as reading and writing, is necessary to fully appreciate the present invention. A detailed description of DRAM circuitry is presented below.
After the cell access is complete, the sensing operation can commence. The reason for forming a digitline pair will now become apparent.
Shortly after the N-sense-amp fires, ACT will drive toward Vcc. This activates the P-sense-amp that operates in a complementary fashion to the N-sense-amp. With the low voltage digitline approaching ground, a strong signal exists to drive the appropriate PMOS transistor into conduction. This conduction, again moving from subthreshold to saturation operation, will charge the high voltage digitline toward ACT, ultimately reaching Vcc. Since the memory bit transistor remains on during sensing, the memory bit capacitor will charge to the NLAT* or ACT voltage level. The voltage, and hence charge, which the memory bit capacitor held prior to accessing will restore a full level—Vcc for a logic one and GND for a logic zero. It should be apparent now why the minimum wordline voltage is Vth above Vcc. If Vccp were anything less, the memory bit transistor would turn off before the memory bit capacitor attains a full Vcc level.
A DRAM write operation is very similar to sensing and restore operations except that separate write driver circuits determine whether logic ones or zeros are placed into the cells. The write driver circuit is generally a tri-state inverter 19 connected to the digitlines through a second pair of pass transistors 17 as shown in FIG. 7. These pass transistors are referred to as I/O transistors. The gate terminals of the I/O transistors connect to a common CSEL (Column SELect) signal. The column address determines which CSEL signal activates and ultimately which pair (or multiple pairs) of digitlines route to the output pad or write driver. In most current DRAM designs, the write driver simply overdrives the sense amplifier pair, which remains active during the writing operation. The write operation needs to be long enough in duration to flip the sense amplifiers. After new data writes into the sense amplifiers, the amplifiers actually finish the write cycle by restoring the digitlines to full rail to rail voltages.
Memory Cells and Arrays. The primary advantage of DRAM, over other types of memory technology, is low cost. This advantage arises from the simplicity and scaling characteristics of its 1T1C memory cell. Although the DRAM memory bit encompasses simple concepts, its actual design and implementation are highly complex. Successful, cost-effective DRAM designs require a tremendous amount of process technology.
A modem buried capacitor DRAM memory bit pair appears in FIG. 9. DRAM memory bits are constructed in pairs to allow sharing of the digitline contact 22. Sharing a contact significantly reduces overall cell size. The memory bits consist of an active area rectangle 24 (in this case N+active area), a pair of polysilicon wordlines 16, a single digitline contact 22, a metal or polysilicon digitline 14, and a pair of cell capacitors 12 formed with oxide-nitride-oxide dielectric between two layers of polysilicon. For some processes, the wordline polysilicon is silicided to reduce the sheet resistance, permitting longer wordline segments without reducing speed. The memory bit layout, shown in
A small array of memory bits appears in FIG. 10. This figure is useful to illustrate several features of the memory bit. First, note that the digitline pitch (width plus space) dictates the active area pitch and capacitor pitch. Process engineers adjust the active area width and the field oxide width to maximize transistor drive and minimize transistor to transistor leakage. The field oxide technology greatly impacts this balance. A thicker field oxide or a shallower junction depth will enable wider transistor active area. Second, the wordline pitch (width plus space) dictates the space available for the digitline contact, transistor length, active area space, field poly width, and capacitor length. Optimization of each of these features by process engineers is necessary to maximize capacitance, minimize leakage and maximize yield. Contact technology, subthreshold transistor characteristics, photolithography, etch and film technology will dictate the overall design.
The memory bit shown in
A folded array is schematically depicted in FIG. 12. Sense amplifier circuits placed at the edge of each array connect to both true and complement digitlines (D and D*) coming from a single array. Optional digitline pair twisting at one or more places can reduce and balance the coupling to adjacent digitline pairs and improve overall signal-to-noise characteristics.
An alternative to the folded array architecture, popular prior to the 64 kbit generation, was the open digitline architecture. Seen schematically in
Digitline capacitive components, contributed by each memory bit, include junction capacitance, digitline to cellplate (poly3), digitline to wordline, digitline to digitline, digitline to substrate, and in some cases digitline to storage cell (poly2) capacitance. Each memory bit connected to the digitline, therefore, adds a specific amount of capacitance to the digitline. Most modern DRAM designs have no more than 256 memory bits connected to a digitline segment. Two factors dictate this quantity. First for a given cell size, as determined by row and column pitches, there is a maximum achievable storage capacitance without resorting to exotic processes or excessive cell height. For processes in which the digitline is above the storage capacitor (buried capacitor), contact technology will determine the maximum allowable cell height. This fixes the volume available (cell area multiplied by cell height) in which to build the storage capacitor. Second, as the digitline capacitance increases, the power associated with charging and discharging this capacitance during reading and writing operations increases. Any given wordline essentially accesses (crosses) all of the columns within a DRAM. For a 256 Meg DRAM, each wordline crosses 16,384 columns. With a multiplier such as that, it is easy to appreciate why limits to digitline capacitance are necessary to keep power dissipation low.
There are two other basic memory bit configurations used in the DRAM industry. The first, shown in
Sense Amplifier Elements. The term “sense amplifier” refers to a collection of circuit elements that pitch up to the digitlines of a DRAM array. This collection most generally includes isolation transistors, devices for digitline equilibration and bias, one or more N-sense amplifiers, one or more P-sense amplifiers, and devices to connect selected digitlines to I/O signal lines. All of these circuits along with the wordline driver circuits, to be discussed later, are called pitch cells. This designation comes from the requirement that the physical layout for these circuits is constrained by the digitline and wordline pitches of an array of memory bits. For example, the sense amplifier layout for a specific digitline pair (column) generally consumes the space of four digitlines. This is commonly referred to as quarter-pitch or four-pitch, such that one sense amplifier exists for every four digitlines.
The first elements for review are the equilibration and bias circuits. From the earlier discussions on DRAM operation, the digitlines start at Vcc/2 volts prior to cell access and sensing. In this paired digitline configuration, it is important to the sensing operation that both digitlines, which form a column pair, are at the same voltage before firing a wordline. Any offset voltage that appears between the pair will directly reduce the effective signal voltage produced by the access operation. Digitline equilibration is accomplished with one or more NMOS transistors connected between the digitlines. The higher drive strength of an NMOS device produces faster equilibration than a PMOS transistor of comparable size. An equilibration transistor, together with bias transistors, appears schematically in FIG. 20. The gate terminal 36 is connected to a signal called EQ (EQuilibrate). EQ is held at Vcc whenever the external row address strobe (RAS*) is high, indicating an inactive or precharge state for the DRAM. When RAS* falls, EQ will transition low, turning off the equilibration transistor just prior to any wordline firing. Toward the end of each RAS* cycle, EQ will again transition high and force the digitlines to re-equilibrate.
As shown in
Isolation devices are important elements in sense amplifier circuits. Generally implemented as NMOS transistors, isolation transistors are placed between the array digitlines and specific sense amplifier components. As will be understood shortly, there are a multitude of possible configurations for the sense amplifier block. Isolation devices provide two functions. First, if the sense amps are positioned between and connected to two arrays, they allow one of the two arrays to be electrically isolated. This isolation is necessary whenever a wordline fires high in one of the arrays. Isolation of the second array will reduce the total digitline capacitance connected to the sense amplifiers. This speeds read and write time, reduces power consumption, and extends refresh for the isolated array. Second, the isolation devices provide some resistance between the sense amplifier and the array digitlines. This resistance stabilizes the sense amplifiers and speeds up the sensing operation by somewhat separating the high capacitance digitlines from the low capacitance sense nodes. Capacitance of the sense nodes, between isolation transistors, is generally less than 15 fF, permitting the sense amplifier to latch somewhat faster than if solidly connected to the digitlines. The restore operation slows, though, because of the isolation resistance, but this is less important than sensing and stability. Isolation transistors are physically located on both ends of the sense amplifier layout. For quarter pitch sense amplifiers, there is one isolation transistor for every two digitlines. Although this is twice the active area width and space of an array, it nevertheless establishes the minimum isolation used in the pitch cells.
Input/output (I/O) transistors allow data to be read from or written to specific digitline pairs. A single I/O transistor connects to each sense node as shown in FIG. 22. The outputs of each I/O transistor are connected to I/O signal pairs. Commonly, there are two pairs of I/O signal lines permitting four I/O transistors to share a single column select control signal. DRAM designs employing two or more metal layers run the column select lines across the arrays using either metal2 or metal3. Each column select activates four I/O transistors on both sides of an array, permitting the connection of four digitline pairs (columns) to peripheral data path circuits. The I/O transistors are carefully sized to ensure that the I/O bias voltage or remnant voltage on the I/O lines does not introduce instability into the sense amplifiers. Although designs vary significantly as to the numerical ratio, I/O transistors are two to eight times smaller than the N-sense amplifier transistors. This relationship is referred to as beta ratio. A beta ratio between five and eight is common, although proper selection can only be verified with silicon, since simulations fail to adequately predict sense amplifier instability.
The fundamental elements of any sense amplifier block are the N-sense amplifier and the P-sense amplifier. These amplifiers, as previously discussed, work together to detect the access signal voltage and drive the digitlines, accordingly to Vcc and ground. The N-sense amplifier, depicted in
While the majority of DRAM designs latch the digitlines to Vcc and ground, a growing number of designs are beginning to reduce these levels. Various technical papers report improved refresh times and lower power dissipation through reductions in latch voltages. At first, this appears contradictory, since writing a smaller charge into the memory cell should produce lower refresh time. The benefits are derived from maintaining lower drain to source voltages (Vds) and negative gate to source voltages (Vgs) across non-accessed memory bit transistors. Lower Vds and negative Vgs translate to substantially lower subthreshold leakage and longer refresh, despite the smaller stored charge. Most designs that implement reduced latch voltages generally raise the N-sense amplifier latch voltage without lowering the P-sense amplifier latch voltage. Designated as boosted sense ground designs, they write data into each memory bit using full Vcc for a logic one and boosted ground for a logic zero. The sense ground level is generally a few hundred millivolts above true ground. In standard DRAMs which drive digitlines fully to ground, the Vgs of non-accessed memory bits becomes zero when the digitlines are latched. This results in high subthreshold leakage for a stored one level, since full Vcc exists across the memory bit transistor while the Vgs is held to zero. Stored zero levels do not suffer from prolonged subthreshold leakage since any amount of cell leakage produces a negative Vgs for the transistor. The net effect is that a stored one level leaks away much faster than a stored zero level. One's level retention, therefore, establishes the maximum refresh period for most DRAM designs. Boosted sense ground extends refresh by reducing subthreshold leakage for stored ones. This is accomplished by guaranteeing negative gate to source bias on non-accessed memory bit transistors. The benefit of extended refresh from these designs is somewhat diminished, though, by the added complexity of generating boosted ground levels and the problem of digitlines that no longer equilibrate at Vcc/2 volts.
The rate at which the sense amplifiers are activated has been the subject of some debate. A variety of designs utilizes multistage circuits to control the rate at which NLAT* fires. Especially prevalent with boosted sense ground designs are two stage circuits that initially drive NLAT* quickly toward true ground, to speed sensing, and then bring NLAT* to the boosted ground level to reduce cell leakage. An alternative to this approach, using two stage drivers, first drives NLAT* slowly toward ground to limit current and digitline disturbances. Following this phase is a second phase in which NLAT* drives strongly toward ground to complete the sensing operation. The second phase usually occurs in conjunction with ACT activation. Although these two designs have contrary operation, they each meet specific performance objectives—trading off noise and speed.
The sense amplifier block shown in
A sense amplifier design for use on a single metal DRAM appears in FIG. 26. Prevalent on 1 Mb and 4 Mb designs, single metal processes conceded to multi-metal processes at the 16 Mb generation. Unlike the double metal sense amplifiers shown in
A set of operating signal waveforms appears in
Row Decoder Elements. Row decode circuits are similar to sense amplifier circuits in that they also pitch up to memory bit arrays and have a variety of implementations. A row decode block consists of two basic elements, a wordline driver and an address decoder tree. There are three basic configurations for wordline driver circuits that include the NOR driver, the inverter (CMOS) driver, and the bootstrap driver. Additionally, the drivers and associated decode trees can either be configured as local row decodes for each array section or as global row decodes which drive a multitude of array sections. Global row decodes connect to multiple arrays through metal wordline straps. The straps are stitched to the polysilicon wordlines at specific intervals dictated by the polysilicon resistance and the desired RC wordline time constant. Most processes that strap wordlines with metal do not silicide the polysilicon, although doing so would reduce the number of stitch regions required. Strapping wordlines and using global row decoders obviously reduce die size—in some cases very dramatically. The penalty to strapping is that it requires an additional metal layer and that this layer is at minimum array pitch. This puts a tremendous burden on process technologists in which three conductors are at minimum pitch—wordlines, digitlines, and wordline straps. Distributed row decoders, on the other hand, do not require metal straps, but do require additional die size. It is highly advantageous to reduce the polysilicon resistance in order to stretch the wordline length and reduce the number of needed row decodes especially on large DRAMs such as the 1 gigabit.
The bootstrap wordline driver shown in
The bootstrap driver is turned off by first driving the PHASE0 signal to ground. M1 remains on, since node 131 cannot drop below Vcc−Vth, substantially discharging the wordline toward ground. This is followed by the address decoder turning off, bringing DEC to ground and DEC* to Vcc. With DEC* at Vcc, transistor M2 turns on and fully clamps the wordline to ground. A voltage level translator is required for the PHASE0 signal since it operates between ground and the boosted voltage Vccp. For a global row decode configuration, this is not much of a burden. For a local row decode configuration, the level translators can be very difficult to implement. Generally, these translators are placed in array gaps which exist at the intersection of sense amplifier and row decode blocks, or they are distributed throughout the row decode block itself. The translators require both PMOS and NMOS transistors and must be capable of driving large capacitive loads. Layout of the translators is very difficult, especially since the overall layout must be as small as possible.
The second type of wordline driver, shown in
The final wordline driver configuration seen in
Address decode trees are the final element of the row decode block to be discussed. Decode trees are constructed from all types of logic—static, dynamic, pass gate, or a combination thereof. Regardless of what type of logic that an address decoder is implemented with, the layout must completely reside beneath the row address signal lines to constitute an efficient, minimal design. In other words, the metal address tracks dictate the die area available for the decoder. For DRAM designs that utilize global row decode schemes, the penalty for inefficient design may be insignificant, but for distributed local row decode schemes, the die area penalty can be significant. As with memory bits and sense amplifiers, any time invested in row decode optimization is well spent.
The simplest type of address decode tree utilizes static CMOS logic. Shown in
The second type of address tree utilizes dynamic logic, the most prevalent being precharge and evaluate (P&E) logic. Used by the majority of DRAM manufacturers, P&E address trees come in a variety of forms, although the differences between one and another can be subtle.
The row address lines shown as RA1-RA3 can be either true and complement address lines or predecoded address lines. Predecoded address lines are formed by logically combining (AND) addresses as shown in Table 1. Advantages to predecoded addresses include lower power, since fewer signals make transitions during address changes, and higher efficiency, since only three transistors are necessary to decode six addresses for the circuit of FIG. 33. Predecoding is especially beneficial for redundancy circuits. Predecoded addresses are used throughout most DRAM designs today.
The final type of address tree, shown in
Architectural Characteristics. A detailed description of the two most prevalent array architectures under consideration for future large scale DRAMs is provided—the aforementioned open digitline and folded digitline architectures. To provide a viable point for comparison, each architecture will be employed in the theoretical construction of 32 Mbit memory blocks for use in a 256 Mbit DRAM. Design parameters and layout rules from a typical 0.25 μm DRAM process provide the needed dimensions and constraints for the analysis. Some of these parameters are shown in Table 2. Examination of DRAM architectures in the light of a real world design problem permits a more objective and unbiased comparison. An added benefit to this approach is that the strengths and weaknesses of either architecture should become readily apparent.
Open Digitline Array Architecture. The open digitline array architecture was the prevalent architecture prior to the 64 kbit DRAM. A modern embodiment of this architecture as shown in
Array core size, as measured in the number of memory bits, is restricted by two factors—a desire to keep the quantity of memory bits binary and practical limits on wordline and digitline length. The need for a binary quantity of memory bits in each array core derives from the binary nature of DRAM addressing. Given N row addresses and M column addresses for a given part, there are a total of 2N+M addressable memory bits. Address decoding is greatly simplified within a DRAM if array address boundaries are derived directly from address bits. Since the addressing is binary, the boundaries naturally become binary. Therefore, the size of each array core must necessarily have 2X addressable rows and 2Y addressable digitlines. The resulting array core size is 2X+Y memory bits, which is, of course, a binary number. The second set of factors limiting array core size is practical limits on digitline and wordline length. From earlier discussions, the digitline capacitance is limited by two factors. First, the ratio of cell capacitance to digitline capacitance must fall within a specified range to ensure reliable sensing. Second, operating current and power for the DRAM is, in large part, determined by the current required to charge and discharge the digitlines during each active cycle. For the 256 Mbit generation, the digitlines are restricted from having connection to more than 256 rows (128 memory bit pairs) because of these power considerations. Each memory bit connected to a digitline adds capacitance to the digitline. The power dissipated during a read or refresh operation is proportional to the digitline capacitance (Cd), the supply voltage (Vcc), the number of active columns (N), and the refresh period (P). Accordingly, the power dissipated is given as Pd=Vccx·(N·Vcc·(Cd+Cc))÷(2·P) watts. On a 256 Mbit DRAM in 8K refresh, there are 32,768 (215) active columns during each read, write, or refresh operation. Active array current and power dissipation for a 256 Mbit DRAM are given in Table 3 for a 90 nS refresh period (−5 timing) at various digitline lengths. The budget for active array current is limited to 200 mA for this 256 Mbit design. To meet this budget, the digitline cannot exceed a length of 256 memory bits.
Wordline length, is limited by the maximum allowable RC time constant of the wordline. To ensure acceptable access time for the 256 Mbit DRAM, the wordline time constant should be kept below four nanoseconds. For a wordline connected to N memory bits, the total resistance and capacitance using 0.3 μm polysilicon are Rwl=Rs·N·Pwl÷0.3 μm ohms and Cwl=Cw6·N Farads, respectively. Table 4 contains the effective wordline time constants for various wordline lengths. As shown in the table, the wordline length cannot exceed 512 memory bits (512 digitlines) if the wordline time constant is to remain under four nanoseconds.
The open digitline architecture does not support digitline twisting since the true and complement digitlines which constitute a column are in separate array cores. Therefore, no silicon area is consumed for twist regions. The 32 Mbit array block requires a total of 256 128 Kbit array cores in its construction. Each 32 Mbit block represents an address space comprising a total of 4,096 rows and 8,192 columns. A practical configuration for the 32 Mbit block is depicted in FIG. 36. In this figure, the 256 array cores appear in a 16 by 16 arrangement. The 16 by 16 arrangement produces 2 Mbit sections consisting of 256 wordlines and 8,192 digitlines (4,096 columns). A total of 16 2 Mbit sections are required to form the complete 32 Mbit block. Sense amplifier strips are positioned vertically between each 2 Mbit section. Row decode strips or wordline stitching strips are positioned horizontally between each array core.
Layout was generated for the various 32 Mbit elements depicted in
Unfortunately, the architecture presented in
A second approach to solving the array edge problem in open digitline architectures is to maintain the configuration shown in
The presence of metal3 not only enables the sense amplifier layout, but also permits the use of either a full or hierarchical global row decoding scheme. A full global row decoding scheme using wordline stitching places great demands upon metal and contact/via technologies, but represents the most efficient use of the additional metal. Hierarchical row decoding using bootstrap wordline drivers is slightly less efficient, but relaxes process requirements significantly. For a balanced perspective, both approaches, global and hierarchical, were analyzed. The results of this analysis for the open digitline architecture are summarized in Tables 6 and 7, respectively. Array efficiency for global and hierarchical row decoding calculated to 60.5 percent and 55.9 percent, respectively, for the 32 Mbit memory blocks is based upon data from these tables.
Folded Array Architecture. The folded array architecture depicted in
Sense amplifier blocks are placed on both sides of each array core. The sense amplifiers within each block are laid out at quarter pitch—one sense amplifier for every four digitlines. Each sense amplifier connects through isolation devices to columns (digitline pairs) from both adjacent array cores. Odd columns connect on one side of the core and even columns connect on the opposite side. Each sense amplifier block is, therefore, connected to only odd or even columns, never connecting to both odd and even columns within the same block. Connecting to both odd and even columns requires a half pitch sense amplifier layout—one sense amplifier for every two digitlines. While half pitch layout is possible with certain DRAM processes, the bulk of production DRAM designs remains quarter pitch due to ease of layout. The analysis presented in this section is accordingly based upon quarter pitch design practices.
Location of row decode blocks for the array core depends upon the number of available metal layers. For one and two metal processes, local row decode blocks are located at the top and bottom edges of the core. Three and four metal processes support the use of global row decodes. Global row decodes require only stitch regions or local wordline drivers at the top and bottom edges of the core. Stitch regions consume much less silicon area than local row decodes, substantially increasing array efficiency for the DRAM. The array core also includes digitline twist regions that run parallel to the wordlines. These regions provide the die area required for digitline twisting. Depending upon the particular twisting scheme selected for a design, the array core will need between one and three twist regions. For the sake of analysis, a triple twist is assumed, since it offers the best overall noise performance and is the choice of DRAM manufacturers on advanced large-scale applications. Each twist region constitutes a break in the array structure, necessitating the inclusion of dummy wordlines. For this reason, there are 16 dummy wordlines (2 for each array edge) in the folded array core rather than 4 dummy wordlines as in the open digitline architecture.
The array core for folded digitline architectures contains more memory bits than is possible for open digitline architectures. Larger core size is an inherent feature of folded architectures arising from the very nature of the architecture. Folded architectures get their name from the fact that a folded array core results from folding two open digitline array cores one on top of the other. The digitlines and wordlines from each folded core are spread apart (double pitch) to allow room for the other folded core. After folding, each constituent core remains intact and independent, except for memory bit changes (8F2 conversion) that are necessary in the folded architecture. The array core size doubles since the total number of digitlines and wordlines doubles in the folding process. It does not quadruple as one might suspect, because the two constituent folded cores remain independent—the wordlines from one folded core do not connect to memory bits in the other folded core. Digitline pairing (column formation) is a natural outgrowth of the folding process since each wordline only connects to memory bits on alternating digitlines. The existence of digitline pairs (columns) is the one characteristic of folded digitline architectures that produces superior signal-to-noise performance. Furthermore, the digitlines that form a column are physically adjacent to one another. This feature permits various digitline twisting schemes to be used which further improves signal-to-noise.
Similar to the open digitline architecture, digitline length for the folded digitline architecture is again limited by power dissipation and minimum cell to digitline capacitance ratio. For the 256 Mbit generation, digitlines are restricted from having connection to more than 256 cells (128 memory bit pairs). The analysis to arrive at this quantity is similar to that for the open digitline architecture. Refer back to Table 3 to view the calculated results of power dissipation versus digitline length for a 256 Mbit DRAM in 8K refresh. Wordline length is again limited by the maximum allowable RC time constant of the wordline. Contrary to an open digitline architecture in which each wordline connects to memory bits on each digitline, the wordlines in a folded digitline architecture only connect to memory bits on alternating digitlines. Therefore, a wordline can cross 1,024 digitlines while only connecting to 512 memory bit transistors. The wordlines will have twice the overall resistance, but only slightly more capacitance since the wordlines run over field oxide on alternating digitlines. Table 8 contains the effective wordline time constants for various wordline lengths for a folded array core. For a wordline connected to N memory bits, the total resistance and capacitance using 0.3 μm polysilicon are Rwl=2·N·Pwl÷0.3 μm ohms and Cwl=Cw8·N Farads, respectively. As shown in Table 8, the wordline length cannot exceed 512 memory bits (1,024 digitlines) for the wordline time constant to remain under four nanoseconds. Although the wordline connects to only 512 memory bits, it is two times longer (1,024 digitlines) than wordlines in open digitline array cores. The folded digitline architecture, therefore, requires half as many row decode blocks or wordline stitching regions as the open digitline architecture.
A diagram of a 32 Mbit array block using folded digitline architecture is shown in FIG. 40. This block requires a total of 128 256 Kbit array cores. In this figure, the 128 array cores are arranged in an 8 row by 16 column configuration. The 16 column by 8 row arrangement produces 2 Mbit sections consisting of 256 wordlines and 8,192 digitlines (4,096 columns). A total of 16 2 Mbit sections form the complete 32 Mbit array block. Sense amplifier strips are positioned vertically between each 2 Mbit section, as was done in the open digitline architecture. Again, row decode blocks or wordline stitching regions are positioned horizontally between the array cores.
The 32 Mbit array block shown in
Array efficiency for the 32 Mbit memory block from
The addition of metal3 to the DRAM process enables the use of either a global or hierarchical row decoding scheme—similar to the open digitline analysis. While global row decoding and stitched wordlines achieve the smallest die size, they also place greater demands upon the fabrication process. For a balanced perspective, both approaches were analyzed for the folded digitline architecture. The results of this analysis are presented in Tables 10 and 11. Array efficiency for the 32 Mbit memory blocks using global and hierarchical row recoding is calculated to 74.0 percent and 70.9 percent, respectively.
Advanced Bilevel DRAM Architecture. The present invention provides a novel advanced architecture for use on future large scale DRAMs. A 32 Mbit memory block is described with this new architecture for use in a 256 Mbit DRAM. The results achieved with the new architecture are compared to those obtained for the open digitline and folded digitline architectures described above.
The bilevel digitline architecture is an innovation which has created a new DRAM array configuration—one that allows the use of 6F2 memory bits in an otherwise folded digitline array configuration.
6F2 memory cells are a byproduct of cross-point style (open digitline) array blocks. Cross-point style array blocks require that every wordline connect to memory bit transistors on every digitline, precluding the formation of digitline pairs. Yet, digitline pairs (columns) remain an essential element in folded digitline type operation. Digitline pairs and digitline twisting are important features that provide for good signal to noise performance. The bilevel digitline architecture solves the cross-point and digitline pair dilemma through vertical integration. Essentially, two open digitline cross-point array sections 100 are placed side by side as seen in FIG. 41. Digitlines in one array section are designated as true digitlines 106(b) and 104(b) while digitlines from the second array section are designated as complement digitlines 104(a) and 106(a). An additional conductor is added to the DRAM process to complete formation of the digitline pairs. The added conductor allows digitlines from each array section to route across the other array section—both true and complement digitlines being vertically aligned. At the juncture 108 between each section, the true and complement signals are vertically twisted. This twisting allows the true digitline to connect to memory bits in one array section and the complement digitline to connect to memory bits in the other array section. The twisting concept is illustrated in FIG. 42.
To improve signal to noise characteristics of this design, the single twist region is replaced by three twist regions as illustrated in FIG. 43. An added benefit to multiple twist regions is that only half of the digitline pairs actually twist within each region, thus making room in the twist region for each twist to occur. The twist regions are equally spaced at the 25%, 50%, and 75% marks in the overall array. Assuming that even digitline pairs twist at the 50% mark, then odd digitlines twist at the 25% and 75% marks. Each component of a digitline pair, true and complement, spends half of its overall length on the bottom conductor connecting to memory bits and half of its length on the top conductor. This characteristic balances the capacitance and the number of memory bits associated with each digitline. Furthermore, the triple twisting scheme guarantees that the noise terms are balanced for each digitline, producing excellent signal to noise performance.
A variety of vertical twisting schemes is possible with the bilevel digitline architecture. As shown in
The architectures of
To further advance the bilevel digitline architecture concept, its 6F2 memory bit was modified to improve yield. Shown in arrayed form in
In the bilevel and folded digitline architectures, both true and complement digitlines exist in the same array core. Accordingly, the sense amplifier block needs only one sense amplifier for every two digitline pairs. For the folded digitline architecture, this yields one sense amplifier for every four metal1 digitlines—quarter pitch. The bilevel digitline architecture that uses vertical digitline stacking needs one sense amplifier for every two metal1 digitlines—half pitch. Sense amplifier layout is, therefore, more difficult for bilevel than folded designs. The triple metal DRAM process needed for bilevel architectures concurrently enables and simplifies sense amplifier layout. Metal1 is used for lower level digitlines and local routing within the sense amplifiers and row decoders. Metal2 is available for upper level digitlines and column select signal routing through the sense amplifiers. Metal3 can, therefore, be used for column select routing across the arrays and control and power routing through the sense amplifiers. The function of metal2 and metal3 can easily be swapped in the sense amplifier block depending upon layout preferences and design objectives.
Wordline pitch is effectively relaxed for the plaid 6F2 memory bit used in the bilevel digitline architecture. The memory bit is still built using the minimum process feature size of 0.3 μm. The relaxed wordline pitch stems from structural differences between a folded digitline memory bit and an open digitline or plaid memory bit. There are essentially four wordlines running across each folded digitline memory bit pair compared to two wordlines that run across each open digitline or plaid memory bit pair. Although the plaid memory bit is 25% shorter than a folded memory bit (3 features versus 4 features), it also has half as many wordlines, effectively reducing the wordline pitch. This relaxed wordline pitch makes layout much easier for the wordline drivers and address decode tree. In fact, both odd and even wordlines can be driven from the same row decoder block, thus eliminating half of the row decoder strips in a given array block. This is an important consideration since the tight wordline pitch for folded digitline designs necessitates separate odd and even row decode strips.
The bilevel digitline array architecture depicted in
Sense amplifier blocks are placed on both sides of each array core. The sense amplifiers within each block are laid out at half pitch—one sense amplifier for every two metal1 digitlines. Each sense amplifier connects through isolation devices to columns (digitline pairs) from two adjacent array cores. Similar to the folded digitline architecture, odd columns connect on one side of the array core and even columns connect on the other side. Each sense amplifier block is then exclusively connected to either odd or even columns, never both.
Unlike a folded digitline architecture that needs a local row decode block connected to both sides of an array core, the bilevel digitline architecture only needs a local row decode block connected to one side of each core. As stated earlier, the relaxed wordline pitch allows both odd and even rows to be driven from the same local row decoder block. This feature helps make the bilevel digitline architecture more efficient than alternative architectures. A four metal DRAM process allows local row decodes to be replaced by either stitch regions or local wordline drivers. Either approach could substantially reduce die size. The array core also includes the three twist regions that are necessary for the bilevel digitline architecture. The twist region is somewhat larger than that used in the folded digitline architecture, due to the complexity of twisting digitlines vertically. The twist regions again constitute a break in the array structure, necessitating the inclusion of dummy wordlines.
As with the open digitline and folded digitline architecture, the bilevel digitline length is limited by power dissipation and minimum cell to digitline capacitance ratio. In the 256 Mbit generation, the digitlines are again restricted from having connection to more than 256 memory bits (128 memory bit pairs). The analysis to arrive at this quantity is the same as that for the open digitline architecture, except that the overall digitline capacitance is higher since the digitline runs equal lengths in metal2 and metal1. The capacitance added by the metal2 component is small compared to the metal1 component since metal2 does not connect to memory bit transistors. Overall, the digitline capacitance increases by about 25 percent compared to an open digitline. The power dissipated during a read or refresh operation is proportional to the digitline capacitance (Cd), the supply voltage (Vcc), the number of active columns (N), and the refresh period (P) and is given as Pd=Vccx·(N·Vcc (Cd+Cc))÷(2·P) watts. On a 256 Mbit DRAM in 8K refresh there are 32,768 (215) active columns during each read, write, or refresh operation. Active array current and power dissipation for a 256 Mbit DRAM are given in Table 12 for a 90 nS refresh period (−5 timing) at various digitline lengths. The budget for active array current is limited to 200 mA for this 256 Mbit design. To meet this budget, the digitline cannot exceed a length of 256 memory bits.
Wordline length is again limited by the maximum allowable RC time constant of the wordline. The calculation for bilevel digitline is identical to that performed for open digitline due to the similarity of array core design. These results are given in Table 4 above. Accordingly, the wordline length cannot exceed 512 memory bits (512 bilevel digitline pairs) if the wordline time constant is to remain under the required four nanosecond limit.
Layout of various bilevel elements was generated to obtain reasonable estimates of pitch cell size. These size estimates allow overall dimensions for a 32 Mbit array block to be calculated. The diagram for a 32 Mbit array block using the bilevel digitline architecture is shown in FIG. 47. This block requires a total of 128 256 Kbit array cores. The 128 array cores are arranged in 16 rows and 8 columns. Each 4 Mbit vertical section consists of 512 wordlines and 8,192 bilevel digitline pairs (8,192 columns). A total of eight 4 Mbit strips are required to form the complete 32 Mbit block. Sense amplifier blocks are positioned vertically between each 4 Mbit section. Row decode strips are positioned horizontally between every array core. There are only a total of eight row decode strips needed for the sixteen array cores since each row decode contains wordline drivers for both odd and even rows.
The 32 Mbit array block shown in
With metal4 added to the bilevel DRAM process, the local row decoder scheme can be replaced by a global or hierarchical row decoder scheme. The addition of a fourth metal to the DRAM process places even greater demands upon process technologists. Regardless, an analysis of 32 Mbit array block size was performed assuming the availability of metal4. The results of the analysis are shown in Tables 14 and 15 for the global and hierarchical row decode schemes. Array efficiency for the 32 Mbit memory block using global and hierarchical row decoding calculates to 74.5 percent and 72.5 percent, respectively.
Architectural Comparison. Although a straight comparison of DRAM architectures might appear simple, in actual fact it is a very complicated problem. Profit remains the critical test of architectural efficiency and is the true basis for comparison. This in turn requires accurate yield and cost estimates for each alternative. Without these estimates and a thorough understanding of process capabilities, conclusions are elusive and the exercise remains academic. The data necessary to perform the analysis and render a decision also varies from manufacturer to manufacturer. Accordingly, a conclusive comparison of the various array architectures is not possible. Rather, the architectures will be compared in light of the available data. To better facilitate a comparison, the 32 Mbit array block size data is summarized in Table 16 for the open digitline, folded digitline, and bilevel digitline architectures.
From Table 16 it can be concluded that overall die size (32 Mbit Area) is a better metric for comparison than array efficiency. For instance, the triple metal folded digitline design using hierarchical row decodes has an area of 34,089,440 μm2 and an efficiency of 70.9%. The triple metal bilevel digitline design with local row decodes has an efficiency of only 63.1%, but an overall area of 28,732,296 μm2. Array efficiency for the folded digitline is higher, but this is misleading, since the folded digitline yields a die that is 18.6% larger for the same number of conductors. Table 16 also illustrates that the bilevel digitline architecture always yields the smallest die area, regardless of the configuration. The smallest folded digitline design at 32,654,160 μm2 and the smallest open digitline design at 29,944,350 μm2 are still larger than the largest bilevel digitline design at 28,732,296 μm2. Also apparent is that the bilevel and open digitline architectures both need at least three conductors in their construction. The folded digitline architecture still has a viable design option using only two conductors. The penalty to two conductors is, of course, a much larger die size—a full 41% larger than the triple metal bilevel digitline design.
Conclusion. A novel bilevel digitline architecture for use on advanced DRAM designs has been described. The bilevel digitline architecture achieves significant reductions in die size while maintaining the high signal to noise performance of traditional folded digitline architectures. The bilevel digitline uses vertically stacked digitline pairs connected to arrays of 6F2 or smaller memory cells. Vertical digitline twisting ensures balanced noise cancellation and equalizes the quality of memory cells contacting each digitline. DRAM die size reduction results primarily from the use of smaller memory cells in cross-point style arrays and secondarily from efficient pitch cell utilization. Overall, the bilevel digitline approach presented combines the best characteristics of both folded and open digitline architectures into an efficient new DRAM architecture.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the present invention. Therefore, it is manifestly intented that this invention be limited only by the claims and the equivalents thereof.