CA2044689A1

CA2044689A1 - Multilevel inclusion in multilevel cache hierarchies

Info

Publication number: CA2044689A1
Application number: CA002044689A
Authority: CA
Inventors: Roger E. Tipley
Original assignee: Roger E. Tipley; Compaq Computer Corporation
Current assignee: Compaq Computer Corp
Priority date: 1990-06-15
Filing date: 1991-06-14
Publication date: 1991-12-16
Also published as: JPH04233048A; DE69130086T2; EP0461926A3; EP0461926A2; DE69130086D1; EP0461926B1; ATE170642T1; US5369753A

Abstract

MULTILEVEL INCLUSION IN
MULTILEVEL CACHE HIERARCHIES
Abstract A method for achieving multilevel inclusion in a computer system with first and second level caches.
The caches align themselves on a "way" basis by their respective cache controllers communicating with each other as to which blocks of data they are replacing and which of their cache ways are being filled with data.
On first and second level cache read misses the first level cache controller provides way information to the second level cache controller to allow received data to be placed in the same way. On first level cache read misses and second level cache read hits, the second level cache controller provides way information the first level cache controller, which ignores its replacement indication and places data in the indicated way. On processor writes the first level cache controller caches the writes and provides the way information to the second level cache controller which also caches the writes and uses the way information to select the proper way for data storage. An inclusion bit is set on data in the second level cache that is duplicated in the first level cache. Multilevel inclusion allows the second level cache controller to perform the principal snooping responsibilities for both caches, thereby enabling the first level cache controller to avoid snooping duties until a first level cache snoop hit occurs. On a second level cache snoop hit, the second level cache controller checks the respective inclusion bit to determine if a copy of this data also resides in the first level cache. The first level cache controller is directed to snoop the bus only if the respective inclusion bit is set.

Description

8 ~

NULTILEVEL INC~USION IN MULTILEVEL CACHE
HIERARCHIES

The present invention relates to micropxocessor cache s~bsystems in computer systems, and more specifically to a method for achieving multilevel inclusion among first level and second level caches in a computer system so that the second level cache controller can perform the principal snooping responsibilities for both caches.

The personal computer industry is a vibrant and growing field that continues to evolve as new innovations occur. The driving force behind this innovation has been the increasing demand for faster and more powerful computers. A major bottleneck in personal computer speed has historically been the speed with which data can be accessed from memory, referred to as the memory access time. The microprocessor, with its relatively fast processor cycle times, has generally been delayed by the use of wait states during memory accesses to account for the relatively slow memory access times. Therefore, improvement in memory access times has been one of the major areas of research in enhancing computer performance.

,. ' -, - .' ., ' ~ ' ' ~ . , -, :' ' ~ ' ., ~4fi89 In order to bridge the gap between fast processor cycle times and ~low memory access times, cache memory was developed. A cache is a small ~mount oP Yery fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main me~ory. The microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during me~ory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the m~mory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place, and the memory request is forwarded to the system and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high "hit rate", which is the percentage o~ cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a xelatively infrequ~nt ~iss are avera~ed over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of the ~icroproc:essor, cache hits are serviced locally without rec~iring use of the system bus. Therefore, a processor operating out of its local cache has a much lower "bus ut:ilization." This reduces syste~ bus 2 ~ 8 9 bandwidth used by the processor, making more bandwidth available for other bus mas1:ers.
Another important feature o~ caches is that the processor can operate out of its loca' cache when it does not have control of the! ~ystem bus, ~here~y increasing the efficiency of the computer ~ystem. In systems without microprocessor caches, the processor generally must remain idle while it does not have control of the system bus. This reduces the overall efficiency of the computer system because the processor cannot do any useful work at this time. Rowever, if the processor includes a cache placed on its local bus, it can retrieve the necessary code and data ~rom its cache to perform useful work while other devices have control of the system bus, thereby increasing system efficiency.
Cache performance is dependent on many factors, including the hit rate and the cache memory access time. The hit rate is a measure of how efficient a cache is in maintaining a copy of the most frequently used code and data, and, to a large extent, it is a function of the size of the cache. A larger cache will g~nerally have a higher hit rate than a ~maller cac~e.
Increasing the size of the cache, however, can possibly degrade the cache memory access time. ~owever, cache designs for a larger cache can be achieved using cache memory with the fastest possible access ti~es such that the limiting factor in the design ~s the minimum CPU
access time. In this way, a larger cache would not be penalized by a possibly slower cache ~emory access time with respecl: to the memory access time of a smaller cache because the limiting factor in the design would be the minimum CPU access time.
Other i.mportant considerations in cache performance are the organization of ~he cache and the .. . . . .
.

' ' , ..... ..

~0~89 cache management policies that are employed in the cache. A cache can generally be organized into either a direct-mapped or set-associative configuration. In a direct-~apped organization, the physical address space of the computer i& conceptually divided up into a number of equal pages, with the page ~ize e~ualing the size of the cache. The cas~he i~ divided up into a number of ~ets, with each E;et having a certain number o~ lines. Each of the pages in main memory Aas a number of lines eguivalent to the ~umber of lines in the cache, and each line from a respective page in main memory c~rresponds to a similarly located line in the cache. An important characteristic of a direct-mapped cache is that ~ach ~emory line fro~ a page in main memory, referred to as a page offset, can only reside in the eguivalently located line or page o~fset in the cache. Due to this restriction, the cache only need refer to a certain number of the upper address bits of a memory address, referred to as a ~ag, to determine if a copy of the data from the respective memory addr~ss resides in th~ cache because the lower order address bits are pre-determined by the page offset of the memory address.
Whereas a direct-mapped cache is vrganized as one bank o~ memory that is equivalent in size to a conceptual page in main memory, a set-associative cache includes a number of banks, or way~, of memory that are each equivalent in size to a conceptual page in main memory. Accordingly, a page o~fset in main memory can be mapped to a number of locations in the cache equal to the number of ways in the cache. For example, in a 4-way set associative cache, a line or page offset from main memory can reside in the equivalent page offset location in any of the four ways of the cache.

, . , ., ' ' , .. ~ , ~ . ' - ~ .. : . . . : .:

-- .

2 ~ Ll 4 6 8 9 A set-associative cache generally includes a replacement algorithm that determine~ which bank, or way, with which to fill data when a read ~i85 occurs.
Many set-associative cach~s use some form of a least recently used (LRU) algorithm that places new data in the way.that was least recent:ly acce~sed. This is because, statistically, the way most recently used or accessed to provide data to the processor is the one most likely to ba need~d again in the future.
Therefore, the LRU algorithm ensures that the block which is replaced is the least likely to have data requested by the cache.
Cache management is generally performed by a device referred to as a cache controller. The cache controller includes a directory that holds an associated entry for each ~et in the cache. This entry generally has three components: a tag, a tag ~alid bit, and a number of line valid bits equaling the number of lines in each cache set. The tag acts as a main memory page number, and it holds the upper address bits of th~
particular page in main memory from which the copy of data residing in the respective set of the cache originated. The status of the tag valid bit determines whether the data in the respective set of the cache is considered valid or invalid. If the tag valid bit is clear, then the entire set is considered inval~d. If the tag valid bit is true, then an individual line within the set is considered valid or invalid depending on the status of its respective line valid bit.
A principal cache management policy is the preservation of cache coherency. Cache coherency refers to the reguirement that any copy of data in a cache must be identical to (or actually be) the owner of that location's data. The owner of a location'~
data is generally defined as the respective location . ~ . - - ~ . . . -,: ~

.
. . , ~ .

2~4'~6~9 ~;
having the most recent version ~f the data residing in the respective memory location. Th~ owner of data can be either an unmodified location in main memory, or a modified location in a write-~back cache. In co~puter systems where independent bus masters can access memory,.there is ~ possibility that a bus master, such as a direct ~emory access conl:roller, network or disk interface card, or video graphics card, ~ight alter the contents of a main memory location that is duplicated in the cache. When this occurs, the cache is 6aid to hold nstale" or invalid data. In order to ~aintain cache coherency, it is necessary for the cache controller to monitor ths syste~ bus when the processor does not own the system bus to see if another bus master accesses main me~oryO This method o~ monitoring the bus is referred to as snooping.
The cache controller must monitor the system bus during memory reads by a bus master in a write-back cache design because of the possibility that a previous processor write may have altered a copy of data in the cache that has not been updated in main me~ory. This is referred to as read snooping. On a read snoop hit where the cache contains data not yet updated in main ~emory, the cache controller generally provides the respective data to main memory, and the requesting bus master generally reads this data en route from the cache controller to main memory, this operation being referred to as snarfing. The cache controller must also monitor the system bus during memory writes because the bus master ~ay write to or alter a memory location that resides in the cache. This is referred to as write snooping. On a write snoop hit, the cache entry is either marked invalid in the cache directory by the cache controller, signifying that this entry is no longer correct, or ~he cache is updated along with , ~
- , " . ,: . ' ' ' : ' . ' ' ' ' :

2 ~ 9 main memory. Therefore/ when a bus master reads or writes to ~ain memory in a write-back cache design, or writes to main memory in a write-throu~h cache design, the cache controller ~ust latch ~h~ 8y6tem address and perform a cache look-up in thle tag directory corresp~nding to the page offlset locat~on where the memory access occurred to 6ee if the main memory location being accessed also resides in the cache. If a copy cf the data from this :Location does reside in the cache, then the ~ache controller takes the appropriate ~ction depending on whether a read ~r write snoop hit has occurred. This prevent~ lncompatible data from being storad in ~ain memory and the cache, thereby preserving cache coherency.
Another consideration in the preservation of cache coherency is the handling o~ processor writes to memory. When the processor writes to main memory, the memory location must be checked to determine i~ a copy of the data from this location al~o resides in the cache. If a processor write hit occurs in a write-back cache design, then the cache location is updated with the new data and main memory may be updated with the new data at a later time or should the need arise. In a write-through cache, the main memory location is generally updated in conjunction with the cache location on a processor write hit. If a processor write miss occurs, the cache controller may ignore the write miss in a write-through cache design because the cache is unaffected in this design. Alternatively, the cache controller may perform a ~write-allocate" whereby the cache controller allocates a new line in the cache in addition to passing the data the data to the ~ain memory. In 2t write-back cache design, the cache controller generally allocates a new line in the cache when a processor write miss occurs. This generally .. ' -, ' :'' -: :
- .
.. . . , ., .
.

inv~lves reading the remaining ~ntrie6 ko fill the line from main memory before or jointly with providing the write data to the cache. ~ain memory is updated at a later time should the need ~rise.
Caches have generally been designed independently of the ~icroprocessor. The ulche is placed ~n the local bus of the microprocessor ~nd interfaced between the processor and the ~ystem bus during the design of the computer syste~. ~owever, with the development of higher transistor ~ensity computer chips, many processors are currently being designed with an on-chip cache in order to ~eet performance goals with regard to memory access times. The on-chip cache used in these processors is generally ~mall, an exemplary size being 8 kbytes in size. The smaller, on-chip cache is generally faster than a large off-chip cache and reduces the gap between fast processor cycle times and the relatively slow access times o~ large caches.
In c~mputer systems that utilize processors with on-chip caches, an external, second level cache is often added to the syste~ to further improve ~emory access time. The second level cache is generally much larger than the on-chip cache, and, when used in conjunction with the on-chip cache, provides a greater overall hit rate than the on-chip cache wGuld provide by itself.
In systems that incorporate multiple levels of caches, when the processor requests data from memory, the on-chip or first level cache is first checked to see if a CQpy of the data resides there. If 80, then a first level cache hit occurs, and the first level cache provides the appropriate data to the processor. If first level cache miss occurs, then the ~econd level cache is then checked. If a second level cache hit occurs, then the data is pr~vided from the second level : ' ' . , ': . . .
,, . : , ' ' ' . . .
. .
.:
: : :, . : . . :. : :
- . . . . . . . .
. - ~ .

cache to the processor. If al second level cache miss occurs, then the data is retrieved from main ~emory.
Write operations are similar, with mix and matching of the operati~ns discussed above being possible.
In multilevel cache systems, it ha~ generally been necessary for each cache to 6noop the system bus during memory writes by ~ther bus masters in order to maintain cache coherency. When the microprocessor does not have control of the ~ystem bus, the cache controllers of both the first level and second level caches are required to latch the address of every memory write and check this address ~gainst the tags in its cache directory. This considerably i~pairs the efficiency of the processor working out of its on-chip cache during this time because it i6 continually being interrupted by the snooping efforts of the cache controller of the on-chip cache. Therefore, the requirement that the cache co~troller of the on-chip cache snoop the system bus for ev~ry memory write degrades system per~ormance ~ecause it prevents the processor from efficiently operating out sf its on-chip cache while it does not have control of the system bus.
In many instances where multilevel cache hierarchies ~xist with multiple processors, a property referred to as multilevel inclusion is desired in the hierarchy. Multilevel inclusion provides that the second level cache is guaranteed to have a copy of what is inside the first level, or on-chip cache. When this occurs, the ~econd level cache is said to hold a supersst of the first level cache. Multilevel inclusion has mostly been used in mult~-processor systems to prev2nt cache coherency problems. Wben multilevel inclusion i~ implemented in multi-processor systems, the higher level caches can shield the lower level caches from cache coherency problems and thereby 2û~6~

prevent unnecessary blind checks and invalidations that would otherwise occur in the lower level cache~ if multilevel inclusion were not implemented.

The present invention includes a ~ethod for achieving multilevel inclu~ion among first ~nd second levsl caches in a computer syE;tem. ~ultilevel inclusion obviates the necessity of the cache lo controller of the first level cache to ~noop the ~ystem bus for every memory write that occurs while the processor is not in control of the system bus because the cache controller of the ~econd level cache can assume this duty for both caches. This ~rees up the first level cache controller and thereby allows the microprocessor to operate more ef~iciently out of the first lev~l cache ~hen it does not have control of the system bus.
The second level cache preferably has a number of ways egual to or greater than the number of ways in the first level cache. The first level and second level caches are 4-way 6et associative caches in the preferred embodiment of the present invention. In this embodiment there i~ a one-to-one correspondence between the cache ways in the first level cache and the cache ways in the second level cache. During a first leYel cacha line fill from main memory, the first level cache controller communicates to the second level cache controller the particular first level cache way in which the data is to be placed so that the second level cache controller can place the data in the corresponding second level cache way. When the second level cache controller is transmitting a copy of data to the first level cache controller, the second level cache controller informs the first level cache - . . .
, '' . , -. ': " ', ' , :- , 2 ~ 8 9 controller which ~econd level cache way the data i8 coming from. The first level cache controller disregards its nor~al replac~ment ~lgorithm and fills the corresponding first level ~ache way. In this ~anner, the first and second level cac~es align themselYes on a way basis. n This ~way~ alignment prevents the second level cache controller from placing data in a different way than the fir t level cache and in the process pos~ibly discarding data that resides in lo the first level cache.
The cache organization of the first level cache according to the present invention is a write-through architecture. On a processor write, the info~mation is preferably written to the first level cache, regardless of whether a write hit or write miss occurs, and external write bus cycles ~re initiated which write the information to the second level cache. The first level cache broadcasts the particular first level cache way where the data was placed to the second level cache controller so that the ~econd level cache controller can place the data in the corresponding second level cache way, thereby retaining the ~way" alignment. The second level cache is preferably a write-back cache according to the preferred embodiment, but could be a write-through cache if desired.
The second level cache controller utilizes an inclusion bit with respect to each line of data in the second level cache in order to remember whether a copy of this data also resides in the first level cache.
~nen a location in the first level cache is replaced, whether concurrently with a second level cache replacement from memory or diractly from the second level cache, the second level cache controller sets an inclusion bit ~or that location in the second level 3S cache to signify that a copy o~ this data is duplicated ~4~9 in the first level cache. ~len this occurs, all other locations in the ~econd level cache that correspond to the 6ame location in the firE;t level cache have their inclusion bits cleared by the second leYel cache controller to ~ignify that the data held in th2se locatio~s does not reside in ~he fir~t level cache.
The second level cache controller performs the principal snooping duties for both caches when the processor does not have control of the system bus.
When a writs snoop hit occurs in the second level cache, the inclusion bit is read by the second level ca~he controller to see whether the first level cache controller must also snoop the memory access. If the inclusion bit is not set, then the first level cache controller is left alone. If the inclusion bit is set, then the secon~ level cache controller directs the first level cache controller to snoop that particular memory access. In this manner, the first level cache controller can neglect its snooping duties until the cecond level cache controller determines that a write snoop hit on the first level cache has actually occurred. This allows the processor to operate more efficiently out of its first level cache when it does not have control of the ~ystem bus.

A better understanding of the invention can be obtained when the ollowing detailed description of the preferred embodiment is considered in conjunction with the following drawings, in which:
Figure 1 is a block diagram of a computer system including ~i:rst and second level caches and implementing multilevel inclusion according to the present invention;

' .

Figure 2 depicts the organization of the 2-way ~et associative C1 cache of Figure l;
Figure 3 depicts the organization of the 2-way 6et associative C2 cache of Figure l;
S Figures 4~ and 4B depict a flowchart illu~trating the ope~ation of cache read hits and mis6es according ~o the present invention; and Figure 5 is a flowch~rt .illustr~ting the operation of read and write snooping according to the present invention.

Referring now to Figure 1, a computer system S is generally shown. Many o~ the details of a computer system that are not relevant to the present invention have been omitted for the purpose of clarity. The computer system S includes a microprocessor 20 that is connected to a first level cache Cl that is preferably locat~d on the same chip 22 as the pro~essor 20. The chip 22 includes a C1 cache controller 30 that is connected to the Cl cache and controls the operation of the C1 cache. The processor 20, the first level cache Cl, and the first level cache controller 30 are connected to a ~ystem bus 24 throu~h a local processor bus 25. A second level cache C2 is connected to the loc~l processor bus 25. A second l~vel cache controller, referred to as the Q cache controller 32, is connected to the C2 cache and the local processor bus 25. Random access memory 26, which is 4 Gigabytes in size according to the present embodiment, and an intelligent bus master 28 are connected to the system bus 24. The rando~ access memory (RAM) 26, includes a system memor~ controller (not shown) that controls the operation of the RAM 26. The RAM 26 and the system memory controller (not shown) are hereinaftar re~erred 2 ~ 9 to as ~ain mem~ry 26. The ~ystem bu~ 24 includes a data bus and ~ 32-bit addre~s bus, the address bus including address bit~ A2 to A31, which allows access to any of 23~ 32-bit doublewords in main memory 26.
The bus ~aster 28 ~ay be any of the type that controls the ~ystem bus 24 when the processor 8yst2m i~ on hold, ~uch as the system direct ~e:mory dcce~ (DMA) controller, a hard elisk inte:rface, a local area n etworX
(LAN) interface or a video graphics proce sor ~ystem.
The Cl and C2 caches are aligned on a Iway~ basis such that a copy of data placed in a particular way in one of the caches can only be placed in a predetermined corresponding way in the other cache. This "way"
alignment requires that the C2 cache have at least as many cache ways as does the Cl ~ache. If the Cl and C2 caches have the sa~e number of ways, then there is a one-to-one correspondence between the cache ways in the Cl cache and the cache ways in the C2 cache. If the C2 cache has more cache ways than the C1 cache, then each 2 0 cache way in the Cl cache corresponds to one or more cache ways in the C2 cache. However, no two Cl cache ways can correspond to the same C2 cache way. This requirement stems from the fact that each memory address has only one possible location in each of the Cl and C~ t:aches. Accordingly, if two Cl cache ways corresponded to a single C2 cache way, then there would be memory address locations residing in the Cl cache that would be incapable of residing in the C2 cache.
The respective C2 cache way location would be incapable of holding the two memory addresses which would reside in each of the respective C1 cache ways that corresponded to the respective C2 cache way location.
The actual si2e of each of the caches is not important for the purposes of the inventionO However, the C2 cache must be at least as large as the Cl cache - ~ .

2 ~ 9 - ~5 -to ~chieve multilevel ~nclusion, and the C2 ca~he is preferably at least four times as large as the Cl cache to provide for an improved cache hit rate. In the preferred embodiment of the present invention, the Cl cache is ~ kbytes in size and the C2 cache i5 prefQrably 512 ~bytes in size. In thi~ embodiment, the Cl cache and th~ C2 cache are each ~-way ~et associative caches. In an alternate e~bodiment of the present invention, the Cl and C2 caches are each 2-way set-associative caches.
Referring now to Figures 2 and 3, conceptual diagrams of the Cl and C2 caches with their respective cache controllers 30 and 32 configured in a 2-way set-associative organization are generally ~hown. The following discussion i~ intended to provide an introduction to the ~itructure and operation of a set-associative cache as well as the relationship between the cache memory, cache directories, and main memory 26. The Cl and C2 caches are discussed with reference to a 2-way set-associative cache organization as a simpler example of the more complex 4-way set-associative cache organization of the preferred embodimentO The special cache controller design considerations that arise in a 4-way set-associative cache organization that do not occur in a 2-way set-associative organization are noted in the ~ollowing discussion.
The Cl cache includes two banks or ways of memory, refarred to as Al and Bl, which are each 4 kbytes in size. Each of the cache ways Al and B1 are organized into 128 set~i, with each set including eight lines 58 of memory stc)rage. Each line includes one 32-bit doubleword, or four bytes of memory. Main memory 2S
is conceptually organi2ed as 22~ pages with a page size of 4 kbytes, which is equivalent to the ~ize of each C1 2 ~ 9 cache ~ay Al and Bl. Each conceptual pa~e in ~ain ~emory 26 includes 1024 line,s, which i8 the 6am number of lines as hav~ ~ach of the cache ways Al and Bl. The unit of transfer between the main m~mcry 26 and the Cl cache is one line.
A particular line locat:ion, or page offset, ~rom each of the pages in main ~eDDOXy 26, maps to the similarly located line in each of the cache ways A1 and Bl. For example, as ~hown in Figure 2 the page offset from each of the pages in main memory 26 that is shaded maps to the equivalently located, and shaded, line offse~ in each of the cache ways Al and B1. In this way, a particular page offset memory location ~rom main memory 26 can only map to one of two locations in the Cl cache, these locations being in each of ~he cache ways Al and B1.
Each of the cache ways Al and Bl include a cache directory, referred to as directory DA1 and directory DBl, respectively, that are located in the Cl cache controller 30 of the Cl cach~. The directories DAl and DBl each include one entry 60 and 62, respectively, for each of the 128 sets in the respective cache way Al and Bl. The cache directory entry ~or each ~et has three components: a tag, a tag valid bit, and eight line valid bits, as shown. The number of line valid ~its equals the nu~ber of lines in each set. The 20 bits in the tag field hold the upper address bits, address bits A12 to A31, of the ~ain memory address location of the copy of data that resides in the respective set of the cache. The upper address bits address the appropriate 4 kbyte conceptual page in main memory 26 where the data in the respective set of the cache is located.
The remaining address bits from this main memory address location, address bits A2 to All, can be partitioned .into a set address field comprising seven , . . .
.
, : - , fi ~ 9 bits, A5 to All, which are us~ed to ~elect one of the 128 ~ets in the Cl cache, ~nd ~ line ~ddress fi~ld comprising 3 bits, A2 to A4, which 2re used to 6elect an individual line from the e:ight line~ in the ~elected set. Therefore, the lower address bit~ A2 through A11 ~erve a~ the "cache address" which directly ~elects one of the line locations in each of the ways A1 and Bl of the Cl cache.
When the microprocessor .initiates a memory read cycle, the address bi~s A5 to A11 are used to select one of the 128 sets, and the address bits A2 to A4 are used to select one of the respective line valid bits within each entry in the respective directories DA1 and DBl from the selected ~et. The lower address bits A2 to A11 are also used to select the appropriate line in the ~1 cache. The cache controller compares the upper address bit tag field of the requested memory address with each of the tags stored in the selected directory entries of the selected set ~or each of the cache ways 20 Al and Bl. At the same ti~e, both the tag valid and line valid bits are ~hecked. I~ the upper address bits match one of the tags, and if bDth the tag valid bit and the ~ppropriate line valid bits are set for the respective cache way directory where the tag match was ~5 made, the result is a cache hit, and the corresponding cache way is direct~d to drive the selected line of data onto the data bus.
A miss can occur in either of two ways. The girst is ~nown as a line miss and occurs when the upper address bits of the reguested memory address match one of the tags in either of the directories DAl or DBl of the selected set and the respective tag valid bit is set, but the respective line valid bit~s) where the requested data resides are clear. ~he second is called a tag miss and occurs when either the upper address ~ .

2 ~ 8 ~

bits of the requested memory ia~dress do not ~atch either of the respective tags in directories DA1 or DB1 of the ~elec~ed 6et where the requested data iB
located, or the respective tag valid ~it for each of the directories DA1 or DBl are not clear.
Th~ C1 cache controller 30 includes a replacemenk algorithm that determines which cache way, Al or B1, in which to place new data. The replacement algorithm used is a least recently used (LRU) algorithm that places new data in the cache way that was least recently accessed by the processor for data. This is because, statistically, the way most recently used is ~h8 way most likely to be needed again in the near future. The Cl cacho controller 30 includes a direct~ry 70 that holds a LRU bit for each set in the cache, and the LRU bit is pointed away from the cache way that was most recently accessed by the processor.
Therefore, if data reguested by the processor resides in way Al, then the LRU bit is pointed toward B1. If the data requested by the processor resides in way Bl, then the LRU bit is pointed toward Al.
In the 4-way set-associative Cl cache organization of the preferred embodiment, a more elaborate LRU or pseudo~LRU replacement algorith~ can be used in the C1 cache controller 30~ The choice of a replacement algorithm is generally irrelevant to the present invention, and it is suggested that an LRU or pseudo-LRU algorithm be chosen to optimize the particular cache design used in the chosen embodiment. One replacement algorithm that can be used in the Cl caehe controller 30 in the 4-way set-associative Cl cache organization of the preferred embodiment is a pseud~-LRU algorithm which operates as follows. The 4-way set-associative Cl cache includes four ways of memory referred to as W0, W1, W2, and W3. Tbree bits, , - ' ' ' -. -.' ., - . : , - . .. . .

.. - : . . : . , - ., . . -. :,, . . ' ~ :, -2 ~ ,g ~

referred to as Xo, X1, and X2, are located in the C1cache controller 30 and are defined for a respective set in each of the ways in the 4-way cl cache. ~hese bits are called LRU bits and are updated for every hit or replace in the Cl cache. If the most recent access in the ~espective ~et was to way Wo or way ~1, then Xo i~ 6et to 1 or a logic high value~ Bit X0 is 6et to 0 or a logic low value if the most recent access was to way W2 or way ~3. If X0 is ~et to 1 and the ~ost lo recent access between way W0 and way Wl was to way Wo, then Xl is set to 1, otherwise Xl is ~et to 0. If X0 is set to 0 and the most recent access between way W2 and way W~ was to way W2, then X2 is set to 1, otherwise X2 is set to 0.
The pseudo LRU replacement mechanis~ works in the following manner. When a line must be replaced in the 4-way C1 cache, the Cl cache controller 30 uses the X0 bit to first select the respactive ways W0 and Wl or W2 and W3 where the particular line relocation candidate that was least recently used is located. The Cl cache controller then utilizes the Xl and X2 bits to determine which o~ the two selected cache ways W0 and W1 or W2 and W3 holds the respective line location that was least recently used, and this line location is marked for replacement.
The Cl cache controller 30 broadcasts its LRU
information to the C2 cache controller 32 on Cl and C2 cache read misses and on processor writes according to the present invention. In this manner, the C2 cache controller 32 is able to place the copy of data that it receives from either the main memory 26 on read misses or from the processor 20 on processor writes into the C2 cache way corresponding to the C1 cache way where the Cl cache controller placed the copy of data, thereby achieving multilevel inclusion, In addition, ':
.

S 8 ~

the Cl cache controller 30 ignores its LRU repl~cement algorithm on a Cl cache read miss and a C2 cache read hit 8~ that the Cl cache controller 30 can place the copy of data that it receives fro~ thc C2 cache controller 32 in the C1 cache way corresponding to the C2 cache way where the read hit occurred.
The 2-way ~;et-associative C2 cache is organized in a manner similar to that of t.he 2-way ~et associative Cl cache. In the preferred e~mbodiment~ the C2 cache preferably comprises 512 kbytes of cache data ~AM.
Referring now to Figure 3, each cache way A2 and B2 in the C2 cache is 256 kbytes in size and includes 8192 sets of eight lines each. The line size in the C2 cache is one 32-bit doubleword, which is the same as that of the C1 cache. The 4 Gigabyte main memory 26 is organized into 214 conceptual pages with each conceptual page being 256 kbytes in size. The number of conceptual pages of main memory 26 f or the C2 cache is less than that o~ the Cl cache because the conceptual page size for the C2 cache is greater than that of the C1 cache. As in the Cl cache, each line location or paqe offset in main memory 26 maps to a similarly located line in each of the cache ways A2 and B2.
The C2 cache controller 32 includes cache way directories DA2 and DB2. The cache way directories DA2 and DB2 have set entries which include 14-bit tag fields, as opposed to the 20-bit tag fields in tha entries of the Cl cache dir~ctories DA1 and DBl. The 14-bit tag fields hold the upper address bits, address bits A18 to A31, that address the appropriate 256 kbyte conceptual page in main memory 26 where the data in the respective set of the cache is located. The remaining address bits" A2 to A17, can be partitioned into a set address field comprising thirteen bits, A5 to Al7, .- - : -': . .',, ' ' '' ' ~
:-% ~ 8 9 - 2~ -which ~re used to select one of the 8192 ~ets in the C2 cache, and a line address field comprising 3 bits, A2 to A4, which are used to ~elect an individual line from the eight lines in the selected set. Therefore, in the C2 cache the lower address bits A2 to A17 serve as the "cache ~ddress" which directly ~elects one of the line locations in each of the ways A2 and B2 of the C2 cache.
The C2 cache controller 32 according to the present invention does not generally reguire a rPplacement algorithm because the C2 cache receives new data only on C1 and C2 cache read misse~ and on processor writes, and in those instances the C2 cache controller receives the way location from the C1 cache controller and ~ust fill the corresponding C2 cache way. Therefore, the C2 cache controlle~ 32 doPs not need a replacement algorithm because the respective C2 cache way where data is placed is deter~ined by the data's way location in the Cl cache. However, if the C2 cache has more ways than has the Cl cache, then the C2 cache controller 32 will require use of a replacement algorithm. In this instance, a C1 cache way will correspond to two or more C2 cache ways.
Accordingly, when the C1 cache controller 30 broadcasts the C1 cache way location to the C2 cache controller 32, the C2 cache controller 32 will need a replacement algorithm in order to decide betwPen the multiple C2 cache ways that correspond to the Cl cache way location in which to place the received data.
The 2-way set-associative Cl and C2 caches are aligned on a ~way~ basis ~uch that the ways Al and B1 in the Cl cach~ have a one-to-one correspondence with the ways A2 ancl B2, respectlvely, o~ the C2 cache. In this manner, a page offset ~rom main memory 26 that is placed in the respective line location ~n a ~ cache 8 ~

way A1 or Bl has only one por~6ible l~>c~tion in the corresponding C2 c2lche way A2 or B2, respectively.
Conversely, ~ respective line location in a C2 cache way ~2 or B2 has only one possible locatic~n in the corresponding Cl cache way Al or B1, respectively.
Howevert because the C2 cache is 6~ times a~ large as the Cl cache, each of the C2 cache ways A2 or B2 hold 64 lines of data that each correspond to, or ct~uld be located in, a ~ingle line or page off~et location in the corresponding C1 cache way Al or Bl. Therefore, the C2 cache controller 32 according to the present invention includes inclusion bits 80 for each of its respective lines. This enables the C2 cache controller 32 to remember whether a copy of data from the respectivç C2 cache line also resides in the corresponding C1 cache line location.
The use o~ inclusion bits B0 allows the C2 cach~
controller 32 to remember which of the 64 lines of data in the respective C2 cache way A2 or B2 that corresponds to a ~ingle rl cache way location holds a copy of data that is dupl icated in that Cl cache location. For example, if a line in the C2 cache receives a copy of data ~rom main memory 26 that was also placed in the Cl cache, or if a line in the C2 cache provides a copy of data that is placed in the Cl cache, then an inclusion bit for the respective C2 cache line is true or ~t to a logic high value, signifying that the respective C2 cache line holds a copy of data that i~ duplicated in the respective Cl cache locat:ion. The other 63 line locations in the C2 cache which correspond to the respective Cl cache locatiion involved in the above operation have their inclusion biits cleared as a reminder that the copy of data that they hold i5 not duplicated in a ~1 ca¢he location. This is important because one of these other ,, . ~ . .

. .

2 ~

63 line locations ~ay hold data that was previously duplicated in the respectiYe Cl cache location before one of the operations ~entionedl above placed new data in the respective Cl cache location, ~nd therefore one of th~se 63 locations may have itfi inclusion bit ~et.
The only inst~nce where one of these other 63 C2 cache locations would not have its inclusion bit set is when the respective C2 cache line location that was involved in the above operation and had its inclusion bit set also held the copy o data that was duplicated in the respective Cl cache location before the operation took place and therefore already had its inclusion bit ~et.
Referring now to Figures 4A ~nd 4B, a flowchart describing the operation of the Cl and C2 caches according to the present invention is shown. It is understood that-numerous of these operations may occur concurrently, but a flowchart format has been chosen to simplify the explanation of the oper~tion. For clarity, the flowchart is shown in two portions, with the interconnections between ~igures 4A and 4B
designated by reference to the circled numbers one and two. Step 100 represents that the computer ystem S is operating or turned onO In ~ome computer systems, the processor is required to have control of the system bus 24 before it may issue ~emory reads or writes.
However, in the system S according to the preferred embodiment the processor 20 is not required to have control of the system bus 24 when it issues memory reads or writes but rather the processor 20 can operate out of the Cl cache and the ~2 cache without re~uiring use of the fiystem bus 24 until z C1 and C2 cache read miss or a processor write beyond any posting depth occurs.
When the processor 20 attempts a main memory read in ctep 102, ~le C1 cache controller 30 first checks 2~6g9 the Cl cache in ~tep 104 to det:ermine if ~ copy of the request~d main memory data reside~ in the C1 cache. If a copy of the requested data does not reside in the Cl cache, then a C1 cache read miss occurs in ~tep 106, S and the read operation i6 passed on to the C2 cache, where the C2 cache controller 32 then checks the C2 cache in step 108. If a copy of the requested data does not reside in the C2 cache, then a C2 cache read miss occurs in step 110, and the operation is passed onto the system ~emory controller to obtain the necessary data from main memory 26.
Main memory 26 provides the requested data to the Cl cache, the C~ cache and the processor 20 in step 112, and the Cl cache controller 30 places the data into one of its cache ways Al or Bl according to its particular replacement algorithm in step 114. The data is placed in the C1 cache because of the statistical likelihood that this data will be requested again soon by the processor 20. The Cl cache controller 30 during this period has been broadcasting to the C2 cache controller 32 the particular C1 cache way A1 or Bl in which it is placing the data, represented in ~tep 118, so that the C2 cache controller 3~ can place th2 data in the corresponding C2 cache way A2 or B2 in step 120.
The C2 cache controller 32 sets the inclusion bit on the respective C2 cache memory location where the data is stored in step 122, signifying that a copy of the data in this location also resides in the C1 cache.
The C2 cache controller 32 also clears the inclusion bits on the o-ther 63 C2 cache locations that correspond to the same page offset location in the Cl cache in step 124 to s:ignify that a copy of the data in these locations does not reside in the Cl cache. Upon completion of the ~emory read, the computer system 3s returns to step 100.

' :' ' " - . '' ' . -- . , 20~8~

~ 25 ~
The above ~equence of events oc~urs on a C1 and C2 cache read miss and also when the computer ~y~tem S is first turned on becau6e the Cl and C2 cachPs ~re both empty at power on of the computer ~y~tem S ~nd C1 and C2 cache misses ar~ ~herefore guaranteed. ~he majority of processor memory reads ~hat occur immediately after power on of the computer ~ystem S will be C1 and C2 cache mi~ses because the Cl and C2 caches are relatively empty at this time. In thi~ manner, the C1 and C2 caches are filled with data and align themselves on a ~way" basis wherein data in a particular way A1 or Bl in the C1 cache is guaranteed to be located in the corresponding cache way A2 or 82 in the C2 cache. In addition, when the computer system S has been operating for a while and a Cl and C2 cache read ~iss occurs, the resulting line fills of data in the C1 and C2 caches are performed as described above and therefore the "way" alignment is maintained.
When the processor 20 initiates a main memory read in ~tep 102 and the C2 cache controller 32 checks the C2 cache in 6tep 108 after a Cl cache miss occurs in step 106, and a copy of the requested data resides in the C2 cache, then a C2 cache hit occurs in ~tep 130.
The C2 cache controller 32 provid~s the requested data to the processor 20 in ~tep 132, and also provides the data to the Cl cache in ~tep 134 due to the statistical likelihood that this data will ~e requested again soon by the processor 20. The C2 cache controller 32 informs the Cl cache controller 30 as to the particular C2 cache way A2 or B2 in which the data is located in the C2 cache in ~tep 136 æo that the Cl cache controller 30 can place the data in the corresponding Cl cache way Al or B1 in ctep 138. Thi6 requires that the Cl cache controller 30 disregard its normal LRU
replacement algo~m ~ecx~6e tha replacement algorithm .: . . . .
.

may choose a di~ferent C1 cache way A1 or Bl in which to place the data. In this manner, the Cl and C2 caches maintain their away" nlignment without a requirement for the C2 cache controll~r 32 to transfer data ~etween the ways in the C2 cache. The C2 cache control~er 32 ~e ~ the inclusion hit on the C2 cache location where the requested data i~ located in ~tep 140, ~ignifying that a copy o:E this data also resides in the Cl cache. The C2 cache controller 32 also lo clears the other 63 inclusion bits on the C2 cache memory locations that cvrrespond to the same page offset location to . ignify that a copy of the data in these locations does not reside in the C1 cache. The computer system S is then ~inished with the memory read and returns to ~tep 100.
When the processor 20 initiates a ~emory read in step 102 and checks the contents of the C1 cache in step 104 to deter~ine if a copy of the requested data resides there, and a copy of the requested data does reside in the Cl cache, then a Cl ~ache hit takes place in step 150. The Cl cache controller 30 provides the requested data to the processor 20 in step 152, ~nd operation of the computer system S is resumed in step 100. Since multilevel inclusion exists in the ca~he subsystem, the CZ cache is guaranteed to have a copy of the data that the Cl cache controller 30 provided to the processor 20, and no transfer of data from the Cl cache contxoller 30 to the C2 cache controller 32 is necessary when a C1 cache read hit takes place.
The cache architecture of the C1 cache in the preferred embodiment i6 preferAbly a writ6!-through cache architecture and the cache architecture of the C2 cache is preferably a write-back cache architecture.
However, the use of other cache architectures for the Cl cache and the C2 cache is also cont~mpla~ed. When .. . , . , . .. . ,.:, , , : - . . . . .. . . . .

: . : - - ~. , , :
. .. . .. .- . ~
.

2~6~

the proce~sor 20 performs a memory write operation, the data is written into the C1 cache, regardless of whether the processor write is a Cl cache ~rite hit or write miss. In addition, proce6sor writPs initiate external write bus cycles to ~rite the respective data into the C2 cache. Nhen thi~ occurs, the C1 cache controller 30 broadcasts the particular Cl cache way where the data was placed 60 that the C2 cache controller 32 can place the data in the correspo~ding C2 cache way. Therefore, the Cl and C2 caches allocate write ~isses accordinq to the present inve~tion. ~t is preferred that the Cl and C2 either both allocate write misses or both do not allocate write ~isse~. If the C1 cache were to not allocate writes and the C2 cache were to allocate writes, the designs would be more complicated. The C2 cache controller 32 would require an LRU algorith~ and would need to insure that if the C2 cache controller LRU algorithm selected a particular C2 cache way that contains a copy of data that is duplicated in the Cl cache, the LRU algorithm would be overridden or the caching absrted so that ~ultilevel inclusi~n remained guaranteed.
~eferring now to ~igure 5, when the intelligent bus master 28 gains control of the system bus 24 in step 200, the C2 cache controller 3~ watches or "snoops" the system bus 24 in step 202 to see if the bus master 28 performs any writes, and reads in the case of a write-back cache, to main ~emory 26, and, if 60, which memory location is being accessed. The C2 cache controller 32 can perfor~ the snooping responsibilities for both the Cl and C2 caches because the C2 cache is guaranteed to have a copy of all the data that resides in the Cl cache due to the ~ultilevel inclusion, ,:,` : , .
' .. . .' ' .
':

2~4~6~9 - 2~ -I~ the bus ~aster 28 writes to main ~amo~y 26 in step 204 and a write ~noop hit occurs in the C2 oache in step 206, then the C2 cache controller ~2 checXs the inclusion bit for the respective C2 cache location to see whether the Cl cache cont:roller 30 ~ust ~lso snoop the mem~ry access in step 208. If the inclusion bit is not ~et in ~tep 208, then a copy of the data Prom the memory location ~eing written to doe~ not reside in the Cl cache, and the C1 cache controller 30 is left alone.
In this case, the C2 cache receives the new copy of data in ~tep 210 and the C2 cache controller 32 resumes its snooping duties in ~tep 202. If the inclusion bit on the C2 cache memory location is set in step 208 after a snoop hit in step 206, then the C2 cache controller directs the C1 cache controller 30 to snoop that particular memory access in step 212. In step 214, the C1 and C2 caches each receive a copy of the new data, and the C2 cache controller 32 resumes its snooping duties in ~tep 202. I~ a snoop miss occurs in step 206 after the bus master 28 writes to a memory location in ctep 204, then the C2 cache controller 32 resumes its snooping duties in ~tep 202. The C2 cache controller 32 continues to snoop the system bus 24 in step 202 until the bus ~aster 28 is no longer in control of the system bus 24.
I~ the bus master 28 reads a main memory locatisn in step 204 and a read snoop hit occurs in the C2 ca~he in step 220, then the C2 cache controller 32 checks the C2 cache location in step ~22 to determine i~ it is the owner of the respective memory locationL I~ not, then main memory 26 or other source services the d~ta request, and the C2 cache controller 32 resumes snooping in ~tep 202. I~ the C2 cache controller 32 is the owner of the memory location, then the C2 cache controller 32 provides the requested data ta ~ain .~
.
.
. .
,: .
,. . .
, , : -: . ' ,. .
- :.. : ' ' " ' ............ . .
.

fi ~ ~

memory 26 in ~tep 224. The bus master 28 reads this data in step 226 when the data has been placsd on the data bus, this being referred to ~s snarfing. The C2 cache controller 32 then resumes its ~nooping duties in step 202. I~ ~ snoop miss occurs in ~tep 220 after the bus ma~er 28 reads a mPmory location in step 204~ then the C2 ~ache controller 32 resumes its ~nooping duties in step 2 02 .
In this manner, the Cl cache controller 30 can neglect its ~nooping duties until the C2 cache controller 32 determines that a ~noop hit on data held in the Cl cache has actually occurred~ This allows the processor 20 to operate more e~ficiently out o~ the Cl cache while it does not have control OI the syste~ bus 24 because the C1 cache controller 30 only has to snoop the system bus 2 4 when a C1 cache snoop hit occurs, not on every memory write as it normally would.
The foregoing disclosure and description of the invention are illustrative and explanatory thereof, and various changes in the ~ize, components, construction and method of operation may be made without departing ~rom the spirit o~ the invention.

, ,' : , . ' "
. .
' ' . ' ' .
.

.

Claims

1. A method for achieving multilevel inclusion in a computer system having a microprocessor, a system bus, a first level set associative cache memory including a first number of ways, a first level cache controller, a second level set associative cache including a number of ways equal to or greater than the first number of ways of the first level cache, wherein each of the ways in the first level cache corresponds to at least one way in the second level cache, a second level cache controller, means coupled to the second level cache controller for setting and clearing an inclusion bit on data inside the second level cache, means coupled to the first and second level cache controllers for communicating and transmitting data between the first level and second level caches, a bus master device, and random access memory, the method comprising:
the first level cache controller communicating to the second level cache controller the particular first level cache way in which a copy of data received from the random access memory is placed on a first level and second level cache read miss;
the second level cache controller placing the copy of data received from the random access memory in the second level cache way corresponding to the first level cache way communicated by the first level cache controller on the first level and second level cache read miss;
the second level cache controller communicating to the first level cache controller the particular second level cache way where a copy of data is located on a first level cache read miss and second level cache read hit;

the first level cache controller placing the copy of data transmitted from the second level cache controller to the processor in the corresponding first level cache way; and the second level cache controller setting an inclusion bit on the second level cache location of the copy of data and clearing inclusion bits on any other second level cache locations that correspond to the first level cache location where the first level cache controller placed the copy of data.

2. The method of claim 1, wherein the first level cache controller includes a replacement algorithm that determines which first level cache way in which to place a received copy of data, the step of the first level cache controller copying the data into the first level cache way corresponding to the second level cache way including:
the first level cache controller disregarding its replacement algorithm on first level cache read miss and second level cache read hit cases.

3. The method of claim 1, further comprising:
the first level cache controller communicating to the second level cache controller the particular first level cache way in which a copy of received data is placed on a processor write; and the second level cache controller placing the copy of received data in the second level cache way corresponding to the first level cache way communicated by the first level cache controller.

4. The method of claim 1, wherein greater than one way in the first level cache cannot correspond to than one cache way in the second level cache can correspond to one way in the first level cache.

5. The method of claim 1, further comprising:
the second level cache controller snooping the system bus when the processor does not have control of the system bus to determine if the bus master device is writing to a cached memory location;
the second level cache controller checking the inclusion bit on a second level cache location where a second level cache write snoop hit occurs to determine if a copy of data from the random access memory location being written to resides in the first level cache; and the second level cache controller directing the first level cache controller to snoop the system bus if said inclusion bit is set.

6. The method of claim 5, wherein the second level cache is a write-back cache, the method further comprising:
the second level cache controller snooping the system bus when the processor does not have control of the system bus to determine if the bus master device is reading a cached memory location;
the second level cache controller determining if the second level cache has an updated version of the data residing in the requested memory location on a second level cache read snoop hit;
the second level cache controller providing the requested data to main memory if the second level cache has an updated version of the data; and the bus controller reading the requested data provided by the second level cache controller.

7. An apparatus for achieving multilevel inclusion in a computer system, comprising:
a system bus;
a microprocessor coupled to said system bus;
a first level cache memory coupled to said microprocessor and including a first number of ways;
a first level cache controller coupled to said first level cache, said microprocessor and aid system bus and including an output for transmitting way information and an input for receiving way information;
a second level cache of a size greater than or equal to the size of the first level cache which includes a number of ways equal to or greater than the first number of ways of the first level cache, wherein each of the ways in the first level cache corresponds to at least one way in the second level cache and which includes inclusion information indicating presence of data in the second level cache that is duplicated in the first level cache;
a second level cache controller coupled to said system bus, said second level cache, said microprocessor, and said first level cache controller and including an input coupled to said first level cache controller way information output for receiving way information and an output coupled to said first level cache controller way information input for transmitting way information; and random access memory coupled to said system bus;
wherein on a first and second level cache read miss said first level cache controller transmits way information to said second level cache controller and said second level cache controller places received data in a way of the second level cache corresponding to the received way information, wherein on a first level cache read miss and a second level cache read hit said second level cache controller transmits way information to said first level cache controller and said first level cache controller places received data in a way of the first level cache corresponding to the received way information, and wherein said second level cache controller sets the inclusion bit in the second level cache location which contains the data placed in the first level cache and clears the inclusion bits of any other second level cache locations which correspond to the first level cache location where the data was placed.

8. The apparatus of claim 7, wherein said first level cache controller includes a replacement means that determines which first level cache way in which to place a received copy of data, wherein said first level cache controller disregards said replacement means on first level cache read miss and second level cache read hits cases.

9. The apparatus of claim 7, wherein greater than one way in the first level cache cannot correspond to one cache way in the second level cache and greater than one way in the second level cache can correspond to one way in the first level cache.

10. The apparatus of claim 7, wherein on a processor write said first level cache controller transmits way information to said second level cache controller and said second level cache controller places received data in a way of the second level cache corresponding to the received way information.

11. The apparatus of claim 7, further comprising:
a bus master device coupled to said system bus;
and wherein said first level cache controller includes means for snooping the system bus when said microprocessor does not have control of said system bus to determine if the bus master device is writing to a random access memory location that is cached in the first level cache, and wherein said second level cache controller further includes:
means for snooping the system bus when said microprocessor does not have control of said system bus to determine if the bus master device is writing to a random access memory location that is cached in the second level cache;
means for checking the inclusion bit on a second level cache location where a second level cache write snoop hit occurs to determine if a copy of data from said random access memory location being written to also resides in said first level cache; and means coupled to said first level cache controller which directs said first level cache controller to snoop the system bus if said inclusion bit is set.

12. The apparatus of claim 11, further comprising:
said second level cache being a write-back cache, wherein said second level cache controller further includes:
means for snooping the system bus when said microprocessor does not have control of said system bus to determine if the bus master device is reading a random access memory location that is cached in the second level cache;

means for determining whether the second level cache includes an updated version of the data residing in the requested memory location when a second level cache read snoop hit occurs; and means for providing the requested data to main memory if the second level cache has an updated version of the data, wherein the bus controller reads the requested data provided by the second level cache controller.