|Publication number||US5694567 A|
|Application number||US 08/386,025|
|Publication date||Dec 2, 1997|
|Filing date||Feb 9, 1995|
|Priority date||Feb 9, 1995|
|Publication number||08386025, 386025, US 5694567 A, US 5694567A, US-A-5694567, US5694567 A, US5694567A|
|Inventors||Philip A. Bourekas, Andrew P. Ng|
|Original Assignee||Integrated Device Technology, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (7), Referenced by (40), Classifications (7), Legal Events (4)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This invention is related to cache memories and, particularly, to direct mapped cache memories with cache locking.
Cache memories are used to decrease the average memory access time in many microprocessor systems. Cache memories typically are smaller and faster (i.e., lower access time) than main memory, and are used to store frequently used information (hereinafter called a "data word"). The CPU first searches the cache for the address of the requested data word and, if present, retrieves the data word stored at that address. However, when the CPU requests a data word that is not in the cache, the CPU accesses main memory and stores the data word in the cache. Oftentimes, storing information in the cache requires that an equal amount of information already stored in the cache be deleted.
Two of the main types of cache memories are set associative caches and direct mapped caches. In both types of caches, the address requested by the CPU is divided into a tag and index, with the tag typically comprising the higher order bits and the index comprising the lower order bits.
In a direct mapped cache, the cache is accessed by index (i.e., the index serves as an "address" where the data word and its tag is stored in the cache memory). Thus, the CPU sends the address of the requested data word to the cache. The index is used to access the cache and the data word and tag stored at the index is read. The stored tag is compared to the tag of the requested data word and, if they match, the data word is sent to and used by the CPU. Thus, in a direct mapped cache, an address in main memory can be mapped to only one particular cache location.
In contrast to direct mapped caches, set associative caches have at least two "sets" of possible stored information (i.e., data word and tag) for each index. The sets are distinguished from each other by tag. When the CPU requests an address be accessed, the requested address's index is used to access all of the sets associated with that index, and the requested address's tag is compared to each set's tag until a match is found (as in an associative memory, hence the name "set associative"). If a match occurs, the data word associated with the matched tag is sent to and used by the CPU. Thus, in a set associative cache, an address in main memory can be mapped into as many locations in the cache as there are sets.
In some applications, the tags of critical data words are stored in the cache and "locked" to prevent them from being deleted so that the data word can always be accessed as quickly as possible. Thus, other data words having the same index (but different tags) as locked data words cannot force deletion of "locked" data from the cache during normal cache miss refill processing.
Cache locking is difficult to implement in a conventional direct mapped cache. For example, FIG. 1 shows a memory mapping of a 16 word main memory and an 8 word direct mapped cache memory. A 16 word memory requires a 4 bit address, which in this example, is divided into a 1 bit tag and a 3 bit index. In this mapping, physical address ranges 0000-0011 and 1000-1011 map into cache index range 000-011, whereas physical address ranges 0100-0111 and 1100-1111 map into cache index range 100-111.
In this example, the indexes 000-011 are locked to keep a critical 4 word program stored at physical address range 0000-0011 (word 0-word 3) in the cache at all times. Thus, word 0-word 3 are stored in the cache at cache index range 000-011. Consequently, the programs stored at physical address range 1000-1011 (i.e., word 8-word 11) cannot be cached, while the information stored at physical address ranges 0100-0111 and 1100-1111 can be cached. As a result, an eight word program would have to be broken down into two discontiguous 4 word sections and stored at 0100-0111 and 1100-1111 in order to be completely "cacheable".
Extending the concept of the 8 word cache system described above to a 4 kB direct mapped cache system, if half of the cache were locked, then in effect, this system has a 2 kB cache that services the lower 2 kB of every 4 kB of main memory. The code in the upper 2 kB could not be cached. Consequently, a user would have to store a large program in discontiguous 2 kB pieces by using the lower half of each 4 kB block to ensure that the program can be serviced by the cache. Because typical software is very much larger than 2 kB, it is difficult to use cache locking in a direct mapped cache.
One solution to this problem is to modify the software compiler to store the unlocked software in the small portions that can be serviced by the cache. This solution typically requires the addition of complex page management software to the compiler's linker facility.
Another solution to this problem is to use a translation lookaside buffer (TLB) in applications where the cache size is equal to or larger than the virtual page size mapped by a single TLB entry. TLBs can mitigate the problems of cache locking in a direct mapped cache, but at the cost of additional hardware complexity and performance degradation. TLBs also require additional operating system software to maintain page tables to perform the translations.
Consequently, cache locking is commonly implemented in set associative caches. An extra bit is added to the cache memory for storing a lock bit for each tag in each set. Thus, when a set is locked, other sets associated with the locked set's index are available to cache other addresses with the same index, thereby avoiding the shortcomings of cache locking in a direct mapped cache.
However, set associative caches are more complex and costly than direct mapped caches because each set requires sense amplifiers and tag comparators. In addition, for locking caches, the lock bit for each tag increases the size of the cache. Further, the lock bit requires extra sense amplifiers and comparators for each set. Further still, each index may require a "tag set index bit" for each tag for each set.
A direct mapped cache with cache locking according to one embodiment of the present invention includes a physical address latch and a multiplexing circuit. The multiplexing circuit receives the physical address from the physical address latch and exchanges a physical address tag bit with a physical address index bit to generate a cache tag address and a cache index address. The physical address tag bit and the physical address index bit are exchanged so that the direct mapped cache memory is divided into two equal halves, each half servicing a contiguous address range of main memory. The cache index and cache tag with "swapped" bits are used to access the memory cells in the cache memory as in a typical direct mapped cache.
A programmer can store critical software in one of the contiguous portions of main memory and lock it into one half of the cache. The programmer can then store other software in the other contiguous portion of main memory, which is serviced by the other half of the cache. Thus, a direct mapped cache with cache locking is realized without a TLB or additional page management or operating system software.
In another embodiment, the multiplexing circuit can exchange two physical address tag bits with two physical address index bits to generate a cache tag address and a cache index address. The physical address tag bit and the physical address index bit are exchanged so that the direct mapped cache memory is divided into four equal portions, each portion of the cache memory servicing a contiguous portion of main memory.
FIG. 1 shows a prior art memory map of a 16 word main memory and an 8 word direct mapped cache memory.
FIG. 2 shows a block diagram of a direct mapped cache memory according to one embodiment of the present invention.
FIG. 3 shows a memory map of 16 word main memory and an 8 word direct mapped cache according to the present invention.
FIG. 4 shows a block diagram of a direct mapped instruction cache memory according to another embodiment of the present invention.
FIG. 5 shows a memory map of the cache memory of FIG. 4 when bits 28 and 11 are exchanged.
FIG. 6 shows a memory map of the cache memory of FIG. 4 when bits 28 and 11, and 27 and 10 are exchanged, respectively.
FIG. 7 shows a block diagram of a direct mapped data cache memory according to another embodiment of the present invention.
FIG. 2 shows a block diagram of a direct mapped cache memory according to one embodiment of the present invention. Cache memory 100 uses a physical address latch 110 to receive a physical address from the CPU. The physical address is divided into a tag portion and an index portion. Physical address latch 110 stores the address received from the CPU and sends the address to a multiplexing circuit 120, which exchanges a physical address tag bit with a physical address index bit to generate the cache address to access a cache RAM 130.
The cache index from multiplexing circuit 120 accesses data cells 132 in cache RAM 130 to output the data word, which also accesses the stored tag for that particular data word from tag cells 134. The data word is sent to the CPU, while a tag comparator 140 compares the cache tag from multiplexing circuit 120 with the stored tag from cache RAM 130. If the tags match, the stored data word corresponds to the address requested by the CPU and the CPU accepts the data word.
FIG. 3 shows a memory map of an 8 word direct mapped cache according to the present invention. Physical address range 0000-0111 maps into cache index range 000-011, whereas physical address range 1000-1111 maps into cache index range 100-111. As a result, a programmer can lock a 4 word program stored at physical address range 0000-0011 in the lower-order half of the cache RAM and keep the upper-order half of the cache RAM available to cache an 8 word program stored at physical address range 1000-1111. Using this embodiment, the programmer does not have to separate the 8 word program into two discontiguous 4 word portions as required by the mapping of FIG. 1.
The programmer can treat this embodiment as a cache RAM divided into 2 equal portions, each portion servicing a contiguous half of the physical address range. Thus, the programmer can store in the lower-order half of the physical address range programs to be locked in the cache RAM, while storing other programs contiguously in the upper-order of the physical address range, which are serviced by the upper-order portion of the cache RAM. Thus, this embodiment realizes a "lockable" direct mapped cache without TLBs, additional sets of tag comparators, or additional page management or operating system software.
FIG. 4 shows a block diagram of a direct mapped instruction cache memory according to another embodiment of the present invention. Cache memory 400 is a 4 kB instruction cache that services a 512 MB main memory. The CPU uses 32 bit addresses for accessing the main memory, with a 20 bit tag and 12 bit index. Bits 0-11 form the physical address index and bits 12-31 form the physical address tag. A physical address latch 410 receives the physical address from the CPU and sends the received address to a multiplexing circuit 420.
Multiplexing circuit 420 includes multiplexers 421-424. Control signal C1 selects the output of multiplexers 421 and 423. When control signal C1 is a logic zero, multiplexers 421 and 423 do not exchange bit 10 for bit 27 in outputting the cache index and cache tag, respectively. However, when control signal C1 is a logic one, multiplexers 421 and 423 exchange bit 10 for bit 27 in outputting the cache index and cache tag, respectively.
Similarly, control signal C2 controls multiplexers 422 and 424 to exchange or not exchange bits 11 and 28 in generating the cache index and cache tag, respectively. Control signals C1 and C2 can be provided by register, PROM, or other memory cells (not shown) to configure the cache as desired.
When control signals C1 and C2 are both logic zero, no bits are exchanged, and, consequently, cache memory 400 operates as a traditional direct mapped cache.
When control signals C1 and C2 are a logic zero and logic one, respectively, only bits 28 and 11 are exchanged. FIG. 5 shows a memory map of the cache memory of FIG. 4 when bits 28 and 11 are exchanged. Physical address range 0000-- 0000-0FFF-- FFFF (Hex) is mapped into cache address 0000-- 0000-0000-- 07FF and 1000-- 0000-1FFF-- F7FF. Likewise, physical address range 1000-- 0000-1FFF-- FFFF is mapped into cache address 0000-- 0800-0000-- 0FFF and 1000-- 0800-1FFF-- FFFF. As a result, 256 MB physical address range 0000-- 0000-0FFF-- FFFF is serviced by cache index range 000-7FF, whereas 256 MB physical address range 1000-- 0000-1FFF-- FFFF is serviced by cache index range 800-FFF. Thus, in a manner similar to cache memory 200 (FIG. 2), cache RAM 430 is now divided into two 2 kB portions, each servicing a contiguous 256 MB portion of the 512 MB physical address range.
The user can advantageously use this configuration to store up to 2 kB of critical programs in the upper 256 MB of the physical address range and lock them in the upper 2 kB portion of cache RAM 430 and, moreover, store other non-critical programs in the lower 256 MB contiguous physical address range serviced by the lower 2 kB portion of cache RAM 430. Because 256 MB is large enough to store several typical programs, the user can store each non-critical program contiguously and still have each non-critical program completely serviced by the cache without TLBs or additional page management and operating system software. Because each software program is contiguous, software compilers and linkers easily support this cache locking configuration.
When control signals C1 and C2 are both logic one, bit 28 is exchanged with bit 11, and bit 27 is exchanged with bit 10. FIG. 6 shows a memory map of the cache memory of FIG. 4 when bits 28 and 11, and 27 and 10 are exchanged, respectively. Similar to the mapping described above in conjunction with FIG. 5, physical address range 0000-- 0000-07FF-- FFFF (Hex) is mapped into cache index range 000-3FF. Physical address range 0800-- 0000-0FFF-- FFFF is mapped into cache index range 400-7FF. Physical address range 1000-- 0000-17FF-- FFFF is mapped into cache index range 800-BFF. Physical address range 1800-- 0000-1FFF-- FFFF is mapped into cache index range C00-FFF. Thus, cache RAM 430 is now divided into four 1 kB portions, each servicing a contiguous 128 MB portion of the 512 MB physical address range.
The user can use this configuration to lock critical programs into one or more of the four 1 kB cache portions, while using the remaining 1 kB cache portions to service non-critical programs stored in the physical address ranges corresponding to the remaining 1 kB cache portions. For example, the user can store a 256 byte program at physical address range 0000-- 0000-0000-- 03FF and lock it into the cache at index range 000-3FF. As a result, the user can store other programs contiguously at 384 MB address range 0800-- 0000-1FFF-- FFFF, which is serviced by cache index range 400-FFF.
Table 1 shows the mapping of cache memory 400 in response to control signals C1 and C2 in tabular form. In a preferred embodiment, control signals C1 and C2 are software reconfigurable by writing to a designated control register (not shown).
TABLE 1______________________________________C1 C2 physical memory range cache index range______________________________________0 0 0000-- 0000-1FFF-- FFFF 000-FFF0 1 0000-- 0000-0FFF-- FFFF 000-7FF 1000-- 0000-1FFF-- FFFF 800-FFF1 1 0000-- 0000-07FF-- FFFF 000-3FF 0800-- 0000-0FFF-- FFFF 400-7FF 1000-- 0000-17FF-- FFFF 800-BFF 1800-- 0000-1FFF-- FFFF C00-FFF______________________________________
FIG. 7 shows a block diagram of a direct mapped data cache memory according to another embodiment of the present invention. Cache memory 700 is a 1 kB data cache and services a 512 MB main memory. The CPU uses 32 bit addresses for accessing the main memory, with a 22 bit tag and 10 bit index. Bits 0-9 form the physical address index, thereby fully addressing the 1 kB data cache RAM, and bits 10-31 form the physical address tag. A physical address latch 710 receives the physical address from the CPU and sends the received address to a multiplexing circuit 720.
In a manner similar to multiplexing circuit 420 (FIG. 4), multiplexers 721-724 in multiplexing circuit 720 operate in response to control signals C1 and C2 to exchange or not exchange bit 28 for bit 9, and bit 27 for bit 8. Table 2 shows the mapping of cache memory 700 in response to control signals C1 and C2.
TABLE 2______________________________________C1 C2 physical memory range cache index range______________________________________0 0 0000-- 0000-1FFF-- FFFF 000-3FF0 1 0000-- 0000-0FFF-- FFFF 000-1FF 1000-- 0000-1FFF-- FFFF 200-3FF1 1 0000-- 0000-07FF-- FFFF 000-0FF 0800-- 0000-0FFF-- FFFF 100-1FF 1000-- 0000-17FF-- FFFF 200-2FF 1800-- 0000-1FFF-- FFFF 300-3FF______________________________________
The foregoing has described the principles and preferred embodiments of the present invention.
However, the invention should not be construed as being limited to the particular embodiments described herein.
For example, different implementations may be used for multiplexing circuits 420 and 720. Further, different bits from the physical address latch may be exchanged as appropriate for the size of the cache RAM, or different bits may be exchanged to provide variable sizes of contiguous address spaces. Still further, 3 (or more) bits from the physical address index may be exchanged with 3 (or more) from the physical address tag to divide the cache RAM into 8 (or 16, etc.) portions, as supported by the size of the RAM cache.
In other embodiments, a virtual address latch may be substituted for the physical address latch for use in "virtually tagged cache" applications as opposed to the "physically tagged cache" applications described for the embodiments above. Thus, the above-described embodiments should be regarded as illustrative rather than restrictive. Variations can be made to those embodiments by workers skilled in the art without departing from the scope of the present invention as defined by the following claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5025366 *||Jan 20, 1988||Jun 18, 1991||Advanced Micro Devices, Inc.||Organization of an integrated cache unit for flexible usage in cache system design|
|US5353425 *||Apr 29, 1992||Oct 4, 1994||Sun Microsystems, Inc.||Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature|
|US5367653 *||Dec 26, 1991||Nov 22, 1994||International Business Machines Corporation||Reconfigurable multi-way associative cache memory|
|US5390308 *||Apr 15, 1992||Feb 14, 1995||Rambus, Inc.||Method and apparatus for address mapping of dynamic random access memory|
|US5479627 *||Sep 8, 1993||Dec 26, 1995||Sun Microsystems, Inc.||Virtual address to physical address translation cache that supports multiple page sizes|
|US5487162 *||Nov 10, 1994||Jan 23, 1996||Matsushita Electric Industrial Co., Ltd.||Cache lock information feeding system using an address translator|
|US5493667 *||Feb 9, 1993||Feb 20, 1996||Intel Corporation||Apparatus and method for an instruction cache locking scheme|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US5860101 *||Dec 17, 1997||Jan 12, 1999||International Business Machines Corporation||Scalable symmetric multiprocessor data-processing system with data allocation among private caches and segments of system memory|
|US5893163 *||Dec 17, 1997||Apr 6, 1999||International Business Machines Corporation||Method and system for allocating data among cache memories within a symmetric multiprocessor data-processing system|
|US5913228 *||Mar 12, 1997||Jun 15, 1999||Vlsi Technology, Inc.||Method and apparatus for caching discontiguous address spaces with short cache tags|
|US5923857 *||Sep 6, 1996||Jul 13, 1999||Intel Corporation||Method and apparatus for ordering writeback data transfers on a bus|
|US5928352 *||Sep 16, 1996||Jul 27, 1999||Intel Corporation||Method and apparatus for implementing a fully-associative translation look-aside buffer having a variable numbers of bits representing a virtual address entry|
|US6044478 *||May 30, 1997||Mar 28, 2000||National Semiconductor Corporation||Cache with finely granular locked-down regions|
|US6047358 *||Oct 31, 1997||Apr 4, 2000||Philips Electronics North America Corporation||Computer system, cache memory and process for cache entry replacement with selective locking of elements in different ways and groups|
|US6115793 *||Feb 11, 1998||Sep 5, 2000||Ati Technologies, Inc.||Mapping logical cache indexes to physical cache indexes to reduce thrashing and increase cache size|
|US6314490 *||Nov 2, 1999||Nov 6, 2001||Ati International Srl||Method and apparatus for memory addressing|
|US6446165||Jul 30, 1999||Sep 3, 2002||International Business Machines Corporation||Address dependent caching behavior within a data processing system having HSA (hashed storage architecture)|
|US6449691||Jul 30, 1999||Sep 10, 2002||International Business Machines Corporation||Asymmetrical cache properties within a hashed storage subsystem|
|US6463509 *||Jan 26, 1999||Oct 8, 2002||Motive Power, Inc.||Preloading data in a cache memory according to user-specified preload criteria|
|US6470442||Jul 30, 1999||Oct 22, 2002||International Business Machines Corporation||Processor assigning data to hardware partition based on selectable hash of data address|
|US6516404||Jul 30, 1999||Feb 4, 2003||International Business Machines Corporation||Data processing system having hashed architected processor facilities|
|US6591361||Dec 28, 1999||Jul 8, 2003||International Business Machines Corporation||Method and apparatus for converting data into different ordinal types|
|US6598118||Jul 30, 1999||Jul 22, 2003||International Business Machines Corporation||Data processing system with HSA (hashed storage architecture)|
|US6658556 *||Jul 30, 1999||Dec 2, 2003||International Business Machines Corporation||Hashing a target address for a memory access instruction in order to determine prior to execution which particular load/store unit processes the instruction|
|US6681296||Aug 1, 2001||Jan 20, 2004||Nintendo Co., Ltd.||Method and apparatus for software management of on-chip cache|
|US6823471||Jul 30, 1999||Nov 23, 2004||International Business Machines Corporation||Method for providing high availability within a data processing system via a reconfigurable hashed storage subsystem|
|US6826691 *||Oct 20, 1997||Nov 30, 2004||Freescale Semiconductor, Inc.||Arrangement for encryption/decryption of data and data carrier incorporating same|
|US6859862||Apr 7, 2000||Feb 22, 2005||Nintendo Co., Ltd.||Method and apparatus for software management of on-chip cache|
|US7187948 *||Mar 6, 2003||Mar 6, 2007||Skullcandy, Inc.||Personal portable integrator for music player and mobile phone|
|US7228391||Jun 8, 2004||Jun 5, 2007||International Business Machines Corporation||Lock caching for compound atomic operations on shared memory|
|US7395090||Oct 10, 2006||Jul 1, 2008||Skullcandy, Inc.||Personal portable integrator for music player and mobile phone|
|US7475192 *||Jul 12, 2005||Jan 6, 2009||International Business Machines Corporation||Cache organization for power optimized memory access|
|US7743200 *||May 24, 2007||Jun 22, 2010||Juniper Networks, Inc.||Instruction cache using perfect hash function|
|US7827360||Aug 2, 2007||Nov 2, 2010||Freescale Semiconductor, Inc.||Cache locking device and methods thereof|
|US7966442||Jun 21, 2011||Juniper Networks, Inc.||Cache using perfect hash function|
|US8014824||Jun 12, 2008||Sep 6, 2011||Skullcandy, Inc.||Article of manufacture integrated with music and telephonic communication devices|
|US8875114 *||Sep 21, 2007||Oct 28, 2014||International Business Machines Corporation||Employing identifiers provided by an operating system of a processing environment to optimize the processing environment|
|US9316159||Jan 30, 2013||Apr 19, 2016||Pratt & Whitney Canada Corp.||Gas turbine engine with transmission|
|US20040198436 *||Mar 6, 2003||Oct 7, 2004||Alden Richard P.||Personal portable integrator for music player and mobile phone|
|US20070016729 *||Jul 12, 2005||Jan 18, 2007||Correale Anthony Jr||Cache organization for power optimized memory access|
|US20070142025 *||Oct 10, 2006||Jun 21, 2007||Skullcandy, Inc.||Personal Portable Integrator for Music Player and Mobile Phone|
|US20080267440 *||Jun 12, 2008||Oct 30, 2008||Skullcandy, Inc.||Article of manufacture integrated with music and telephonic communication devices|
|US20090037666 *||Aug 2, 2007||Feb 5, 2009||Freescale Semiconductor, Inc.||Cache locking device and methods thereof|
|US20090083720 *||Sep 21, 2007||Mar 26, 2009||International Business Machines Corporation||Employing identifiers provided by an operating system of a processing environment to optimize the processing environment|
|US20100217952 *||Aug 26, 2010||Iyer Rahul N||Remapping of Data Addresses for a Large Capacity Victim Cache|
|US20120297110 *||Nov 22, 2012||University Of North Texas||Method and apparatus for improving computer cache performance and for protecting memory systems against some side channel attacks|
|US20140052919 *||Aug 16, 2013||Feb 20, 2014||Arteris SAS||System translation look-aside buffer integrated in an interconnect|
|U.S. Classification||711/3, 711/202, 711/E12.018|
|Cooperative Classification||G06F12/0842, G06F12/0864|
|Feb 9, 1995||AS||Assignment|
Owner name: INTEGRATED DEVICE TECHNOLOGY, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOUREKAS, PHILIP A.;NG, ANDREW P.;REEL/FRAME:007355/0019
Effective date: 19950209
|Jun 1, 2001||FPAY||Fee payment|
Year of fee payment: 4
|Jun 2, 2005||FPAY||Fee payment|
Year of fee payment: 8
|Jun 2, 2009||FPAY||Fee payment|
Year of fee payment: 12