WO2005033946A1 - A mechanism to compress data in a cache - Google Patents

A mechanism to compress data in a cache Download PDF

Info

Publication number
WO2005033946A1
WO2005033946A1 PCT/US2004/032110 US2004032110W WO2005033946A1 WO 2005033946 A1 WO2005033946 A1 WO 2005033946A1 US 2004032110 W US2004032110 W US 2004032110W WO 2005033946 A1 WO2005033946 A1 WO 2005033946A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
compression
computer system
companion
cache line
Prior art date
Application number
PCT/US2004/032110
Other languages
French (fr)
Inventor
Ali-Reza Adl-Tabatabai
Anwar Ghuloum
Ram Huggahalli
Chris Newburn
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to JP2006534088A priority Critical patent/JP4009310B2/en
Publication of WO2005033946A1 publication Critical patent/WO2005033946A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data

Definitions

  • the present invention relates to computer systems; more
  • CPU central processing unit
  • RAM Random Access Memory
  • the memory may store twice the amount of data at the same cost, or the same
  • MXT Memory Expansion Technology
  • MXT addresses system memory costs with a memory system architecture that
  • compressor and decompressor hardware engines provide the means to
  • the compressor encodes data blocks into as
  • main memory compression e.g., decreasing the amount of
  • Figure 1 illustrates one embodiment of a computer system
  • Figure 2 illustrates one embodiment of a physical cache
  • Figure 3 illustrates one embodiment of a logical cache organization
  • Figure 4A illustrates an exemplary memory address implemented in
  • Figure 4B illustrates one embodiment of a memory address
  • Figure 5 illustrates one embodiment of a tag array entry for a
  • Figure 6 is a block diagram illustrating one embodiment of a cache
  • Figure 7 illustrates one embodiment of a set and way selection
  • Figure 8 illustrates one embodiment of tag comparison logic
  • Figure 9 illustrates another embodiment of a tag array entry for a
  • Figure 10 illustrates another embodiment of tag comparison logic
  • Figure 11 illustrates one embodiment of byte selection logic.
  • Figure 1 is a block diagram of one embodiment of a computer
  • Computer system 100 includes a central processing unit (CPU) 102
  • CPU 102 is a processor in the Pentium®
  • Pentium® II processor family including the Pentium® II processor family, Pentium® III processors, and Pentium® IV processors available from Intel Corporation of Santa
  • a chipset 107 is also coupled to bus 105.
  • Chipset 107 includes a
  • MCH 110 memory control hub 110.
  • MCH 110 may include a memory controller 112
  • Main system memory 115 stores
  • CPU 102 may be executed by CPU 102 or any other device included in system 100.
  • main system memory 115 includes dynamic
  • main system memory 115 may be DRAM
  • bus 105 such as multiple CPUs and/ or multiple system memories.
  • MCH 110 is coupled to an input/ output control
  • ICH 140 via a hub interface.
  • ICH 140 provides an interface to input/ output
  • ICH 140 may be coupled
  • a cache memory 103 resides within
  • processor 102 and stores data signals that are also stored in memory 115.
  • processor 103 speeds up memory accesses by processor 103 by taking advantage of its
  • cache 103 resides external to processor
  • cache 103 includes compressed
  • Figure 2 illustrates one embodiment of a physical organization for cache
  • cache 103 is a 512 set, 4-way set associative cache.
  • a tag is associated with each line of a set. Moreover, a compression
  • the compression bits indicate whether a respective
  • lines are two lines with addresses that differ only in the companion bit (e.g., two
  • the companion bit is selected so that companion
  • Figure 3 illustrates one embodiment of a logical
  • the second line of set 0 is
  • each cache line holds 64 bytes of data when not compressed.
  • each cache line holds 128 bytes of data when compressed.
  • cache 103 may store twice the
  • a cache controller 104 is coupled to cache
  • cache controller 104 to manage the operation of cache 103.
  • cache controller 104 to manage the operation of cache 103.
  • hashing function that is used to map addresses to physical sets and ways is
  • hashing function is organized so that companion lines map to the same set.
  • Figure A illustrates an exemplary memory address implemented in
  • the set component is used to select one of the
  • the offset component is the low order bits of the address
  • Figure 4B illustrates one embodiment of a memory address
  • Figure 4B shows the
  • the companion bit is used in instances where a line is not compressed.
  • the companion bit indicates which of the
  • the window of address bits that are used for set is the window of address bits that are used for set
  • the companion bit is a part of the address and is used in set selection to determine
  • Figure 5 illustrates one embodiment of a tag array entry for a
  • the tag array entries include the companion bit (e.g., as part
  • the compression bit causes the
  • compressed cache 103 tag to be one bit larger than a traditional uncompressed
  • the compression bit indicates whether a line is compressed.
  • the compression bit specifies how to deal with the
  • Figure 6 is a block diagram illustrating one embodiment of cache
  • Cache controller 104 includes set and way selection logic 610, byte
  • Set and way selection logic 610 is
  • Figure 7 illustrates one embodiment
  • set and way selection logic 610 includes tag
  • comparison logic 710 that receives input from a tag array to select a cache line
  • the tag comparison logic 710 takes into account
  • tag comparison logic 710 is also variable length, depending on whether
  • FIG. 8 illustrates one embodiment of tag comparison logic 710
  • XNOR exclusive-nor
  • the OR gate is used to select the companion bit depending upon
  • companion bit of the address is compared with the companion bit of the tag.
  • the tag's companion bit can be used for other purposes.
  • this bit may be used as a compression format bit to select between two different compression algorithms.
  • the companion bit can be used to encode the ordering of
  • each cache line is partitioned into two sectors
  • the companion bit is a sector identifier (e.g., upper or lower) and
  • each sector of a line is stored in a
  • a free encoding (“00" is used to indicate an
  • FIG. 10 illustrates another embodiment of tag comparison logic 610 implementing sector
  • byte selection logic 620 selects the
  • Byte selection logic 620 includes a decompressor 1110
  • An input multiplexer selects
  • the range of the offset depends on whether the
  • decompressor 1110 is bypassed and the companion bit of the address is not used
  • the selected line is held in a buffer whose size is twice the physical
  • compression logic 630 is used to
  • cache lines are compressed according
  • compression algorithms e.g., WK, X-Match, sign-bit compression, run-length compression, etc.
  • WK Cost Key Integrity
  • X-Match Sign-bit compression
  • run-length compression etc.
  • Compression logic 630 may also be used to determine when a line is
  • opportunistic compression is
  • cache 103 uses its standard replacement algorithm to make space for the
  • cache 103 reuses the resident companion's cache line to
  • companion line is resident without doing a second cache access. For example, if
  • a prefetch mechanism is used to determine
  • cache 103 If the two companion lines are not compressible by 2:1, cache 103
  • the hardware can adaptively switch between these policies based on
  • a victim compression mechanism is used to generate a victim compression signal.
  • cache 103 gives the victim
  • the victim and its companion are not compressible by 2:1, the victim is then
  • cache 103 reuses the resident companion's cache line to store
  • the first approach is to simply evict another line to make room
  • compressed cache 103 and the next cache closest to the processor (e.g., if the L3 is a compressed cache then it depends on the interaction between L3 and L2).
  • the first two approaches include an invalidation of
  • the mechanism modifies the set
  • mapping function and selects the companion bit such that it allows adjacent

Abstract

According to one embodiment a computer system is disclosed. The computer system includes a central processing unit (CPU) and a cache memory coupled to the CPU. The cache memory includes a plurality of compressible cache lines to store additional data.

Description

A MECHANISM TO COMPRESS DATA IN A CACHE
FIELD OF THE INVENTION
[0001] The present invention relates to computer systems; more
particularly, the present invention relates to central processing unit (CPU) caches.
BACKGROUND
[0002] Currently, various methods are employed to compress the content
of computer system main memories such as Random Access Memory (RAM).
These methods decrease the amount of physical memory space needed to provide
the same performance. For instance, if a memory is compressed using a 2:1 ratio,
the memory may store twice the amount of data at the same cost, or the same
amount of data at half the cost.
[0003] One such method is Memory Expansion Technology (MXT),
developed by International Business Machines (IBM) of Armonk, New York.
MXT addresses system memory costs with a memory system architecture that
doubles the effective capacity of the installed main memory. Logic-intensive
compressor and decompressor hardware engines provide the means to
simultaneously compress and decompress data as it is moved between the shared
cache and the main memory. The compressor encodes data blocks into as
compact a result as the algorithm permits.
[0004] However, there is currently no method for compressing data that is
stored in a cache. Having the capability to compress cache data would result in similar advantages as main memory compression (e.g., decreasing the amount of
cache space needed to provide the same performance).
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The present invention will be understood more fully from the
detailed description given below and from the accompanying drawings of various
embodiments of the invention. The drawings, however, should not be taken to
limit the invention to the specific embodiments, but are for explanation and
understanding only.
[0006] Figure 1 illustrates one embodiment of a computer system;
[0007] Figure 2 illustrates one embodiment of a physical cache
organization;
[0008] Figure 3 illustrates one embodiment of a logical cache organization;
[0009] Figure 4A illustrates an exemplary memory address implemented in
an imcompressed cache;
[0010] Figure 4B illustrates one embodiment of a memory address
implemented in a compressed cache;
[0011] Figure 5 illustrates one embodiment of a tag array entry for a
compressed cache;
[0012] Figure 6 is a block diagram illustrating one embodiment of a cache
controller;
[0013] Figure 7 illustrates one embodiment of a set and way selection
mechanism in a compressed cache;
[0014] Figure 8 illustrates one embodiment of tag comparison logic; [0015] Figure 9 illustrates another embodiment of a tag array entry for a
compressed cache;
[0016] Figure 10 illustrates another embodiment of tag comparison logic;
and
[0017] Figure 11 illustrates one embodiment of byte selection logic.
DETAILED DESCRIPTION
[0018] A mechanism for compressing data in a cache is described. In the
following description, numerous details are set forth. It will be apparent,
however, to one skilled in the art, that the present invention may be practiced
without these specific details. In other instances, well-known structures and
devices are shown in block diagram form, rather than in detail, in order to avoid
obscuring the present invention.
[0019] Reference in the specification to "one embodiment" or "an
embodiment" means that a particular feature, structure, or characteristic
described in connection with the embodiment is included in at least one
embodiment of the invention. The appearances of the phrase "in one
embodiment" in various places in the specification are not necessarily all referring
to the same embodiment.
[0020] Figure 1 is a block diagram of one embodiment of a computer
system 100. Computer system 100 includes a central processing unit (CPU) 102
coupled to bus 105. In one embodiment, CPU 102 is a processor in the Pentium®
family of processors including the Pentium® II processor family, Pentium® III processors, and Pentium® IV processors available from Intel Corporation of Santa
Clara, California. Alternatively, other CPUs may be used.
[0021] A chipset 107 is also coupled to bus 105. Chipset 107 includes a
memory control hub (MCH) 110. MCH 110 may include a memory controller 112
that is coupled to a main system memory 115. Main system memory 115 stores
data and sequences of instructions and code represented by data signals that may
be executed by CPU 102 or any other device included in system 100.
[0022] In one embodiment, main system memory 115 includes dynamic
random access memory (DRAM); however, main system memory 115 may be
implemented using other memory types. Additional devices may also be coupled
to bus 105, such as multiple CPUs and/ or multiple system memories.
[0023] In one embodiment, MCH 110 is coupled to an input/ output control
hub (ICH) 140 via a hub interface. ICH 140 provides an interface to input/ output
(I/O) devices within computer system 100. For instance, ICH 140 may be coupled
to a Peripheral Component Interconnect bus adhering to a Specification Revision
2.1 bus developed by the PCI Special Interest Group of Portland, Oregon.
[0024] According to one embodiment, a cache memory 103 resides within
processor 102 and stores data signals that are also stored in memory 115. Cache
103 speeds up memory accesses by processor 103 by taking advantage of its
locality of access. In another embodiment, cache 103 resides external to processor
103.
[0025] According to a. further embodiment, cache 103 includes compressed
cache lines to enable the storage of additional data within the same amount of area. Figure 2 illustrates one embodiment of a physical organization for cache
103. In one embodiment, cache 103 is a 512 set, 4-way set associative cache.
However, one of ordinary skill in the art will appreciate that caches implementing
other sizes may be implemented without departing from the true scope of the
invention.
[0026] A tag is associated with each line of a set. Moreover, a compression
bit is associated with each tag. The compression bits indicate whether a respective
cache line holds compressed data. When a compression bit is set, the physical
memory of the cache line holds two compressed companion lines. Companion
lines are two lines with addresses that differ only in the companion bit (e.g., two
consecutive memory lines aligned at line alignment).
[0027] In one embodiment, the companion bit is selected so that companion
lines are adjacent lines. However, any bit can be selected to be the companion bit.
In other embodiments, it may be possible to encode the compression indication
with other bits that encode cache line state, such as the MESI state bits, thus
eliminating this space overhead altogether.
[0028] When the compression bit is not set, the physical memory of the
cache line holds one line uncompressed. Shaded compression bits in Figure 2
illustrate compressed cache lines. Figure 3 illustrates one embodiment of a logical
organization for cache 03. As shown in Figure 3, cache lines are compressed
according to a 2:1 compression scheme. For example, the second line of set 0 is
compressed, thus storing two cache lines rather than one.
[0029] In one embodiment, each cache line holds 64 bytes of data when not compressed. Thus, each cache line holds 128 bytes of data when compressed. The
effect of the described compression scheme is that each cache tag maps to a
variable-length logical cac ie line. As a result, cache 103 may store twice the
amount of data without having to increase in physical size.
[0030] Referring back to Figure 1, a cache controller 104 is coupled to cache
103 to manage the operation of cache 103. Particularly, cache controller 104
performs lookup operations of cache 103. According to one embodiment, the
hashing function that is used to map addresses to physical sets and ways is
modified from that used in typical cache controllers. In one embodiment, the
hashing function is organized so that companion lines map to the same set.
Consequently, companion lines may be compressed together into a single line
(e.g., way) that uses one address tag.
[0031] Figure A illustrates an exemplary memory address implemented in
an uncompressed cache. In a traditional cache, an addressed is divided according
to tag, set and offset components. The set component is used to select one of the
sets of lines. Similarly, the offset component is the low order bits of the address
that are used to select bytes within a line.
[0032] Figure 4B illustrates one embodiment of a memory address
implemented for lookup in a compressed cache. Figure 4B shows the
implementation of a companion bit used to map companion lines into the same
set. The companion bit is used in instances where a line is not compressed.
Accordingly, if a line is not compressed, the companion bit indicates which of the
two compressed companion lines are to be used. [0033] In one embodiment, the window of address bits that are used for set
selection is shifted to the left by one so that the companion bit lies between the set
selection and byte offset bits. In this way, companion lines map to the same cache
set since the companion bit and set selection bits do not overlap. The companion
bit, which now is no longer part of the set selection bits, becomes part of the tag,
though the actual tag size does not increase. In a traditional uncompressed cache,
the companion bit is a part of the address and is used in set selection to determine
whether an address hashes to an odd or even cache set.
[0034] Figure 5 illustrates one embodiment of a tag array entry for a
compressed cache. The tag array entries include the companion bit (e.g., as part
of the address tag bits) and a compression bit. The compression bit causes the
compressed cache 103 tag to be one bit larger than a traditional uncompressed
cache's tag. The compression bit indicates whether a line is compressed.
[0035] Particularly, the compression bit specifies how to deal with the
companion bit. If the compression bit indicates a line is compressed, the
companion bit is treated as a part of the offset because the line is a compressed
pair. If the compression bit indicates no compression, the companion bit is
considered as a part of the tag array and ignored as a part of the offset.
[0036] Figure 6 is a block diagram illustrating one embodiment of cache
controller 104. Cache controller 104 includes set and way selection logic 610, byte
selection logic 620 and compression logic 630. Set and way selection logic 610 is
used to select cache lines within cache 103. Figure 7 illustrates one embodiment
of set and way selection logic 610 in a compressed cache. [0037] Referring to Figure 7, set and way selection logic 610 includes tag
comparison logic 710 that receives input from a tag array to select a cache line
based upon a received address. The tag comparison logic 710 takes into account
whether a cache line holds compressed data. Because cache lines hold a variable
data size, tag comparison logic 710 is also variable length, depending on whether
a particular line is compressed or not. Therefore, the tag match takes into account
the compression bit.
[0038] Figure 8 illustrates one embodiment of tag comparison logic 710
includes exclusive-nor (XNOR) gates 1-n, an OR gate and an AND gate. The
XNOR gates and the AND gate is included in traditional uncompressed caches,
and are used to compare the address with tag entries in the tag array until a
match is found. The OR gate is used to select the companion bit depending upon
the compression state of a line.
[0039] The companion bit of the address is selectively ignored depending
on whether the compression bit is set. As discussed above, if the compression bit
is set, the companion bit of the address is ignored during tag match because the
cache line contains both companions. If the compression bit is not set, the
companion bit of the address is compared with the companion bit of the tag.
[0040] The "Product of XNOR" organization of the equality operator,
therefore, uses the OR gate to selectively ignore the companion bit. In one
embodiment, because the tag's companion bit is ignored when the compression
bit is set (e.g., it is a "don't care"), the tag's companion bit can be used for other
uses. For example, when a line is compressed, this bit may be used as a compression format bit to select between two different compression algorithms.
In another example, the companion bit can be used to encode the ordering of
companion lines in the compressed line
[0041] In other embodiments, each cache line is partitioned into two sectors
that are stored in the same physical cache line only if the sectors can be
compressed together. In the tag entry, the companion and compression bits
become sector presence indications, as illustrated in Figure 9. In this
embodiment, the companion bit is a sector identifier (e.g., upper or lower) and
thus has been relabeled as sector ID.
[0042] Accordingly, a "01" indicates a lower sector (not compressed), "10"
indicates an upper sector (not compressed), and a "11" indicates both sectors (2:1
compression). Also, in this arrangement the physical cache line size is the same as
the logical sector size. When uncompressed, each sector of a line is stored in a
different physical line within the same set (e.g., different ways of the same set).
[0043] When compressible by at least 2:1, the two sectors of each line are
stored in a single physical cache line (e.g., in one way). It is important to note
that this differs from traditional sectored cache designs in that different logical
sectors of a given logical line may be stored simultaneously in different ways
when uncompressed.
[0044] In one embodiment, a free encoding ("00") is used to indicate an
invalid entry, potentially reducing the tag bit cost if combined with other bits that
encode the MESI state. Because this is simply an alternative encoding, the sector
presence bits require slightly difference logic to detect tag match. Figure 10 illustrates another embodiment of tag comparison logic 610 implementing sector
presence encoding.
[0045] Referring back to Figure 6, byte selection logic 620 selects the
addressed datum within a line. According to one embodiment, byte selection
logic 620 depends on the compression bit. Figure 11 illustrates one embodiment
of byte selection logic 620. Byte selection logic 620 includes a decompressor 1110
to decompress a selected cache line if necessary. An input multiplexer selects
between a decompressed cache line and an uncompressed cache line depending
upon the compression bit.
[0046] In one embodiment, the range of the offset depends on whether the
line is compressed. If the line is compressed, the companion bit of the address is
used as the high order bit of the offset. If the line is not compressed,
decompressor 1110 is bypassed and the companion bit of the address is not used
for the offset. The selected line is held in a buffer whose size is twice the physical
line size to accommodate compressed data.
[0047] Alternative embodiments may choose to use the companion bit to
select which half of the decompressed word to store in a buffer whose length is
the same as the physical line size. However, buffering the entire line is
convenient for modifying and recompressing data after writes to the cache.
[0048] Referring back to Figure 6, compression logic 630 is used to
compress cache lines. In one embodiment, cache lines are compressed according
to a Lempel-Ziv compression algorithm. However in other embodiments, other
compression algorithms (e.g., WK, X-Match, sign-bit compression, run-length compression, etc.) may be used to compress cache lines.
[0049] Compression logic 630 may also be used to determine when a line is
to be compressed. According to one embodiment, opportunistic compression is
used to determine when a line is to be compressed.
In opportunistic compression, when a cache miss occurs the demanded cache line
is fetched from memory 115 and cache 103 attempts to compress both companions
into one line if its companion line is resident in the cache. If the companion line is
not resident in cache 103 or if the two companions are not compressible by 2:1,
then cache 103 uses its standard replacement algorithm to make space for the
fetched line.
[0050] Otherwise, cache 103 reuses the resident companion's cache line to
store the newly compressed pair of companions thus avoiding a replacement.
Note, that it is easy to modify the tag match operator to check whether the
companion line is resident without doing a second cache access. For example, if
all of the address tag bits except for the companion bit match, then the companion
line is resident.
[0051] In another embodiment, a prefetch mechanism is used to determine
if lines are to be compressed. In the prefetch compression mechanism the
opportunistic approach is refined by adding prefetching. If the companion of the
demand-fetched line is not resident, the cache prefetches the companion and
attempts to compress both companions into one line.
[0052] If the two companion lines are not compressible by 2:1, cache 103
has the choice of either discarding the prefetched line (thus wasting bus bandwidth) or storing the uncompressed prefetched line in the cache (thus
potentially resulting in a total of two lines to be replaced in the set). In one
embodiment, the hardware can adaptively switch between these policies based on
how much spatial locality and latency tolerance the program exhibits.
[0053] In another embodiment, a victim compression mechanism is used to
determine if lines are to be compressed. For victim compression, there is an
attempt to compress a line that is about to be evicted (e.g., a victim). If a victim is
not already compressed and its companion is resident, cache 103 gives the victim
a chance to remain resident in the cache by attempting to compress it with its
companion. If the victim is already compressed, its companion is not resident, or
the victim and its companion are not compressible by 2:1, the victim is then
evicted. Otherwise, cache 103 reuses the resident companion's cache line to store
the compressed pair of companions, thus avoiding the eviction.
[0054] As data is written, the compressibility of a line may change. A write
to a compressed pair of companions may cause the pair to be no longer
compressible. Three approaches may be taken if a compressed cache line becomes
uncompressible. The first approach is to simply evict another line to make room
for the extra line resulting from the expansion. This may cause two companion
lines to be evicted if all lines in the set are compressed.
[0055] The second approach is to evict the companion of the line that was
written. The third approach is to evict the line that was written. The choice of
which of these approaches to take depends partly on the interaction between the
compressed cache 103 and the next cache closest to the processor (e.g., if the L3 is a compressed cache then it depends on the interaction between L3 and L2).
[0056] Assuming that the compressed cache is an inclusive L3 cache and
that L2 is a write-back cache, the first two approaches include an invalidation of
the evicted line in the L2 cache to maintain multi-level inclusion, which has the
risk of evicting a recently accessed cache line in L2 or LI. The third approach does
not require L2 invalidation and does not have the risk of evicting a recently
accessed cache line from L2 because the line that is being written is being evicted
from L2.
[0057] The above-described mechanism allows any two cache lines that
map to the same set and that differ only in their companion bit to be compressed
together into one cache line. In one embodiment, the mechanism modifies the set
mapping function and selects the companion bit such that it allows adjacent
memory lines to be compressed together, which takes advantage of spatial
locality.
[0058] Whereas many alterations and modifications of the present
invention will no doubt become apparent to a person of ordinary skill in the art
after having read the foregoing description, it is to be understood that any
particular embodiment shown and described by way of illustration is in no way
intended to be considered limiting. Therefore, references to details of various
embodiments are not intended to limit the scope of the claims which in
themselves recite only those features regarded as the invention.

Claims

CLAIMSWhat is claimed is:
1. A computer system comprising: a central processing unit (CPU); and a cache memory, coupled to the CPU, having a plurality of compressible
cache lines to store additional data.
2. The computer system of claim 1 wherein the computer system further
comprises a cache controller to perform lookup operations of the cache memory.
3. The computer system of claim 1 wherein the cache controller is included
within the CPU.
4. The computer system of claim 2 wherein the cache controller comprises an
array of tags corresponding to each of the plurality of cache lines, each tag having
one or more compression encoding bits indicating whether a corresponding cache
line is compressed.
5. The computer system of claim 4 wherein a single cache line stores two or
more cache lines if the corresponding compression bit indicates that the line is
compressed.
6. The computer system of claim 4 wherein each tag includes one or more
companion encoding bits indicating which companion lines are stored in a
common cache set.
7. The computer system of claim 5 wherein the companion lines are adjacent
memory lines.
8. The computer system of claim 4 wherein the companion encoding bits used
as a compression format bit to select between different compression algorithms.
9. The computer system of claim 4 wherein the companion encoding bits used
to encode the ordering of companion lines in the compressed line.
10. The computer system of claim 6 wherein the cache controller further
comprises set and way selection logic to select a cache line.
11. The computer system of claim 10 wherein the set and way selection logic
comprises tag comparison logic to compare a cache line address to tags in the
arrays of tags.
12. The computer system of claim 11 wherein the tag comparison logic ignores
the one or more companion encoding bits within the address if the one or more
compression encoding bits indicate that the cache line is compressed.
13. The computer system of claim 11 wherein the tag comparison logic
compares the one or more companion bits within the address with the one or
more companion encoding bits within the tag if the compression encoding bits
indicate that the cache line is not compressed.
14. The computer system of claim 10 wherein the cache controller further
comprises compression logic to compress a cache line.
15. The computer system of claim 14 wherein the compression logic
compresses cache lines via a dictionary based compression algorithm.
16. The computer system of claim 14 wherein the compression logic
compresses cache lines via a sign-bit compression algorithm.
17. The computer system of claim 14 wherein the compression logic
determines when a cache line is to be compressed.
18. The computer system of claim 17 wherein the compression logic
compresses a cache line based upon opportunistic compression.
19. The computer system of claim 17 wherein the compression logic
compresses a cache line based upon prefetch compression.
20. The computer system of claim 17 wherein the compression logic
compresses a cache line based upon victim compression.
21. The computer system of claim 14 wherein the cache controller further
comprises byte selection logic to select addressed datum within a cache line.
22. The computer system of claim 21 wherein the byte selection logic
comprises: a decompressor to decompress a selected cache line; an input multiplexer to select between a decompressed cache line and an
un-decompressed cache line; and an output multiplexer to select between companion lines in the uncompressed cache line.
23. A cache controller comprising: compression logic to compress lines within a cache memory device; and set and way logic to select cache lines.
24. The cache controller of claim 23 further comprising an array of tags
corresponding to each of the cache lines, each tag having one or more
compression encoding bits indicating whether a corresponding cache line is
compressed.
25. The cache controller of claim 24 wherein a single cache line stores two or
more cache lines if the corresponding compression bit indicates that the line is
compressed.
26. The cache controller of claim 24 wherein each tag includes one or more
companion encoding bits indicating which companion lines are stored in a
common cache set.
27. The cache controller of claim 26 wherein the set and way selection logic
comprises tag comparison logic to compare a cache line address to tags in the
arrays of tags.
28. The cache controller of claim 27 wherein the tag comparison logic ignores
the one ore more companion encoding bits within the address if the one or more
compression encoding bits indicate that the cache line is compressed.
29. The cache controller of claim 28 wherein the tag comparison logic
compares the one ore more companion bits within the address with the one ore
more companion encoding bits within the tag if the compression encoding bits
indicates that the cache line is not compressed.
30. The cache controller of claim 23 wherein the compression logic compresses
cache lines via a dictionary based compression algorithm.
31. The cache controller of claim 23 wherein the compression logic compresses
cache lines via a sign-bit compression algorithm.
32. The cache controller of claim 23 wherein the compression logic determines
when a cache line is to be compressed.
33. The cache controller of claim 23 wherein the cache controller further
comprises byte selection logic to select addressed datum within a cache line.
34. The cache controller of claim 33 wherein the byte selection logic comprises: a decompressor to decompress a selected cache line; an input multiplexer to select between a decompressed cache line and an
un-decompressed cache line; and an output multiplexer to select between companion lines in the
uncompressed cache line.
35. A method comprising: determining if a first cache line within a cache memory device is to be compressed; and compressing the first cache line.
36. The method of claim 35 wherein compressing the first cache line comprises
storing data from a second cache line within the first cache line.
37. The method of claim 35 further comprising analyzing a tag associated with
the first cache line in a tag array to determine if the first cache line is compressed.
38. The method of claim 37 further comprising analyzing one or more
companion encoding bits if the first cache line is not compressed.
39. The method of claim 38 further comprising disregarding the one or more
companion encoding bits if the first cache line is compressed.
40. The method of claim 37 further comprising using the one or more
companion encoding bits as a compression format bit to select between different
compression algorithms if the first cache line is compressed.
41. The method of claim 37 further comprising using the one or more
companion encoding bits to encode the ordering of companion lines in the first
cache line if the first cache line is compressed.
42. A computer system comprising: a central processing unit (CPU); a cache memory, coupled to the CPU, having a plurality of compressible
cache lines to store additional data; a chipset coupled to the CPU; and a main memory.
43. The computer system of claim 1 wherein the computer system further
comprises a cache controller to perform lookup operations of the cache memory.
44. The computer system of claim 1 wherein the cache controller is included
within the CPU.
45. The computer system of claim 1 wherein the cache controller is included
within the chipset.
46. The computer system of claim 43 wherein the cache controller comprises
an array of tags corresponding to each of the plurality of cache lines, each tag
having one or more compression encoding bits indicating whether a
corresponding cache line is compressed.
47. The computer system of claim 46 wherein a single cache line stores two or
more cache lines if the corresponding compression bit indicates that the line is
compressed.
PCT/US2004/032110 2003-09-30 2004-09-29 A mechanism to compress data in a cache WO2005033946A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006534088A JP4009310B2 (en) 2003-09-30 2004-09-29 Computer system, cache control unit, method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/676,480 US7143238B2 (en) 2003-09-30 2003-09-30 Mechanism to compress data in a cache
US10/676,480 2003-09-30

Publications (1)

Publication Number Publication Date
WO2005033946A1 true WO2005033946A1 (en) 2005-04-14

Family

ID=34377403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/032110 WO2005033946A1 (en) 2003-09-30 2004-09-29 A mechanism to compress data in a cache

Country Status (4)

Country Link
US (1) US7143238B2 (en)
JP (1) JP4009310B2 (en)
CN (1) CN100432959C (en)
WO (1) WO2005033946A1 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7243191B2 (en) * 2004-08-31 2007-07-10 Intel Corporation Compressing data in a cache memory
US7809900B2 (en) * 2006-11-24 2010-10-05 Sandforce, Inc. System, method, and computer program product for delaying an operation that reduces a lifetime of memory
US7904619B2 (en) 2006-11-24 2011-03-08 Sandforce, Inc. System, method, and computer program product for reducing memory write operations using difference information
US7747813B2 (en) * 2006-11-24 2010-06-29 Sandforce, Inc. Multi-memory device system and method for managing a lifetime thereof
US7904672B2 (en) 2006-12-08 2011-03-08 Sandforce, Inc. System and method for providing data redundancy after reducing memory writes
JP5194703B2 (en) * 2007-10-16 2013-05-08 ソニー株式会社 Data processing apparatus and shared memory access method
US7849275B2 (en) 2007-11-19 2010-12-07 Sandforce, Inc. System, method and a computer program product for writing data to different storage devices based on write frequency
US7903486B2 (en) 2007-11-19 2011-03-08 Sandforce, Inc. System, method, and computer program product for increasing a lifetime of a plurality of blocks of memory
US9183133B2 (en) * 2007-11-28 2015-11-10 Seagate Technology Llc System, method, and computer program product for increasing spare space in memory to extend a lifetime of the memory
US20090210622A1 (en) * 2008-02-19 2009-08-20 Stefan Birrer Compressed cache in a controller partition
JP4653830B2 (en) * 2008-09-19 2011-03-16 株式会社東芝 Instruction cache system
US8516166B2 (en) * 2009-07-20 2013-08-20 Lsi Corporation System, method, and computer program product for reducing a rate of data transfer to at least a portion of memory
US20130019029A1 (en) * 2011-07-13 2013-01-17 International Business Machines Corporation Lossless compression of a predictive data stream having mixed data types
US8990217B2 (en) 2011-07-13 2015-03-24 International Business Machines Corporation Lossless compression of high nominal-range data
US9261946B2 (en) * 2012-10-11 2016-02-16 Wisconsin Alumni Research Foundation Energy optimized cache memory architecture exploiting spatial locality
KR102336528B1 (en) 2014-07-07 2021-12-07 삼성전자 주식회사 Electronic device having cache memory and method for operating thereof
US9361228B2 (en) 2014-08-05 2016-06-07 Qualcomm Incorporated Cache line compaction of compressed data segments
US20160283390A1 (en) * 2015-03-27 2016-09-29 Intel Corporation Storage cache performance by using compressibility of the data as a criteria for cache insertion
CN107408076B (en) * 2015-04-08 2020-12-11 国立大学法人奈良先端科学技术大学院大学 Data processing apparatus
US10025956B2 (en) * 2015-12-18 2018-07-17 Intel Corporation Techniques to compress cryptographic metadata for memory encryption
US10019375B2 (en) * 2016-03-02 2018-07-10 Toshiba Memory Corporation Cache device and semiconductor device including a tag memory storing absence, compression and write state information
US10042576B2 (en) * 2016-08-17 2018-08-07 Advanced Micro Devices, Inc. Method and apparatus for compressing addresses
CN115129618A (en) * 2017-04-17 2022-09-30 伊姆西Ip控股有限责任公司 Method and apparatus for optimizing data caching
US10983915B2 (en) * 2019-08-19 2021-04-20 Advanced Micro Devices, Inc. Flexible dictionary sharing for compressed caches
US11586554B2 (en) * 2020-07-23 2023-02-21 Arm Limited Cache arrangements for data processing systems
US20230315627A1 (en) * 2022-03-16 2023-10-05 International Business Machines Corporation Cache line compression prediction and adaptive compression
US20230297382A1 (en) * 2022-03-16 2023-09-21 International Business Machines Corporation Cache line compression prediction and adaptive compression

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030135694A1 (en) * 2002-01-16 2003-07-17 Samuel Naffziger Apparatus for cache compression engine for data compression of on-chip caches to increase effective cache size

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5237675A (en) * 1990-06-04 1993-08-17 Maxtor Corporation Apparatus and method for efficient organization of compressed data on a hard disk utilizing an estimated compression factor
US5247638A (en) * 1990-06-18 1993-09-21 Storage Technology Corporation Apparatus for compressing data in a dynamically mapped virtual data storage subsystem
US5206939A (en) * 1990-09-24 1993-04-27 Emc Corporation System and method for disk mapping and data retrieval
JP3426385B2 (en) * 1995-03-09 2003-07-14 富士通株式会社 Disk controller
US5875454A (en) * 1996-07-24 1999-02-23 International Business Machiness Corporation Compressed data cache storage system
US6115787A (en) * 1996-11-05 2000-09-05 Hitachi, Ltd. Disc storage system having cache memory which stores compressed data
US5798718A (en) * 1997-05-12 1998-08-25 Lexmark International, Inc. Sliding window data compression method and apparatus
US6199126B1 (en) * 1997-09-23 2001-03-06 International Business Machines Corporation Processor transparent on-the-fly instruction stream decompression
US6092071A (en) * 1997-11-04 2000-07-18 International Business Machines Corporation Dedicated input/output processor method and apparatus for access and storage of compressed data
US7024512B1 (en) * 1998-02-10 2006-04-04 International Business Machines Corporation Compression store free-space management
US6145069A (en) * 1999-01-29 2000-11-07 Interactive Silicon, Inc. Parallel decompression and compression system and method for improving storage density and access speed for non-volatile memory and embedded memory devices
US20010054131A1 (en) * 1999-01-29 2001-12-20 Alvarez Manuel J. System and method for perfoming scalable embedded parallel data compression
US6819271B2 (en) * 1999-01-29 2004-11-16 Quickshift, Inc. Parallel compression and decompression system and method having multiple parallel compression and decompression engines
US6289420B1 (en) * 1999-05-06 2001-09-11 Sun Microsystems, Inc. System and method for increasing the snoop bandwidth to cache tags in a multiport cache memory subsystem
US6449689B1 (en) * 1999-08-31 2002-09-10 International Business Machines Corporation System and method for efficiently storing compressed data on a hard disk drive
US6507895B1 (en) * 2000-03-30 2003-01-14 Intel Corporation Method and apparatus for access demarcation
US6523102B1 (en) * 2000-04-14 2003-02-18 Interactive Silicon, Inc. Parallel compression/decompression system and method for implementation of in-memory compressed cache improving storage density and access speed for industry standard memory subsystems and in-line memory modules
US6480938B2 (en) * 2000-12-15 2002-11-12 Hewlett-Packard Company Efficient I-cache structure to support instructions crossing line boundaries
US6735673B2 (en) * 2002-01-10 2004-05-11 Hewlett-Packard Development Company, L.P. Apparatus and methods for cache line compression
US6795897B2 (en) * 2002-05-15 2004-09-21 International Business Machines Corporation Selective memory controller access path for directory caching
US7162669B2 (en) * 2003-06-10 2007-01-09 Hewlett-Packard Development Company, L.P. Apparatus and method for compressing redundancy information for embedded memories, including cache memories, of integrated circuits

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030135694A1 (en) * 2002-01-16 2003-07-17 Samuel Naffziger Apparatus for cache compression engine for data compression of on-chip caches to increase effective cache size

Also Published As

Publication number Publication date
US7143238B2 (en) 2006-11-28
JP2007507806A (en) 2007-03-29
US20050071562A1 (en) 2005-03-31
JP4009310B2 (en) 2007-11-14
CN1853170A (en) 2006-10-25
CN100432959C (en) 2008-11-12

Similar Documents

Publication Publication Date Title
US7143238B2 (en) Mechanism to compress data in a cache
US7162584B2 (en) Mechanism to include hints within compressed data
US6640283B2 (en) Apparatus for cache compression engine for data compression of on-chip caches to increase effective cache size
US7243191B2 (en) Compressing data in a cache memory
US6795897B2 (en) Selective memory controller access path for directory caching
JP6505132B2 (en) Memory controller utilizing memory capacity compression and associated processor based system and method
US7162583B2 (en) Mechanism to store reordered data with compression
US6961821B2 (en) Reconfigurable cache controller for nonuniform memory access computer systems
US11586555B2 (en) Flexible dictionary sharing for compressed caches
US5905997A (en) Set-associative cache memory utilizing a single bank of physical memory
US6353871B1 (en) Directory cache for indirectly addressed main memory
US6202128B1 (en) Method and system for pre-fetch cache interrogation using snoop port
US20050071566A1 (en) Mechanism to increase data compression in a cache
Benveniste et al. Cache-memory interfaces in compressed memory systems
US6587923B1 (en) Dual line size cache directory
US10140211B2 (en) Cache device and method for storing tag data and cache data in cache device
US5835945A (en) Memory system with write buffer, prefetch and internal caches
US5278964A (en) Microprocessor system including a cache controller which remaps cache address bits to confine page data to a particular block of cache
KR20040073167A (en) Computer system embedded sequantial buffer for improving DSP data access performance and data access method thereof
US20240119001A1 (en) Method for caching and migrating de-compressed page
WO2023158357A1 (en) Identification of random-access pages and mitigation of their impact on compressible computer memories
US7966452B2 (en) Cache architecture for a processing unit providing reduced power consumption in cache operation
Sardashti et al. Memory Compression
EP1913479A2 (en) Cache architecture for a processing unit providing reduced power consumption in cache operation

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200480027175.6

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006534088

Country of ref document: JP

122 Ep: pct application non-entry in european phase