US 7689778 B2
In various embodiments, hardware, software and firmware or combinations thereof may be used to prevent cache conflicts within microprocessors and/or computer systems. More particularly, embodiments of the invention relate to a technique to prevent cache conflicts within a processor and/or computer system in which a number of accesses may be made to a particular cache or group of caches.
1. A method comprising:
initiating a first snoop operation from a first bus agent to a shared cache;
initiating a second snoop operation from a second bus agent to the shared cache, wherein the first bus agent is a first processor core within a multi-core processor and the second bus agent is a bus agent external to the multi-core processor;
blocking the second snoop operation to allow the first snoop operation to complete by issuance of a snoop blocking signal from a logic coupled between the shared cache, the first processor core and the external bus agent, the blocking to prevent the external bus agent and other external bus agents from gaining ownership of the shared cache until the snoop blocking signal is de-asserted;
issuing at least one block checking transaction from the logic after the first snoop operation has been initiated to determine if the second snoop operation is pending; and
if the second snoop operation is pending, issuing a snoop unblock signal from the logic to unblock the second snoop operation after the first snoop operation has completed to allow the external bus agent to gain ownership of the shared cache.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. A system comprising:
a multi-core processor including a first processor core, and a shared cache to store data to be used by cores within the multi-core processor, and a logic coupled to the first processor core and the shared cache to block a snoop operation from an external agent by issuance of a snoop blocking signal to prevent the external agent and other external agents from gaining ownership of the shared cache until the snoop blocking signal is de-asserted, issue at least one block checking transactions after a cross-snoop operation has been initiated within the multi-core processor to determine if the snoop operation is pending, and if the snoop operation is pending, issue a snoop unblock signal to unblock the snoop operation after the cross-snoop operation has completed to allow the external agent to gain ownership of the shared cache;
the external agent coupled to a bus external to the multi-core processor, the external agent to initiate the snoop operation to the shared cache if the cross-snoop operation is not in progress, wherein the external agent is to check for cross-snoop operations being initiated within the multi-core processor by submission of a cross-snoop check to the shared cache before initiation of the snoop operation, and if the external agent detects a cross-snoop operation, the external agent does not snoop the shared cache until the cross-snoop operation is complete, otherwise the external agent snoops the shared cache.
10. The system of
11. The system of
12. The system of
13. The system of
14. The system of
15. The system of
16. A processor comprising:
a plurality of cores each associated with a private cache;
a shared cache coupled to the plurality of cores;
a logic coupled to the shared cache and the plurality of cores, the logic to receive a cross-snoop check from an external agent to the shared cache, and to issue a look-up request to the shared cache if no cross-snoop from one of the plurality of cores has been detected, wherein the logic is to allow the external agent to snoop the shared cache if no request from one of the plurality of cores is initiated before the external agent issues the look-up request, and otherwise to block the look-up request via a snoop blocking signal that is to prevent the external agent from gaining ownership of the shared cache until the snoop blocking signal is de-asserted if a cross-snoop to the shared cache from one of the plurality of cores has been detected responsive to the cross-snoop check, issue at least one block checking transaction after the cross-snoop has been initiated to determine if the look-up request is pending, and to unblock the look-up request via a snoop unblock signal after the cross-snoop is completed to allow the external agent to gain the shared cache ownership.
17. The processor of
18. The processor of
Embodiments of the invention relate to microprocessors and microprocessor systems. More particularly, embodiments of the invention relate to preventing cache access conflicts within a processor or computer system in which a number of accesses occur to the same cache or group of caches.
Prior art processors and computer systems may be limited in the number of accesses to a particular cache or group of caches that can be concurrently managed. One prior art technique used to combat this problem has been the use of an inclusive cache structure whose cache entries correspond to the cache entries of one or more processor core-specific caches, such as level 1 (L1) caches.
In other words, prior art multi-core processors and/or multi-processor computer systems have attempted to reduce cache access conflicts within core caches by simply directing some of the cache accesses to a shared inclusive cache structure, such as a last level cache (LLC), that contains all of the cache entries of the processor cores or agents to which the inclusive cache structure corresponds. In the case of a cache access from a core within a multi-core processor, however, the core will typically attempt to access data first from its own cache and then resort to the shared cache. The shared inclusive cache structure is sometimes referred to as a “cache filter”, as it shields core caches from excessive cache accesses, and therefore bus traffic, from other agents by providing the requested data to these agents from the inclusive cache instead of the core's cache.
The prior art technique of using a cache structure, such as an LLC, for servicing cache requests from various agents is helpful in allowing requesting agents to obtain the data they need without resorting to a cache of a processor core, for example, if the data is not exclusively owned or modified by a particular processor core. To the extent that an agent, such as a processor or processor core owns the cache line of its cache that the requesting agent is trying to access, a cache structure, such as an LLC, can allow the requesting agent to obtain the data it is requesting rather than waiting for the owning agent to share the data.
However, other conflicts can occur when using an LLC to service cache requests from external agents and processor cores.
The first potential conflict, “conflict window A” in
The prior art problem depicted in
Cache conflicts, such as those depicted in
Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Embodiments of the invention relate to caching architectures within microprocessors and/or computer systems. More particularly, embodiments of the invention relate to a technique to manage cache conflicts within a processor and/or computer system in which a number of accesses may be made to a particular cache or group of caches.
This disclosure describes various embodiments of the invention to address problems associated with prior art caching techniques in multi-processor and/or multi-core computer systems, including conflict resolution and avoidance when a number of requesting agents attempt to access the same line of cache. In at least one embodiment of the invention, an inclusive cache structure, such as a last level cache (LLC), is used in conjunction with a number of processors or processor cores having an associated cache, such as a level 1 (L1) cache. Inclusive cache structures, such as an LLC, include those that contain at least the same data as other caches to which the inclusive cache structure corresponds. By maintaining coherence between the inclusive cache structure and corresponding core and/or processor caches, accesses to the corresponding core/processor caches are serviced by the inclusive cache, thereby reducing bus traffic to the corresponding cores/processors and allowing the cores/processors.
Embodiments of the invention, in which an inclusive cache structure is used, can also reduce or even prevent the number and/or types of conflicts that can occur when an agent external to a multi-core processor, such as another processor, (“external agent”) and a processor core attempt to access the same line of cache within the inclusive cache structure.
For example, at least one embodiment of the invention prevents cache conflicts resulting from a cache snoop request from an external agent to a line within an inclusive cache structure, such as an LLC, that is being accessed as a result of a cross-snoop operation initiated by a core within the processor to which the LLC corresponds.
Throughout this disclosure, operations are referred to as “transactions” that may be performed via a command or set of commands. Furthermore, transactions mentioned throughout this disclosure may be performed via a sequence of bus cycles or signals from various functional units. The terms, “transaction”, “operation”, and “signal” may therefore be used interchangeably throughout this disclosure.
In at least one embodiment of the invention, the CBSO logic may be used to manage and prevent conflicts to the same cache line (i.e. same address) resulting from a number of transactions, including an LLC snoop by an external agent occurring at substantially the same time as a cross-snoop between cores of a multi-core processor.
An LLC access by either an external agent or core, typically involves read and read-for-ownership transactions from the cores/external agents accessing the LLC to read or gain ownership of a desired line of cache. If an LLC look-up initiated by a processor core results in a hit in another processor core's cache, the request may be allocated to another processor core's cache. In this case, several opportunities for conflicts, including those described above, may result between the core cross-snoop transaction and a snoop of the LLC from an external agent, such as another processor or other system agent.
Cross snoop transactions typically result when an ownership request from a core determines that the LLC line is owned by another core or when a read transaction from a core determines that another core may have the most current version of desired data in its cache. In these cases, the core requesting data from another core's cache will perform a snoop to the other core's cache (“cross snoop”) owning the line, which can result in the core's line state changing from “exclusive” to “invalid” or “shared”, depending on the particular coherency protocol being used. In one embodiment of the invention, the CBSO logic manages, or prevents, conflicts resulting from a snoop to the LLC from an external bus agent to a line being accessed in the LLC resulting from a cross-snoop transaction.
External snoops coming before cycle 313 may change the cache state in the LLC line and cause a system error. Therefore, in one embodiment, external requests are blocked until cycle 313. If at cycle 313 the core request is indeed a cross snoop, then no unblocking signal will be sent at cycle 313, instead, unblock signal comes when cross snoop is completed.
In the embodiment illustrated in
In the embodiment of
The embodiment illustrated within
Illustrated within the processor of
The main memory may be implemented in various memory sources, such as dynamic random-access memory (DRAM), a hard disk drive (HDD) 720, or a memory source located remotely from the computer system via network interface 730 containing various storage devices and technologies. The cache memory may be located either within the processor or in close proximity to the processor, such as on the processor's local bus 707. Furthermore, the cache memory may contain relatively fast memory cells, such as a six-transistor (6T) cell, or other memory cell of approximately equal or faster access speed.
The computer system of
The system of
At least one embodiment of the invention may be located within the processors 870 and 880. Other embodiments of the invention, however, may exist in other circuits, logic units, or devices within the system of
Embodiments of the invention described herein may be implemented with circuits using complementary metal-oxide-semiconductor devices, or “hardware”, or using a set of instructions stored in a medium that when executed by a machine, such as a processor, perform operations associated with embodiments of the invention, or “software”. Alternatively, embodiments of the invention may be implemented using a combination of hardware and software.
While the invention has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments, which are apparent to persons skilled in the art to which the invention pertains are deemed to lie within the spirit and scope of the invention.