|Publication number||US7873770 B2|
|Application number||US 11/559,049|
|Publication date||Jan 18, 2011|
|Filing date||Nov 13, 2006|
|Priority date||Nov 13, 2006|
|Also published as||US20080114916|
|Publication number||11559049, 559049, US 7873770 B2, US 7873770B2, US-B2-7873770, US7873770 B2, US7873770B2|
|Inventors||Mark D. Hummel, Andrew W. Lueck, Andrew G. Kegel|
|Original Assignee||Globalfoundries Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (28), Non-Patent Citations (4), Referenced by (16), Classifications (11), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
1. Field of the Invention
This invention is related to the field of computer systems, and more particularly address translation mechanisms for input/output (I/O) device-initiated requests.
2. Description of the Related Art
Computer systems of various types are ubiquitous in modern society, including personal computers (PCs), workstations, servers, various personal digital assistant (PDA) devices, etc. Most, if not all, of these computer systems have implemented memory management functionality for processor accesses to memory. Generally, the memory management functionality has included translating addresses from a virtual address space used by each process to a physical address space that spans the actual system memory, along with various memory protections (e.g. read only, read/write, privilege level requirements, etc.). The memory management functionality has a variety of uses: protecting the memory used by each process from unauthorized access by other processes; permitting large virtual spaces to be used by processes even if the physical memory system is not that large; relocation of virtual addresses to available physical memory without the participation of the process; etc.
While the processor addresses are frequently translated, addresses used by input/output (I/O) devices in computer systems are generally not translated. That is, the I/O devices use physical addresses to access memory. In a single operating system (OS) computer system, such as most PCs, the OS controls access to the I/O devices by other processes (applications and OS services). Accordingly, the OS can control which process has access to a given device at any given point in time, and can at least somewhat control the addresses accessed by the device. However, such mechanisms become more complicated and cumbersome in virtual machine systems, which may have multiple guest OSs running on a virtual machine monitor. Additionally, devices' use of physical addresses reduces the overall security of the system, since a rogue device (or a device programmed by a malicious software agent) can access memory unimpeded.
Additional challenges exist on at least some peripheral interfaces to which the I/O devices are connected or over which the devices communicate, directly or indirectly (e.g. through one or more bridges that bridge between peripheral interfaces). An address space associated with the peripheral interface can include one or more address ranges that are assigned operations other than a memory access. That is, while a read or write operation is specified as the command, an address in the address range is interpreted as causing the operation, in addition to or instead of the memory access. For example, interrupts can be signalled through an address range, system management operations can be specified through an address range, etc.
If translation of I/O-generated addresses is to be performed, a mechanism is needed for handling these special address ranges. Additionally, interrupts generated by the I/O devices (e.g. through a special address range, as message signalled interrupts (MSIs), etc.) must be handled correctly.
In one embodiment, an input/output memory management unit (IOMMU) comprises a control register and control logic coupled to the control register. The control register is configured to store a base address of a device table, wherein a given input/output (I/O) device has an associated device identifier that selects a first entry in the device table. The first entry comprises a pointer to an interrupt remapping table. The control logic is configured to remap an interrupt specified by an interrupt request received by the IOMMU from the given I/O device if the interrupt remapping table includes an entry for the interrupt.
In an embodiment, a method comprises receiving an interrupt request in an input/output memory management unit (IOMMU) from an input/output (I/O) device, wherein the interrupt request specifies an interrupt; locating a device table entry corresponding to the I/O device in a device table identified by a base address programmed into the IOMMU, wherein the device table entry is located responsive to an associated device identifier corresponding to the I/O device, and wherein the device table entry comprises a pointer; locating an interrupt remapping table responsive to the pointer; and remapping an interrupt specified by the interrupt request if the interrupt remapping table includes an entry for the interrupt.
In some embodiments, a system comprises an input/output (I/O) device configured to transmit an interrupt request, and an input/output memory management unit (IOMMU) coupled to receive the interrupt request and to remap an interrupt specified by the interrupt request via an interrupt remapping table. The interrupt remapping table is located via a pointer in a device table entry corresponding to the I/O device, wherein the device table is located by a base address programmed into the IOMMU. The device table entry is located in the device table using an device identifier corresponding to the I/O device.
The following detailed description makes reference to the accompanying drawings, which are now briefly described.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
As illustrated in
Generally, the I/O devices 22 may be configured to generate memory requests, such as memory read and write requests, to access memory locations in the memory 20. The memory requests may be part of a direct memory access (DMA) read or write operation, for example. The DMA operations may be initiated by software executed by the processors 12, programming the I/O devices 22 directly or indirectly to perform the DMA operations. Among other things, the I/O devices 22 may be provided with virtual addresses to access the memory 20. The virtual addresses may be translated by the IOMMU 26 to corresponding physical addresses to access the memory, and the physical addresses may be provided to the memory controller 18 for access. That is, the IOMMU 26 may modify the memory requests sourced by the I/O devices 22 to change the virtual address in the request to a physical address, and the memory request may be forwarded to the memory controller 18 to access the memory 20.
The IOMMU uses a set of I/O translation tables 36 stored in the memory 20 to translate the addresses of memory requests from the I/O devices 22. Generally, translation tables may be tables of translation data that can be used to translate virtual addresses to physical addresses. The translation tables may store the translation data in any fashion. For example, in one embodiment, the I/O translation tables 36 may include page tables similar to those defined in the x86 and AMD64 ™ instruction set architectures. Various subsets of the virtual address bits may be used to index levels of the table, and each level may either be the end of translation (i.e. storing a real page number for the translation) or may point to another table (indexed by another set of virtual address bits). The page may be the unit of translation (i.e. each address in the virtual page translates to the same physical page). Pages may have varying sizes, from 4 kilobytes up to Megabytes or even Gigabytes. Additionally, the translation tables 36 may include a device table that maps devices to sets of page tables (e.g. by device identifier). The device identifier (ID) may be defined in a variety of ways, and may be dependent on the peripheral interconnect to which the device is attached. For example, Peripheral Component Interconnect (PCI) devices may form a device identifier from the bus number, device number and function number. HyperTransport™ devices may use a bus number and unit ID to form a device identifier. Thus, in general, a translation from a virtual address to a physical address may be stored in one or more entries in one or more translation tables, and some of the entries may be shared with other translations. Traversing the tables from entry to entry may be part of identifying the translation for the virtual address. In one embodiment, the translation tables 36 may include an interrupt remapping table to remap interrupts signalled by the I/O devices 22 (e.g. via MSIs, and address range associated with interrupt operations, etc.).
Specifically, the IOMMU 26 illustrated in
To facilitate more rapid translations, the IOMMU may cache some translation data. For example, the IOTLB 30 may be a form of cache, which caches the result of previous translations, mapping virtual page numbers to real page numbers and corresponding translation data. If a translation is not found in the IOTLB 30 for the given memory request, the table walker 28 may be invoked. In various embodiments, the table walker 28 may be implemented in hardware, or in a microcontroller or other processor and corresponding executable code (e.g. in a read-only memory (ROM) in the IOMMU 26). Additionally, other caches may be included to cache page tables, or portions thereof, and/or device tables, or portions thereof, as part of IOTLB/cache 30
The control logic 34 may be configured to access the IOTLB 30 to detect a hit/miss of the translation for a given memory request, and may invoke the table walker. The control logic 34 may also be configured to modify the memory request from the I/O device with the translated address, and to forward the request upstream toward the memory controller. Additionally, the control logic 34 may control various functionality in the IOMMU 26 as programmed into the control registers 32. For example, the control registers 32 may define an area of memory to be a command queue 42 for memory management software to communicate control commands to the IOMMU 26, in this embodiment. The control logic 34 may be configured to read the control commands from the command queue 42 and execute the control commands. Similarly, the control registers 32 may define another area of memory to be an event log buffer 44. The control logic 34 may detect various events and write them to the event log buffer 44. The events may include various errors detected by the control logic 34 with respect to translations and/or other functions of the IOMMU 26.
The I/O devices 22 may comprise any devices that communicate between the computer system 10 and other devices, provide human interface to the computer system 10, provide storage (e.g. disk drives, compact disc (CD) or digital video disc (DVD) drives, solid state storage, etc.), and/or provide enhanced functionality to the computer system 10. For example, the I/O devices 22 may comprise one or more of: network interface cards, integrated network interface functionality, modems, video accelerators, audio cards or integrated audio hardware, hard or floppy disk drives or drive controllers, hardware interfacing to user input devices such as keyboard, mouse, tablet, etc., video controllers for video displays, printer interface hardware, bridges to one or more peripheral interfaces such as PCI, PCI express (PCIe), PCI-X, USB, firewire, SCSI (Small Computer Systems Interface), etc., sound cards, and a variety of data acquisition cards such as GPIB or field bus interface cards, etc. The term “peripheral device” may also be used to describe some I/O devices.
In some cases, one or more of the I/O devices 22 may also comprise an IOTLB, such as IOTLBs 24. These IOTLBs may be referred to as “remote IOTLBs”, since they are external to the IOMMU 26. In such cases, the memory requests that have already been translated may be marked in some fashion so that the IOMMU 26 does not attempt to translate the memory request again.
The memory controller 18 may comprise any circuitry designed to interface between the memory 20 and the rest of the system 10. The memory 20 may comprise any semiconductor memory, such as one or more RAMBUS DRAMs (RDRAMs), synchronous DRAMs (SDRAMs), DDR SDRAM, static RAM, etc. The memory 20 may be distributed in a system, and thus there may be multiple memory controllers 18.
The MMU 14 may comprise a memory management unit for memory requests sourced by a processor 12. The MMU may include TLBs 16, as well as table walk functionality. When a translation is performed by the MMU 14, the MMU 14 may generate translation memory requests (e.g. shown as dotted arrows 46 and 48 in
The processors 12 may comprise any processor hardware, implementing any desired instruction set architecture. In one embodiment, the processors 12 implement the x86 architecture, and more particularly the AMD64™ architecture. Various embodiments may be superpipelined and/or superscalar. Embodiments including more than one processor 12 may be implemented discretely, or as chip multiprocessors (CMP) and/or chip multithreaded (CMT).
The system 10 illustrates high level functionality of the system, and the actual physical implementation may take many forms. For example, the MMU 14 is commonly integrated into each processor 12.
In the illustrated embodiment, the system 10 a comprises processing nodes 60A-60B, which respectively comprise processors 12A-12B further comprising MMUs 14A-14B. The processor nodes 60A-60B also comprise memory controllers 18A-18B. Each of processors 12A-12B may be an instance of a processor 12 as mentioned above. Similarly, each of MMUs 14A-14B and memory controllers 18A-18B may be instances of the MMU 14 and memory controller 18 shown in
The system 10 a includes a distributed memory system, comprising memories 20A-20B. The physical address space may be distributed over the memories 20A-20B. Accordingly, a given memory request specifying a given address is routed to the memory controller 18A or 18B coupled to the memory 20A or 20B to which that given address is assigned.
Memory requests from the I/O devices (e.g. I/O devices 22A-22D, coupled to I/O Hubs 62A-62B as illustrated in
The IOMMU may be placed anywhere along the path between I/O-sourced memory requests and the memory 20. In the illustrated embodiment, IOMMUs 26A-26B are included in the I/O hubs 62A-62B. Thus, any memory requests sourced by an I/O device coupled to the corresponding hub may be translated by the IOMMU in the I/O hub. Other embodiments may locate the IOMMU in different places, from IOTLBs in the I/O devices to IOMMUs within the processing nodes 60A-60B, or even IOMMUs at the memory controllers 18A-18B. Still further, IOMMUs may be located at different points in different parts of the system.
Address Range Reclaiming and Interrupt Remapping
Turning now to
The device table 36A includes a plurality of entries, indexed by a device ID assigned to the device. Thus, a given device corresponds to one of the entries in the device table 36A (unless the device has multiple device IDs). The entry may include a variety of data. An exemplary entry is shown in
Specifically, the entry may include a pointer to the I/O page tables 36C (represented by arrow 70). The pointer to the I/O page tables 36C may point to a page table that is the starting point for translation searching in the page tables 36C. The starting page table may include pointers to other page tables, in a hierarchical fashion, as mentioned above. The page tables may be indexed by various bits of the virtual address to be translated, according to the implemented translation process.
The entry may also include a pointer to the interrupt remapping table 36B (represented by arrow 72). The interrupt remapping data may be used when an interrupt request is transmitted by a device, and may be indexed by an interrupt ID. The interrupt ID may comprise data that identifies the requested interrupt, and may vary based on the mechanism used to transmit the interrupt request. For example, PCIe defines MSIs, and the interrupt is specified via the MSI data. The MSI data may comprise the interrupt ID. In HT, portions of the address specify the interrupt. The specification information may comprise, e.g., destination (e.g. processor) and vector on that processor. In some embodiments, some or all of the data forming the interrupt ID may be explicitly included in the interrupt request. In other embodiments, some or all of the data may be implicit in the interrupt request (e.g. based on the type of interrupt request, the specific interrupt requested, etc.). In still other embodiments, a combination of explicit and implicit data may be used.
It is noted that, while one device table 36A is shown, multiple device tables may be maintained if desired. The device table base address in the control register 32A may be changed to indicate other device tables. Furthermore, device tables may be hierarchical, if desired, similar to the page tables described above. Similarly, while one interrupt remapping table 36B is shown, there may be multiple interrupt mapping tables, e.g. up to one per entry in the device table 36A. There may also be multiple sets of page tables, e.g. up to one per entry in the device table 36A. It is noted that other embodiments may implement interrupt remapping without I/O translation, and may implement I/O translation (including reclaiming certain address ranges, as described in more detail below) without interrupt remapping.
In one embodiment, at least one peripheral interconnect between the I/O devices 22 and the IOMMU 26 uses one or more address ranges in the address space on that interconnect to specify operations other than the memory operation that would be performed based on the read/write encoding of the command. The operations may be referred to as “special operations” and the corresponding address ranges may be referred to as “special operation address ranges”.
Some devices may be known not to generate certain operations mapped to some of the address ranges shown. For example, some devices may not use the legacy programmable interrupt controller (PIC) interrupt acknowledge (IACK) space, because they don't implement the legacy PIC. Some devices may not communicate in the system management space. While devices receive transactions in the configuration or extended configuration spaces, devices frequently don't initiate transactions in the configuration spaces. Devices may not use device messaging. Generally, the devices don't use the reserved spaces.
For such devices, it may be desirable to reclaim those address ranges to be usable as virtual addresses, translated through the page tables to physical addresses outside the corresponding range. For each reclaimed page, a translation may be provided in the translation tables 36 that translates the addresses in that virtual page to physical addresses mapped to the memory 20. Accordingly, the I/O device-initiated requests in those address ranges may be redirected to memory, and may perform normal memory read/write operations instead of the operation(s) assigned to that range. If a given range is used by a given device, translations for pages in that range may be established in the translation tables 36 with a unity mapping. A unity mapping may be a mapping of a virtual address to a physical address that is numerically the same as the virtual address. Pages having a unity mapping may cause the operation(s) assigned to the corresponding address range, instead of the memory operation. It is not necessary that all pages in a given range have the unity mapping or be reclaimed. The decision to reclaim or provide the unity mapping may be made on a page by page basis.
In some cases, it may be desirable to override the translation, through the I/O page tables 36C, for a special operation address range. Control fields in the device table entry for the device may be used for such ranges, as described in more detail below.
In addition to having translations assignable to the address ranges associated with special operations, some I/O devices may have multiple address spaces that are memory-mapped to a larger address space. These regions may have unique properties requiring special handling. Again, if such regions are not in use for a given device, the address ranges may be reclaimed using the translation tables 36.
Turning now to
It is noted that the I/O device that initiates a request may be directly coupled to the peripheral interface on which the address ranges are defined (e.g. the HT interface, in this example), or may be coupled to a different peripheral interface that is bridged to that interface. The bridge circuitry may convert the requests from the non-HT interface to read/write requests with addresses in the corresponding range.
In the table of
The interrupt/EOI address range comprises interrupt requests and end of interrupt (EOI) responses to interrupts. Portions of the addresses in the range may be used to specify the particular requested interrupt. For example, some interrupts are a specific address encoding for a specific interrupt. In x86-compatible processors, such interrupts may include system management interrupt (SMI), non-maskable interrupt (NMI), initialization interrupt (INIT), and external interrupt. Additionally, other interrupts are specified as an interrupt vector. The interrupt vector identifies the requested interrupt according to software and/or hardware convention, and may be used to locate the corresponding interrupt service routine (ISR) in memory. That is, the interrupt vector may be a portion of the address, or may be mapped to an offset from a base address of ISRs in memory.
The interrupt/EOI range is not reclaimed through the I/O page tables 36C in this embodiment, but may be remapped by the IOMMU 26 through the I/O interrupt table 36B. Additional details for one embodiment are provided below for interrupt remapping. Additionally, the IOMMU 26 response to certain interrupts (Lint0, Lint1, NMI, external interrupt, and INIT) are controlled by fields in the device table entry for the I/O device (Lint1P, Lint0P, NMIP, ElntP, and INITP).
The legacy programmable interrupt controller (PIC) interrupt acknowledge (IACK) address range may be used for communications related to an interrupt controller that was commonly used in personal computers (PCs) based on x86 processors prior to the advanced programmable interrupt controller (APIC) specification that is currently in use in PCs. Specifically, interrupt acknowledgements required by the legacy PIC may be transmitted in this range. If a legacy PIC, or legacy PIC functionality, is used in a PC, this legacy PIC address range is used for such communication. The legacy PIC IACK address range may be reclaimed through the T/O page tables 36C, since the PIC IACK traffic is only transmitted downstream (from the host to the device).
The system management address range may be used for various system management commands. The commands may include, e.g., commands to cause a processor to go into a power saving mode such as sleep mode, commands to cause the processor to put other devices/system components into various power saving modes, etc. The system management address range may be reclaimed using the T/O page tables 36C, and additional control is provided via the SysMgt field in the device table entry (described in more detail below with regard to
The two reserved address ranges are generally not used. Accordingly, these ranges may be reclaimed using the T/O page tables 36C. If the reserved address ranges are assigned to operations in the future, unity mappings in the T/O page tables 36C may be used to enable use of the newly-assigned operations.
The I/O space address range may be used for a device to initiate port I/O requests to I/O ports in the system. The I/O space address range may be reclaimed using the I/O page tables 36C, and additional control is provided via the IoCtl field in the device table entry (described in more detail below with regard to
The configuration and extended configuration ranges are generally used to configure I/O devices 22. However, the devices are typically receivers of configuration reads/writes and thus typically do not initiate requests in that range. Additionally, the extended configuration space overlaps with the device messaging range. Devices that communicate with each other directly, without software intervention, may use the device messaging range for such messages. Both ranges may be reclaimed using the I/O page tables 36C.
While the SysMgt and IoCtl fields are defined in this embodiment for providing additional control for the corresponding address ranges and their operations, other embodiments may provide additional fields for other address ranges, and/or may not provide the SysMgt and IoCtl fields, as desired.
Turning now to
The Lint1P and Lint0P bits may be used to control whether legacy PIC interrupt requests for Lint1 and Lint0 are blocked or passed unmodified by the IOMMU 26. These interrupts are specific addresses in the Interrupt/EOI address range that are associated with the legacy PIC. If these types of interrupt requests are not expected, they may be blocked using the Lint1P and Lint0P bits. Specifically, in this embodiment, the Lint1P and Lint0P bits may be set to permit the corresponding interrupts to pass the IOMMU 26 unmodified, and may be clear to block the corresponding interrupts. In a similar fashion, the NMIP, EIntP, and INITP bits may control the passing or blocking of the NMI, external interrupt, and INIT interrupt, respectively. It is noted that, in this embodiment, SMI is passed unmodified through the IOMMU 26.
The IntCtl field may control how fixed and arbitrated interrupt messages are handled by the IOMMU 26. Encodings of this field may be used to specify that such interrupts are blocked, remapped using the interrupt remapping table 36B, or forwarded unmodified, in one embodiment. If blocked, the IOMMU 26 may target abort the interrupt message.
The interrupt table pointer field (IntTablePtr) may store the base address of the interrupt remapping table 36C (e.g. illustrated as arrow 72 in
The SysMgt field may be encoded to provide further control of communications in the system management range. Specifically, in one embodiment, the SysMgt field may be encoded to: block requests in the range; forward requests in the range unmodified (posted writes only); forward requests that map to INTx messages unmodified (posted writes only); or translate requests using the I/O page tables 36C. The ToCtl field may be encoded to provide further control of communications in the I/O space range. Specifically, in one embodiment, the ToCtl field may be encoded to: block requests in the range; forward the requests unmodified; or translate the requests using the I/O page tables 36C.
The Domain ID is used to tag IOTLB entries and any other cache entries in the IOMMU 26 so that different devices differentiate their translation data. If devices share translation tables, they may have the same Domain ID to share cache/IOTLB entries.
The page table pointer (PageTablePtr) is the pointer to the I/O page tables 36C (e.g. represented by arrow 70 in
Turning now to
If the address included in the request is not in an address range associated with a special operation (i.e. an operation other than a memory read/write operation) (decision block 90, “no” leg), the IOMMU 26 may translate the address using the I/O page tables 36C (block 92). If the address is in such a range (e.g., any of the ranges shown in
If the address included in the request is in the special address range (decision block 90, “yes” leg) and is not in the interrupt/EOI range (decision block 94, “no” leg), but is in the System Management or I/O space ranges (decision block 100, “yes” leg), the SysMgt or IoCtl field is used to determine if translation is overridden. If the control field indicates that the request is blocked (decision block 102, “yes” leg), the IOMMU 26 may not forward the request. In some embodiments, the IOMMU 26 may abort the request so that the I/O device that initiated the request is informed. In the case of the SysMgt field, the request may be blocked if it is in the system management range or if it is an INTx message, for different encodings of the field. If the control field indicates that the request is forwarded unmodified (decision block 104, “yes” leg), the request is forwarded without translation, or unmodified (block 106). Otherwise, the request may be translated according to the I/O page tables 36C (block 108). Similarly, if the address included in the request is in the special address range (decision block 90, “yes” leg) and is not in the interrupt/EOI range (decision block 94, “no” leg), nor in the System Management or I/O space ranges (decision block 100, “no” leg), the IOMMU may translate the request according to the I/O page tables 36C (block 108). If the translation fails, the IOMMU 26 may take various actions. For example the IOMMU 26 may inform the initiating I/O device of the failure, log the failure in the event log buffer 44, etc. based on various configuration settings, not shown. The translation may fail due to failure to find a device table entry (which may be detected before any of the operation shown in
Turning now to
If the requested interrupt is one of the interrupts controlled by specific bits (Lint0, Lint 1, NMI, external interrupt, or INIT—decision block 110, “yes” leg), and the corresponding control in the device table entry indicates that the interrupt is passed unmodified (decision block 112, “yes” leg), the IOMMU 26 may forward the interrupt request unmodified (block 114). If the interrupt is not enabled in the device table entry (decision block 112, “no” leg), the interrupt request may be blocked. For example, the interrupt request may be target aborted by the IOMMU 26.
If the requested interrupt is not one of the specifically-controlled interrupts (decision block 110, “no” leg), the IntCtl field may control the response of the IOMMU 26. If the IntCtl field indicates that the interrupts are blocked (decision block 118, “yes” leg), then the request is not forwarded. The request may be target aborted, as mentioned above. If the IntCtl field indicates that the interrupts are forwarded, the interrupt is forwarded without remapping (decision block 120, “yes” leg and block 114). Otherwise, the interrupt is remapped according to the interrupt remapping table 36B and the remapped interrupt request is forwarded (block 122). The flowchart assumes that the I/O remapping data in the device table entry is valid (e.g. that the IV bit indicates valid). If the data is not valid, the remapping may fail. The IOMMU 26 may take various actions if the remapping fails. For example, the interrupt may be ignored, an error may be logged in the event log buffer 44, etc. based on various configuration settings, not shown. Similarly, if an interrupt is blocked, various actions may be taken including one or more of the preceding actions.
Turning now to
As illustrated in
On the other hand, a second device-initiated request may be received by the IOMMU 26 (reference numeral 136). The IOMMU 26 responds to the request by is translating the address according to the I/O page tables 36C, and locates a non-unity mapping translation (reference numeral 138). Thus, the request is forwarded to the target (e.g. the memory) with a physical address determined from the translation. Since the physical address is in the address range mapped to the memory, a normal read/write memory operation may be performed (reference numeral 140).
It is noted that, while the above discussion mentions forwarding the request to the target (which differs based on the translation), the IOMMU 26 may simply forward the request with the translated address. The request may be detected by both the memory controller and the processor (or other routing circuitry upstream from the IOMMU 26) and either the processor or the memory controller may respond to the operation as appropriate.
Turning now to
If the translation is being created for a virtual address in one of the address ranges associated with a special operation, the IOMMU code may determine if the special operation is expected to be initiated by the I/O device (decision block 150). Various information may be taken into account in the decision. For example, PCI-type capabilities blocks may include information on whether or not certain operations may be initiated by the device. The type of device may indicate whether certain operations may be initiated. Any information from any source may be used in the determination. If the operation may be initiated (decision block 150, “yes” leg), a unity mapping may be created in the I/O page tables 36C for each page in the corresponding address range (block 152). If the operation is not expected (decision block 150, “no” leg), the IOMMU code may determine if reclaiming the range for memory operations is desired (decision block 154). Various OS allocation policies, the expected range of addresses to be used by the device, etc., may factor into the decision. If reclaiming is desired (decision block 154, “yes” leg), the IOMMU code may create a translation in the I/O pages tables 36C mapping each page in the range to a memory page (block 156). If no reclaim is desired (decision block 154, “no” leg), no translation may be created for the range. If the range supports blocking of requests, e.g. via a field in the device table entry, the IOMMU code may use the field to create the blocking (block 158).
As mentioned above, the interrupt remapping table 36B may be used to remap interrupts requested by a given device. Generally, the interrupt remapping table entry corresponding to a given requested interrupt may include the information to be supplied to the interrupt handling circuitry to identify the interrupt to which the requested interrupt is remapped. For example, the interrupt handling circuitry may comprises a local APIC in each of the processors 12, in one embodiment. Many of the interrupt remapping details are provided in the above discussion, for one embodiment. Additional details and various uses thereof are described below.
The remapped interrupt may be specified by an interrupt vector and destination (Vector and Destination in
The interrupt remapping mechanism, including the interrupt remapping table 36B having entries, e.g., similar to
In a similar fashion, interrupts may be retargetted from one processor to another in any environment (virtualized or not) to improve performance or to balance load among processors in the system. The interrupt remapping table may provide a generic, centralized mechanism to permit such retargetting.
The interrupt remapping mechanism may also be used to improve security in the computer system. A device may be repurposed to attempt a denial of service attack, or malicious software may control a device to attempt such an attack, but frequently and repeatedly issue interrupt requests from the device. Using the interrupt remapping mechanism, such interrupt requests may be ignored or otherwise prevented from interrupting useful work.
Yet another possible use of the interrupt remapping mechanism may be to make the system 10 more scalable as the number of processors increases. Often, in multi-processing systems, the same interrupt vector number on each processor must have the same interrupt service routine associated with it. In this fashion, a given interrupt targetted at any processor is serviced properly. Using interrupt remapping, such interrupts may be remapped to the same interrupt vector number on a single processor. Each additional processor in the system may provide additional unique interrupt vectors that may be used for other purposes.
It is noted that, while the above description above refers to using various data in the translation tables 36 to translate addresses and/or remap interrupts, such data may be cached in the IOMMU 26 (e.g. the caches and/or IOTLB 30) and/or in remote IOTLBs 24. Wherever translation table data is used, or translations/remapping are performed according to the data, a cached copy of the data or a cached copy of the result of the translation may be used. Direct access to the memory storing the data may not be needed if the data is cached.
Turning now to
The Hypervisor may configure the device table entries so that the software-remapped interrupts are blocked and cause an I/O page fault to be logged in the event log buffer 44 (block 170). For example, the Lint0, Lint1, NMI, ExtInt, and INIT interrupts may be blocked using the specific control bits assigned to those interrupts in the device table entry and by clearing the IG bit in the entry. The fixed/arbitrated interrupts may be blocked using the SI and R bits in the corresponding interrupt mapping table entry, or the R bit in combination with the IG bit in the device table entry. Until a software remapped interrupt request is received, the mechanism may be idle (represented by decision block 172, “no” leg). When a software remapped interrupt request is received (decision block 172, “yes” leg), the IOMMU 26 may detect that the interrupt is blocked, and may target abort the interrupt request. The IOMMU 26 may log the I/O page fault for the interrupt request in the event log buffer 44, and interrupt software to invoke the Hypervisor (block 174). The event log buffer entry may include the information used by the Hypervisor to route the interrupt request to the correct guest (e.g. complete address and device ID). The Hypervisor may read the event log from the event log buffer 44 and detect the I/O page fault (block 176). The Hypervisor may remap the interrupt to the appropriate guest for handling (block 178).
Turning next to
In the embodiment of
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4550368||Oct 31, 1983||Oct 29, 1985||Sun Microsystems, Inc.||High-speed memory and memory management system|
|US4802085||Jan 22, 1987||Jan 31, 1989||National Semiconductor Corporation||Apparatus and method for detecting and handling memory-mapped I/O by a pipelined microprocessor|
|US4812967||Mar 7, 1986||Mar 14, 1989||Hitachi, Ltd.||Method and apparatus for controlling interrupts in a virtual machine system|
|US5301287||Feb 16, 1993||Apr 5, 1994||Hewlett-Packard Company||User scheduled direct memory access using virtual addresses|
|US5826084 *||Mar 25, 1997||Oct 20, 1998||Texas Instruments Incorporated||Microprocessor with circuits, systems, and methods for selectively bypassing external interrupts past the monitor program during virtual program operation|
|US5949436||Sep 30, 1997||Sep 7, 1999||Compaq Computer Corporation||Accelerated graphics port multiple entry gart cache allocation system and method|
|US5956516 *||Dec 23, 1997||Sep 21, 1999||Intel Corporation||Mechanisms for converting interrupt request signals on address and data lines to interrupt message signals|
|US5987557 *||Jun 19, 1997||Nov 16, 1999||Sun Microsystems, Inc.||Method and apparatus for implementing hardware protection domains in a system with no memory management unit (MMU)|
|US6065088 *||Aug 31, 1998||May 16, 2000||International Business Machines Corporation||System and method for interrupt command queuing and ordering|
|US6128684 *||Jun 29, 1998||Oct 3, 2000||Nec Corporation||Bus bridge|
|US6622193 *||Nov 16, 2000||Sep 16, 2003||Sun Microsystems, Inc.||Method and apparatus for synchronizing interrupts in a message passing queue oriented bus system|
|US6725289||Apr 17, 2002||Apr 20, 2004||Vmware, Inc.||Transparent address remapping for high-speed I/O|
|US6886171 *||Feb 20, 2001||Apr 26, 2005||Stratus Technologies Bermuda Ltd.||Caching for I/O virtual address translation and validation using device drivers|
|US7155379||Feb 25, 2003||Dec 26, 2006||Microsoft Corporation||Simulation of a PCI device's memory-mapped I/O registers|
|US7209994 *||Feb 25, 2005||Apr 24, 2007||Advanced Micro Devices, Inc.||Processor that maintains virtual interrupt state and injects virtual interrupts into virtual machine guests|
|US7302511 *||Oct 13, 2005||Nov 27, 2007||Intel Corporation||Chipset support for managing hardware interrupts in a virtual machine system|
|US7487327 *||Jun 1, 2005||Feb 3, 2009||Sun Microsystems, Inc.||Processor and method for device-specific memory address translation|
|US7552436 *||Nov 25, 2003||Jun 23, 2009||International Business Machines||Memory mapped input/output virtualization|
|US20030135685 *||Jan 16, 2002||Jul 17, 2003||Cowan Joe Perry||Coherent memory mapping tables for host I/O bridge|
|US20040215860 *||Apr 24, 2003||Oct 28, 2004||International Business Machines Corporation||Virtualization of a global interrupt queue|
|US20060075146||Sep 30, 2004||Apr 6, 2006||Ioannis Schoinas||Address translation for input/output devices using hierarchical translation tables|
|US20060195848 *||Feb 25, 2005||Aug 31, 2006||International Business Machines Corporation||System and method of virtual resource modification on a physical adapter that supports virtual resources|
|US20060200616 *||Mar 2, 2005||Sep 7, 2006||Richard Maliszewski||Mechanism for managing resources shared among virtual machines|
|US20060230208 *||Apr 7, 2005||Oct 12, 2006||Gregg Thomas A||System and method for presenting interrupts|
|US20060277348 *||Jun 1, 2005||Dec 7, 2006||Microsoft Corporation||Scalable DMA remapping on a computer bus|
|US20060288130 *||Jun 21, 2005||Dec 21, 2006||Rajesh Madukkarumukumana||Address window support for direct memory access translation|
|US20060294277 *||Jun 24, 2005||Dec 28, 2006||Tetrick Raymond S||Message signaled interrupt redirection|
|US20070079039 *||Sep 30, 2005||Apr 5, 2007||Ashok Raj||Method and apparatus to retarget platform interrupts in a reconfigurable system|
|1||*||Darren Abramson et al, Intel Virtulization Technology for Directed I/O, Aug. 10, 2006, Intel Technology Journal, vol. 10, Issue 3, 2006.|
|2||*||Intel Virtualization Technology for Directed I/O, Intel, Aug. 10, 2006, pp. 179-193.|
|3||*||Mark Hummel, IO Memory Management Hardware goes Mainstream, WinHEC, Apr. 25-27, 2006.|
|4||U.S. Appl. No. 11/559,028, entitled "Efficiently Controlling Special Memory Mapped System Accesses", May 2008.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8180944 *||Nov 3, 2009||May 15, 2012||Advanced Micro Devices, Inc.||Guest interrupt manager that records interrupts for guests and delivers interrupts to executing guests|
|US8234429||Nov 5, 2009||Jul 31, 2012||Advanced Micro Devices, Inc.||Monitoring interrupt acceptances in guests|
|US8234432||Nov 3, 2009||Jul 31, 2012||Advanced Micro Devices, Inc.||Memory structure to store interrupt state for inactive guests|
|US8312195 *||Feb 18, 2010||Nov 13, 2012||Red Hat, Inc.||Managing interrupts using a preferred binding between a device generating interrupts and a CPU|
|US8327055 *||Apr 12, 2010||Dec 4, 2012||International Business Machines Corporation||Translating a requester identifier to a chip identifier|
|US8489789||Dec 6, 2010||Jul 16, 2013||Advanced Micro Devices, Inc.||Interrupt virtualization|
|US8631212||Sep 25, 2011||Jan 14, 2014||Advanced Micro Devices, Inc.||Input/output memory management unit with protection mode for preventing memory access by I/O devices|
|US8689007 *||Mar 25, 2008||Apr 1, 2014||International Business Machines Corporation||Integrity protection in data processing systems|
|US8706941||Jun 13, 2013||Apr 22, 2014||Advanced Micro Devices, Inc.||Interrupt virtualization|
|US9009368||Oct 23, 2012||Apr 14, 2015||Advanced Micro Devices, Inc.||Interrupt latency performance counters|
|US20080235534 *||Mar 25, 2008||Sep 25, 2008||International Business Machines Corporation||Integrity protection in data processing systems|
|US20100191888 *||Nov 3, 2009||Jul 29, 2010||Serebrin Benjamin C||Guest Interrupt Manager to Aid Interrupt Virtualization|
|US20110197003 *||Dec 6, 2010||Aug 11, 2011||Serebrin Benjamin C||Interrupt Virtualization|
|US20110202699 *||Feb 18, 2010||Aug 18, 2011||Red Hat, Inc.||Preferred interrupt binding|
|US20110252173 *||Apr 12, 2010||Oct 13, 2011||International Business Machines Corporation||Translating a requester identifier to a chip identifier|
|US20130318334 *||Apr 24, 2012||Nov 28, 2013||Peter P. Waskiewicz, JR.||Dynamic interrupt reconfiguration for effective power management|
|U.S. Classification||710/266, 710/260, 710/269|
|International Classification||G06F13/32, G06F13/24|
|Cooperative Classification||G06F12/1027, G06F12/1081, G06F2212/206, G06F13/24|
|European Classification||G06F12/10P, G06F13/24|
|Nov 13, 2006||AS||Assignment|
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUMMEL, MARK D.;LUECK, ANDREW W.;KEGEL, ANDREW G.;REEL/FRAME:018510/0874;SIGNING DATES FROM 20061106 TO 20061110
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUMMEL, MARK D.;LUECK, ANDREW W.;KEGEL, ANDREW G.;SIGNING DATES FROM 20061106 TO 20061110;REEL/FRAME:018510/0874
|Aug 18, 2009||AS||Assignment|
Owner name: GLOBALFOUNDRIES INC., CAYMAN ISLANDS
Free format text: AFFIRMATION OF PATENT ASSIGNMENT;ASSIGNOR:ADVANCED MICRO DEVICES, INC.;REEL/FRAME:023120/0426
Effective date: 20090630
Owner name: GLOBALFOUNDRIES INC.,CAYMAN ISLANDS
Free format text: AFFIRMATION OF PATENT ASSIGNMENT;ASSIGNOR:ADVANCED MICRO DEVICES, INC.;REEL/FRAME:023120/0426
Effective date: 20090630
|Jun 18, 2014||FPAY||Fee payment|
Year of fee payment: 4