|Publication number||US7861121 B2|
|Application number||US 11/479,268|
|Publication date||Dec 28, 2010|
|Filing date||Jun 30, 2006|
|Priority date||Nov 23, 1999|
|Also published as||US7111307, US20060248542|
|Publication number||11479268, 479268, US 7861121 B2, US 7861121B2, US-B2-7861121, US7861121 B2, US7861121B2|
|Original Assignee||Microsoft Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (41), Non-Patent Citations (4), Referenced by (2), Classifications (9), Legal Events (5)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application is a continuation of U.S. patent application Ser. No. 09/447,501, filed Nov. 23, 1999 now U.S. Pat. No. 7,111,307,B1, which is hereby incorporated by reference.
The invention relates generally to computer systems, and more particularly to an improved method and system for monitoring and verifying software components such as kernel-mode drivers.
In contemporary operating systems such as Microsoft Corporation's Windows® 2000, low-level (i.e., kernel mode) components including drivers and the operating system itself, handle critical system operations. At the same time, for performance and architectural reasons, drivers typically load in an environment where any driver memory is accessible by any other driver. Furthermore, performance requirements keep operating system overhead to a minimum. Consequently, such components are highly privileged in the operations that they are allowed to perform, and moreover, do not have the same protection mechanisms as higher level (i.e., user mode) components. As a result, even the slightest error in a kernel component can corrupt the system and cause a system crash.
Determining the cause of a system crash so that an appropriate fix may be made has heretofore been a difficult, labor-intensive and somewhat unpredictable task, particularly since the actual component responsible for corrupting the system often appears to be substantially unrelated to the problem. For example, one way in which a kernel component can cause a system crash is related to the way in which pooled memory is arranged and used. For many reasons, including performance and efficiency, pooled memory is allocated by the system kernel as a block, (e.g., in multiples of thirty-two bytes), with a header (e.g., eight bytes) at the start of each block. For example, if forty-four bytes of pooled memory are required by a driver, sixty-four are allocated by the kernel, eight for the header, forty-four for the driver, with the remaining twelve unused. Among other information, the header includes information that tracks the block size. Then, when the memory is deallocated, the kernel looks to see if this block may be coalesced with any adjacent deallocated blocks, so that larger blocks of memory become available for future requests. If so, the header information including the block size is used to coalesce the adjacent blocks.
However, while this mechanism is highly efficient in satisfying requests for memory allocations and then recombining deallocated memory, if an errant kernel component writes beyond its allocated memory block, it overwrites the header of the subsequent block. For example, if a driver requests twenty-four bytes, it will receive one thirty-two byte block, eight for the header followed by the requested twenty-four bytes. However, if the driver writes past the twenty-fourth byte, the driver will corrupt the next header, whereby the kernel may, for example, later coalesce the next block with an adjacent block even though the next block may be allocated to another kernel component. As can be appreciated, other types of errors may result from the corrupted header. In any event, the kernel or the component having the next block allocated to it (or even an entirely different component) will likely appear responsible for the crash, particularly if the problem caused by the errant driver in overwriting the header does not materialize until long after the errant driver has deallocated its memory block.
Another way in which an errant driver may crash the system is when a driver frees pooled memory allocated thereto, but then later writes to it after the memory has been reallocated to another component, corrupting the other component's information. This may lead to a crash in which the other component appears responsible. Indeed, this post-deallocation writing can be a very subtle error, such as if the erroneous write occurs long after the initial deallocation, possibly after many other components have successfully used the same memory location. Note that such a post-deallocation write may also overwrite a header of another block of pooled memory, e.g., when smaller blocks are later allocated from a deallocated larger block.
Yet another type of error that a kernel component may make is failing to deallocate memory that the component no longer needs, often referred to as a “memory leak.” This can occur, for example, when a driver unloads but still has memory allocated thereto, or even when a driver is loaded but for some reason does not deallocate unneeded memory. Note that this can occur because of the many complex rules drivers need to follow in order to safely interact with other drivers and operating system components. For example, if two related components are relying on each other to deallocate the space, but neither component actually does deallocate it, a memory leak results. Memory leaks can be difficult to detect, as they slowly degrade machine performance until an out-of-memory error occurs.
Other kernel component errors involve lists of resources maintained by the kernel to facilitate driver operations, and the failure of the driver to properly delete its listed information when no longer needed. For example, a driver may request that the kernel keep timers for regularly generating events therefor, or create lookaside lists, which are fixed-sized blocks of pooled memory that can be used by a driver without the overhead of searching the pool for a matching size block, and thus are fast and efficient for repeated use. A driver may also fail to delete pending deferred procedure calls (DPCs), worker threads, queues and other resources that will cause problems when the driver unloads. Moreover, even when still loaded, the driver should delete items when no longer needed, e.g., a timer maintained by the kernel for a driver may cause a write to a block of memory no longer allocated to the driver. Other errors include drivers incorrectly specifying the interrupt request level (IRQL) for a requested operation, and spinlock errors, i.e., errors related to a mechanism via which only one processor in a multi-processor system can operate at a time, while a driver in control of the spinlock uses the operational processor to execute a critical section of code that cannot be interrupted.
Further complicating detection of the above errors, and identification of their source, is that the errors are often difficult to reproduce. For example, a driver may have a bug that does not arise unless memory is low, and then possibly only intermittently, whereby a test system will not reproduce the error because it does not reproduce the conditions.
In sum, kernel components such as drivers need to be privileged, which makes even slight errors therein capable of crashing the system, yet such errors are often difficult to detect, difficult to match to the source of the problem and/or difficult to reproduce.
Briefly, the present invention provides a method and system that enables monitoring of user-specified kernel mode components, to watch for select errors committed thereby. To this end, a kernel mode component such as a driver is identified to the kernel at the time the driver is loaded, along with information identifying the type or types of errors for which the driver is to be monitored. Calls by the identified driver to the kernel are re-vectored to a driver verifier component in the kernel, and the driver verifier component takes actions to monitor the driver based on the type or types of monitoring selected for that driver.
Actions that may be taken by the driver verifier include satisfying driver memory pool allocation requests from a special pool that is isolated and bounded by no access permissions. This ascertains whether a driver allocates a certain number of bytes and accesses bytes outside of that allocation. When a driver deallocates space, the space is marked as “No Access” to detect drivers that later access the deallocated space. Also, pool being freed is examined to ensure that no pending timers are inside the pool allocation.
The driver verifier may also be enabled to track a driver's use of pooled memory. To this end, the driver verifier maintains data structures that record information about each allocation, appropriately updating the information as memory is allocated and deallocated. When the driver is unloaded, the driver verifier checks that the driver's space is all deallocated, otherwise a memory leak is detected. Driver unload checking also detects drivers that unload without deleting kernel resources including lookaside lists, pending deferred procedure calls (DPCs), worker threads, queues, timers and other resources.
The driver verifier may also be enabled to simulate low resource conditions. One way to simulate low resource conditions is to provide extreme memory pressure on a specific driver, without affecting other drivers, and regardless of system memory size. This is accomplished by instructing memory management to invalidate the driver's pageable code and data, as well as system paged pool, code and data. This catches drivers that incorrectly hold spinlocks or raise the interrupt request level, and then access paged code or data. Another way in which the driver verifier may simulate low resource conditions to force driver errors is by randomly failing requests from the driver for pooled memory, thereby determining whether a driver can handle this low memory situation.
Further, the driver verifier validates the parameters of a kernel function call, whereby errors in spinlock, IRQL and pool allocation calls are detected. Input/Output (I/O) verification may also be selectively enabled.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
With reference to
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35 (preferably Windows® 2000), one or more application programs 36, other program modules 37 and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in
When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Note that the present invention is described herein with respect to the Windows® 2000 (formerly Windows® NT®) operating system. However, as can be readily appreciated, the present invention is not limited to any particular operating system, but rather may be used with any operating system, and moreover, has many uses in general computing.
As will be understood, the present invention is primarily directed to the selective monitoring of drivers for detecting certain types of errors therein, thereby ultimately verifying (to a substantial probability) that a driver makes none of the errors for which tests are performed. Thus, for purposes of simplicity, the component (or components) being monitored for verification will ordinarily be described as a “driver” (or “drivers”). Nevertheless, it should be understood that the present invention is capable of monitoring of other components, including the kernel itself, and as described below, may be extended via APIs or the like to other components to provide verification functionality for their subcomponents that do not directly call the kernel. Moreover, as will be understood, the present invention is extensible in another way, in that as additional tests for drivers are developed, those tests may be added to the present architecture that enables the selective monitoring of drivers.
Turning to the drawings and referring first to
The type of verification may be specified in the REG_DWORD key “\\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet \Control\Session Manager\Memory Management\VerifyDriverLevel.” The bitfield values for this key (which can be freely combined) are shown as hexadecimal in TABLE 1, and represent the type of tests to perform, in accordance with the present invention and as described in detail below:
Attempt to satisfy all of this driver's
allocations from a special memory pool.
Apply memory pressure to this driver to validate
IRQL usage in regards to accessing pageable code
Randomly fail various pool allocation requests.
Note this is only done after the system has
booted to a point wherein these failures can be
treated as reasonable situations to be handled.
Enable pool allocation tracking. Every
allocation must be freed before the driver
unloads, or the system will bug check.
Enable the I/O verifier
The default value is three (3) if the key does not exist or no level of driver verification is specified. Lastly, verification may be executed via a command line.
As shown in
In accordance with one aspect of the present invention and as generally represented in
To detect overruns, the previous page and the next page in the page pool 80 are marked inaccessible. Note that this is accomplished in Windows® 2000 via virtual memory management, wherein each virtual address is associated with a page table entry which comprises a physical address to which the virtual address maps, along with bits that control page access. Thus, the surrounding pages are marked “No Access” by the driver verifier 70 via the bit settings. Virtual memory management is further described in the references, “Inside Windows NT®,” by Helen Custer, Microsoft Press (1993); and “Inside Windows NT®, Second Edition” by David A. Solomon, Microsoft Press (1998), hereby incorporated by reference herein.
Attempts to access memory beyond the allocation buffer (within a page) are immediately detected as an access violation, as such an access is within the subsequent, “No Access” memory page. Note that writing before the beginning of the buffer will (presumably) alter the random data, and when the buffer is freed, this alteration will be detected. In either case, a bug check (e.g., having a value of 0xC1 in one implementation) is issued, whereby the user can debug (or record for later debugging) the error. In keeping with the present invention, the immediate violation detection of overruns helps identify the errant driver.
Note that when underrun detection is selected for drivers, (via the global flag), the allocated memory is instead aligned with the beginning of the page. With this setting, underruns cause an immediate bug check, while overruns (may) cause a bug check when the memory is freed. In actual implementations, underrun errors tend to occur less often then overrun errors.
Further, note that each allocation from the special pool 80 uses one page of non-pageable memory and two pages of virtual address space, burdening system resources. If the special pool 80 is exhausted, memory is allocated in the standard way until the special pool 80 becomes available again, and thus depending on a system's resources, verifying multiple drivers in this manner at the same time may cause errors to be missed. A debugger extension can be used to monitor special pool use, and report whether the special pool covered all memory allocations or whether the special pool was exhausted. Note that the size of the pool may be automatically or manually configured based on the amount of physical memory on the system.
Another test of memory misuse that the driver verifier 70 performs is represented in
However, because memory space is finite, the system needs to reuse the special pool 80 at some time. To this end, as shown in
In accordance with another aspect of the present invention, other types of incorrect usage of pooled memory may also be detected for a specified driver. A first way in which an error may occur is by having a pending timer remain in deallocated pool. To detect this, the driver verifier 70 examines the memory pool deallocated by a driver being verified, and if timers are found, the driver verifier issues a bug check (e.g., 0xC7).
Memory leaks are also a misuse of pooled memory, and the driver verifier 70 may be selectively enabled to test for such errors. To this end, as represented in
As generally represented in
The verification block 82 1 also includes a pointer to the outstanding allocations table 84 1 set up for this driver 74 1. The outstanding allocations table 84 1 tracks specific information about each pool allocation that the driver has been given that remains outstanding, i.e., has not yet deallocated. The information includes the allocation's virtual address, length, and information useful in debugging such as per-process caller information and the tag of the driver that allocated the memory, e.g., “TCP” for TCP/IP drivers.
To track pool allocations, each new allocation by the driver 74 1 adds information to its allocation table 84 1, while each deallocation removes the information. Note that the information need not actually be erased from the table 84 1, as it is equivalent to effectively “remove” the information by marking it as deallocated. If the driver 74 1 unloads, the driver verifier 70 examines the outstanding allocations table 84 1 and issues a bug check if any memory allocated by the driver has not been deallocated. In this manner, memory leaks are detected and can be more easily debugged. The driver verifier 70 also attempts to page in all paged-out allocations by the driver 74 1 when the driver 74 1 is unloaded, making it easier to debug the driver 74 1.
Note that another driver may deallocate pooled memory for the driver 74 1. As understood from the above description, this deallocation by the other driver needs to be removed from the outstanding allocations table 84 1, otherwise the system will incorrectly generate a bug check when the driver 74 1 unloads. In order for the outstanding allocations table to properly reflect such a deallocation, at the time of allocation, any memory allocated to a driver with pool tracking on is tagged (via bits in the header). Anytime the kernel 64 deallocates memory with this tag, the driver verifier 70 is notified, whereby the allocation information is located and properly removed the from the appropriate outstanding allocations table.
In addition to detecting memory leaks on unload, the driver verifier 70 performs several other checks on unload. More particularly, the driver verifier looks for undeleted timers, pending DPCs, undeleted lookaside lists, undeleted worker threads, undeleted queues and other similar resources that remain. To this end, the driver verifier 70 examines resource lists 87 (
While a driver is loaded, other information may be evaluated by a user that may provide insight beyond that detected at unload time. For example, the user interface 60 is capable of displaying pool tracking data 88 for a given driver while the driver is loaded. In one implementation, the pool tracking data is displayable via a property page that shows the statistics gathered from the driver verifier. The counters shown on the page are related to the pool tracking flag of the verifier and are mostly per-driver counters, (e.g., current allocations, current allocated bytes, and so forth), with the specific driver being selectable via the user interface 60. In this manner, a tester may examine a loaded driver's usage of pooled memory, e.g., see to whether relatively many allocations are outstanding when only relatively few are expected.
Other property pages are displayable, such as a “Driver Status” property page that provides an image of the current status of the driver verifier 70. For example, a user can see a list of drivers of which the verifier 70 is aware. The status can be “Loaded” (the driver is loaded and verified right now), “Unloaded” (the driver is not loaded now but it has been loaded at least once since reboot), or “Never Loaded” (the driver was never loaded, which may suggest that the driver's image file is corrupted or that the user specified a driver name that is missing from the system). A user may also view the current types of verification that are in effect.
The Global Counters property page 86 (
A settings page may be used to create and modify the driver verifier 70 settings, which are saved in the registry 62 as described above. The user can use the list to view the currently installed drivers in the system. At present, each driver can be in four possible states, including “Verify Enabled”—the driver is currently verified—or “Verify Disabled”—the driver is currently not verified—. The other possible states are “Verify Enabled” (Reboot Needed)—the driver will be verified only after the next reboot—and “Verify Disabled” (Reboot Needed)—the driver is currently verified but will not be verified after the next reboot—. The user can select one or more drivers from the list and switch the status. The user can also specify additional drivers to be verified after next reboot, such as when the user wants to install a new driver that is not loaded already loaded. Lastly, a Modify Settings property page is provided for dynamically changing volatile driver verifier flags.
In accordance with another aspect of the present invention, the driver verifier 70 examines the function calls of each selected driver to monitor for certain actions that are forbidden. In one implementation, the driver verifier 70 performs this automatically for each driver being verified, although this checking alternatively may be made selectable. More particularly, as represented in
Another type of violation occurs when a driver attempts to raise or lower its interrupt request level (IRQL), i.e., by calling KeRaiseIrql or KeLowerIrql. For these types of calls, the driver verifier checks that a raise IRQL really is a raise (i.e., the current IRQL is less than the target IRQL) or that a lower IRQL really is a lower IRQL.
Other IRQL-related errors occur when paged and non-paged pool allocations and deallocations are made at the incorrect IRQL. Paged pool allocations and deallocations need to be made at the asynchronous procedure call level (APC_LEVEL) IRQL or below, while non-paged pool allocations and deallocations need to be made at the DISPATCH_LEVEL IRQL or below. Allocating or freeing paged pool at an IRQL above APC_LEVEL, or allocating or freeing non-paged pool at an IRQL above DISPATCH_LEVEL is detected as a violation.
Still other detected violations include acquiring or releasing a fast mutex at an IRQL above APC_LEVEL, acquiring or releasing a spin lock at an IRQL other than DISPATCH_LEVEL, and double release of a spinlock. Mutexes and spinlocks are described in the aforementioned references “Inside Windows NT®” and “Inside Windows NT®, Second Edition.” These violations similarly cause a bug check to be generated. Note that other types of violations may be added to those being monitored by the verifier 70.
In accordance with another aspect of the present invention, the driver verifier can simulate extreme conditions, and thereby proactively force errors in a driver (e.g., 74) that may be otherwise difficult to reproduce. A first way in which is accomplished is to randomly fail pool allocation requests (and other APIs). To this end, (after seven minutes or some other duration following system startup so as to accurately simulate a low-memory condition), any allocation request calls placed in a certain time window are failed, while others outside the window are not failed. For example, the driver verifier 70 may be configured to fail any call made within in a one-second interval (that restarts every fifteen seconds), providing a generally psuedo-random nature to the failures. Other intervals and periods may be used. The injection of such allocation faults tests the driver's ability to react properly to low-memory conditions. Note that allocation requests marked MUST_SUCCEED (a maximum of one page of MUST_SUCCEED pool is permitted) are not subject to this action.
Another way in which errors may be forced is to place extreme memory pressure on the driver by invalidating its pageable code. Although kernel-mode drivers are forbidden to access pageable memory at a high IRQL or while holding a spin lock, such an action might not be noticed if the page has not actually been trimmed (i.e., paged-out). To detect this, whenever the driver's IRQL is raised to DISPATCH_LEVEL or higher, or when a spin lock is requested, the driver verify marks the driver's pageable code and data (as well as system pageable pool, code, and data) as trimmed. Thus, any attempt by the driver to access this memory indicates an attempt to access paged memory at the wrong IRQL, or while holding a spin lock, whereby the driver verifier issues a bug check.
Note that drivers that are not selected for verification will not be directly affected by this memory pressure since their IRQL raises will not cause this action. However, when a driver that is being verified raises the IRQL, the driver verifier 70 trims pages which may be used by drivers that are not being verified. As a result, errors by drivers that are not being verified may occasionally be detected by this action.
Another aspect of the driver verifier 70 is that it is extensible and provides APIs so that other drivers can provide “mini-verifiers” for their subcomponents. For example, not all kernel drivers (e.g., display drivers, kernel-mode printer drivers, network mini-port drivers) are allowed to call the kernel directly for allocating pool. Because of this difference, the driver verifier treats graphics drivers somewhat differently than it treats other kernel-mode drivers.
By way of example, as generally shown in
To provide the above-described automated testing for the graphics drivers (e.g., 96), support for some of the driver verifier functions have been incorporated into the GDI 98, i.e., via API calls to the driver verifier 70, the GDI 98 may use the driver verifier facility to further verify video and print drivers. Note that the “ndis.sys” driver may do the same for network miniport drivers. Further, note that the driver verifier 70 may be set to verify the GDI driver 98 itself, although this has the effect of verifying all graphics drivers simultaneously, and thus to obtain more specific information about a graphics driver, the driver itself may be verified directly.
However, because graphics drivers are more restricted than other kernel-mode drivers, they require only a subset of the driver verifier functionality. For example, IRQL checking and I/O verification are not needed, and thus the above-described automatic checks which the driver verifier 70 usually performs (e.g., verification of IRQL and memory routines, checking freed memory pool for timers, and checking on driver unload) are not made when verifying a graphics driver. Similarly, the force IRQL checking option and I/O verifier option (described below) are not used for graphics drivers, and if selected, have no effect.
The other functionality provided by the driver verifier 70, namely using special pool, random failure of pool allocations, and pool tracking, are supported in the different graphics GDI callbacks. Table 2 lists the following GDI callback functions that are subject to the random failure test:
By way of summary, the general operation of the present invention is described below with respect to the flow diagrams of
Step 1100 of
If no errors are found (or testing is to continue despite an error), step 1208 is executed and represents the checking of the bitfield value for the driver level key to determine if random failures are enabled, along with a check as to whether the request corresponds to a request for pooled memory. If not, the process branches to the next test (as represented by
Step 1300 of
Step 1400 of
Once the various tests are established, the driver verifier 70 can detect driver errors and can issue appropriate bug checks. Table 3 below summarizes errors and bug check values issued therefor:
Lastly, it can be readily appreciated that the driver verifier 70 further provides a foundation for adding additional features in the future. In general, the driver verifier architecture is directed to making it easier for driver writers to validate their products, for systems to run much more reliably and to provide an easy definitive way to troubleshoot problems when they do occur. One such way in which the present invention may be extended is through an I/O verifier 100 that enables special IRP verification, as generally represented in
The I/O Verifier
The I/O verifier 100 tests for driver errors wherein I/O is accomplished by sending I/O Request Packets (IRPs) 102 to a stack of drivers 104 managing a particular piece of hardware. The proper handling of IRPs is required for a stable and functional operating system. In general, the I/O verifier 100 operates by activating hooks in the kernel 64 that allow it to monitor the IRP traffic throughout the system. The I/O verifier 100 also changes the manner in which I/O travels throughout the system, setting a series of traps to immediately catch errant behavior.
The I/O verifier 100 hooks functions that drivers call to manipulate IRPs. The functions are set forth in TABLE 4:
Sends or forwards an
IRP to a driver
Finishes an IRP
In order to monitor the driver stacks 104 that receive IRPs 102 and to catch other bugs, the I/O verifier 100 also hooks the functions set forth in TABLE 5 below:
Adds driver to stack that
(removes driver from stack that
(removes an instantiation of a
driver from memory)
(initializes a timer for a
given driver stack)
The manner in which the IO verifier “hooks” these functions depends on whether the kernel 64 makes internal use of the routine. Two methods are available.
In the re-vectoring method, a drivers' requests to get an address for a kernel function at load-time are monitored. If the function and driver are to be hooked, an alternate function is supplied by a re-vectoring component for the driver being verified. Re-vectoring monitors load-time fixups between different components. As such, one disadvantage of re-vectoring is that it does not catch a component's call to itself. However, for certain functions the kernel reliance is not an issue, namely the IoInitializeTimer, IoBuildSynchronousFsdRequest, IoBuildAsynchronousFsdRequest and IoBuildDeviceIoRequest functions. As such, the I/O Verifier 100 hooks these functions via re-vectoring.
The second hooking technique requires the kernel 64 to supply hooks in its own functions, because kernel reliance on the remaining functions is an issue, as the entire lifetime of an IRP needs to be monitored. The second technique is thus used on these other functions, i.e., each function has a special callout available to the I/O Verifier 100.
The lifetime of an IRP starts when the IRP is allocated. Next, the request is written into the IRP and the IRP is sent to a driver 102. That driver either forwards the request to another driver, handles the request entirely within itself, or modifies the request before sending it on to another driver. Note that the request is independent of the call stack, as a driver in the stack 104 may choose to “pend” an IRP, i.e., to tell the initiator of the request that the call will be completed later.
To track IRPs throughout the system, the I/O verifier 100 maintains a set of structures that mirror the various aspects of an IRP. When an IRP is allocated, a tracking structure (IOV_REQUEST_PACKET) is created. This structure tracks the memory that encapsulates the IRP. An IOV_REQUEST_PACKET is “active” whenever the corresponding IRP may be sent to a driver. The IOV_REQUEST_PACKET is “non-active” when the corresponding IRP has been freed, but the trackable aspects of the IRP have not abated. When no trace of the IRP remains in the system, the IOV_REQUEST_PACKET becomes “dead” and the underlying structure is freed.
When the new IRP is sent to a stack, the request therein is noticed. In response, the I/O Verifier 100 creates a structure (IOV_SESSION_DATA) to track it, and attaches it to the IOV_REQUEST_PACKET that corresponds to the IRP. The IOV_SESSION_DATA is “alive” when the request is being processed by drivers. When the request is completed, the IOV_SESSION_DATA is marked “non-active.” When the request is completed and all call stacks used to process the request have unwound, the request is marked “dead” and the IOV_SESSION_DATA tracking structure is freed.
The lifetimes of the IOV_REQUEST_PACKET and the IOV_SESSION_DATA are independent. If an IRP is created and immediately freed without ever being sent to a stack, an IOV_SESSION_DATA structure is never created. If the IRP is recycled upon completion of a request of a request, the IOV_REQUEST_PACKET may pick up a new “active” IOV_SESSION_DATA before the old “non-active” IOV_SESSION_DATA transitions to the “dead” state. Alternately, the IRP may be freed immediately upon completion of the request, in which case both the IOV_SESSION_DATA and the IOV_REQUEST_PACKET will be “non-active”.
At present, when I/O verifier is enabled for a driver, the I/O verifier 100 detects forty different failures within drivers. These failures may be divided into two levels, in which Level 1 is a subset of Level 2 as set forth in TABLE 6 below:
Verifier level 1 detects:
Drivers calling IoFreeIrp on invalid or freed IRPs.
Drivers calling IoFreeIrp on IRPs that are still
associated with a thread and thus will be freed when
Drivers calling IoCallDriver with invalid or freed
Drivers calling IoCallDriver with invalid or freed
Drivers having dispatch routines that return at
IRQLs other than that at which they were called.
Drivers that complete IRPs that were already
Drivers that forget to remove cancel routines
before completing IRPs.
Drivers that complete with −1 or STATUS_PENDING
(which is illegal).
Drivers that complete IRPs from within their ISRs.
Drivers that pass bogus fields to the IoBuild . . . Irp
Drivers that reinitialize timer fields.
Verifier level 2 detects the above items and also:
Drivers that delete their device objects without
first detaching them from the stack.
Drivers that detach device objects from a stack
when they were never attached to anything in the first
Drivers that forget to remove cancel routines
before forwarding IRPs.
Drivers that forward or complete IRPs not currently
owned by them.
Drivers that copy entire stack locations and
inadvertantly copy the completion routine.
Drivers that free IRPs currently in use.
Drivers that call IoInitializeIrp on IRPs allocated
Drivers that fail to properly initialize IRP
Drivers that forward IRPs directly to the bottom on
Drivers that respond to IRPs to which they should
Drivers that forward failed IRPs where
Drivers that reset IRPs statuses that they should
Drivers that do not handle required IRPs.
Drivers that fail to detach their device objects
from the stack at the appropriate time
Drivers that fail to delete their device objects at
the appropriate time
Drivers that don't fill out required dispatch
Drivers that don't properly handle WMI IRPs
Drivers that delete device objects at inappropriate
Drivers that detach their device objects at
Drivers that return statuses inconsistent with what
the completion routine above them saw.
Drivers that return bogus or uninitialized values
from their dispatch routines.
Drivers that return synchronously but forget to
complete an IRP
Drivers that set pageable completion routines
Drivers that forget to migrate the pending bit in
their completion routines
Drivers that forget to reference device objects
Drivers that complete IRPs without forwarding them
Drivers that incorrectly fill out certain PnP IRPs.
Drivers that create IRPs that are reserved for
system use only.
Drivers that call IoCallDriver at invalid IRQLs,
based on the major code.
Many of these checks (e.g., checks numbered 1-9, 11-14, 18, 27 and 40) involve spot checking various fields when a driver calls one of the monitored functions. These checks typically use the re-vectoring technique.
The remainder of the checks depend on complete knowledge of the IRP as it traveled throughout the system. For example, the check numbered 17 detects that an IRP has been freed when in use by checking to see if an IRP has an “active” IOV_SESSION_DATA structure associated with it.
In addition to monitoring I/O, The I/O verifier 100 actively changes the way in which I/O travels throughout the system to flush out errant behavior and make such behavior more-readily detectable.
First, the I/O verifier 100 allocates IRPs from a special pool. As described above, special pool memory may be set by the I/O verifier 100 to detect attempts to access freed memory, and to detect over-writes. Both of these mistakes are common in IRP usage. To this end, when I/O Verifier is enabled (via the bitfield for a driver), all IRPs obtained through IoAllocateIrp are allocated from a special pool and their use is tracked.
Second, when a driver finishes with a request, the memory backing the IRP is typically still valid. Drivers that erroneously touch the IRP after completion may or may not corrupt memory in an easily detectable manner. The use of the special pool alone does not always catch this bug, as the IRP is often recycled instead of freed upon completion. To catch this, the I/O verifier 100 creates a copy of the IRP, called a surrogate, each time the IRP is forwarded. Upon completion of the surrogate, the original IRP is updated and the surrogate is immediately freed. If the surrogate IRP is allocated via special pool, then the above-mentioned bugs are immediately detected. Surrogate IRPs have their own IOV_REQUEST_PACKET tracking data, and the structure refers back to the IOV_REQUEST_PACKET associated with the original IRP.
Third, the driver handling the request may choose to return before the operation is completed. This is called “pending” an IRP. Drivers often send IRPs down their stack and do some processing when the IRP comes back up the stack. However, many fail to wait if the IRP was not handled synchronously (i.e., the IRP is still pending). To catch these bugs the I/O verifier 100 makes the IRPs appear to be handled asynchronously. While doing so, the I/O verifier 100 ensures that the IRP is not accessible, whereby erroneous behavior is immediately detectable.
Fourth, the code at a higher IRQL must finish before code at a lower IRQL is scheduled. IRPs may be completed at any IRQL between zero (0) and two (2). A common bug in drivers is to access pageable memory during completion of an IRP. Such an operation is illegal if the IRP is completed at level two (2). To flush out this type of bug, the I/O verifier 100 can choose to complete all IRPs at any level between 0 and 2.
Fifth, when completing an IRP, a driver must return an identical status in two different places. Drivers often make the mistake of returning two different statuses, one from the driver stack beneath them and one of their own. Unfortunately, such bugs are hard to detect, as typically the stack beneath them will use a matching return code. To flush these bugs out, the I/O verifier 100 may change the returned status of a driver by continually adding one (1) to the code at each layer. This technique is called status rotation.
Sixth, drivers sometimes return uninitialized status codes. In such an event, the code returned is read from a location on the call stack, with a value that is essentially random. Before calling into a driver, the I/O verifier 100 may first pre-initialize future stack locations to a known illegal value. If that value is returned after calling the driver, then this bug is immediately detected.
As can be seen from the foregoing detailed description, there is provided a method and system for monitoring and verifying drivers. The method and system are flexible, efficient, extensible and help detect (and produce) numerous errors in various types of drivers, thereby significantly helping to increase the reliability of a system. Indeed, in actual implementations, computer systems having verified drivers have increased reliability on the order of thirty to forty percent. Moreover, because no re-compilation or changes of any kinds to the target drivers are required (i.e., the driver verifier can take action on unmodified driver binaries), the present invention provides an invaluable tool for system administrators and so forth, (as well as developers), as they can easily verify drivers on existing systems without having to install an entire checked (i.e., debug) build, or indeed, debug components of any type.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific form or forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4999838||Jul 20, 1988||Mar 12, 1991||Nec Corporation||Failure recovery informing arrangement with simplified hardware in a data processing system|
|US5043871||Mar 26, 1987||Aug 27, 1991||Hitachi, Ltd.||Method and apparatus for database update/recovery|
|US5111384||Feb 16, 1990||May 5, 1992||Bull Hn Information Systems Inc.||System for performing dump analysis|
|US5117350 *||Dec 15, 1988||May 26, 1992||Flashpoint Computer Corporation||Memory address mechanism in a distributed memory architecture|
|US5355469||Jul 30, 1990||Oct 11, 1994||Delphi Data, A Division Of Sparks Industries, Inc.||Method for detecting program errors|
|US5390324||Oct 2, 1992||Feb 14, 1995||Compaq Computer Corporation||Computer failure recovery and alert system|
|US5485573||Jul 16, 1993||Jan 16, 1996||Unisys Corporation||Method and apparatus for assisting in the determination of the source of errors in a multi-host data base management system|
|US5491808 *||Sep 30, 1992||Feb 13, 1996||Conner Peripherals, Inc.||Method for tracking memory allocation in network file server|
|US5495571 *||Sep 30, 1992||Feb 27, 1996||Microsoft Corporation||Method and system for performing parametric testing of a functional programming interface|
|US5590329||Feb 4, 1994||Dec 31, 1996||Lucent Technologies Inc.||Method and apparatus for detecting memory access errors|
|US5598577||Dec 8, 1995||Jan 28, 1997||Dell Usa, L.P.||Computer system with automatic drive model ID recognition and drive type adaptation|
|US5689707||Dec 4, 1995||Nov 18, 1997||Ncr Corporation||Method and apparatus for detecting memory leaks using expiration events and dependent pointers to indicate when a memory allocation should be de-allocated|
|US5790777||Dec 20, 1995||Aug 4, 1998||Mitsubishi Denki Kabushiki Kaisha||Computer system analysis device|
|US5819024||Jul 10, 1996||Oct 6, 1998||Hitachi, Ltd.||Fault analysis system|
|US5948112||Mar 18, 1997||Sep 7, 1999||Kabushiki Kaisha Toshiba||Method and apparatus for recovering from software faults|
|US5949972||Aug 23, 1996||Sep 7, 1999||Compuware Corporation||System for memory error checking in an executable|
|US5999933||Dec 14, 1995||Dec 7, 1999||Compaq Computer Corporation||Process and apparatus for collecting a data structure of a memory dump into a logical table|
|US6047124||Oct 31, 1997||Apr 4, 2000||Sun Microsystems, Inc.||System and method for tracing device drivers using a computer|
|US6070254||Oct 17, 1997||May 30, 2000||International Business Machines Corporation||Advanced method for checking the integrity of node-based file systems|
|US6101617||Feb 22, 1999||Aug 8, 2000||Compaq Computer Corporation||Computer failure recovery and alert system|
|US6163858||Jun 8, 1998||Dec 19, 2000||Oracle Corporation||Diagnostic methodology for debugging integrated software|
|US6170067||Oct 1, 1997||Jan 2, 2001||Micron Technology, Inc.||System for automatically reporting a system failure in a server|
|US6178528||Sep 18, 1997||Jan 23, 2001||Intel Corporation||Method and apparatus for reporting malfunctioning computer system|
|US6226761||Sep 24, 1998||May 1, 2001||International Business Machines Corporation||Post dump garbage collection|
|US6243833 *||Aug 26, 1998||Jun 5, 2001||International Business Machines Corporation||Apparatus and method for self generating error simulation test data from production code|
|US6279120||Jul 27, 1998||Aug 21, 2001||Siemens Aktiengesellschaft||Method for storing computer status data given a malfunction that requires a subsequent restarting of the computer|
|US6311327||Feb 12, 1999||Oct 30, 2001||Applied Microsystems Corp.||Method and apparatus for analyzing software in a language-independent manner|
|US6360233 *||Jun 17, 1999||Mar 19, 2002||U.S. Philips Corporation||Dynamic memory space allocation|
|US6363467||Sep 24, 1998||Mar 26, 2002||British Telecommunications Plc||Apparatus and method for allocating memory space for program use and management purposes|
|US6393560||May 10, 1999||May 21, 2002||Intel Corporation||Initializing and restarting operating systems|
|US6430665||Jun 25, 1999||Aug 6, 2002||Sun Microsystems, Inc.||System and method for heuristically allocating memory|
|US6430707||Mar 31, 1999||Aug 6, 2002||International Business Machines Corporation||Source-level debugging of client dump image in a computer network|
|US6457112||Aug 20, 2001||Sep 24, 2002||Curl Corporation||Memory block allocation system and method|
|US6543010||Feb 24, 1999||Apr 1, 2003||Hewlett-Packard Development Company, L.P.||Method and apparatus for accelerating a memory dump|
|US6618824||Nov 4, 1999||Sep 9, 2003||Rational Software Corporation||Method and apparatus for modifying relocatable object code files and monitoring programs|
|US6728907||Apr 14, 2000||Apr 27, 2004||Microsoft Corporation||System and method for self-diagnosing system crashes|
|US6948099||Jul 30, 1999||Sep 20, 2005||Intel Corporation||Re-loading operating systems|
|US6971048 *||Jun 15, 1998||Nov 29, 2005||Sun Microsystems, Inc.||Testing device driver hardening|
|US7127642||Mar 23, 2004||Oct 24, 2006||Microsoft Corporation||System and method for self-diagnosing system crashes|
|US20070168739||Oct 12, 2006||Jul 19, 2007||Microsoft Corporation||System and method for self-diagnosing system crashes|
|WO1995052104A||Title not available|
|1||Notice of Allowance dated May 1, 2008 cited in related U.S. Appl. No. 1/549,073.|
|2||Office Action dated Sep. 19, 2007 cited in related U.S. Appl. No. 11/549,073.|
|3||Using Driver Verifier to Expose Driver Errors, Mar. 17, 1999, www.microsoft.com/hwdev/driver/driververify.htm.|
|4||*||Windows 2000 Beta 3 "Using Driver Verifier to Expose Driver Errors" Mar. 17, 1999 Window 2000 Beta 3 RCI pp. 1-8.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US8799931 *||Mar 2, 2010||Aug 5, 2014||Silicon Motion Inc.||Apparatus for controlling at least one electronic device and related method|
|US20100275220 *||Mar 2, 2010||Oct 28, 2010||Li-Ling Chou||Apparatus for controlling at least one electronic device and related method|
|U.S. Classification||714/41, 714/48, 719/327, 714/25, 714/47.1|
|International Classification||G06F3/00, G06F11/00|
|Oct 12, 2006||AS||Assignment|
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, LANDY;ONEY, ADRIAN J.;REEL/FRAME:018381/0557;SIGNING DATES FROM 19991122 TO 19991123
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, LANDY;ONEY, ADRIAN J.;SIGNING DATES FROM 19991122 TO 19991123;REEL/FRAME:018381/0557
|Jul 19, 2011||CC||Certificate of correction|
|May 28, 2014||FPAY||Fee payment|
Year of fee payment: 4
|Dec 9, 2014||AS||Assignment|
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034542/0001
Effective date: 20141014
|Oct 14, 2016||AS||Assignment|
Owner name: ZHIGU HOLDINGS LIMITED, CAYMAN ISLANDS
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT TECHNOLOGY LICENSING, LLC;REEL/FRAME:040354/0001
Effective date: 20160516