|Publication number||US8103916 B2|
|Application number||US 12/325,522|
|Publication date||Jan 24, 2012|
|Filing date||Dec 1, 2008|
|Priority date||Dec 1, 2008|
|Also published as||EP2192488A1, US20100138699|
|Publication number||12325522, 325522, US 8103916 B2, US 8103916B2, US-B2-8103916, US8103916 B2, US8103916B2|
|Original Assignee||Sap Ag|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (18), Non-Patent Citations (3), Referenced by (1), Classifications (8), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present disclosure relates generally to anomaly detection. In an example embodiment, the disclosure relates to scheduling of checks in computing systems.
A variety of checks may be run on a computing system to detect various glitches. For example, checks may be run to detect program crashes. In a hosted system environment with a large number of computers, such checks are typically automated where they are executed on a regular basis. Once a check detects a glitch, an incident report can be generated and proper actions may be taken to correct the glitch.
Unfortunately, the culminated effect of running all the checks degrades system performance because running the checks consumes system resources. In an extreme example, a computing system can be occupied 100% running the checks and thereby not have any available processing capacity to handle other applications. On the other hand, no glitches are detected if checks are not scheduled to run at all, which results in the degradation of system integrity.
The present disclosure is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
The description that follows includes illustrative systems, methods, techniques, instruction sequences, and computing machine program products that embody the present invention. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be evident, however, to those skilled in the art that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures and techniques have not been shown in detail.
The embodiments described herein provide various techniques for scheduling checks in computing systems. In an example, the scheduling may be based on minimizing costs associated with and without executing the checks. As explained in more detail below, the minimization of costs results in an optimal frequency that can be derived from an average time between detected anomalies and runtime of the checks. The checks may then be scheduled for execution based on this calculated, optimal frequency.
Each check 102 or 104 is scheduled for execution at various times 10-13 for fixed periods of time 160 and 161. As used herein, a “runtime” of the check refers to a duration of the execution of a check, such as periods of time 160 and 161. As depicted in
The scheduling of the first checked 102 and the second check 104 relative to each other can be based on minimizing costs associated with and without executing the first check 102 and the second check 104. The cost of executing one or more checks refers to a price paid to execute the checks, which may be defined as a monetary cost, energy consumed, processor cycles, and other costs. On the other hand, the cost of not executing one or more checks refers to a price paid of not having to execute the checks, which effectively is the cost of not detecting the anomalies. As a result, the “cost of not executing a check” and the “cost of not detecting anomalies,” as used herein, may be used interchangeably. For example, the cost of not detecting anomalies may include monetary cost spent to track, analyze, and fix the anomalies. In another example, the cost of not detecting the anomalies may include business lost as a result of the anomalies. It should be appreciated that in many examples, the cost of not detecting the anomalies cannot be automatically identified or detected by a computing system, but may instead be identified and provided by a user. As will be explained in more detail below, the first check 102 and the second check 104 may be scheduled relative to each other based on an optimal frequency that minimizes the both costs associated with and without executing the first and second checks 102 and 104.
As depicted in
The anomaly detection module 204 includes a variety of anomaly scanners or detectors that are configured to detect different anomalies. In the example of
Additionally, the anomaly detection module 204 includes an average time between anomalies identification module 210, which, as explained in more detail below, is configured to identify an average time between anomalies of each scanner or detector 206, 207, 208, or 209. Furthermore, the anomaly detection module 204 includes a scheduler module 212 that is configured to schedule execution of checks based on a calculated optimal frequency, which is based on the average time between anomalies. As an example, the scheduler module 212 can calculate an optimal frequency to schedule execution of the virus scanner 206 such that it checks for likely or popular viruses more often if short on runtime. With a longer runtime available, the scheduler module 212 may instead schedule the execution of the virus scanner 206 with less frequency.
It should be appreciated that in other embodiments, the processing system 200 may include fewer, more, or different modules apart from those shown in
A frequency of the check can thereafter be calculated based on the average time between anomalies and the runtime of the check at 306. In general, this frequency is proportional to the average time and the runtime, which may be expressed as:
where the frequency f (e.g., in Hertz) is a square root of the average time between anomalies M (e.g., in seconds) divided by the runtime T (e.g., in CPU seconds). As explained in more detail below, the frequency may further be based on a cost of not executing the check. The check may then be scheduled for execution based on the calculated frequency at 308.
The cost of not executing the check 404 may exponentially increase with the passing of time because, for example, the number of anomalies rises linearly with time between two checks, and the average time that the anomalies will persist also increases linearly. As depicted in
On the other hand, as depicted in
An optimum frequency 453 that minimizes both costs 402 and 404 can be derived from the plots of the cost of executing the check 402 and the cost of not executing the check 404. In particular, this frequency 453 is derived from an average cost that is based a sum of the cost of executing the check 402 and the cost of not executing the check 404, which can be expressed as:
where the average cost A is approximately equal to the frequency of executing the check f multiplied by the runtime T added to a cost of not executing the check e multiplied by the average time between anomalies M divided by the frequency f. In actuality, the f*T, as expressed in Equation 2.0, is actually the cost of executing the check 402 while the e*M/2f is the cost of not executing the check 404.
In order to find the optimum frequency 453 that minimizes costs 402 and 404, a Lagrange multiplier may be applied to Equation 2.0 above to yield a frequency that is expressed as:
where, similar to Equation 1.0 above, the frequency f is proportional to a square root of the cost of not executing the check e multiplied by the average time between anomalies M divided by the runtime T. As an example, if an anomaly occurs at a rate of 20 anomalies per day, then the execution of the check twice a day will catch 10 errors on average. Thus, if the check is executed four times a day, five anomalies will be detected per execution of the check, which means that if the runtime of the check is doubled, the expected anomalies detected are halved in the computing system. That is, Equation 2.0 and 3.0 essentially convey that a check should be executed often if this check detects many anomalies and is cheap to execute.
In an embodiment, the average number of anomalies detected may then be calculated at 506 based on the number of anomalies detected and the fixed runtime, which may be expressed as:
where the average number of anomalies AN is the number of anomalies detected N divided by the fixed runtime R. Of course, the average time between anomalies, which is calculated at 508, is the inverse of Equation 4.0. For example, an execution of a check may detect four anomalies within a fixed runtime of an hour. The average time between anomalies is therefore 60 minutes/four anomalies, which equals 15 minutes per anomaly.
In general, the frequency of a single check can be calculated at 606 based on a sum of a proportion of the average times and runtimes of the different checks, which may be expressed as:
where the frequency f for each check j is proportional to the cost correction factor c multiplied by the square root of the average time between anomalies M divided by the runtime T. Similar to Equation 2.0 above, the frequency is derived from a sum of average costs associated with the checks, which are based on costs associated with and without executing the checks, and the application of the Lagrange multiplier. The cost correction factor c depends on the costs of not executing the checks, which may be expressed as:
where the λ is expressed as:
The correction factor c as expressed in Equation 6.0 may be defined manually, but could also be automatically decreased once the computing system becomes more stable depending on the criticalities of the abnormalities.
It should be appreciated that if the cost of not executing a check e is not known for every check, then such cost may be set to an equal value for all checks. In such an example, the frequency becomes a relative frequency. That is, the frequency defined in Equation 5.0 is a frequency of a single check relative to other frequencies of other checks. For example, the relative frequency can specify how a check may be executed twice as often as another check. It should be noted that the cost of not executing the check e for certain types of checks, such as severe or critical abnormalities, may also be allowed to automatically increase as the computing system becomes more stable.
The average time between anomalies M for each check j may be initially set equally for all checks but, in an alternative embodiment, may then be adjusted accordingly to observable errors. For example, as depicted in
Embodiments may also, for example, be deployed by Software-as-a-Service (SaaS), Application Service Provider (ASP), or utility computing providers, in addition to being sold or licensed via traditional channels. The machine is capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example processing system 700 includes processor 702 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), main memory 704 and static memory 706, which communicate with each other via bus 708. The processing system 700 may further include video display unit 710 (e.g., a plasma display, a liquid crystal display (LCD) or a cathode ray tube (CRT)). The processing system 700 also includes alphanumeric input device 712 (e.g., a keyboard), user interface (UI) navigation device 714 (e.g., a mouse), disk drive unit 716, signal generation device 718 (e.g., a speaker), and network interface device 720.
The disk drive unit 716 includes machine-readable medium 722 on which is stored one or more sets of instructions and data structures (e.g., software 724) embodying or utilized by any one or more of the methodologies or functions described herein. The software 724 may also reside, completely or at least partially, within main memory 704 and/or within processor 702 during execution thereof by processing system 700, main memory 704 and processor 702 also constituting machine-readable, tangible media.
The software 724 may further be transmitted or received over network 726 via network interface device 720 utilizing any one of a number of well-known transfer protocols (e.g., HTTP).
While machine-readable medium 722 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals.
While the invention(s) is (are) described with reference to various implementations and exploitations, it will be understood that these embodiments are illustrative and that the scope of the invention(s) is not limited to them. In general, techniques check scheduling may be implemented with facilities consistent with any hardware system or hardware systems defined herein. Many variations, modifications, additions, and improvements are possible.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in the exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the invention(s).
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4998208 *||Nov 14, 1988||Mar 5, 1991||The Standard Oil Company||Piping corrosion monitoring system calculating risk-level safety factor producing an inspection schedule|
|US5471629 *||Jul 20, 1992||Nov 28, 1995||Hewlett-Packard Company||Method of monitoring changes in an object-oriented database with tuned monitors|
|US5500941 *||Jul 6, 1994||Mar 19, 1996||Ericsson, S.A.||Optimum functional test method to determine the quality of a software system embedded in a large electronic system|
|US5586252 *||May 24, 1994||Dec 17, 1996||International Business Machines Corporation||System for failure mode and effects analysis|
|US6081771 *||Sep 8, 1998||Jun 27, 2000||Nec Corporation||Apparatus checking method and apparatus to which the same is applied|
|US6370656 *||Nov 19, 1998||Apr 9, 2002||Compaq Information Technologies, Group L. P.||Computer system with adaptive heartbeat|
|US6493836 *||Nov 30, 2000||Dec 10, 2002||Compaq Information Technologies Group, L.P.||Method and apparatus for scheduling and using memory calibrations to reduce memory errors in high speed memory devices|
|US6782496 *||Apr 13, 2001||Aug 24, 2004||Hewlett-Packard Development Company, L.P.||Adaptive heartbeats|
|US7472388 *||Mar 18, 2004||Dec 30, 2008||Hitachi, Ltd.||Job monitoring system for browsing a monitored status overlaps with an item of a pre-set browsing end date and time|
|US7596731 *||Jan 19, 2007||Sep 29, 2009||Marvell International Ltd.||Test time reduction algorithm|
|US20020198983 *||Jun 26, 2001||Dec 26, 2002||International Business Machines Corporation||Method and apparatus for dynamic configurable logging of activities in a distributed computing system|
|US20040181712 *||Dec 19, 2003||Sep 16, 2004||Shinya Taniguchi||Failure prediction system, failure prediction program, failure prediction method, device printer and device management server|
|US20050081114 *||Sep 26, 2003||Apr 14, 2005||Ackaret Jerry Don||Implementing memory failure analysis in a data processing system|
|US20050166089||Dec 22, 2004||Jul 28, 2005||Masanao Ito||Method for processing a diagnosis of a processor, information processing system and a diagnostic processing program|
|US20050246590 *||Apr 15, 2004||Nov 3, 2005||Lancaster Peter C||Efficient real-time analysis method of error logs for autonomous systems|
|US20060168473 *||Jan 25, 2005||Jul 27, 2006||International Business Machines Corporation||Method and system for deciding when to checkpoint an application based on risk analysis|
|US20070043536||Aug 16, 2006||Feb 22, 2007||First Data Corporation||Maintenance request systems and methods|
|US20080275985 *||Jul 16, 2008||Nov 6, 2008||International Business Machines Corporation||Systems, Methods and Computer Programs for Monitoring Distributed Resources in a Data Processing Environment|
|1||"European Application Serial No. 09013911.4, Extended European Search Report mailed Apr. 14, 2010", 8 Pgs.|
|2||Okamura, Hiroyuki, et al., "Availability optimization in operational software system with aperiodic time-based software system rejuvenation scheme", Software Reliability Engineering Workshops, (Nov. 11, 2008), 1-6.|
|3||Zeng, Fancong, et al., "A Reinforcement-Learning Approach to Failure-Detection Scheduling", QSIC '07. IEEE, (Oct. 11, 2007), 161-170.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US9348710 *||Jul 29, 2014||May 24, 2016||Saudi Arabian Oil Company||Proactive failure recovery model for distributed computing using a checkpoint frequency determined by a MTBF threshold|
|U.S. Classification||714/48, 714/55, 702/123|
|International Classification||G06F11/00, G06F11/34|
|Cooperative Classification||G06F11/0706, G06F11/0751|
|Mar 20, 2009||AS||Assignment|
Owner name: SAP AG,GERMANY
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KLEIN, UDO;REEL/FRAME:022432/0365
Effective date: 20081201
Owner name: SAP AG, GERMANY
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KLEIN, UDO;REEL/FRAME:022432/0365
Effective date: 20081201
|Aug 26, 2014||AS||Assignment|
Owner name: SAP SE, GERMANY
Free format text: CHANGE OF NAME;ASSIGNOR:SAP AG;REEL/FRAME:033625/0334
Effective date: 20140707
|Jun 26, 2015||FPAY||Fee payment|
Year of fee payment: 4