|Publication number||US7779309 B2|
|Application number||US 11/936,533|
|Publication date||Aug 17, 2010|
|Filing date||Nov 7, 2007|
|Priority date||Nov 7, 2007|
|Also published as||US20090119545|
|Publication number||11936533, 936533, US 7779309 B2, US 7779309B2, US-B2-7779309, US7779309 B2, US7779309B2|
|Inventors||Bernard Pham, Eric B. Watson, Zhiyi Xie|
|Original Assignee||Workman Nydegger|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (11), Non-Patent Citations (5), Referenced by (1), Classifications (17), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
Recent trends in computer/network system design include the use of distributed application programs, and distributed processing environments. In such environments, an organization might implement an application program, or even a system of application programs, that include several different components that operate separately (or at least partly separate) from each other, often on different computer systems or servers entirely. For example, an organization might operate a single distributed email application program to handle the entire organization's needs, where the email application has several different components operating on one or multiple different servers.
In addition, there is often a great deal of dependency between components and different application programs. For example, one email application program component may be configured to operate appropriately only if a different application component used for network access is operating appropriately, which, in turn, might depend on the operability of one or more network firewall components. Thus, it is often the case that tasks performed by users in the organization, such as sending and receiving email, logging-in to the domain, etc., can depend on the individual operability of a long string of different applications and/or different application program components.
One can appreciate, therefore, that when an error occurs in any one of the different components needed for a particular task, it can be difficult to troubleshoot what exactly is happening, and/or how this may affect other applications, application components, or even end-user tasks. For example, conventional application programs may be configured to generate error reports that are sent to some application-specific error reception module. The generated error reports usually contain some application-specific technical data that names the filename of the failing component, and some sort of numerical error code associated with the failure event. In a large, enterprise-style organization, a system administrator specifically trained and hired to manage that particular application may then diagnose the error, and determine a solution for the error. Smaller, more medium-sized organizations, however, usually do not have such specially-trained application administrators that can do this level of work, particularly when considering several different application programs.
In particular, smaller, more moderately-sized organizations tend to have one or few system administrators for managing all of the system resources, and such administrators tend to have generalized skill sets. Even where the administrators are specially trained in a specific application program, the organization will often need such administrators to manage a range of different application programs and components with which the administrators may be unfamiliar except on a very basic level. As a result, when an application program in such smaller organizations generates an error, the administrators often have difficulty troubleshooting and fixing the error in a reasonably quick amount of time.
Such delays in fixing an error can be exacerbated by a number of different factors. For example, system administrators in smaller organizations already tend to be stretched thin as they continually ensure that users have network login access, internet connectivity, and other communication capabilities for telephone and email systems. Thus, when a system administrator in such an organization receives an undetermined error message, it may not be readily apparent that the error message relates to something of immediate concern (e.g., internet or email access), and the administrator may delay working on the error. In addition, there is no guarantee the administrator will actually receive the error in the first instance without in-depth searching, or, alternatively, ascertain the error when an end-user approaches the administrator with an inability to perform a routine task.
For example, a user may approach a system administrator with a problem about logging-in to the network, or otherwise accessing email. To identify how to fix the error for the user in a distributed application environment, the system administrator may need to check several different application program error logs or repositories. Although there are some application programs that centralize or standardize much of this error reporting information, the system administrator may still have difficulty identifying what each error message means, and to what given application program the error message relates. That is, even if the administrator is able to find an error report, there is no guarantee that the administrator will be able to deduce the relevant problems from the error report. Furthermore, each different system administrator may be inclined to interpret the error reports differently, which can result in inconsistent or error prone solutions the next time the error arises.
Thus, simply centralizing the error reporting is usually insufficient, particularly for generalized system administrators who may be untrained in each specific application program they manage. Accordingly, there are a number of difficulties with managing errors and functionality within small to medium-sized networks/organizations that can be addressed.
Implementations of the present invention provide systems, methods, and computer program products configured to efficiently report various computer system operability metrics in a human-readable, easy to understand way. For example, one implementation of a system is configured to collect status reports (e.g., error reports) from one or more application programs into a centralized location, and interpret the error reports in terms of generalized, end-user tasks. The system then associates the generalized end-user tasks, such as sending and receiving email, or logging-in to a network, etc. with a positive or negative (or undetermined) designation. The system further includes one or more user interfaces (e.g., a dashboard) that continually display updated system health information, which indicates which generalized end-user tasks may be working properly, or otherwise implicated by problems identified with application components.
For example, a method of automatically determining a positive or negative status of one or more generalized, end-user tasks can include identifying one or more end-user tasks to be performed in a computerized environment. In this case, each of the one or more end-user tasks involves execution of a corresponding set of one or more distributed application components. The method can also involve receiving one or more status reports from the plurality of distributed application programs. The one or more status reports provide information regarding execution of one or more distributed application program components.
In addition, the method can involve correlating the information of the one or more status reports with the one or more end-user tasks. Furthermore, the method can involve associating a positive, negative, or pending status with at least one of the one or more end-user tasks based on the correlated information. The associated status indicates whether the corresponding end-user task can be performed.
Similarly, a method of displaying one or more graphics that indicate whether generalized end-user tasks can be performed can include identifying one or more distributed application programs and a minimum set of one or more distributed application components corresponding to performance of any one or more generalized end-user tasks. The method can also include querying a centralized database comprising information corresponding to one or more status reports received from the one or more distributed application programs. In addition, the method can include determining from the results of the query a recent positive, negative, or pending status of at least one end-user task. Furthermore, the method can include displaying through a graphical user interface the positive, negative, or pending status indicator associated with the at least one end-user task. The positive, negative, or pending status indicator identifies whether users in the computerized environment can perform the end-user task.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Implementations of the present invention extend to systems, methods, and computer program products configured to efficiently report various computer system operability metrics in a human-readable, easy to understand way. For example, one implementation of a system is configured to collect status reports (e.g., error reports) from one or more application programs into a centralized location, and interpret the error reports in terms of generalized, end-user tasks. The system then associates the generalized end-user tasks, such as sending and receiving email, or logging-in to a network, etc. with a positive or negative (or undetermined) designation. The system further includes one or more user interfaces (e.g., a dashboard) that continually display updated system health information, which indicates which generalized end-user tasks may be working properly, or otherwise implicated by problems identified with application components.
Accordingly, and as described more fully herein, implementations of the present invention present a “scenario-based health view” of a computerized system or computerized environment. That is, implementations of the present invention provide an end-to-end approach to solving system errors, such as by aggregating relevant error events and tying them to operational business functions at a level of complexity appropriate for the technical staff of smaller organizations. The automated, all-up overview health status of the important business functions can be very helpful to such technical staff, and has high value in such traditionally resource-challenged environments.
In general, and as also discussed more fully herein, there are at least four key operational business functions that are referred to herein as “generalized end-user tasks” that may be running or being performed in the computerized environment at any given time. These functions or end-user tasks can include logging-in to the network/domain, sending and receiving emails, accessing the internet, and performing or executing remote management. Each such business function or end-user task can then be associated with certain distributed application components that are essential to the performing the function or task, such as described more fully herein.
The system is further configured with certain monitoring rules so that each application component necessary to perform a given end-user task generates critical errors when the component fails. In one implementation, the critical errors that are generated can be configured to include an explanation of failure, the time of failure, some recommended corrective action, the status of the alert, and/or the ability to resolve the alert. In addition, and as also described more fully herein, implementations of the present invention are further set up to evaluate events that have the appropriate level of complexity in order for a smaller company's technical staff to resolve the problem. In one implementation, this means that the system will primarily or only fire off critical alerts that have a clearly identified action for the smaller company technical staff to perform.
As a preliminary matter, frequent reference is made herein to the term “error,” such as with the terms “error report” or “error reporting service.” For the purposes of this specification and claims, however, the term “error” is interchangeable with the term “status,” since an error is understood as a special form of status. That is, a distributed application program 125 in accordance with the present invention can be configured to send one or more status reports 130 onward that include the negative connotation of “error” reports, but can also or alternatively include positively connoted information. For example, the application program 125 can send a status report to indicate that previously failed components are functioning properly.
In addition, just as there can be any number of reasons (positive, negative, or undetermined) for sending a status report, there can also be a number of ways for sending a status report. For example, in at least one implementation, each distributed application program 125 (a-e, etc.) is configured to send the status reports 130 (a-b, etc.) as an XML (extensible markup language) document that has been formatted with certain, specifically-defined fields. Of course, other markup languages and virtually any number of network communication protocols can be used in accordance with the present invention to communicate status information. To ensure consistency between the application programs, however, implementations of the present invention include installing one or more additional components with each application program 125 to ensure that status reports are properly formatted before being received by the event manager module 120.
As previously mentioned, a distributed application program 125 can include any number of different application program components that may or may not be installed on the same server, or even in the same server domain. Thus, when a given distributed application program 125 sends a status report 130 to event manager module 120, the status report 130 may be based on the execution status of one particular application component, or on the execution status of several different application components for that particular application program (e.g., 125 a). Furthermore, some application program 125 components for a given application program can be configured to send their own status report 130 to the error reporting service 105 separately. As such, discussion or illustration herein with respect to a single application program 125 sending a status report 130 is done primarily by way of convenience.
In any event, and as previously described,
Upon receipt of the given status reports 130,
As previously mentioned, the term “generalized end-user task” (e.g., 240,
For example, a generalized end-user task of logging-in to a network can involve components from at least DB application 125 b (e.g., an account management database service) and a security application 125 e. Similarly, a generalized end-user task of sending and receiving email messages can involve components from all of the illustrated applications. For example, to initiate an email client, one or more components from the DB application 125 b and licensing application 125 d may need to verify that the user account is valid, and that the user account qualifies for a license to use the email program at the user's computer. Similarly, components of applications 125 a, 125 c, and 125 e may also need to be executed to ensure that email messages are properly formatted, that the messages are sent and received across any network security boundaries, and the like.
Accordingly, one can appreciate that there can be a large number of different application components that are used for each generalized end-user task, and that some of these components might be more critical than others. For example, email application 125 a components for appropriately formatting an email message may not be as critical for ultimate end-use or functionality compared with security application 125 e components that ensure the messages are actually transmitted or received over a network boundary. Similarly, formatting components may not be as critical as licensing application components, which, if not properly authenticated, could result in failure of the email application to initialize in the first instance. Of course, whether a particular application component for any given application is termed critical or less-critical may be a decision made by a network administrator, or even by the developer of a given distributed application program 125.
In any event, the components or functions deemed critical for any given generalized end-user task can also be stored in a database. For example,
In any event, an in at least one implementation, event interpreter module 140 can use the critical application information from rules DB 115 to form the query of error DB 110. For example a generalized end-user task for logging-in to the network may comprise execution of as many as ten different application components, of which only four or five are actually required for basic operation. In at least one implementation, therefore, the event interpreter module 140 can identify whatever set of application components are of interest from rules DB 115, and then query error DB 110 (or otherwise be set up to receive updates) to determine the operating status of the identified application components.
The event interpreter module 140 can then coordinate the received, queried status information in error DB 110 with each generalized end-user task (e.g., as related by rules DB 115). For example, the event interpreter module 140 can identify an error from records 135 a and 135 b with one or more components in the DB application 125 b and licensing application 125 d. The event interpreter module 140 can then determine, such as based on information in rules DB 115, that these errors implicate the generalized end-user task of logging-in to the network. The event interpreter module 140 then prepares corresponding output for display. In particular, the event interpreter module 140 can prepare one or more reports with one or more critical error alerts, as well as an indication that these critical errors will impede user logins. The event interpreter module 140 then passes this information onto user interface module 145.
User interface module 145 then prepares, formats, and otherwise passes the output of event interpreter module 140 to a computerized graphical output display. For example, the user interface module 145 can send one or more instructions to display the output from event interpreter module 140 as a selectable, interactive “dashboard” (e.g., 200,
In one implementation, the dashboard-style user interface is configured to immediately tell (or otherwise identify to) the user (e.g., network administrator) how critical errors in the system may be affecting the performance of generalized end-user tasks. For example, and continuing from the scenario above, the dashboard can include a main user interface, with selectable alert categories, which, when expanded, describe which (if any) application components and end-user tasks are associated therewith. The displayed data can be configured to change on the user interface depending on the results of the various, ongoing status changes within error DB 110.
For example, as shown in
In the illustrated implementation,
Along these lines,
Similarly along these lines,
In one implementation, the user can further select each of these different alerts in one form or another to identify a suggested course of action. For example, selection of alert 225 a or 225 b might result in the display of still another user interface, which details the application programs and/or file names of corresponding application program components that are causing the error. Furthermore, this interface can indicate what time the error occurred, and might further suggest possible solutions for the errors, such as rebooting the components, changing address or name information in the files, or the like. The network administrator can then use this information to quickly perform any fixes that are necessary on the identified files.
In at least one implementation, and upon fixing the errors, the relevant application programs 125 will eventually send a status report 130, or some other appropriate signal, to event manager module 120, which indicates a change in status (i.e., from “not executing” to “executing”). Event interpreter module 140 can then identify the status change, such as when performing a routine query of error DB 110 (or upon receiving an updated signal from error DB 110). Event interpreter module 140 can then send the change to user interface module 145, which in at least one implementation results in removal of the alert in the dashboard 200 to reflect fixing the alert.
As such, one will appreciate that there are a number of ways such changes in status can be reflected in dashboard 200. For example, dashboard 200 can be configured to display only those alerts and/or categories for which there is relevant information to report. In particular, if there are no critical alerts and/or no pending alerts that require attention, the dashboard 200 of
In addition to the foregoing,
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US6675315||May 5, 2000||Jan 6, 2004||Oracle International Corp.||Diagnosing crashes in distributed computing systems|
|US6687847||Apr 19, 2000||Feb 3, 2004||Cornell Research Foundation, Inc.||Failure detector with consensus protocol|
|US6721907||Jun 12, 2002||Apr 13, 2004||Zambeel, Inc.||System and method for monitoring the state and operability of components in distributed computing systems|
|US6789114||Aug 5, 1998||Sep 7, 2004||Lucent Technologies Inc.||Methods and apparatus for managing middleware service in a distributed system|
|US6950867||Jul 31, 2000||Sep 27, 2005||Intertrust Technologies Corp.||System and method for managing transaction record delivery using an acknowledgement-monitoring process and a failure-recovery process with modifying the predefined fault condition|
|US6959265||Oct 7, 2003||Oct 25, 2005||Serden Technologies, Inc.||User-centric measurement of quality of service in a computer network|
|US7171672||Apr 24, 2002||Jan 30, 2007||Telefonaktie Bolaget Lm Ericsson (Publ)||Distributed application proxy generator|
|US20020073364 *||Oct 2, 2001||Jun 13, 2002||Tomoaki Katagiri||Fault notification method and related provider facility|
|US20040153703||Apr 22, 2003||Aug 5, 2004||Secure Resolutions, Inc.||Fault tolerant distributed computing applications|
|US20050114501||Nov 25, 2003||May 26, 2005||Raden Gary P.||Systems and methods for state management of networked systems|
|US20050216793||Mar 29, 2005||Sep 29, 2005||Gadi Entin||Method and apparatus for detecting abnormal behavior of enterprise software applications|
|1||"A Fault Detection Service for Wide Area Distributed Computations," by Paul Stelling and Craig Lee of The Aerospace Corporation, El Segundo, California; Ian Foster and Gregor Von Laszewski of Mathematics and Computer Science, Argonne National Laboratory, Argonne, Illinois; and Carl Kesselman of Information Sciences Institute, University of Southern California, Marina Del Rey, California, [online] [retrieved on Oct. 5, 2007], 11 pgs. Retrieved from the Internet: ftp://ftp.globus.org/pub/globus/papers/hbm.pdf.|
|2||"An Architecture-Based Approach to Self-Adaptive Software," by Peyman Oreizy, Michael M. Gorlick, Richard N. Taylor, Dennis Heimbigner, Gregory Johnson, Nenad Medvidovic, Alex Quillici, David S. Rosenblum and Alexander L. Wolf, May-Jun. 1999 IEEE Intelligent Systems, [online] [retrieved on Oct. 5, 2007], p. 54 through 62. Retrieved from the Internet: http://sunset.usc.edu/~neno/publications/ieee-is99.pdf.|
|3||"Standards Development for Condition-Based Maintenance Systems," by Michael Thurston and Mitchell Lebold, Date Unknown, Applied Research Laboratory, Penn State University, State College, Pennsylvania, [online] [retrieved on Oct. 5, 2007], 11 pgs. Retrieved from the Internet: http://www.osacbm.org/Documents/ConfPapers/MFPT2001-OSACBM-FinalPaper.pdf.|
|4||"An Architecture-Based Approach to Self-Adaptive Software," by Peyman Oreizy, Michael M. Gorlick, Richard N. Taylor, Dennis Heimbigner, Gregory Johnson, Nenad Medvidovic, Alex Quillici, David S. Rosenblum and Alexander L. Wolf, May-Jun. 1999 IEEE Intelligent Systems, [online] [retrieved on Oct. 5, 2007], p. 54 through 62. Retrieved from the Internet: http://sunset.usc.edu/˜neno/publications/ieee-is99.pdf.|
|5||"Standards Development for Condition-Based Maintenance Systems," by Michael Thurston and Mitchell Lebold, Date Unknown, Applied Research Laboratory, Penn State University, State College, Pennsylvania, [online] [retrieved on Oct. 5, 2007], 11 pgs. Retrieved from the Internet: http://www.osacbm.org/Documents/ConfPapers/MFPT2001—OSACBM—FinalPaper.pdf.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US20110145662 *||Dec 16, 2009||Jun 16, 2011||Microsoft Corporation||Coordination of error reporting among multiple managed runtimes in the same process|
|Cooperative Classification||G06F11/0715, G06F11/0769, H04L43/0817, G06Q10/10, H04L41/5074, G06F11/0709, G06F11/0784, H04L41/22|
|European Classification||G06F11/07P4A, G06F11/07P1A, G06F11/07P4F, G06Q10/10, H04L41/50J4, H04L41/22, H04L43/08D|
|Nov 7, 2007||AS||Assignment|
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PHAM, BERNARD;WATSON, ERIC B.;XIE, ZHIYI;REEL/FRAME:020081/0243;SIGNING DATES FROM 20071106 TO 20071107
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PHAM, BERNARD;WATSON, ERIC B.;XIE, ZHIYI;SIGNING DATES FROM 20071106 TO 20071107;REEL/FRAME:020081/0243
|Jan 28, 2014||FPAY||Fee payment|
Year of fee payment: 4
|Dec 9, 2014||AS||Assignment|
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034542/0001
Effective date: 20141014