Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberUS20070174452 A1
Publication typeApplication
Application numberUS 11/709,096
Publication dateJul 26, 2007
Filing dateFeb 21, 2007
Priority dateAug 27, 1998
Also published asUS6513060, US20100070599
Publication number11709096, 709096, US 2007/0174452 A1, US 2007/174452 A1, US 20070174452 A1, US 20070174452A1, US 2007174452 A1, US 2007174452A1, US-A1-20070174452, US-A1-2007174452, US2007/0174452A1, US2007/174452A1, US20070174452 A1, US20070174452A1, US2007174452 A1, US2007174452A1
InventorsMiles Nixon, Alan Moyer, Christopher Moyer
Original AssigneeInternetseer.Com Corp
Export CitationBiBTeX, EndNote, RefMan
External Links: USPTO, USPTO Assignment, Espacenet
System and method for monitoring informational resources
US 20070174452 A1
Abstract
A system and method are provided to monitor informational resources, such as websites. A plurality of host units support one or more informational resources accessible over a network, such as the Internet. A plurality of monitoring units located on a distributed computer system are coordinated to access the network and monitor the informational resources to determine if they are accessible and to evaluate their performance. Preferably, a central control unit manages the monitoring units.
Images(5)
Previous page
Next page
Claims(2)
1. A method for use in conjunction with a distributed computer system, comprising the steps of:
(a) accessing a network by a plurality of host units, the network comprising a plurality of interconnected computers capable of sending and receiving data to and from one another;
(b) supporting by each host unit one or more informational resources accessible through the network;
(c) coordinating a plurality of monitoring units each located on a different computer in the distributed computer system;
(d) accessing the network by at least one monitoring unit; and
(e) monitoring the performance of at least one informational resource by the at least one monitoring unit.
2-28. (canceled)
Description
    TECHNICAL FIELD
  • [0001]
    The present invention relates generally to computers and software, and more specifically to a method and apparatus for monitoring informational resources, such as websites on the Internet or intranets.
  • BACKGROUND
  • [0002]
    The virtual explosion of technical advances in microelectronics, digital computers and software have changed the face of modern society. In fact, these technological advances have become so important and pervasive that this explosion is sometimes referred to as “the information revolution.” Through telephone lines, cables, satellite communications and the like, information and resources are ever increasingly being accessed and shared.
  • [0003]
    Informational resources, which are typically interactive in nature, are a commonly used vehicle to share information and resources. Informational resources can take a variety of forms, including but not limited to HTML (hypertext mark-up language), XML (extended mark-up language), Java or ActiveX applets, still or moving graphics, audio, ASCII text, and the like. For instance, informational resources are often provided on the Internet as websites, on an intranet as a page or document, on an e-mail system as a mail request, and the like. Whatever the particular form of the informational resource, a computer or group of computers are programmed to support the informational resources.
  • SUMMARY OF THE INVENTION
  • [0004]
    An object of the invention is to provide a system and method for monitoring informational resources. Additional objectives, advantages and novel features of the invention will be set forth in the description that follows and, in part, will become apparent to those skilled in the art upon examining or practicing the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
  • [0005]
    One aspect of the invention is a method for use in conjunction with a distributed computer system. A plurality of host units access a network comprising a plurality of interconnected computers capable of sending and receiving data to and from one another. Each host unit supports one or more informational resources accessible through the networks. A plurality of monitoring units, each located on a different computer in the distributed directory, are coordinated and access the network. The performance (e.g. accessability) of at least one informational resource is monitored by at least one monitoring unit.
  • [0006]
    Another aspect of the present invention is a computer system for use in conjunction with the Internet. A plurality of host computers each have access to the Internet and support a website on the Internet. A plurality of monitoring computers each have access to the Internet. Each of the monitoring computers is operative to transmit messages to and receive messages from one or more of host computers through the Internet and to monitor the accessability and performance of the corresponding host computers and supported websites. A managing computer has access to the monitoring computers. The managing computer is operative to transmit messages to and receive messages from the monitoring computers and to manage the monitoring computers.
  • [0007]
    Yet another aspect of the present invention is a method for monitoring an informational resource being supported by a host computer. The method comprises the steps of:
      • a) determining whether the host computer is pingable;
      • b) if the host computer is pingable, performing a ping operation comprising the steps of:
        • (i) sending a ping to the host computer;
        • (ii) determining whether the host computer responds to the ping;
        • (iii) if the host computer does not respond to the ping, sending a message;
      • c) attempting to access the informational resource; and
      • d) if the informational resource is not accessible, sending a message.
  • [0015]
    Still other aspects of the present invention will become apparent to those skilled in the art from the following description of a preferred embodiment, which is by way of illustration, one of the best modes contemplated for carrying out the invention. As will be realized, the invention is capable of other different and obvious aspects, all without departing from the invention. Accordingly, the drawings and descriptions are illustrative in nature and not restrictive.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • [0016]
    The accompanying drawings, incorporated in and forming part of the specification, illustrate several aspects of the present invention and, together with their descriptions, serve to explain the principles of the invention. In the drawings:
  • [0017]
    FIG. 1 illustrates a computer system for monitoring informational resources;
  • [0018]
    FIG. 2 illustrates a flowchart of a method for monitoring informational resources;
  • [0019]
    FIG. 3 illustrates the interrelationship between various software components for monitoring informational resources;
  • [0020]
    FIG. 4 illustrates the components of a control unit; and
  • [0021]
    FIG. 5 illustrates the components of a monitor unit.
  • [0022]
    Reference will now be made to the present preferred embodiment of the invention, an example of which is illustrated in the accompanying drawings, wherein like numerals indicate the same element throughout the views.
  • DETAILED DESCRIPTION
  • [0023]
    Some aspects of the invention will be illustrated in the context of the Internet and websites. However, one with ordinary skill in the art will readily recognize that the invention has utility in the context of any network, including but not limited to intranets, and in connection with any informational resource. The Internet began in the late 1960s as an experimental project to link together Defense Department computers has blossomed into a globally interconnected virtual community often referred to as Cyberspace. The Internet comprises more than 30,000 interconnected computer networks located in over 70 countries. The World Wide Web (or Web) was created by researchers in Switzerland and comprises a set of interlinked information resources, typically in the form of HTML-based files. These files, often called websites, web pages, or web documents are located throughout the world on and supported by computers (or servers) that are connected to the Internet.
  • [0024]
    One aspect of the present invention is illustrated in FIG. 1. The domestic network 10 is connected to the Internet 20. The foreign networks 30 each have a host web server 32 connected to the Internet 20 over the connection 34. The domestic network 10 provides a monitoring system that will determine whether host web servers 32 have stopped functioning and do not respond to requests to provide their website data. Two elements of the monitoring system include a control console computer 14 and the monitoring computers 15. The control console 14 transmits messages to and receives messages from the monitoring computers 15 to manage and coordinate the monitoring computers 15. The monitoring computers 15 transmit messages to and receive messages from one or more host web servers 32 over the Internet 20. The monitoring computers 15 also monitor and determine the performance of the corresponding host web server 32. The monitoring system is scalable, in that the entire system may run in its entirety on a single computer, or various components may be distributed on a number of computers. For instance, monitoring capacity can be increased by installing additional monitoring computers 15. The additional monitoring computers 15 automatically synchronize with a control console 14 and will be instructed which host web servers 32 to monitor. Also depicted in the domestic network 10 is an optional web server 16 which maintains and supports a domestic website. Information in the network 10 can be accessed remotely over the Internet 20 via the web server 16.
  • [0025]
    Various data tables containing monitoring information and report files resulting from the monitoring are stored on a file server 12 on an attached computer readable medium 13, shown here as a hard disk. A computer readable medium generally refers to anything which holds information readable by a computer, such as programs, data, files, etc. As one with ordinary skill in the art will readily appreciate, computer readable media can take a variety of forms, including magnetic storage (such as hard disk drives, floppy diskettes, etc.), optical storage (such as laser discs, compact discs, etc.), electronic storage (such as random access memory “RAM”, read only memory “ROM”, programmable read only memory “PROM”, flash memory, etc.), and the like. Certain types of computer readable media, which are sometimes described as being nonvolatile, can retain data in the absence of power so that the information is available when power is restored.
  • [0026]
    The control console 14 builds a table and schedule of host web servers 32 to be monitored from a web host database stored on the file server 12. The web host database is used by the control console 14 to determine which host web servers 32 are to be tested and when. When the predetermined time to check a host web server 32 is reached, the control console 14 contacts a monitoring computer 15 and instructs it to check the host web server. In its instructions to the monitoring computer 15, the control console 14 includes information such as “pingable”, timing information, when it expects to hear back from the monitoring computer 15, and the like. In its schedule table, the control console 14 marks the host web server 32 as being in an active monitoring state.
  • [0027]
    The monitoring system will proactively monitor host web servers 32 and send alarm messages when predetermined conditions exist, preferably immediately after the condition has been detected. One example of such a condition is if a host web server 32 is down (does not respond to pings, if it is a pingable system). Another example is if a host web server software will not return its website data. Still another example is if predetermined response time performance parameters are exceeded. Beyond the above examples, other predetermined conditions could also trigger an alarm. Alarm messages are preferably directed to the host web server 32 owner and may take a variety of forms, including a pager alarm, e-mail, fax, voice phone, and the like.
  • [0028]
    FIG. 2 depicts one example of a method 40 for monitoring the performance of a website supported by a host web server. At step 41, a loop is started to check a list of one or more host web servers. For each host web server on the list, the IP address for the corresponding host web server is determined. At step 42, the method determines whether the host web server is “pingable.” For instance, to “ping” a TCP/IP computer is a standard method of determining whether a computer is active and communicating via TCP/IP, regardless of whether host web server software is active on it. Some TCP/IP computers have this feature intentionally disabled for various reasons, including some security issues, so it cannot be assumed that all host web servers are “pingable”. One way of making a “pingable” determination is to reference the web host database which contains information about host web servers, such as whether the host web server is “pingable” or not.
  • [0029]
    If the host web server is pingable, the method proceeds to step 43 where a ping is sent over the Internet, preferably from a monitoring computer. At step 44, if the host web server responds, the response time is sent to the control console, which stores the information in a report database on the file server 12. If the host web server is pingable, but does not respond to pings, at step 51 the monitoring computer performs a “trace route” operation to record if there is a break in the Internet as packets trace their way to the host web server. In step 52, the trace route information, such as the trace route time, is sent to the control console, which stores it in the report database on the file server 12 for later analysis.
  • [0030]
    At step 53, the method for sending an alert message is determined. For instance, the party responsible for operating the host web server can select a method to be contacted if the host web server is considered unavailable, which is stored in a database on the file server. Some preferred methods of contact include pager, e-mail, fax, voice phone, or the like. If a monitoring computer has determined that a host web server in the active monitoring state is unavailable, the monitoring computer reads the database to determine which method of contact the party responsible for operating the host web server has selected, and then contacts them accordingly in step 54.
  • [0031]
    After sending the initial alert message, the monitoring computer keeps this host web server in an active monitoring state and periodically starts this process over. This active monitoring state is continued until the host web server is either returned to service and responds to monitoring, or the party responsible for operating the host web server requests that monitoring and alerts be temporarily halted.
  • [0032]
    If the host web server responds to a ping or if the host web server is not pingable, the method proceeds to step 46 where the web page is attempted to be accessed. The monitoring computer sends a request to the host web server to return its primary web page. At decision block 47, the monitoring computer determines whether the host web server responds. If the host web server returns its primary web page within a predetermined time period, it is considered available. At step 48 the monitoring computer records the time that the host web server was contacted and the response time to return the primary web page in the report database for the host web server and sends this information to the control console. The control console then stores it in the report database on the file server.
  • [0033]
    In step 48 the host web server can additionally transmit other data to the monitoring computer. For instance, a client agent can be embedded in the web page. Whenever a host web server returns its website to a requesting web browser, it executes the client agent which stores usage and other statistical information. The control console computer processes this information and stores it in the report database on the file server. This information is used to in generating usage reports of the host web server.
  • [0034]
    If the host web server does not return its primary web page within the predetermined time period, it is considered unavailable and the monitoring computer proceeds to step 55 where the host web server software is determined to be down. The transactional information is stored in step 56, and the method continues to step 53.
  • [0035]
    FIG. 3 depicts the interrelationship between various software components or units, which refer to a group of instructions, preferably located on a computer readable medium, that work in conjunction to achieve a desired result or perform one or more functions. The various units can be run from a single computer or as a distributed system on separate computers which communicate over a network, preferably a LAN, to act as a complete system. One advantage of operating as a distributed system is that the number of informational resources can be dramatically increased compared to a system operating on a single computer. The distributed system is fully scalable, so computers can freely be added or removed with minimal or no configuration modifications.
  • [0036]
    In one embodiment of a distributed system, the control unit 60 acts like a central manager and links most aspects of the monitoring system. Some of the control unit's 60 functions include, but are not limited to:
      • Reading and configuring the other units;
      • Scheduling and dispatching monitoring activities to the monitoring units 61;
      • Monitoring the activity of the monitoring units 61;
      • Scheduling and dispatching alerts to the alert unit 64;
      • Storing monitoring and performance data on the data store 66; and
      • Communicating status and performance data to the report unit 65.
        Preferably, the control unit 60 is programmed in an object oriented environment. In such an environment, some of the software components of the control unit 60 include a Configuration object, a Scheduler object, and a Dispatcher object. However, one with ordinary skill in the art will recognize that the software components could be programmed using other development environments. Preferably, the control unit 60 is loaded and run on a server on a network.
  • [0043]
    The monitor units 61 communicate and monitor via a network (e.g. the Internet 20) with the host units or hosts 67. The host units 67 support and maintain one or more informational resource accessible over a network. The host units are preferably run on any pingable computer, including but not limited to Unix hosts, web servers, DNS servers, mail servers, FTP servers, news servers, and the like. The monitor units 61 have enough intelligence to conduct all monitoring of one or more host units 67. For instance, in the case of website informational resources, the host units 67 and the computers on which the host units 67 are run are monitored by the monitor units 61, including:
      • Pinging the web server;
      • Performing a trace route on web servers;
      • Accessing the website;
      • Monitoring the web server;
      • Monitoring web server performance;
      • Checking website for changes (checksum);
      • Hacking check (literal string check); and
      • Website link check.
        Some of the software components of the each monitor unit 61 include a ServerConnection object and one or more MonitorTask objects. Preferably, the monitor units are loaded and run on client machines on a network.
  • [0052]
    The administrator unit or admin unit 62 provides administrative features. The admin unit 62 has three basic operating modes. Operator mode is a default mode when the admin unit 62 is first run. This is a protected mode for monitoring site operators. Access to lower level configuration data is not provided at this level. In the administration mode, a password must first be entered and validated. This protected mode provides access to all levels of configuration data. The admin unit 62 operates in the remote administration mode when the admin unit 62 detects that it is not running on the same network as the rest of the system units. The admin unit 62 it assumes that it is running from a remote location, such as over the Internet as indicated by reference 63. A remote access password is then required, in which case the user has rights equivalent to operator mode or administration mode, depending on the level of rights accorded the entered password. The admin unit 62 includes the following functions:
      • Provides a user interface for configuration files;
      • Displays reports and graphs of real-time status and performance;
      • Monitors configuration files for new or changed input from operators or customers, including input from host units 67, and notify other units that changes have occurred; and
      • Monitors a domestic web server for real-time report requests.
        Some of the software components of the admin unit 62 include a Communications object (includes remote admin relay), a ReportDisplay object, and a MonitorConfig object.
  • [0057]
    The alert unit 64 creates and publishes alert messages, preferably in response to a request from the control unit 60. Some of the alert unit 64 functions include:
      • Generate numeric paging alert;
      • Generate e-mail alert;
      • Generate fax alert;
      • Generate voice phone alert, or send alert to operator to call an individual; and
      • Escalate alerts after a predetermined number of alerts.
        Information as to the type of alert message and the contents of the alert message are read from the data store 66 or provided to the alert unit 64 from the control unit 60. The major components of the alert unit 64 include a Communications object, a PagerAlert object, a EmailAlert object, a FaxAlert object, a VoiceAlert object, and an Escalation object. Preferably, the alert unit 64 is loaded and ran on a server in a network. Such as the same servers that the control unit 60 is loaded.
  • [0063]
    The report unit 65 generates and provides reporting features. Some of the functions of the report unit 65 include:
      • Scheduling report generation;
      • Generating reports and graphs for output to web server pages;
      • Generating reports and graphs for output to e-mail;
      • Generating reports and graphs for output to fax; and
      • Taking special report requests from other units and generates them.
        The major components of the report unit 65 include a Communications object, a ReportScheduler object, a GenerateReport object, a WebPageOutput object, a EmailOutput object, and a FaxOutput object. Preferably, the report unit 65 is loaded and ran on a server in a network, such as the same server that the control unit 60 is loaded.
  • [0069]
    One example of a report is a performance report comprising both text and graphs. The report unit 65 retrieves data from the data store 66 and processes the data for presentation in the performance report. Some examples of the information contained in a performance report, including but not limited to:
      • Minimum, average, and maximum ping time to the computer on which the host unit 67 is running;
      • Dates and times that the host unit 67 was unavailable;
      • Minimum, average, and maximum time to retrieve the informational resource supported by the host unit 67;
      • Dates and times that the host unit 67 would not return its informational resource;
      • Number of hits (accesses) to the informational resource; and
      • Number of hits compared to one or more of the other informational resources being monitored by the system.
        Performance reports are generated periodically upon the instruction of the control unit 60, unless a critical condition exists. Alternatively, a performance report can be generated upon a user's command through via the admin unit 62.
  • [0076]
    The customer inputtviewer unit or customer unit 68 provides a user interface to interact with the monitoring system. Preferably, the customer unit is a protected server side program running on a domestic web server. The customer unit interfaces with the remaining units through the admin unit 62. Some of the functionality of the customer unit include allowing a user to:
      • Purchase monitoring services;
      • Add or remove host units to monitor;
      • Pause or restart monitoring; and
      • View reports.
        The major components of the customer unit 68 include a Communications object, a AddRemoveHost object, a PauseRestart object, and a ViewReport object.
  • [0081]
    Various configuration and data files are created and accessed by the monitoring system. For the purpose of managing a group of distributed computers which are working together to monitor hosts, the group will be called a Monitor Set. Ideally, a Monitor Set will have a directory set up in a centrally located position, such as the data store 66 on a centrally located file server, and contain all configuration and data files to be used. The following example illustrates a basic directory configuration:
    S:\Monitor
    Class
    Java class code
    MSA
    Configuration and data files
    Cust_1
    Customer data files
    Cust_2
    Customer data files
    Cust_x
    Customer data files
    OPERATIONS
  • [0082]
    In this example, the directory S:MonitorMSA refers to Monitor Set A. Within this directory are the following configuration and data files:
    UNIT-IP.DAT Unit IP number master file.
    CU.DAT CU (Control Unit 60) configuration file.
    MU.DAT Common MU (Monitor Unit 60) data file.
    MU-xxx.xxx.xxx.xxx.DAT Unique MU data file.
    LU.DAT LU (aLert Unit 64) configuration file.
    RU.DAT RU (Report Unit 65) configuration file.
    AU.DAT AU (Admin Unit 62) configuration file.
    CIVU.DAT CIVU (Customer Input/Viewer Unit 68)
    configuration file.
    DS.DAT DS (Data Store 66) configuration file.
    CUSTID.DAT Customer ID master file.

    Under the MSA directory is one subdirectory for each customer to contain their files. The subdirectory is named by the customer id number, which is contained in the CUSTID.DAT file. For instance:
  • [0083]
    .\CID00000001 Subdirectory for customer number 1
  • [0084]
    Within the customer directory are the following files:
    CUSTOMER.DAT General customer data
    xxx.xxx.xxx.xxx.HOST Customer host configuration file. One
    per host.
    xxx.xxx.xxx.xxx.ABYPASS File to indicate temporary alert
    bypass.
    xxx.xxx.xxx.xxx.MDATA Customer host monitoring results data
    file.
    xxx.xxx.xxx.xxx.ALERT Customer host alert record data file.
  • [0085]
    Also under the MSA directory is one subdirectory for business operation files. The subdirectory is named OPERATIONS. One OPERATIONS subdirectory exists for each Monitor Set.
  • [0086]
    .\OPERATIONS Subdirectory for operation data
  • [0087]
    Within this directory are the following files:
    OPERATIONS.DAT General operations data
    OPERATIONS.RESPONSE Operations configuration file.
    OPERATIONS.ABYPASS File to indicate temporary alert bypass.
    OPERATIONS.MDATA Operations results data file.
    OPERATIONS.ALERT Operations alert record data file.
  • [0088]
    In one embodiment, each unit has two possible configuration data files. For all units of a type (e.g. a monitor unit 61) there is a common configuration data file. Optionally, there may be additional unique configuration data files for individual units, having data that is unique for that particular unit. The common configuration data file is read first. Then, if it exists, the unique configuration data file is read and overwrites any values from the common configuration data file. The unique configuration data files contain the same type data as the common configuration data files, but may only contain data that changes, not the entire group of data.
  • [0089]
    The following is an example of how dual configuration files would be implemented for a hypothetical monitor unit 16. The common configuration data file could be configured:
    [cu_ip_number] xxx.xxx.xxx.xxx
    [num_simul_hosts] xx
    [num_ping_pkts] x
    [ping_pkt_len] xx
    [ping_timeout_ms] xxx
    [ping_interval_ms] xxx
  • [0090]
    The unique configuration data file for a given monitor unit 16 having a unique identifier of MU “_” xxx.xxx.xxx.xxx could be formatted as follows:
    [ping_pkt_len] xx
    [ping_timeout_ms] xxx

    Note that in this example, only two items would be overwritten from the unique configuration file. Also note that in each configuration data file, each line item is prefaced with an id tag. This is so the unique configuration data files only need to contain the information that changes.
  • [0091]
    Examples of the configuration and data files follow. In the following configuration data file examples, if the IP number for any of the individual units is ZERO, that indicates the unit is running on the same computer (internal unit) and there is no communication over the network to that unit.
  • [0092]
    The MSA directory includes a common units configuration data file (UNIT-IP.DAT). All unit IP numbers, except for monitoring units 61, are defined here. When a unit starts up, it checks this file for the other unit IP numbers. If the IP number of the other units are the same as its IP number, then it is running on the same computer. If the IP number of the other units are different, then it is running on a different computer and in a distributed mode. The following illustrates the format of UNIT-IP.DAT:
    /* UNIT-IP.DAT - Common Units Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [cu_ip_number] xxx.xxx.xxx.xxx
    [lu_ip_number] xxx.xxx.xxx.xxx
    [ru_ip_number] xxx.xxx.xxx.xxx
    [au_ip_number] xxx.xxx.xxx.xxx
    [civu_ip_number] xxx.xxx.xxx.xxx
    [ds_ip_number] xxx.xxx.xxx.xxx
    /*eof*/
  • [0093]
    The MSA directory contains control unit 60 configuration data file (CU.DAT). When the control unit 60 starts up, it checks UNIT-IP.DAT for the IP numbers of the other units. Then it checks this file for operating parameters. It knows nothing about any monitoring unit 61 until the monitoring unit 61 contacts the control unit 60 to be registered. The following illustrates the format of CU.DAT:
    /* CU.DAT
    Common Control Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [mu_status_retries] x // max count
    [mu_reset_timeout] x // seconds
    [mu_timeout_action] aaaaaa // coded action to take: send alert, etc.
    [mu_max_bad_chksums] x // max count
    [lu_status_retries] x // max count
    [lu_reset_timeout] x // seconds
    [lu_timeout_action] aaaaaa // coded action to take
    [au_status_retries] x // max count
    [au_reset_timeout] x // seconds
    [au_timeout_action] aaaaaa // coded action to take
    [ru_status_retries] x // max count
    [ru_reset_timeout] x // seconds
    [ru_timeout_action] aaaaaa // coded action
    [ds_status_retries] x // max count
    [ds_reset_timeout] x // seconds
    [ds_timeout_action] aaaaaa // coded action to take
    [scan_freq] 4,10,60 // scan frequencies
    /*eof*/
  • [0094]
    The MSA directory contains a monitoring unit 61 configuration data files. When a monitoring unit 61 starts up, it checks UNIT-IP.DAT for the IP number of the control unit 60 and then registers itself to accept work. The monitoring unit 61 gets all other information about itself from the control unit 60. The following illustrates the format of the common configuration data file (MU.DAT) and the unique configuration data file:
    /* MU.DAT
    Common Monitor Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [num_simul_hosts] xx
    [num_ping_pkts] x // count
    [ping_pkt_len] xx // bytes
    [ping_timeout_ms] xxx // milliseconds
    [ping_interval_ms] xxx // milliseconds
    [max_idle_time_s] xx // seconds
    /*eof*/
    /* MU-xxx.xxx.xxx.xxx.DAT
    Unique Monitor Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    /*
    /*eof*/
  • [0095]
    The MSA directory contains an alert unit 64 configuration data file (LU.DAT). The following illustrates the format of LU.DAT:
    /* LU.DAT
    Common aLert Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [num_pager_enabled] true/false // numeric pager alerts enabled
    [alphanum_pager_enabled] true/false // alphanumeric pager alerts
    enabled
    [email_enabled] true/false // e-mail alerts enabled
    [email_server] xxx.xxxxxxxx.xxxx.xxx // e-mail server
    [email_server_port] // e-mail server port
    [fax_enabled] true/false // fax alerts enabled
    [voice_enabled] true/false // voice alerts enabled
    [escalation_enabled] true/false // alert escalation enabled
    [com_port] com2 // modem com port for pager and
    fax
    /*eof*/
  • [0096]
    The MSA directory contains a report unit 65 configuration data file (RU.DAT). The following illustrates the format of RU.DAT:
    /* RU.DAT
    Common Report Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [batch_dow] 1 // numeric for day of week that batch reports; run
    1=Sunday, 7=Saturday
    [batch_tod] 0100 // mil time for time to run batch reports
    /*eof*/
  • [0097]
    The MSA directory contains an admin unit 62 configuration data file (AU.DAT). The following illustrates the format of AU.DAT:
    /* AU.DAT
    Common Admin Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [admin_passwd] @#$%%${circumflex over ( )}$ // password for admin level (encoded)
    [remote_passwd] {circumflex over ( )}&#%{circumflex over ( )}#@$% // password for remote access (encoded)
    /*eof*/
  • [0098]
    The MSA directory contains a customer unit 68 configuration data file (CIVU.DAT). The following illustrates the format of CIVU.DAT:
    /* CIVU.DAT
    Common Customer Input/Viewer Unit Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [admin_passwd] @#$%%${circumflex over ( )}$ // password for CIVU admin level
    (encoded)
    [remote_passwd] {circumflex over ( )}&#%{circumflex over ( )}#@$% // password for CIVU remote access
    (encoded)
    /*eof*/
  • [0099]
    The MSA directory contains a data store 66 configuration data file (DS.DAT). The following illustrates the format of DS.DAT:
    /* DS.DAT
    Common Data Store Configuration Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [ds_unc_loc]\\server\share // unc name for data store area
    /*eof*/
  • [0100]
    The OPERATIONS directory contains a several operations related data files. Following illustrates the format of some of those files:
    /* .\OPERATIONS\OPERATIONS.DAT
    Operations General Data File - One file per monitor set.
    This data is used for internal alerts; performance problems, etc.
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [cust_company] Quicksand Development // customer company name
    [cust_contact] Miles Nixon // customer contact name
    [cust_contact_pn] xxx.xxx.xxxx // customer contact phone number
    [access_login] aaaaaaa // customer login name
    [access_passwd] @#$%%${circumflex over ( )}$ // password for web server access
    (encoded)
    /*eof*/
    /* OPERATIONS.RESPONSE
    QSDev Configuration and Response Data File - One file per monitor set.
    This data is used for internal alerts; performance problems, etc.
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [num_pager_pn] xxxxxxxxxx,xxxxxxxxxx // numeric pager(s) phone number
    [num_pager_pin] xxxxxxxxxx,xxxxxxxxxx // numeric pager PIN(s)
    [num_pager_email] aaaaa@xxxxx.com,aaaa@xxxxx.com
    // numeric pager email address(es)
    [num_pager_msg] 64 911 // numeric pager message
    [alphanum_pager_pn] xxxxxxxxxx.xxxxxxxxxx // alphanumeric pager(s) phone number
    [alphanum_pager_pin] xxxxxxxxxx,xxxxxxxxxx // alphanumeric pager PIN(s)
    [alphanum_pager_email] aaaaa@xxxxx.com, aaaa@xxxxx.com
    // alphanumeric pager email address(es)
    [alphanum_pager_msg] ws 911 // alphanumeric pager message
    [email_address] aaaaa@xxxxxxx.xxx, aaaaa@xxxxxxx.xxx
    // email address(es)
    [email_msg] web server down // email additional message
    [fax_pn] xxxxxxxxxx,xxxxxxxxxx // fax phone number(s)
    [fax_msg] web server down // fax additional message
    [voice_email] aaaaa@xxxxx.com, aaaaa@xxxxx.com
    // voice email address(es)
    [voice_pn] xxxxxxxxxx,xxxxxxxxxx // voice phone number(es)
    [voice_msg] web server down // voice additional message
    [num_attempts] xx // number of attempts before escalation
    // zero indicates no escalation
    [esc_num_pager_pn] xxxxxxxxxx,xxxxxxxxxx // esc numeric pager phone number(s)
    [esc_num_pager_pin] xxxxxxxxxx,xxxxxxxxxx // esc numeric pager PIN(s)
    [esc_num_pager_email] aaaaa@xxxxx.com, aaaaa@xxxxx.com
    // esc numeric pager email address(es)
    [esc_num_pager_msg] 64 911 // esc numeric pager message
    [esc_alphanum_pager_pn] xxxxxxxxxx,xxxxxxxxxx
    // esc alphanumeric pager phone
    number(es)
    [esc_alphanum_pager_pin] xxxxxxxxxx,xxxxxxxxxx
    // esc alphanumeric pager PIN(s)
    [esc_alphanum_pager_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // esc alphanumeric pager email address
    [esc_alphanum_pager_msg] web server down // esc alphanumeric pager message
    [esc_email_address] aaaaa@xxxxxxx.xxx,aaaaa@xxxxxxx.xxx
    // esc email address(es)
    [esc_email_msg] web server down // esc email additional message
    [esc_fax_pn] xxxxxxxxxx,xxxxxxxxxx // esc fax phone number(s)
    [esc_fax_msg] web server down // esc fax additional message
    [esc_voice_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // esc voice email address(es)
    [esc_voice_pn] xxxxxxxxxx,xxxxxxxxxx // esc voice phone number(es)
    [esc_voice_msg] web server down // esc voice additional message
    /*eof*/
  • [0101]
    Customer data files are preferably maintained separately from the general customer information or billing data. Each group of data files for each customer are kept in a separate subdirectory organized by using the customer ID number. The MSA directory contains customer master identification data file (CUSTID.DAT). The following illustrates the format of CUSTID.DAT:
    /* CUSTID.DAT
    Customer ID Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [cid_00000001] ABC Corp. // Customer number 1
    [cid_00000002] XYZ Corp. // Customer number 2
    /*eof*/
  • [0102]
    Located in each separate customer subdirectory are several configuration and data files unique to the corresponding customer. These files include the following:
    /* .\CID_0000000x\CUSTOMER.DAT
    Customer General Data File
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    [cust_company] aaaaaaaaaaa // customer company name
    [cust_company] aaaaaaaaaaa // customer company name
    [cust_contact] aaaaaaaaaaaaaa // customer contact name
    [cust_contact_pn] xxx.xxx.xxxx // customer contact phone number
    [access_login] aaaaaaa // customer login name
    [access_passwd] @#$%%${circumflex over ( )}$ // password for web server access (encoded)
    [host] xxx.xxx.xxx.xxx // customer host to monitor
    [host] xxx.xxx.xxx.xxx // customer 2nd host to monitor, etc.
    /* .\CID_0000000x\xxx.xxx.xxx.xxx.HOST
    Customer HOST Configuration Data File - One file per host
    Created: mm/dd/yyyy
    Modified: mm/dd/yyyy
    */
    /*
    The first part is MONITORING data for this HOST
    */
    [dns_name] xxx.xxxxxxxxxxx.xxx // take your pick
    [mon_freq] 4 // monitoring times per hour. 4 is the default
    [rpt_freq] 1 // report times per week. 1 is the default
    [pingable] true/false //
    [ping_timeout] xxx // milliseconds
    [web_host] www.xxxxxxx.com // blank means no web page
    [web_timeout] xxx // milliseconds or seconds
    [web_pg_chksum] xxx // checksum of web page
    [web_pg_hack_data] “Case sensitive hack data” // exactly what it says
    [traceroute] true/false // traceroute or not
    [traceroute_hops] xx // max count
    /*
    The second part is RESPONSE data for this HOST
    */
    [num_pager_pn] xxxxxxxxxx,xxxxxxxxxx // numeric pager phone number(s)
    [num_pager_pin] xxxxxxxxxx,xxxxxxxxx // numeric pager PIN(s)
    [num_pager_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // numeric pager email address(s)
    [num_pager_msg] 64 911 // numeric pager message
    [alphanum_pager_pn] xxxxxxxxxx,xxxxxxxxx // alphanumeric pager phone number(s)
    [alphanum_pager_pin] xxxxxxxxxx,xxxxxxxxxx // alphanumeric pager PIN(s)
    [alphanum_pager_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // alphanumeric pager email address(es)
    [alphanum_pager_msg] ws 911 // alphanumeric pager message
    [email_address] aaaaa@xxxxxxx.xxxcaaaaa@xxxxxxx.xxx
    // email address(es)
    [email_msg] web server down // email additional message
    [fax_pn] xxxxxxxxxx,xxxxxxxxxx // fax phone number(s)
    [fax_msg] web server down // fax additional message
    [voice_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // voice email address(es)
    [voice_pn] xxxxxxxxxx // voice phone number
    [voice_msg] web server down // voice additional message
    [num_attempts] xx // number of attempts before escalation
    // zero indicates no escalation
    [esc_num_pager_pn] xxxxxxxxxx,xxxxxxxxxx // esc numeric pager phone number(s)
    [esc_num_pager_pin] xxxxxxxxxx,xxxxxxxxxx // esc numeric pager PIN(s)
    [esc_num_pager_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // esc numeric pager email address(es)
    [esc_num_pager_msg] 64 911 // esc numeric pager message
    [esc_alphanum_pager_pn] xxxxxxxxxx,xxxxxxxxxx
    // esc alphanumeric pager phone number(s)
    [esc_alphanum_pager_pin] xxxxxxxxxx,xxxxxxxxxx
    // esc alphanumeric pager PIN(s)
    [esc_alphanum_pager_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // esc alphanumeric pager email address
    [esc_alphanum_pager_msg] web server down // esc alphanumeric pager message
    [esc_email_address] aaaaa@xxxxxxx.xxx,aaaaa@xxxxxxx.xxx
    // esc email address(es)
    [esc_email_msg] web server down // esc email additional message
    [esc_fax_pn] xxxxxxxxxx,xxxxxxxxxx // esc fax phone number(s)
    [esc_fax_msg] web server down // esc fax additional message
    [esc_voice_email] aaaaa@xxxxx.com,aaaaa@xxxxx.com
    // esc voice email address(es)
    [esc_voice_pn] xxxxxxxxxx,xxxxxxxxxx // esc voice phone number(s)
    [esc_voice_msg] web server down // esc voice additional message
    /*eof*/
    /* .\CID_0000000x\xxx.xxx.xxx.xxx.ABYPASS
    Customer HOST Alert Bypass File
    This file contains no data.
    Its existence indicates that alerts for this host are temporarily being bypassed.
    */
    /* .\CID_0000000x\xxx.xxx.xxx.xxx.MDATA
    Customer HOST Monitor Results Data File - One file per host
    */
    /*
    This data is recorded by the control unit 60.
    Most of it is also passed to the alarm unit 64 for alarm processing.
    */
    /* Monitor data is comma delimited. There is one line (entry) per scan attempt.
    Data is as follows:
    yyyymmdd - year, month, day of scan.
    hhmmss.x - hour, minute, second, and tenth of second of scan.
    111 - number of pings sent. Zero indicates host was non-pingable.
    222,333,444 - resultant millisecond times of the pings.
    ss.x - seconds and tenth of second to return web page. Empty field indicates
    unsuccessful attempt to obtain web page.
    chksum - checksum of retrieved page (hex).
    h-y/n - y/n did hack text check out
    a-y/n - y/n was an alert sent
    trcrtdata - trace route data string.
    */
    yyyymmdd,hhmmss.x, 111,222,333,ss.x,cksum,h-y/n,a-y/n,trcrtdata
    /* .\CID_0000000x\xxx.xxx.xxx.xxx.ALERT
    Customer HOST ALERT Record Data File - One file per host
    */
    /*
    This data is recorded by the alert unit 64..
    */
    /* Alert data is comma delimited. There is one line entry per alert.
    Data is as follows:
    yyyymmdd - year, month, day of alert. (To point back to monitor data file.)
    hhmmss.x - hour, minute, second, and tenth of second of alert.
    yyyymmdd - year, month, day that alert was processed.
    hhmmss.x - hour, minute, second, and tenth of second that alert was processed.
    atype - alert type:
    1 - no ping response
    2 - web page not returned
    3 - web page chksum bad
    4 - web page hack alert
    alevel - alert level:
    0 - alert bypass enabled, no alert sent
    1 - normal
    2 - escalated
    ameth -alert method used:
    1 - numeric page
    2 - alphanumeric page
    3 - e-mail
    4 - fax
    5 - voice
    adata - alert data - e-mail address or phone number used
    */
    yyyymmdd, hhmmss.x, yyyymmdd, hhmmss.x,atype,alevel,ameth,adata
  • [0103]
    FIG. 4 depicts an example of a control unit 60 and some of its components. The Configuration object 71 administers the control unit 60. The Configuration object 71 communicates with the admin unit 62, which is preferably running on the same machine as the control unit 60. However, the admin unit 62 can be run remotely. The Scheduler object 72 builds a schedule of work to do based on data retrieved from on the data store 66. The Scheduler object 72 maintains the job list and triggers the start of those jobs. The Dispatcher object 73 maintains communications with the monitor units 61. When a job needs to be done, the Scheduler object 72 sends the job information to the Dispatcher object 73. The Dispatcher object 73 maintains a queue of new jobs and communicates the jobs to the monitor units 61. The Dispatcher object 73 also maintains a list of active monitor units 61 and their status.
  • [0104]
    The Scheduler object 72 reads in all jobs and sets up a schedule in an internal data structure. The Scheduler object 72 is flexible enough that job information can be added and deleted dynamically. The Scheduler object 72 watches the system clock and sends appropriate jobs to the Dispatcher object 73. The Scheduler object 72 also staggers jobs. For instance, if the system is monitoring 2000 informational resources hourly, it is preferred that 2000 tasks are distributed over the hour instead of lumping the jobs all at once. When a new informational resource is added, the Scheduler object 72 determines the best spot to put it in the schedule.
  • [0105]
    The Dispatcher object 73 maintains a connection with each monitor unit 61. When a monitor unit 61 connects to the server, a new TCP/IP port is assigned to that connection. All communication passes on this port. In some implementations it may be necessary to use more than one port for each monitor unit 61. The Dispatcher object 73 manages all available monitor units 61. If a job is completed successfully by a monitor unit 61, the results are stored in the data store 66. If a job fails or the results indicate a critical condition, the data is stored and a message is sent to the alert unit 64. The Dispatcher object 73 also communicates with the report unit 65 with instructions to publish reports.
  • [0106]
    FIG. 5 illustrates an example of a monitor unit 61 and some of its components. The ServerConnection object 74 maintains a connection to the control unit 60 for instruction. When an instruction is received it creates a MonitorTask object 75 that processes that instruction. All communication between the monitor unit 61 and control unit 60 takes place through a TCP/IP network. The monitor unit 61 is multi-threaded, so many different tasks can execute concurrently. Each MonitorTask object 75 executes in its own thread, performing its task and sending results through the ServerConnection object 74.
  • [0107]
    When the monitor unit 61 is loaded, it first establishes a connection with the control unit 60. For configurations where multiple control units 60 exist in one monitoring system, the monitor unit 61 is configured with the IP address or DNS name of the corresponding control unit 60.
  • [0108]
    The ServerConnection object 74 object polls a TCP/IP port waiting for messages. When a message is received, an appropriate MonitorTask object 75 object is created and initialized with the data necessary to complete that task. Preferably, the MonitorTask objects 75 are instantiated from a MonitorTask class containing the common data and functionality needed by all tasks and then subclass MonitorTask object 75 for each separate task, such as MonitorTaskPing, MonitorTaskTraceRoute, MonitorTaskURLCheck, MonitorTaskGetURL, and the like, thus taking advantage of object oriented programming in languages such as Java, C++, ActiveX, and the like.
  • [0109]
    MonitorTask objects 75 exist for all communications that travel over the Internet 20, including but not limited to pinging a host, performing a trace route to a host, checking the web server of a host for a reply, retrieving a document from a web server, retrieving information from a client program running on the host, sending a command to the web host to execute a program, and the like. When the MonitorTask object 75 has finished, it gives the results to ServerConnection object 74, which sends the results to the control unit 60 for further processing.
  • [0110]
    As indicated above, the monitoring system preferably uses the standard TCP/IP protocol and tools to perform both its monitoring and to communicate between the distributed computers. Encapsulated within the TCP/IP data packet is a data language used to efficiently communicate between the computers participating in the system. Preferably, the packets are configured using the RMI (“Remote Method Interface”) feature built into Java.
  • [0111]
    By way of illustration, the following text provide example dialogs between the monitor unit 61 (MU) and the control unit 60 (CU) during monitor. The term “PCID” is a shorthand notation for Protocol Command Identifier. The characters “>” and “<” are used to indicate that a message is being transferred.
    Dialog of MU Boot Up
    MU CU PCID
    (MU boots up, reads CU
    ip number, reads its
    common config data, reads
    its unique config data
    [if exists].)
    ATT, CU! Registration > 203
    request. My IP# is
    xxx.xxx.xxx.xxx.
    < ACK! MUxxx.xxx.xxx.xxx, 103
    you're registered.
    ACK CU! Confirming. > 201

    MU reads the CU IP number from the UNITIP.DAT file when it boots up. It then reads its configuration data files from the file server. CU keeps track of how many hosts the MU is currently monitoring and feeds it with more after the MU returns host data.
  • [0112]
    For the rest of the examples, it is assumed that the MU and CU are already running and configured to communicate with each other.
    Dialog with Successful Monitor
    MU CU PCID
    < ATT MU! Monitor data: 104
    ping, web, hops, host name
    or ip, etc.
    ACK! Confirming. > 201
    (MU does its thing . . .)
    ATT, CU! I'm done and > 204
    host is OK. Monitor data:
    ping time, web time, host
    name or ip
    < ACK! Confirming 101
  • [0113]
    It is preferred to specify the CU for the MU to request data from rather than using broadcasts. That way, they can be grouped together by Monitor Sets. After the MU is registered with the CU, the CU controls and keeps track of what hosts the MU is currently handling. The MU and CU preferably confirm that the operation is complete. Otherwise, retry. The MU has enough logic to handle all of the monitoring logic. This dialog takes place with the MU's ServerConnecton object for all monitor tasks.
    Dialog with Unsuccessful Monitor
    MU CU PCID
    < ATT MU! Monitor data: 104
    ping, web, hops, host name
    or ip, etc.
    ACK! Confirming > 201
    (MU does its thing . . .)
    ATT, CU! I'm done and > 205
    host is BROKEN. Monitor
    data: ping time, web time,
    host name or ip
    < ACK! Confirming 101
    (CU sends alert to LU) 113
  • [0114]
    If a host is broken, it is preferably monitored continuously on an accelerated schedule until; 1) the host comes back online or 2) the CU is told to pause or stop monitoring of the broken host. The CU is responsible to handle the accelerated and continuous monitoring of a broken host and tell the MU how and when to do that.
    CU_MU Timeout Dialog
    CU MU PCID
    ATT, MU! Request Status. > 107
    (No response.)
    (Timeout period expires. CU will
    then try again as many times as
    defined in the master
    configuration file.)
    ATT, MU! Request Status. (2nd > 107
    try)
    (No response.)
    (Timeout period expires again.
    CU will then try again as many
    times as defined in the master
    configuration file.)
    ATT, MU! Request Status. (nth try) > 107
    (No response.)
    (Timeout period expires again.
    CU will now try to get the MU to
    reset itself.)
    ATT, MU! Request Restart > 108
    (No response.)
    (CU now waits for a certain period
    of time for the MU to reset itself
    and send a registration request.)
    (No response.)
    (CU determines that the MU is 114
    nonfunctional. CU now sends an
    alert to the aLert Unit for
    processing.)

    The number of times to retry getting status should be defined in a master configuration file. The actual action taken after the MU fails to respond n amount of times may be defined in the configuration file also.
  • [0115]
    By way of illustration, the following text provide an example communications protocol, based on the above examples of conversations between the MU and the CU. Construction of the packets will be at the field level in Java. The term “MSID” is a shorthand notation for Message Sequence Identifier.
  • [0000]
    PCID Number:
  • [0116]
    (all) 0xx series numbers (applies to all units)
  • [0117]
    CU: 1xx series numbers
  • [0118]
    MU: 2xx series numbers
  • [0119]
    LU: 3xx series numbers
  • [0120]
    RU: 4xx series numbers
  • [0121]
    AU: 5xx series numbers
  • [0122]
    DS: 6xx series numbers
  • [0000]
    Within all communications between Units on the LAN will use IP numbers. Within all communications by MUs on the Internet, either the IP or the DNS name can be used.
  • [0000]
    All Units (0xx)
  • [0123]
    ATT! SERIOUS Error. A serious error occurred somewhere (e.g. self-destruct initiated, etc.).
  • [0124]
    PCID: 000
  • [0125]
    Field 1: integer—PCID
  • [0126]
    Field 2: string —IP of originator
  • [0000]
    CU (1xx)
  • [0127]
    ATT! CU Error. Some kind of error occurred.
  • [0128]
    PCID: 100
  • [0129]
    Field 1: integer—PCID
  • [0130]
    Field 2: string—IP of originator
  • [0000]
    ACK MU! Message/Command Acknowledgment.
  • [0131]
    PCID: 101
  • [0132]
    Field 1: integer—PCID
  • [0133]
    Field 2: integer—MSID
  • [0000]
    NAK MU! Message/Command Negative Acknowledgment.
  • [0134]
    PCID: 102
  • [0135]
    Field 1: integer—PCID
  • [0136]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Registration Confirmed.
  • [0137]
    PCID: 103
  • [0138]
    Field 1: integer—PCID
  • [0139]
    Field 2: integer—MSID
  • [0140]
    ATT MU! Monitor This Host.
    PCID: 104
    Field 1: integer - PCID
    Field 2: integer - MSID
    Field 3: integer - Pingable host - Zero = non-pingable
    Non-zero = pingable,
    value is timeout
    Field 4: integer - Web host - Zero = not a web host
    Non-zero = Web host,
    value is web page timeout
    Field 5: integer - Traceroute - Zero = do not traceroute
    Non-zero = traceroute,
    value is number of hops
    Field 6: string - IP or DNS name of host (variable length data)
    Field 7: string - URL of web page to obtain (variable length data)

    ACK MU! Host Monitor Complete and Host is OK. Confirming.
  • [0141]
    PCID: 105
  • [0142]
    Field 1: integer—PCID
  • [0143]
    Field 2: integer—MSID
  • [0000]
    ACK MU! Host Monitor Complete and the Host was BROKEN. Confirming.
  • [0144]
    PCID: 106
  • [0145]
    Field 1: integer—PCID
  • [0146]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Request Status.
  • [0147]
    PCID: 107
  • [0148]
    Field 1: integer—PCID
  • [0149]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Restart (Restart Software.)
  • [0150]
    PCID: 108
  • [0151]
    Field 1: integer—PCID
  • [0152]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Reboot (Reboot Hardware.)
  • [0153]
    PCOD: 109
  • [0154]
    Field 1: integer—PCID
  • [0155]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Pause Monitoring.
  • [0156]
    PCID: 110
  • [0157]
    Field 1: integer—PCID
  • [0158]
    Field 2: integer—MSID
  • [0000]
    ATT MU! Resume Monitoring.
  • [0159]
    PCID: 111
  • [0160]
    Field 1: integer—PCID
  • [0161]
    Field 2: integer—MSID
  • [0000]
    ACK MU! Standby (Response to MU Idle Inquiry if CU is not Ready to Send Another Host)
  • [0162]
    PCID: 112
  • [0163]
    Field 1: integer—PCID
  • [0164]
    Field 2: integer—MSID
  • [0165]
    ATT LU! We have a BROKEN Host. Handle It.
    PCID: 113
    Field 1: integer - PCID
    Field 2: integer - MSID
    Field 3: integer - Ping time - Zero = host timed out on ping
    Non-zero = ping time for host
    Field 4: integer - Web time - Zero = host timed out on web
    page retrieval
    Non-zero = web page retrieval
    time
    Field 5: boolean - Check Sum Failed Alert
    Field 6: boolean - Hack String Failed Alert
    Field 7: string - Customer ID
    Field 8: string - IP or DNS name of broken host (variable length data)
    Field 9: string - Traceroute information (variable length data)

    ATT LU! We Have a BROKEN MU. Alert the Boss.
  • [0166]
    PCID: 114
  • [0167]
    Field 1: integer—PCID
  • [0168]
    Field 2: integer—MSID
  • [0169]
    Field 3: string—IP of broken MU
  • [0000]
    ATT LU! Incomplete Host Data. Alert the Boss.
  • [0170]
    PCID: 115
  • [0171]
    Field 1: integer—PCID
  • [0172]
    Field 2: integer—MSID
  • [0173]
    Field 3: string—Customer ID
  • [0174]
    Field 4: string—IP or DNS name of incomplete host (variable length data)
  • [0000]
    ATT MU! Change Your Configuration. New Parameters Follow.
  • [0175]
    PCID: 116
  • [0176]
    Field 1: integer—PCID
  • [0177]
    Field 2: integer—MSID
  • [0178]
    Field 3: string—MU Configuration Data File contents (variable length data)
  • [0000]
    ATT LU! System Performance WARNING. Performance Threshold Exceeded. Alert the Boss.
  • [0179]
    PCID: 117
  • [0180]
    Field 1: integer—PCID
  • [0181]
    Field 2: integer—MSID
  • [0182]
    Field 3: string—IP or DNS name of CU with the performance warning
  • [0183]
    Field 4: string—queue that exceeded performance threshold
  • [0000]
    ATT LU! System Performance PROBLEM. Performance Limits Exceeded. Alert the Boss.
  • [0184]
    PCID: 118
  • [0185]
    Field 1: integer—PCID
  • [0186]
    Field 2: integer—MSID
  • [0187]
    Field 3: string—IP or DNS name of CU with the performance problem
  • [0188]
    Field 4: string—queue that exceeded performance limits
  • [0000]
    ATT LU! BROKEN Host is Back Online. Cancel Alerts.
  • [0189]
    PCID: 119
  • [0190]
    Field 1: integer—PCID
  • [0191]
    Field 2: integer—MSID
  • [0192]
    Field 3: string—Customer ID
  • [0193]
    Field 4: string—IP or DNS name of broken host (variable length data)
  • [0000]
    MU (2xx)
  • [0000]
    ATT! MU Error. Some Kind of Error Occurred.
  • [0194]
    PCID: 200
  • [0195]
    Field 1: integer—PCID
  • [0196]
    Field 2: string—IP of originator
  • [0000]
    ACK! Message/Command Acknowledgement.
  • [0197]
    PCID: 200
  • [0198]
    Field 1: integer—PCID
  • [0199]
    Field 2: integer—MSID
  • [0000]
    NAK! Message/Command Negative Acknowledgement.
  • [0200]
    PCID: 202
  • [0201]
    Field 1: integer—PCID
  • [0202]
    Field 2: integer—MSID
  • [0000]
    ATT CU! Registration Request.
  • [0203]
    PCID: 203
  • [0204]
    Field 1: integer—PCID
  • [0205]
    Field 2: string—IP of originator
  • [0206]
    (No MSID at this point, this should be the only message outstanding for this MU.)
  • [0000]
    ATT CU! Host Monitor Complete and Host is OK.
  • [0207]
    PCID: 204
  • [0208]
    Field 1: integer—PCID
  • [0209]
    Field 2: integer—MSID
  • [0210]
    Field 3: integer—Ping time
  • [0211]
    Field 4: integer—Web time
  • [0212]
    ATT CU! Host Monitor Complete and the Host is BROKEN.
    PCID: 205
    Field 1: integer - PCID
    Field 2: integer - MSID
    Field 3: integer - Ping time - Zero = host timed out on ping
    Non-zero = ping time for host
    Field 4: integer - Web time - Zero = host timed out on web
    page retrieval
    Non-zero = web page retrieval
    time
    Field 5: string - IP or DNS name of broken host (variable length data)
    Field 6: string - Traceroute information (variable length data)

    ATT CU! Error! I already am at My Maximum Simultaneous Host Limit! What are you THINKING?
  • [0213]
    PCID: 206
  • [0214]
    Field 1: integer—PCID
  • [0215]
    Field 2: string—IP of originator
  • [0000]
    ATT CU! I'm Idle and You Haven't Responded in x Amount of Time. Request Response.
  • [0216]
    PCID: 207
  • [0217]
    Field 1: integer—PCID
  • [0218]
    Field 2: string—IP of originator
  • [0000]
    LU (3xx)
  • [0000]
    ATT! LU Error. Some Kind of Error Occurred.
  • [0219]
    PCID: 300
  • [0220]
    Field 1: integer—PCID
  • [0221]
    Field 2: string—IP of originator
  • [0000]
    ACK! Message/Command Acknowledgment.
  • [0222]
    PCID: 101
  • [0223]
    Field 1: integer—PCID
  • [0224]
    Field 2: integer—MSID
  • [0000]
    ACK! Message/Command Negative Acknowledgment.
  • [0225]
    PCID: 102
  • [0226]
    Field 1: integer—PCID
  • [0227]
    Field 2: integer—MSID
  • [0000]
    ACK CU! Confirming broken host.
  • [0228]
    PCID: 301
  • [0229]
    Field 1: integer—PCID
  • [0230]
    Field 2: integer—MSID
  • [0000]
    RU (4xx)
  • [0000]
    ATT! RU Error. Some Kind of Error Occurred.
  • [0231]
    PCID: 400
  • [0232]
    Field 1: integer—PCID
  • [0233]
    Field 2: string—IP of originator
  • [0000]
    ACK! Message/Command Acknowledgment.
  • [0234]
    PCID: 401
  • [0235]
    Field 1: integer—PCID
  • [0236]
    Field 2: integer—MSID
  • [0000]
    NAK! Message/Command Negative Acknowledgment.
  • [0237]
    PCID: 402
  • [0238]
    Field 1: integer—PCID
  • [0239]
    Field 2: integer—MSID
  • [0000]
    AU (5xx)
  • [0000]
    ATT! AU Error. Some Kind of Error Occurred.
  • [0240]
    PCID: 500
  • [0241]
    Field 1: integer—PCID
  • [0242]
    Field 2: string—IP of originator
  • [0000]
    ACK! Message/Command Acknowledgment.
  • [0243]
    PCID: 501
  • [0244]
    Field 1: integer—PCID
  • [0245]
    Field 2: integer—MSID
  • [0000]
    NAK! Message/Command Negative Acknowledgment.
  • [0246]
    PCID: 502
  • [0247]
    Field 1: integer—PCID
  • [0248]
    Field 2: integer—MSID
  • [0000]
    DS (6xx)
  • [0000]
    ATT! DS Error. Some Kind of Error Occurred.
  • [0249]
    PCID: 600
  • [0250]
    Field 1: integer—PCID
  • [0251]
    Field 2: string—IP of originator
  • [0000]
    ACK! Message/Command Acknowledgment.
  • [0252]
    PCID: 601
  • [0253]
    Field 1: integer—PCID
  • [0254]
    Field 2: integer—MSID
  • [0000]
    NAK! Message/Command Negative Acknowledgment.
  • [0255]
    PCID: 602
  • [0256]
    Field 1: integer—PCID
  • [0257]
    Field 2: integer—MSID =p In one embodiment of the invention, a monitoring system is implemented as a distributed client-server system of Java processes communicating over TCP/IP. The monitoring workload is spread over multiple machines and controlled by one or more servers. Each client machine monitors its assigned hosts and report the results to the server for processing. The server maintains a balanced workload over all the clients and logs the success or failure of the host monitoring. The server also triggers host downtime alerts and notifies operators of any potential problems within the system. The system is scalable as well as “plug and play”. Any client started will register itself with the server and wait for work to be assigned. The server can control any number of clients. Adding another machine will expand the processing capacity of the monitoring system. Since the entire system operates by TCP/IP networking, performing remote administration of the server over TCP/IP is possible. Those administration changes are instantly transferred to each client.
  • [0258]
    Because the server performs all disk I/O, the clients will not necessarily need access to the data store. Having a centralized point of administration and reporting will help minimize the number of problems that might exist when multiple machines and processes are generating data. Responsiveness will be increased as changes can be instantly transferred to the clients. The system is highly scalable because of the automated nature of the server load balancing. Any new client will instantly be assigned work. Any failed client's work can be assigned to operational clients. Because all communication between the client and server travels over TCP/IP, clients can be located anywhere with an accessible TCP/IP address. Clients all over the world can be controlled by a single, or multiple, servers. Using Java provides instant networking capabilities and gives the added benefit of cross-platform deployment. Any machine with enough memory and disk space to run an operating system with a supported Java Virtual Machine can be used as a client.
  • [0259]
    When started, the server initializes and reads in the current configuration. The list of hosts to be monitored is also loaded. Then the server checks the network for available clients. Once the server has registered clients it begins to give them work. All scheduling and load balancing takes place on the server. In one embodiment where multiple servers are used, each server can be controlled by a master server. In such an embodiment, the host list will be divided to the servers by the master server and all scheduling and client control takes place at the original servers.
  • [0260]
    Clients are assigned a host list to monitor by the server. Alternatively, clients are assigned each monitoring task dynamically. One advantage of assigning a host list is that interactive network traffic is reduce and latency between monitoring tasks is also reducing. However, a single host task may be more reliable. If the client fails only one task has been interrupted. In an intermediate embodiment, small host lists are assigned to each client. The clients, after completing a monitoring task, report the results to the server.
Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
US4493021 *Apr 3, 1981Jan 8, 1985The United States Of America As Represented By The Administrator Of The National Aeronautics And Space AdministrationMulticomputer communication system
US5506955 *Dec 1, 1994Apr 9, 1996International Business Machines CorporationSystem and method for monitoring and optimizing performance in a data processing system
US5553235 *May 1, 1995Sep 3, 1996International Business Machines CorporationSystem and method for maintaining performance data in a data processing system
US5600096 *Sep 27, 1994Feb 4, 1997The Whitaker CorporationMechanical connector splice for cable
US5640513 *Jan 22, 1993Jun 17, 1997International Business Machines CorporationNotification of disconnected service machines that have stopped running
US5684945 *Apr 16, 1996Nov 4, 1997International Business Machines CorporationSystem and method for maintaining performance data in a data processing system
US5715393 *Jun 21, 1995Feb 3, 1998Motorola, Inc.Method for remote system process monitoring
US5717861 *Dec 21, 1995Feb 10, 1998Zenith Electronics CorporationMethod and system for determining network access delay
US5727159 *Apr 10, 1996Mar 10, 1998Kikinis; DanSystem in which a Proxy-Server translates information received from the Internet into a form/format readily usable by low power portable computers
US5751961 *Jan 31, 1996May 12, 1998Bell Communications Research, Inc.Integrated internet system for translating logical addresses of internet documents to physical addresses using integrated service control point
US5751966 *Mar 3, 1997May 12, 1998International Business Machines CorporationNotification of disconnected service machines that have stopped running
US5870559 *Apr 11, 1997Feb 9, 1999Mercury InteractiveSoftware system and associated methods for facilitating the analysis and management of web sites
US6070190 *May 11, 1998May 30, 2000International Business Machines CorporationClient-based application availability and response monitoring and reporting for distributed computing environments
US6128628 *Feb 27, 1998Oct 3, 2000Mci Communication CorporationMeta data processing for converting performance data into a generic format
US6138157 *Oct 12, 1998Oct 24, 2000Freshwater Software, Inc.Method and apparatus for testing web sites
US6192402 *Aug 7, 1998Feb 20, 2001Nec CorporationNetwork management system and network management method capable of controlling agent even in case of fault occurring on logical communication channel
US6219676 *Aug 6, 1999Apr 17, 2001Novell, Inc.Methodology for cache coherency of web server data
US6286001 *Feb 24, 1999Sep 4, 2001Doodlebug Online, Inc.System and method for authorizing access to data on content servers in a distributed network
US6314463 *May 29, 1998Nov 6, 2001Webspective Software, Inc.Method and system for measuring queue length and delay
US6317788 *Oct 30, 1998Nov 13, 2001Hewlett-Packard CompanyRobot policies for monitoring availability and response of network performance as seen from user perspective
US6385200 *Jan 23, 1998May 7, 2002Fujitsu LimitedBroadcast control system, network element, and switching node apparatus with broadcast cell routing capabilities in asynchronous transmission mode network
US6446134 *Oct 17, 1995Sep 3, 2002Fuji Xerox Co., LtdNetwork management system
US6490617 *Jun 9, 1998Dec 3, 2002Compaq Information Technologies Group, L.P.Active self discovery of devices that participate in a network
US6513060 *Aug 27, 1998Jan 28, 2003Internetseer.Com Corp.System and method for monitoring informational resources
US6526442 *Jul 7, 1998Feb 25, 2003Compaq Information Technologies Group, L.P.Programmable operational system for managing devices participating in a network
US6549944 *Jul 6, 2000Apr 15, 2003Mercury Interactive CorporationUse of server access logs to generate scripts and scenarios for exercising and evaluating performance of web sites
US6636983 *May 9, 2000Oct 21, 2003Andrew E. LeviMethod and system for uniform resource locator status tracking
US6681349 *Jun 11, 2003Jan 20, 2004Seiko Epson CorporationSystem and method for monitoring the state of a plurality of machines connected via a computer network
US6744733 *Sep 24, 1998Jun 1, 2004Fujitsu LimitedNetwork system
US20020138612 *May 24, 2002Sep 26, 2002Hiroaki SekizawaSystem and method for monitoring the state of a plurality of machines connected via a computer network
US20030120775 *Feb 7, 2003Jun 26, 2003Compaq Computer CorporationMethod and apparatus for sending address in the message for an e-mail notification action to facilitate remote management of network devices
US20040017812 *Sep 24, 1998Jan 29, 2004Toshiyuki KamoNetwork system
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
US7461369Mar 29, 2002Dec 2, 2008Bmc Software, Inc.Java application response time analyzer
US7506047Mar 29, 2002Mar 17, 2009Bmc Software, Inc.Synthetic transaction monitor with replay capability
US7620162 *Aug 27, 2004Nov 17, 2009At&T Intellectual Property I.L.P.Methods, systems and computer program products for monitoring service usage
US7792948 *Mar 29, 2002Sep 7, 2010Bmc Software, Inc.Method and system for collecting, aggregating and viewing performance data on a site-wide basis
US7966398Jun 21, 2011Bmc Software, Inc.Synthetic transaction monitor with replay capability
US8036351Oct 11, 2011At&T Intellectual Property I, L.P.Methods, systems and computer program products for monitoring service usage
US8644469 *Sep 8, 2011Feb 4, 2014At&T Intellectual Property I, L.P.Methods, systems, and products for monitoring service usage
US8799460 *Mar 20, 2012Aug 5, 2014Cellco PartnershipMethod and system of providing a summary of web application performance monitoring
US9100310Jan 31, 2014Aug 4, 2015At&T Intellectual Property I, L.P.Methods, systems, and computer program products for monitoring service usage
US20020174421 *Mar 29, 2002Nov 21, 2002Zhao Ling Z.Java application response time analyzer
US20030023712 *Mar 29, 2002Jan 30, 2003Zhao Ling Z.Site monitor
US20030055883 *Mar 29, 2002Mar 20, 2003Wiles Philip V.Synthetic transaction monitor
US20060045245 *Aug 27, 2004Mar 2, 2006Aaron Jeffrey AMethods, systems and computer program products for monitoring service usage
US20060259927 *May 16, 2005Nov 16, 2006Swarup AcharyaMethod and apparatus for providing remote access to subscription television services
US20060280207 *Jun 8, 2005Dec 14, 2006Stephen GuariniDistributed network monitoring system
US20090240765 *Mar 17, 2009Sep 24, 2009Bmc Software, Inc.Synthetic transaction monitor with replay capability
US20100011104 *Jan 14, 2010Leostream CorpManagement layer method and apparatus for dynamic assignment of users to computer resources
US20100063884 *Mar 11, 2010Aaron Jeffrey AMethods, Systems and Computer Program Products for Monitoring Service Usage
US20100220616 *Sep 2, 2010Real Dice Inc.Optimizing network connections
US20110320283 *Dec 29, 2011At&T Intellectual Property I, L.P.Methods, Systems, and Products for Monitoring Service Usage
US20130254373 *Mar 20, 2012Sep 26, 2013Cellco Partnership D/B/A Verizon WirelessMethod and system of providing a summary of web application performance monitoring
US20140156804 *Mar 15, 2013Jun 5, 2014Inventec CorporationInformation processing system and method thereof
Classifications
U.S. Classification709/224
International ClassificationH04L12/26, H04L12/24, G06F15/173, H04L29/08
Cooperative ClassificationH04L67/16, H04L43/0811, H04L41/0681, H04L41/5083, H04L43/06, H04L43/0852, H04L41/5038, H04L43/10
European ClassificationH04L41/50M1, H04L43/10, H04L41/50F, H04L41/06E, H04L29/08N15
Legal Events
DateCodeEventDescription
Feb 21, 2007ASAssignment
Owner name: INTERNETSEER.COM CORP., PENNSYLVANIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:QUICKSAND DEVELOPMENT LLC;REEL/FRAME:019035/0981
Effective date: 20070213