US 20090245492 A1
Provided are methods, devices, and systems for maintaining a SIP survivable User Agent. The present invention is adapted to allow the User Agent to detect the status of the network, thus providing the User Agent to perform failover/failback operations. The User Agent may be adapted to determine the status of the network based on its own monitoring mechanisms and/or based on messages received from other network components.
1. A method, comprising:
triggering a User Agent to monitor a status of at least one component in a communication system, wherein the at least one component comprises a first controller of the User Agent;
sending, by the User Agent, a Session Initiation Protocol (SIP) message to the at least one component; and
determining a status of the at least one component based on a response received at the User Agent.
2. The method of
3. The method of
4. The method of
referencing a list of controllers upon receiving the notification message;
identifying, from the list of controllers, at least one controller for which a status check will be performed, wherein the at least one controller includes the first controller; and
sending a SIP message to a component associated with the at least one controller.
5. The method of
6. The method of
7. The method of
8. A computer readable medium comprising processor executable instructions operable to perform the method of
9. A communication endpoint, comprising:
a discovery module operable, upon being triggered to monitor a status of at least one component in a communication system, to send a Session Initiation Protocol (SIP) message to the at least one component and then determine a status of the at least one component based on a received response, wherein the at least one component comprises a first controller of the endpoint.
10. The endpoint of
11. The endpoint of
12. The endpoint of
13. The endpoint of
14. The endpoint of
15. The endpoint of
16. A communication system, comprising:
means for triggering a User Agent to monitor a status of at least one component by causing the User Agent to send a Session Initiation Protocol (SIP) message to the at least one component; and
means for determining a status of the at least one component based on a response received at the User Agent.
17. The system of
18. The system of
19. The method of
20. The system of
21. The system of
The invention relates generally to communication networks and more specifically to survivable Session Initiation Protocol (SIP) survivable networks.
Session Initiation Protocol (SIP) is an open signaling protocol for establishing many kinds of real-time communication sessions. Examples of the types of communication sessions that may be established using SIP include voice, video, games, applications, and/or instant messaging. These communication sessions may be carried out on any type of communication device such as a personal computer, laptop computer, Personal Digital Assistant (PDA), cellular phone, IM client, IP phone, traditional telephone, server applications, aggregates of applications, desktop applications, and so on.
One key feature of SIP is its ability to use an Address of Record (AOR) as a single unifying public address for all communications to end-users, applications, and service provider networks. Thus, in a world of SIP-enhanced communications, a user's AOR becomes their single address that links the user to all of the communication devices associated with the user. Using this AOR, a caller can reach any one of the user's communication devices, also referred to as User Agents (UAs) without having to know each of the unique device addresses or phone numbers.
Many SIP application servers exist for the purposes of enabling communications applications in a SIP environment and for serving as outbound proxies for a UA, thereby allowing complex networks to be built while hiding that complexity through proxies that devices use to connect into the network. One of the principle areas for such communications applications is call control of a SIP UA. There are currently very few solutions to the problem of providing a survivable SIP network configuration. One way survivability has been provided in the past was through the use of SIP proxies that are employed when there is no response to SIP signaling. The proxy can be used to route the signaling via one or more alternate routes in the network. The use of a separate proxy can become expensive since an additional network element other than the call controller or a gateway is required to provide survivability.
Other network server products provide geo-redundant configurations, such that the gateway is unlikely to encounter a network server failure due to the high availability of the network server. Like the use of proxies, this particular solution is relatively expensive since high availability servers need to be purchased and distributed throughout a network. Additional shortcomings of known current solutions include the fact that the network element (e.g., gateway) is not allowed to use an alternate path if the primary SIP signaling path is unavailable; such solutions require hot standby configurations with replication of data across servers; and they require primary and secondary call controllers to use exactly the same version of SIP and provide exactly the same set of SIP features to SIP endpoints.
In addition to these shortcomings, the logic to determine when a network failure has occurred has been traditionally placed in routers, which have the ability to check the IP layer of the network to determine if various network elements are operating properly. This failure/failback detection logic has been placed in the router to relieve the processing burden on the rest of the network components. One major shortcoming to this particular configuration is that the routers are unable to detect at the SIP application level whether a server or other network element is operational. There may be many instances when a server is operational at the IP layer level but the SIP controller is not operational. Routers and other network elements of the prior art heretofore have been unable to identify such failure conditions and would register such a server as operational.
These and other needs are addressed by various embodiments and configurations of the present invention. The present invention is directed generally to a system, device, and method for analyzing an availability of part or all of a communications network. The method generally comprises:
triggering a User Agent to monitor a status of at least one component in a communication system, wherein the at least one component comprises a first controller of the User Agent;
sending, by the User Agent, a Session Initiation Protocol (SIP) message to the at least one component; and
determining a status of the at least one component based on a response received at the User Agent.
In accordance with at least some embodiments of the present invention, the User Agent may be triggered to monitor the status of a component in a communication system by its internal logic and/or by receiving a prompting message from another component (e.g., a gateway or server) in the communication system. The component that prompts the endpoint to monitor the status of the communication system may also include logic that allows it to separately monitor the status of the communication system. Thus, the component may prompt the User Agent to monitor the status of the communication system in response to it detecting some sort of failure or other type of anomaly in the operation of the system. By combining the logic of the User Agent and the other component to monitor the status of the communication system, more perspectives of the system can be achieved, which means that a more accurate picture of the state of the system can be recovered.
Rather than trying to off-load the processing burdens associated with system monitoring activities from the User Agent to other system components, embodiments of the present invention utilize the User Agent to assist in the system monitoring procedures. This allows the User Agent to maintain its own view of the status of the communication system. In the event that another component attempts to force the User Agent to switch controllers because that component believes the controller has failed, the User Agent may be provided with the ability to check the status of the controller and refuse switching since it is able to verify the operational status of the controller itself. In other words, the User Agent does not have to rely on the world view from some other component, which may be wrong.
In accordance with at least some embodiments of the present invention, upon being triggered to monitor the status of the communication system, the User Agent may attempt to re-register with all of its active controllers as well as any of its passive controllers. Additionally, the User Agent may be prompted to re-register with any other controller that it has an awareness of, regardless of whether that particular controller is currently being used by the User Agent. The User Agent may identify such controllers from its internally maintained list of controllers that was populated during a discovery phase. In accordance with at least one embodiment of the present invention, the component may specify which controller the User Agent should re-register with. The controller may be specified by instructions included in a notification message sent from the component to the User Agent. Instructions included in the notification message may also comprise executable instructions that cause the User Agent to perform a predetermined sequence of actions. The instructions may also include health information related to the sending component.
“Monitoring”, as used herein, includes any type of function related to observing, recording, or detecting with instruments that have no effect upon the operation or condition of the element or group of elements that are being monitored.
As used herein, “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
The terms “a” or “an” entity refer to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein. It is also to be noted that the terms “comprising”, “including”, and “having” can be used interchangeably.
The term “automatic” and variations thereof, as used herein, refers to any process or operation done without material human input when the process or operation is performed. However, a process or operation can be automatic even if performance of the process or operation uses human input, whether material or immaterial, received before performance of the process or operation. Human input is deemed to be material if such input influences how the process or operation will be performed. Human input that consents to the performance of the process or operation is not deemed to be “material”.
The term “computer-readable medium” as used herein refers to any tangible storage and/or transmission medium that participates in providing instructions to a processor for execution. The computer-readable medium can be a serialized set of instructions encoded in a network transmission over an IP network (e.g., SOAP). Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, NVRAM, or magnetic or optical disks. Volatile media includes dynamic memory (e.g., RAM), such as main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, magneto-optical medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, solid state medium like a memory card, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read. A digital file attachment to e-mail or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. When the computer-readable media is configured as a database, it is to be understood that the database may be any type of database, such as relational, hierarchical, object-oriented, and/or the like. Accordingly, the invention is considered to include a tangible storage medium or distribution medium and prior art-recognized equivalents and successor media, in which the software implementations of the present invention are stored.
The terms “determine,” “calculate” and “compute,” and variations thereof, as used herein, are used interchangeably and include any type of methodology, process, mathematical operation or technique. More specifically, such terms may include interpreted rules like BPEL or a rules language where logic is not hard coded but represented in a rules file that can be read in, interpreted, compiled, and executed.
The term “module” or “tool” as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and software that is capable of performing the functionality associated with that element. Also, while the invention is described in terms of exemplary embodiments, it should be appreciated that individual aspects of the invention can be separately claimed.
The preceding is a simplified summary of the invention to provide an understanding of some aspects of the invention. This summary is neither an extensive nor exhaustive overview of the invention and its various embodiments. It is intended neither to identify key or critical elements of the invention nor to delineate the scope of the invention but to present selected concepts of the invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
The invention will be illustrated below in conjunction with an exemplary communication system. Although well suited for use with, e.g., a system using a server(s) and/or database(s), the invention is not limited to use with any particular type of communication system or configuration of system elements. Moreover, the term “database” as used herein may include not only relational database systems, but any computer storage mechanism, available in both hardware and software, in RAM or on a hard disk. Those skilled in the art will recognize that the disclosed techniques may be used in any communication application in which it is desirable to maintain a SIP survivable network.
The exemplary systems and methods of this invention will also be described in relation to analysis software, modules, and associated analysis hardware. However, to avoid unnecessarily obscuring the present invention, the following description omits well-known structures, components and devices that may be shown in block diagram form, are well known, or are otherwise summarized. Examples of such well-known structures include, without limitation, IP infrastructure, Layer 2 switches, Layer IP routers, IP level firewalls, Network Address Translation (NAT) devices, SIP Session Border Controllers (SBCs), etc.
For purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the present invention. It should be appreciated, however, that the present invention may be practiced in a variety of ways beyond the specific details set forth herein.
Referring initially to
The first 104 and second 152 networks may correspond to any type of known communications network or collection of communications equipment. The first network 104 may comprise a Local Area Network (LAN), a Wide Area Network (WAN), or any other type of layer 3 and layer 4 network as defined by the OSI model.
The second network 152 may comprise any type of information transportation medium and may use any type of protocols to transport messages between endpoints. The Internet is an example of the communication network 104 that constitutes an IP network consisting of many computers and other communication devices located all over the world, which are connected through many telephone systems and other means. Other examples of the second network 152 include, without limitation, a standard Plain Old Telephone System (POTS), an Integrated Services Digital Network (ISDN), the Public Switched Telephone Network (PSTN), a LAN, a WAN, a cellular communications network, and any other type of packet-switched or circuit-switched network known in the art. Both communication networks 104, 152 may include wired and/or wireless communication technologies.
SIP functions of the UA 108 may be provided by one or more servers 136, which are also connected to the first network 104. The UA 108 may also be controlled by other servers or communication devices external to the first network 104. For example, a gateway 148 connecting the first network 104 with the second network 152 may also be adapted to provide SIP control capabilities for the UA 108.
In addition to providing SIP functions, the server 136 may also include voice call software (e.g., VoIP software), video call software, IM software, voice messaging software (e.g., multi-media messaging such as audio and video messaging, IM messaging, etc.), recording software, an IP voice server, a fax server, a web server, an email server, call center application(s), and the like.
In accordance with embodiments of the present invention, the server 136 can include interfaces for various other protocols such as a Lightweight Directory Access Protocol (LDAP), H.248, H.323, Simple Mail Transfer Protocol (SMTP), Internet Message Access Protocol 4 (IMAP4), Integrated Services Digital Network (ISDN), E1/T1, HTTP, SOAP, XCAP, STUN, and analog line or trunk.
The server 136 may also include a PBX, an Automatic Call Distribution (ACD), an enterprise switch, or other type of communications system switch (e.g., any device capable of routing calls from one telephone to another, such as a complex machine (or series of them) in a central exchange that works by connecting two or more circuits together, each circuit being connected to a subscriber telephone, according to a dialed telephone number) or server, as well as other types of processor-based communication control devices such as media servers, computers, adjuncts, etc.
The gateway 148 is provided to act as a translation unit between disparate telecommunications networks such as PSTN; Next Generation Networks; 2G, 2.5G and 3G radio access networks; or PBX. One of the functions of the gateway 148 is to convert between the different transmission and coding techniques for the various networks. Media streaming functions such as echo cancellation, DTMF, and tone sender may also be supported by the gateway 148. The gateway 148 may further convert signals/messages from one network operational paradigm (e.g., transmission protocol) to another.
To provide SIP functionalities to the UA 108, the servers 136 and/or gateway 148 may comprise one or more controllers 140 a-N. The UA 108 may be adapted to register with one or more of the controllers 140 a-N at a time. As used herein, the term “register” and “registration” refer to the SIP registration and network attachment method and process, which includes but is not limited to the sending and acknowledgement of the SIP REGISTER message, and may include other mechanisms such as the SUSBSCRIBE message for subscriptions, querying using OPTIONS message, as well as other non-SIP mechanism such as firewall and NAT detections using the STUN protocol, HTTP queries, etc.
The controllers 140 a-N may correspond to applications or firmware residing on the server 136 and the controllers 140 a-N may be used to handle SIP messages directed to and received from the controlled UA 108. The SIP messages handled by the controllers 140 a-N may correspond to outbound SIP messages originated by the UA 108 directed toward another endpoint 156 or inbound SIP messages originated by another endpoint 156 and directed toward the UA 108. The controllers 140 a-N may operate at the application layer of the communication system 100.
In accordance with at least one embodiment of the present invention, the UA 108 may be capable of simultaneously registering with two or more controllers 140. Each of the controllers 140 with which the UA 108 is simultaneously registered may comprise different attributes and therefore may be capable of providing the UA 108 with different SIP functions. For instance, the UA 108 may be simultaneously registered with a first controller 140 a that uses extended extensions (i.e., advanced feature sets) and a second controller 140 b that uses standard Internet Engineering Task Force (IETF) compliant SIP extensions for call processing according to one or more IETF RFCs on the SIP protocol, including but not limited to RFC 3261.
In a simultaneous registration configuration, the UA 108 may be capable of either an active-active registration or an active-standby registration. In an active-active registration, the UA 108 may accept SIP messages from either controller 140 and send SIP messages to either controller 140 without determining whether the message is being sent to or coming from a primary controller 140. In an active-standby registration configuration, however, the UA 108 may only use the active controller 140 for SIP signaling purposes unless the active controller 140 becomes inoperable. In such a configuration, if messages are received from the standby controller 140 and the UA 108 believes that the primary controller 140 is operational, then the UA 108 may send the SIP message back to the standby controller 140 for a re-route via the primary controller 140.
In accordance with further embodiments of the present invention, the UA 108 may execute a priority registration with the controllers 140 a-N, whereby the UA 108 is registered with a first controller 140 a unless the UA 108 determines that the first controller 140 a is out of service or otherwise unavailable to provide SIP functions, in which case the UA 108 may register with a second controller 140 b.
In accordance with still further embodiments of the present invention, a Peer-to-Peer SIP network configuration may be employed, in which case the UA 108 may register with a controller 140 on another endpoint or plurality of endpoints.
The UA 108 may comprise a memory 112 and a processor 124 for executing routines stored in memory 112 as well as processing incoming/outgoing SIP messages and media. The memory 112 may include a discovery module 116 and a list of controllers 120. The discovery module 116 may be employed to discover the devices in the system 100 that comprise controllers 140 capable of controlling the UA 108. In accordance with at least some embodiments of the present invention, the discovery module 116 may be adapted to send out a discovery request to which any available controllers 140 can respond. Based on the response(s) (or lack thereof) received from the controllers 140, the discovery module 116 can populate the list 120 with a number of controllers 140. The discovery module 116 may then be able to create an ordered list 128 that comprises a priority listing of the controllers 140 based on their respective attributes 132. The UA 108 may then determine which controller(s) 140 to register with based on the order of controllers 140 in the ordered list 128.
In addition to providing the ability to discover and arbitrate between controllers 140, the discovery module 116 may be further adapted to monitor the state of the communication system 100 to determine if there are any current network 104 failures, server 136 failures, gateway 148 failures, or any other type of failure which may affect the relationship between the UA 108 and its controller(s) 140. As will be discussed in further detail herein, the discovery module 116 may be adapted to proactively and reactively monitor the state of the system 100 and its components. While proactively monitoring the state of the system 100, the discovery module 116 may employ non-dialog SIP messages along with a predetermined logic to determine the state of the system 100. The UA 108 may additionally be prompted by another system 100 component (e.g., the gateway 148) to begin reactively monitoring the state of the system 100. Providing the UA 108 with the ability to monitor the state of the system 100 is a departure from network monitoring techniques of the prior art in that the logic of the UA 108 is used rather than off-loading the processing burden to another component, such as a router. This provides the surprising advantage of allowing each UA 108 to monitor the state of the system 100 independently, which provides more views of the system 100 and allows each UA 108 to maintain its personal records for its controllers 140. In other words, by using a combination of the intelligence from the UA 108 and other components in the system 100 (e.g., the gateway 148 and/or servers 136), a more accurate and up to date picture of the state of the system 100 can be obtained. Allowing the UA 108 to inspect the operability of the networking layer (e.g., by sending IP packets across the system 100), the SIP application layer (e.g., by sending SIP messages across the system 100), and the aggregate of the two can further increase the accuracy of the picture of the health of the system.
The servers 136 and gateway 148 may also comprise a discovery module 144 for assessing the state of the system 100. More specifically, each discovery module 144 may be used to independently monitor the state of various peer components (e.g., system 100 components that are adjacent to the device comprising the discovery module 144) as well as far end components (e.g., system 100 components that are not adjacent to the device comprising the discovery module 144). Each discovery module 144 may be utilized to independently assess the state of the system 100 and the components therein.
In accordance with at least some embodiments of the present invention, the discovery modules 144 may comprise a software module that is capable of being accessed by user agents 108 or other endpoints 156 in a clustered way (i.e., multiple servers 136 may be able to answer a question from an endpoint asking, “which controller do I use”) or as a singleton. The discovery modules 144 can be implemented using a multitude of protocols and can further support multiple protocols at once. More specifically, each discovery module 144 may be adapted to support one or more of SOAP/HTTP, SXAP (another XML over HTTP standard like SOAP) and even SIP (e.g., by providing the list of controllers 140 in the body of a SIP message).
The discovery module 144 may be adapted to determine its own list of controllers for a given user agent 108 (which it can ultimately provide to a user agent 108 upon request) via rules processing. The rules processing may be hard coded (e.g., Java code) or may comprise a rules engine that takes a script, such as XML, parses it, and executes it when it wants to determine which is the most appropriate controller 140 for the user agent 108 requesting a list of controllers. The types of rules that may be included in the algorithm include, but are not limited to:
Referring now to
When the discovery message is received by the other components in the system 100 (e.g., the servers 136 and/or gateway 148) comprising a controller 140, the component responds to the discovery message with a response message. Alternatively, if no component is currently available to respond to the discovery message then no response messages are sent back to the initiating endpoint. The response messages of the controller(s) 140 are then received at the initiating endpoint (step 208). As the endpoint receives these responses, or the lack of responses to directed discovery messages, the endpoint will employ its discovery module 116 to populate the list of controllers 120.
The response messages may also include the attributes 132 of each responding controller 140. Examples of attribute information contained in the response message include, without limitation, the corresponding server's 136 or gateway's 148 processing capabilities, the SIP extensions (i.e., SIP functions) provided by the controller 140, the number of UA's currently registered with the controller 140, the proximity of the controller 140 to the endpoint (i.e., number of hops between the endpoint and the corresponding device), and so on. The discovery module 116 of the endpoint may utilize the attribute information to arbitrate the order of the controllers 140 in the ordered list 128 (step 212). The discovery module 116 may employ any type of known arbitration algorithm to determine the order of the controllers 140. For instance, the discovery module 116 may attempt to optimize all of the attributes listed for all controllers 140. Alternatively, the discovery module 116 may place the controller 140 with a selected attribute being best suited to the endpoint's needs highest in the ordered list 128.
After the discovery module 116 of the endpoint has generated the ordered list of controllers 128, the discovery module 116 selects one or more controllers 140 with which the endpoint will register (step 216). The controllers 140 may be selected based on their respective order in the ordered list of controllers 128. In accordance with at least one embodiment of the present invention, the discovery module 116 may select a controller 140 from a server 136 and a controller 140 from the gateway 148 to simultaneously register with. In an active-standby configuration, the endpoint may select the controller 140 of the server 136 as a primary controller and the controller 140 of the gateway 148 as a secondary or backup controller.
Following selection of the controller(s) 140, the endpoint is enabled to register with the selected controller(s) 140 (step 220). In accordance with at least some embodiments of the present invention, the endpoint may be adapted to register with controllers 140 having different capabilities and SIP features. For example, the endpoint may be allowed to simultaneously register with a first controller 140 using extended SIP extensions and a second controller 140 using standard SIP extensions. The extended SIP extensions may be used by the first controller 140 to provide additional features not achievable through the second controller 140.
A SIP REGISTER, SUBSCRIBE, or OPTIONS message, or other SIP signaling may be transmitted to detect if the primary controller 140 is still online and available to facilitate SIP messaging. For example, a SIP REGISTER message may be sent by the endpoint to the selected controller(s) 140 to initiate the registration process. In addition to providing the endpoint with the ability to register with its controller(s) 140, the SIP REGISTER message may also be used as a heartbeat for the controllers 140. The frequency of transmission of the SIP REGISTER message may be user configurable based on system demands. In other words, the endpoint may send refresh registrations to any controller 140 with which it is registered.
In both the active-standby and active-active configurations, the endpoint may register concurrently with all of its selected controllers 140. During its operation, the endpoint may attempt to maintain concurrent active registrations with the controllers 140 (e.g., through the periodic transmission of subsequent SIP REGISTER messages). In an active-active configuration, the endpoint may be allowed to send/receive SIP messages to/from any of the controllers 140 with which it is registered. In such a configuration, the endpoint will be simultaneously registered with two or more controllers 140. This dual registration will allow inbound SIP requests (e.g., SIP INVITE messages) from any one of the controllers 140 with which the endpoint is registered (e.g., either the first controller 140 a, the second controller 140 b, the third controller 140 c, etc.). If the endpoint is configured with a policy to use an active-active controller model, then the endpoint may consider that incoming call from a previously failed controller 140 as if it had come from a controller 140 in service. When this occurs, the endpoint may re-try the algorithm to detect if the failed server has gone back into service. The endpoint will be configured to allow receipt of such SIP messages from any controller 140. In accordance with at least some embodiments of the present invention, the endpoint will treat the registrations as independent registrations, maintaining each with the registration logic discussed herein. More specifically, the endpoint will be allowed to register the same AOR with each controller, although the endpoint implementation can be flexible to accommodate different AORs. This is also true for the active-standby configuration.
In an active-standby configuration, on the other hand, the endpoint may only be allowed to route outbound calls and receive inbound calls from its primary controller 140. While operational, this primary controller 140 may also be referred to as the active call controller 140. The other controller(s) 140 with which the endpoint is registered may be referred to as the inactive or standby controllers.
In the active-standby configuration, if the endpoint receives an inbound call from the secondary or standby controller 140 while the endpoint is in its primary mode (i.e., the endpoint believes the primary controller 140 is active), then the endpoint will:
(4) If the primary controller is no longer online, then the endpoint will provide standard call processing for the received call. As part of this process, the endpoint will failover to the secondary controller 140 from the primary controller 140 and refresh its registration with the secondary controller 140.
In the active-standby configuration, if the endpoint receives an inbound call from its primary controller 140 while it is operating in a failover mode (i.e., under the assumption that the state of its primary controller 140 is out of service and is using the secondary controller 140 as the active controller), then the call may be rejected.
During a failure condition or any other time when the endpoint believes a system 100 component is out of service, the endpoint will not try to establish a real-time communications session with the primary controller 140. Additionally, the endpoint will failover to the second or next controller 140 on the ordered list of controllers 128. Call signaling routed via the survivable (i.e., secondary, tertiary, or backup) controller 140 will allow a user of the endpoint to make and receive new calls during the time of failure.
As part of the failover/failback process, which will be described in further detail below, the endpoint will identify the address for the active controller 140 and further derive the attributes and associated features supported by the controller 140. In other words, the endpoint may be adapted to derive whether the controller 140 with which it is registered is designated as “extended/proprietary” or “basic SIP”, for example. The discovery module 116 of the endpoint may be adapted to discover whether a controller uses extended or basic SIP using the following logic:
The endpoint may use this information to identify the secondary controller 140 as a standard IETF SIP compliant controller 140. The endpoint will then display only basic SIP features on its User Interface (UI) while operating in failover mode. If the active controller 140 is designated as “extended” (e.g., because it properly responded to the SUBSCRIBE request), then the endpoint will use SIP with the extended extensions for call and feature processing and display a compatible UI.
Additional features that may be provided in an active-standby configuration include, without limitation:
After the endpoint has registered with its respective controller(s) 140, the method ends (step 224).
Referring now to
As can be appreciated by one skilled in the art, although monitoring techniques employing a SIP OPTIONS message are described herein, any other type of non-dialog SIP transaction may be used to monitor the state of the communication system 100 and the components therein. More specifically, a generic SIP message (e.g., an INFO METHOD, MESSAGE METHOD, or even a void SIP message) may be transmitted to invoke any type of SIP-based response. The SIP message transmitted by the gateway 148 may also include instructions to be executed by the message recipient. For instance, a SIP message may be generated comprising an eXtensible Markup Language (XML) message reporting the health of the gateway 148 and any other component monitored by the gateway 148, such as peer components, as well as actions to be taken by the recipient based on the reported health of the gateway 148.
After the gateway 148 sends out the SIP OPTIONS message it awaits receipt of a response (steps 308 and 312). The amount of time that the gateway 148 waits for receipt of the response may vary depending upon system 100 characteristics. In accordance with at least one embodiment of the present invention, the length of time that the gateway 148 waits may be determined by implementing a SIP Timer function such as SIP Timer B or SIP Timer F (SIP Timer B and F are standard SIP timers defined in RFC 3261, the entire contents of which are incorporated herein by this reference) that cancels SIP signaling transactions (such as a SIP INVITE message) if no SIP response is received within a predetermined number of seconds after the request was sent. Timer B or Timer F is canceled or short-circuited if any SIP response (even a 100 Trying) is received. If, however, Timer B fires, the gateway 148 may be required to cancel the offending SIP transaction and attempt to route the request using an alternative route. In accordance with at least some embodiments of the present invention, if all routing addresses known by the gateway 148 have been exhausted, then the SIP gateway 148 may respond to the original SIP signaling transaction with a 408 Request Timeout.
Once a response is received (e.g., either as an actual response or as a determination that no response was received within a predetermined amount of time), the method continues with the gateway 148 employing the discovery module 144 to determine whether the response corresponds to a failure condition (step 316). The response may indicate that a network 104 or system 100 component has failed if any of the following conditions are met:
With respect to condition (3), existing dialogs that include the IP address of the address should continue to use that address unless a failure occurs, but the endpoint should not use that address for new dialogs until it has recovered. In cases where an existing dialog is using a hostname instead of an IP address, the resolution of that hostname for each transaction will occur and the returned address would be the highest priority address that is available. If the hostname only resolves to a single IP address, and that address is marked as “overloaded,” then the request should be sent. The net effect of condition (3) is that the discovery module 144 of the gateway 148 should be aware of three states: available; out of service; and overloaded. Addresses that are marked as overloaded continue to receive subsequent requests within a dialog, but no new dialog requests.
If, based on the response to the OPTIONS message, the discovery module 144 of the gateway 148 determines that there is no failure condition and the system 100 is in a normal state of operation, then the discovery module 144 will continue by updating its records for the state of the system 100 to reflect the operation of the responding component, such as a server 136 (step 320). Once the state of the network has been updated, the method continues with the discovery module 144 determining whether it is time to send a new message (step 340). More particularly, the gateway 148 may use the periodic transmission of SIP OPTIONS messages to other SIP servers 136 as a heartbeat mechanism to determine if the other SIP servers 136 are active or not. The OPTIONS message may be sent to the other system 100 components at a predetermined interval. The length of the predetermined interval may be determined by implementing a SIP Timer B or a variant thereof. More specifically, the OPTIONS monitoring can be done at different intervals depending upon whether the gateway 148 detects a failure condition or not.
In accordance with at least some embodiments of the present invention, the proactive monitoring interval for the SIP OPTIONS message may be configurable within a range of about 60 to about 100,000 seconds and should use a uniform random time between 75% and 125% of the configured value between subsequent monitoring attempts. For instance, if the proactive monitoring interval is set to 60 seconds, then the actual interval between transmission of OPTIONS messages can be uniformly distributed between 45 and 75 seconds. This deliberate introduction of jitter in the proactive monitoring process allows the requests to remain unsynchronized thereby evenly spreading the load on the monitored components (i.e., servers 136) over time. In one embodiment, the proactive monitoring interval may be configured to be about 900 seconds or 15 minutes.
In accordance with at least some embodiments of the present invention, the reactive monitoring interval for the SIP OPTIONS message may be configurable within a range of about 10 to about 3,600 seconds and may also have a uniform random time between 75% and 125% of the configured value between subsequent monitoring attempts. The provision of a separate timer for the reactive monitoring (as opposed to the proactive monitoring) allows the gateway 148 to more quickly detect when the currently out of service component being monitored becomes available again. In one embodiment, the reactive monitoring interval may be configured to be about 120 seconds.
While exemplary embodiments of the present invention are being described in connection with the use of SIP Timer B, any range of values, with or without a uniform random time modification of that value, may be implemented without departing from the principles of the present invention. Additionally, the monitoring intervals may be modified based on external business logic hooks. For example, if the monitoring mechanism is hooked into a bandwidth management system, the monitoring system may further adjust the monitoring interval by some coefficient that represents the available bandwidth in the network.
Referring back to step 316, if the discovery module 144 determines that the response corresponds to a failure condition, then the discovery module 144 will update its records of the state of the system 100 to reflect the failure (step 324). As a result of detecting this component failure, the gateway 148 will failover and begin operating in a failover state, especially if the failed component resides on a primary communication path.
During failover the discovery module 144 of the gateway 148 will begin reactively monitoring the components in the system 100 by continuing to send the OPTIONS message to the component identified as out of service (step 328). This reactive monitoring allows the discovery module 144 of the gateway 148 to detect when the component comes back online. When reactive monitoring begins, the gateway 148 will stop proactively monitoring the component and begin applying reactive monitoring rules. More specifically, during reactive monitoring, the discovery module 144 of the gateway 148 may initiate a maintenance test to determine whether or not the monitored component is available or out of service (step 336). The maintenance test may employ a monitoring algorithm similar to the proactive monitoring algorithm. The discovery module 144 of the gateway 148 will continue to apply this maintenance test until failback (i.e., until the status of the component changes from out of service to available) (step 332). During the maintenance test, the gateway 148 may transmit any type of SIP request (e.g., INVITE, SUBSCRIBE, NOTIFY, etc.) to the component currently marked as out of service. If one or more of the following conditions apply, then the discovery module 144 of the gateway 148 will continue to mark the component as out of service:
If none of the above-listed conditions applies during the maintenance test, then the discovery module 144 of the gateway 148 will determine that failback has occurred and the method will continue to step 320. In accordance with at least some embodiments of the present invention, the reactive monitoring and maintenance tests will continue to be applied until the component being monitored replies with any SIP response except 503 Service Unavailable to two consecutive OPTIONS request attempts. Once these conditions have been met, the gateway 148 may consider the component back in service, may revert to the proactive monitoring algorithm, can generate an appropriate Simple Network Management Protocol (SNMP) event, and, if applicable, begin utilizing the now active component.
As can be appreciated by one skilled in the art, the gateway 148 is not the only system 100 component that may maintain path reallocation information. Rather, any SIP network element may be adapted to include path reallocation information and the like. For example, an endpoint such as the UA 108 may also maintain a path reallocation table that lists the communication paths that may be used if certain components are identified as out of service.
Referring now to
Thereafter, the gateway 148 determines whether the sending component is actually marked as out of service based on its internally maintained state tables (step 412). If the component is not marked as being out of service, then the method ends (step 424). Otherwise, the gateway 148 will initiate its maintenance test by sending SIP OPTIONS messages to the component (step 416). The gateway 148 may then update its records of the state of the component (step 420). This maintenance testing and state updating process is repeated until the component is determined to be back in service, at which point the method ends (step 424).
With reference to
If the primary signaling path is determined to be operational, then the gateway 148 routes the received signal to the target endpoint via the primary path (step 512) after which the method ends (step 536). If, however, the gateway 148 determines that the primary signaling path is unavailable for any reason (e.g., a network 104, server 136, or other component on the primary signaling path is out of service), then the method continues with the gateway 148 sending the signal to the target endpoint via the secondary signaling path (step 516). This particular step is performed based on the assumption that the secondary signaling path is not also unavailable. If the gateway determines that the secondary signaling path is also unavailable, then another backup signaling path that bypasses the failed component(s) is selected and used to send the signal to the endpoint.
After the call signal has been transmitted to the target endpoint via the secondary signaling path, the gateway 148 waits to determine whether a 305 redirect response (use proxy) message is received from the endpoint (step 520). In an active-standby configuration, the endpoint receiving the call signal via the second path may transmit a 305 redirect response if the endpoint believes the primary signaling path to be operational. This belief would be based on status monitoring operations performed at the endpoint rather than at the gateway 148. Thus, based on the different perspective of the system 100, the endpoint may have a different status marked for one or more components in the system 100. Accordingly, if a 305 redirect response is received by the gateway 148 based on its redirection of the call signal via the secondary path, then the gateway 148 will attempt to redirect the call signal via the primary signaling path (step 524). If the primary signaling path is determined to be available (step 528) due to the successful transmission of the call signal, then the method ends in step 536. If the call is rejected due to the unavailability of the primary signaling path, then the call may be rejected or sent back via the secondary signaling path (step 532). Additionally, the gateway 148 may send a NOTIFY message to the target endpoint instructing it to re-check its view of the state of the primary signaling path. Thereafter, or if no redirect response message is received from the endpoint, the method ends (step 536).
Referring now to
If the endpoint does not detect a failure itself, then the method continues with the endpoint determining whether it has been notified of a failure (step 612). Notification may be received at the endpoint in the form of a SIP NOTIFY message transmitted from the gateway 148, server 136, or some other component in the system 100 that is capable of monitoring the state of the system 100. The SIP NOTIFY message may indicate that the endpoint should failover, reboot, or refresh/renew its registration with all of its controllers 140 (thereby circumventing the logic of the endpoint). If no notification is received, then the method returns back to step 604.
If, however, the endpoint is notified of a failure in the system 100 (e.g., via receipt of a SIP NOTIFY message), then the endpoint determines whether the failure notification is correct (step 616). More specifically, the endpoint may check its internally maintained list of controllers 120, 128 to determine whether any of the controllers 140 is marked as inactive or otherwise out of service. Most times the endpoint will simply comply with the directions of the NOTIFY message. Under certain circumstances, however, the endpoint may determine that its view of the system 100 is the correct view and may respond to the NOTIFY message with a redirect response message (step 620). This redirect response may cause the initiating component of the NOTIFY message to check its view of the system 100. The method then returns to step 604.
If the endpoint decides that the failure notification is correct or decides to comply with the directions of the NOTIFY message, then the endpoint determines whether the NOTIFY message contained instructions (step 624). The instructions may be relatively general instructions such as attempt to re-register with all controllers 140. Alternatively, the instructions may comprise an instruction set requiring the endpoint to try and re-register with a specific controller 140. The endpoint may attempt to refresh or register with the controller 140 on either long or short intervals. The length of the refresh/register interval may vary depending upon the instructions in the instruction set or other factors. The instructions may also contain health information for the entire system 100 as perceived by the component that transmitted the NOTIFY message. If the message does contain instructions, then the endpoint will execute the instructions (step 628).
After the instructions have been executed by the endpoint or in the event that the message does not contain instructions the method continues with the endpoint executing a failover such that it begins operating in a backup state (step 632). This step is also performed in the event that the endpoint detected a system 100 failure itself. During operation in the failover mode, the endpoint will utilize its backup controller 140 and/or utilize a secondary signaling path. In addition to operating in the failover mode, the endpoint may monitor for system 100 failback (step 636). The endpoint will determine that failback has occurred if the discovery module 116 detects that its primary controller 140 is available to resume its role as the active controller. This internal endpoint logic may be governed by the heartbeat/SIP REGISTER message that is periodically sent to all controllers 140 with which the endpoint is registered, independent of whether or not the controller 140 is active at any given time.
If the endpoint does not detect failback on its own, the method continues with the endpoint determining whether it has been notified of failback by another component (e.g., by receiving a NOTIFY message from the gateway 148, the server 136, etc.) (step 640). If the endpoint is not notified of failback, then the method returns to step 632. If, however, the endpoint does receive a NOTIFY message indicating failback, then the method continues by determining whether the NOTIFY message contained instructions for execution by the endpoint (step 644). If instructions were included in the NOTIFY message, then the instructions are executed by the endpoint (step 648). After execution of the instructions, if there were no instructions, or if failback was detected by the endpoint itself, the method returns to step 604.
In accordance with at least some embodiments of the present invention, the discovery module 116 may comprise a configurable parameter governing whether the failover or failback from primary to secondary controllers 140 is to be triggered by the automated endpoint detection or if it is only to be triggered manually (i.e., by receipt of a SIP NOTIFY message). The heartbeat monitoring mechanism employed by the discovery module 116 may be done at different intervals depending on whether the endpoint detects a failure condition or not.
With reference now to
The method remains in step 704 until it is determined that a notification message should be transmitted to an endpoint. After making such a determination, the method continues with the component generating the message (e.g., a NOTIFY message) (step 708). As can be appreciated, however, any other type of SIP or non-SIP message may be employed by the component to notify the endpoint. The notification may allow the component to trigger the endpoint to failover or failback in a system 100 failure or recovery condition. Since the endpoint may maintain an ordered list of controllers 128 and the endpoint has the ability to maintain active registrations with all of these controllers, the system 100 component can trigger the endpoint to take action when it detects a system 100 failure condition. The actions may be triggered by either transmitting a standard notification message (e.g., a standard SIP IETF compliant NOTIFY message) or an extended message containing a specific instruction set, for example. Accordingly, the method continues with the component determining whether the notification message should include instructions therein (step 712).
If no specific instructions are determined to be necessary, then the component can tell the endpoint to re-register with all of the controllers 140 in the ordered list of controllers 128 and send a generic notification message to the endpoint (step 720). Thus, any endpoint that receives such a notification will try to re-register with every controller 140 on its list. Based on the success of this attempted re-registration, the endpoint will be able to derive which controllers 140 are available for processing SIP transactions (e.g., supporting inbound and outbound SIP calls). The endpoint may then behave in accordance with this self-determined information using the highest priority controller 140 on the ordered list of controllers 128 as its primary controller. This particular mechanism may use standard SIP IETF compliant NOTIFY messages. The message may follow the SIP standard for NOTIFY message with the NOTIFY parameter “event”=“probation.” An example of such a NOTIFY message is provided below:
If, however, the component determines that specific instructions should be included in the notification message, then the component may add one or more instruction sets to the message (step 716) prior to sending the message to the endpoint (step 720). By incorporating instructions in the notification message, the component can tell the endpoint to begin using a specific controller 140 on the endpoint's controller 140 list 120. In such a scenario, the endpoint depends on the component to tell it which controller 140 should be used, rather than relying on its own intelligence. This particular mechanism may use the standard SIP NOTIFY message with an extended extension. The NOTIFY message will generally follow the SIP standard for NOTIFY messages; additionally, it may use an exemplary profile event package format with the event name such as <eventName>changeServer</eventName> if the endpoint should failover or failback. The message may also include a timestamp and the address of the server 136 containing the particular controller 140 to which the endpoint should failover of failback. An example of such an extended SIP NOTIFY message is provided below with an XML instruction set:
As can be appreciated by one skilled in the art, the instructions may take many different forms other than an XML instruction set. For example, the instructions may comprise a predetermined trigger that corresponds to and causes execution of an algorithm or application already stored in memory 112 of the endpoint. Additionally, although the example NOTIFY message above included its instruction set in the body of the message, the instruction set may also be placed in the header or trailer of a notification message. This may vary depending upon the type of notification message employed. The instructions may be used to accomplish a number of different actions such as convey health information for the sending component as well as its state information for the rest of the system 100 as well as trigger the endpoint to perform a particular task.
While the above-described flowcharts have been discussed in relation to a particular sequence of events, it should be appreciated that changes to this sequence can occur without materially effecting the operation of the invention. Additionally, the exact sequence of events need not occur as set forth in the exemplary embodiments. The exemplary techniques illustrated herein are not limited to the specifically illustrated embodiments but can also be utilized with the other exemplary embodiments and each described feature is individually and separately claimable.
The systems, methods and protocols of this invention can be implemented on a special purpose computer in addition to or in place of the described communication equipment, a programmed microprocessor or microcontroller and peripheral integrated circuit element(s), an ASIC or other integrated circuit, a digital signal processor, a hard-wired electronic or logic circuit such as discrete element circuit, a programmable logic device such as PLD, PLA, FPGA, PAL, a communications device, such as a server, personal computer, any comparable means, or the like. In general, any device capable of implementing a state machine that is in turn capable of implementing the methodology illustrated herein can be used to implement the various communication methods, protocols and techniques according to this invention.
Furthermore, the disclosed methods may be readily implemented in software using procedural or object-oriented software development environments that provide portable source code that can be used on a variety of computer or workstation platforms. Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits or VLSI design. Whether software or hardware is used to implement the systems in accordance with this invention is dependent on the speed and/or efficiency requirements of the system, the particular function, and the particular software or hardware systems or microprocessor or microcomputer systems being utilized. The analysis systems, methods and protocols illustrated herein can be readily implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the functional description provided herein and with a general basic knowledge of the communication arts.
Moreover, the disclosed methods may be readily implemented in software that can be stored on a storage medium, executed on a programmed general-purpose computer with the cooperation of a controller and memory, a special purpose computer, a microprocessor, or the like. In these instances, the systems and methods of this invention can be implemented as program embedded on personal computer such as an applet, JAVA® or CGI script, as a resource residing on a server or computer workstation, as a routine embedded in a dedicated communication system or system component, or the like. The system can also be implemented by physically incorporating the system and/or method into a software and/or hardware system, such as the hardware and software systems of a communications device or system.
It is therefore apparent that there has been provided, in accordance with the present invention, systems, apparatuses and methods for maintaining a SIP survivable network and network components. While this invention has been described in conjunction with a number of embodiments, it is evident that many alternatives, modifications and variations would be or are apparent to those of ordinary skill in the applicable arts. Accordingly, it is intended to embrace all such alternatives, modifications, equivalents and variations that are within the spirit and scope of this invention.