« PreviousContinue »
APPLICATION BEHAVIORAL CLASSIFICATION
CROSS-REFERENCE TO RELATED
 This application claims the benefit of the filing date of U.S. Provisional Application No. 60/748,804, filed Dec. 9, 2005, the subject matter of which is incorporated herein by reference.
 The constant progress of communication systems that connect computers, particularly the explosion of the Internet and intranet networks, has resulted in the development of a new information era. With a single personal computer, a user may obtain a connection to the Internet and have direct access to a wide range of resources, including electronic business applications that provide a wide range of information and services. Solutions have been developed for rendering and accessing a huge number of resources. However, as more computers have become interconnected through various networks abuse by malicious computer users has also increased. As a result, there are a number of tools or resources that identify potentially malicious software, generally referred to as malware, have been developed to protect computers from the growing abuse that is occurring on modern networks. As described herein, malware includes, but is certainly not limited to, spyware, ad ware, viruses, Trojans, worms, RootKit, any other computer program, or executable software code that performs actions that are malicious or not desirable to the user.
 Malwares can be classified into a malware "family" if they correspond to malware variations originating from one source base and exhibit a set of consistent behaviors. Currently, some anti-malware systems are developed to classify a suspicious or unknown application into a known malware family and therefore recognize an effective way to remove threats based on the previous knowledge of the malware family. One approach may be an automatic malware classification which uses one or more selected undesirable events indicative for a malware family to classify a malware application. However, this conventional automatic malware classification approach may provide only limited protection.
 Typically, conventional automatic malware classifications use a static analysis focusing on whether one or more selected undesirable activities have been detected. However, this static analysis does not detect a malware variation which has subtle differences in code flow and data but still sharing common behavior patterns with its malware family. Thus, conventional automatic malware classifications may not yet be able to recognize common behavior patterns across malware variants or compilers and data/code variations within a malware family.
 This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
 In accordance with an aspect of the present invention, a computer implemented method for classifying an application into an application group based on behavior patterns of the application is provided. The method includes collecting an event sequence of an application and determining an application group corresponding to the application based on the collected event sequence. A set of application groups, each of which includes one or more member applications sharing a set of common behavior patterns, is obtained from a knowledge base. The set of common behavior patterns is represented by a representative event sequence. In an aspect of the computer implemented method, in order to determine an application group corresponding to the application, a similarity distance between the application and each application group is calculated by comparing the representative event sequence of each application group and the collected event sequence. After determining the application group for the application, the information about the determined application group is provided. If necessary, the application group is updated to include the application. If no application group is determined to correspond to the application, a determination is made as to whether a new application group needs to be created. Accordingly, a new application group to include the application may be created.
 In accordance with another aspect of the present invention, a computer system for an application group classification is provided. The computer system includes one or more databases and a computing device, in communication with the one or more databases. The databases include application groups and a set of application classifying rules where each application group has been classified based on the set of application classifying rules. The computing device receives a request to classify an application into a corresponding application group among the application groups and obtains an event sequence which was collected during the execution of the application. The application group with which the application is associated is determined by applying the set of application classifying rules to the obtained event sequence.
 In accordance with yet another aspect of the present invention, an application classification system is provided. The application classification system includes a knowledge base component, an event sequence component, and a classification component. The knowledge base component provides information about a plurality application classification and a set of classification rules. The classification component identifies an application classification in which the application is to be classified based on a runtime event sequence collected by the event sequence component.
DESCRIPTION OF THE DRAWINGS
 The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
 FIG. 1 is a pictorial diagram depicting an exemplary environment that facilitates an application classification via a service provider server in accordance with embodiments of the present invention;
 FIG. 2 is a pictorial diagram depicting an exemplary service provider server interacting with a client in accordance with embodiments of the present invention;
 FIG. 3 is exemplary event sequences of two applications that are captured by the exemplary service provider server;
 FIG. 4 is a pictorial diagram depicting classification of applications based on a similarity distance between a group and an application in accordance with embodiments of the present invention;
 FIG. 5 is a flow diagram depicting an application classifying routine for classifying an application into an appropriate group; and
 FIG. 6 is a flow diagram depicting a nearest clustering subroutine for calculating a similarity distance between an application and an application group in accordance with embodiments of the present invention.
 Generally described, embodiments of the present invention relate to a method and system for automatically classifying an application into an application group which is previously classified in a knowledge base. More specifically, a runtime behavior of an application is captured as a series of "events" which are monitored and recorded during the execution of the application. Each captured "event" represents a token at a particular point in a time sequence during the execution of the application. The series of events are analyzed to find a proper application group, a member application of which shares common runtime behavior patterns with the application. The application groups are previously classified in a knowledge base based on a large number of sample applications.
 The following detailed description describes exemplary embodiments of the invention. Although specific system configurations, screen displays, and flow diagrams are illustrated, it should be understood that the examples provided are not exhaustive and do not limit the present invention to the precise forms and embodiments disclosed. It should also be understood that the following description is presented largely in terms of logic operations that may be performed by conventional computer components. These computer components, which may be grouped at a single location or distributed over a wide area on a plurality of devices, generally include computer processors, memory storage devices, display devices, input devices, etc. In circumstances where the computer components are distributed, the computer components are accessible to each other via communication links.
 In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the invention may be practiced without some or all of these specific details. In other instances, well-known process steps have not been described in detail in order not to obscure the invention.
 FIG. 1 depicts an illustrative networked environment 100 suitable for classifying an application into an application group based on previously known behavior patterns. The illustrative networked environment 100 of
FIG. 1 includes one or more client devices, such as client devices 102-106, to which a service provider server 110 provides online or offline application classification services. The client devices 102-106 communicate with the service provider server 110 via a communication network, such as the Internet 108. The client devices 102-106 are typically computing devices including a variety of configurations or forms such as, but not limited to, laptop or tablet computers, personal computers, personal digital assistants (PDAs), hybrid PDA/mobile phones, mobile phones, workstations, and the like.
 In FIG. 1, the service provider server 110 may also include a knowledge base 122. The knowledge base 122 provides data storage of various types of data, information, etc. and may be organized as desired, preferably using data structures optimized for providing a set of application groups and a set of rules to classify a given application into one of the application group. Each application group includes one or more member applications sharing a set of common behavior patterns. For example, the set of application groups may be previously classified based on known malware applications and their runtime behaviors. The process of classifying of the known malware applications into groups may be done by analyzing the known malware applications to extract knowledge of the known malware applications into a hierarchical or structured format. Based on the learned knowledge, a set of classifying rules are determined to recognize familiar patterns and correlate similarities among the known malware applications. In this manner, common behavior patterns across malware variants or compilers and data/code variations within a malware family (group) can be detected.
 The knowledge base 122 may be built based on sample applications 140 via any suitable data classifying/ minding methods, including, but not limited to, partitioning based on similarities, Bayesian classifiers, decision tree, and/or term vector/inverse index in conjunction with machine learning. In a preferred embodiment, the application groups in the knowledge base 122 may be classified (or trained) through a process of "clustering". A process of "clustering," as used herein, refers to a form of unsupervised learning, for example, a process to organize objects into groups whose members are similar in some way. Generally, a clustering process can be used to discover hidden semantic structure in a collection of unlabeled data. Through the process of clustering, the knowledge base 122 may be built to include a set of application groups based on sample applications 140. Each application group classified through the process of clustering may refer to a collection of applications which shows "similar" behaviors among applications within the application group and shows "dissimilar" behaviors from applications within other application groups. As will be understood by one of ordinary skill in the art, for the process of clustering, a defined distance measure may be utilized to combine applications into an application group (e.g., combine a malware into a malware family or a particular group in a malware family). Generally described, a distance measure may be defined to yield a degree of similarity between two applications. The examples and implementation of the distance measure will be discussed in greater detail below.
 The illustrative networked environment 100 also includes an enterprise system 130 coupled to a local knowl