WO2017061895A1

WO2017061895A1 - Method and system for automatic online identification of network traffic patterns

Info

Publication number: WO2017061895A1
Application number: PCT/RU2015/000659
Authority: WO
Inventors: Alexander Alexeevich SEROV; Valery Nikolaevitch GLUKHOV; Hongbo Zhang
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2015-10-09
Filing date: 2015-10-09
Publication date: 2017-04-13
Also published as: CN108028807B; CN108028807A

Abstract

A method (100) for automatic online identification of network traffic patterns includes: receiving (101) an input traffic stream (102) from a communication network; processing (103) the input traffic stream (Pac) by applying self-learning-based classification and on-the-fly classification to the input traffic stream (102), wherein the self-learning-based classification is performed in an online mode by computing a statistical model of the input traffic stream (102) on the basis of a pre-defined set of features and storing the statistical model in a data base which is configured to store results of a plurality of self-learning based classifications, and wherein the on-the-fly classification processes the input traffic stream (102) based on using a knowledge base comprising a predetermined set of rules for identifying the input traffic stream (102) and based on applying at least a subset of the statistical models stored in the data base; and identifying (105) a network traffic pattern (104) in the input traffic stream (102) based on the results of the on-the-fly classification and/or the self-learning-based classification.

Description

Method and system for automatic online identification of network traffic patterns

TECHNICAL FIELD

The present disclosure relates to a method and a system for automatic online

identification of network traffic patterns. In particular, the present disclosure relates to a method and a system for automatic online identification of network traffic on the basis of statistical self-learning principles.

BACKGROUND

Network traffic identification is the problem of associating network traffic with an application or a set of applications generating this traffic. This problem is one of the most important in the field of network management. Network operators must provide the definite level of Quality-of-Service (QoS). This level is described by Service Level Agreement. The fall of values of QoS indicators may be the reason for financial losses for network providers. Classification of network traffic has a big influence on the value of QoS. Last years are characterized by a dramatic increase of the number and variety of applications using Internet and IP networks. The set of the types of these applications includes the following: Real-time applications, for example voice and video streaming; Corporate applications, for example Lotus Notes, database transactions; Bulk data transfer, for example FTP, P2P file downloads; Interactive network applications, for example telnet, instant messaging and network games. Identifying the network traffic plays an important role for an efficient and optimal allocation of network resources.

Another problem of network management is associated with increase of the number and variety of fraudulent or criminal behavior associated with the use of resources of telecommunication networks. Today network management involves the permanent work of highly skilled specialists who are familiar with features of software and hardware use in controlled segments of computer networks. Modern techniques used for traffic analysis are based on the results of manual processing of data by these highly qualified analysts. The growth of the infrastructure of telecommunication networks accompanied by the growth of the speed of data transfer makes the dynamics of state of the network very high. These two types of growth together with a growing number of applications using network resources make the use of manual data processing during the monitoring of networks more and more inefficient. The management of telecommunications networks today is highly dependent on the effectiveness of tools used to automate the analysis of network events. The degree of automation of the analysis is now one of the key technical problems in the field of telecommunications.

Various categories/types of network traffic can be distinguished, for example, audio (e.g., VoIP traffic), video (e.g., video conference traffic), file transfer traffic, to name a few.

Most hard problems in the field of automatic analysis of network traffic are as follows: Make the classification of network traffic streams in real-time mode; Automatically classifying network traffic to identify application which generates a stream of packets under analysis; Automatically identify new applications and data transfer protocols, automatically construct the models of these applications and protocols; Developing universal tools which will be able to solve the problem of traffic stream identification for different levels of Open Systems Interconnection (OSI) model; Developing universal network management tools which will be able to be a part both of QoS solutions and Network Security solutions.

One of the most important directions of R&D (research and development) activities in the field of Telecommunication Network Management is associated with building of a platform for the application of principles of Artificial Intelligence. Typical technical problems and drawbacks of existing methods used for the analysis of network traffic are as follows: The presence of several different stages of data processing which are separated in time: the stage of machine learning and the stage of data stream classification. This feature leads to the necessity of the use of manual data processing by experts. Manual data processing increases the accuracy of the results, but it reduces the overall efficiency of the use of analysis tools. The absence of adaptability of methods used to process data streams: the absence of adaptability of traffic classification system leads to inability to identify unknown types of applications, protocols, inability to detect new types of network attacks. Finally this feature leads to inability to fully automate the process of data stream analysis and insufficient degree of automation of data stream processing. The growing dynamics of network state requires new types of analysis tools. These tools must be able to identify new types of network traffic in online mode, gather information about this traffic and use this information in future. The absence of universality, that is, numerical methods used for the analysis of traffic may usually only applied for some definite level of OSI model. This feature leads to a quite narrow field of application of such methods.

SUMMARY

It is the object of the invention to provide an efficient technique for network traffic analysis, in particular for automation of analytical processing of data streams in wireless and wired networks. This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

The present disclosure presents techniques for solving the problem of automation of analytical processing of data streams in wireless and wired networks. The disclosure has its scope in the development of scalable and universal tools which may be applied both for host-level analysis and network-level analysis. The ability to reconfiguration makes possible to use similar network analysis tools for the solution of quite different practical problems. The same tools, for example, may be used both for the detection of fraudulent use of network resources and for the identification of behavior pattern that characterizes the use of network resources by some definite software. Implementation of current invention will increase the degree of automation of tools used by the personnel responsible for the management of network.

The disclosure presents techniques for realizing the automatic analysis of network traffic on the basis of adaptive data processing technique. General purpose of this analysis is to provide the highest possible level of Quality of Service for the customers. The basic idea of the invention is using stream-adaptive data processing and knowledge-adaptive data processing at the analysis of traffic. Knowledge-adaptive data processing is realized by the application of a new machine self-learning technique which aims to calculate a statistical model of the analyzed stream of traffic on the basis of a pre-defined set of features. The pre-defined set of features are statistical characteristics of the traffic stream, such as packet length, packet inter-arrival time, etc., on the basis of which the statistical model is calculated, and may be preliminary defined by experts. The procedure of self- learning is performed in online mode.

The technology of traffic identification presented in this disclosure includes the use of signature-based classification and statistics based classification of network traffic inside a single workflow of data stream processing. The computational architecture of the presented numerical method is based on the use of two different technologies of traffic classification: On-the-fly classification and classification on the basis of results of self-learning. These two technologies may be applied

sequentially at the analysis of the stream of packets. On-the-fly classification may include two stages of fast processing of the analyzed stream of packets. The first stage is based on the use of the Knowledge Base which includes the set of rules for the identification of the analyzed stream. The second stage is based on the application of some subset of the Database of the Results of Self Learning. This database includes the set of statistical models resulting from the process of online learning. Methods and systems as presented in this disclosure increase the efficiency of network analysis tools. The increase in efficiency may be defined by automatic online identification of network traffic patterns on the basis of implementation of self-learning techniques.

In order to describe the invention in detail, the following terms, abbreviations and notations will be used:

QoS: Quality of Service

OSI: Open Systems Interconnection

According to a first aspect, the invention relates to a method for automatic online identification of network traffic patterns, comprising: receiving an input traffic stream from a communication network; processing the input traffic stream by applying self-learning- based classification and on-the-fly classification to the input traffic stream, wherein the self-learning-based classification is performed in an online mode by computing a statistical model of the input traffic stream on the basis of a pre-defined set of features and storing the statistical model in a data base which is configured to store results of a plurality of self- learning based classifications, and wherein the on-the-fly classification processes the input traffic stream based on using a knowledge base comprising a predetermined set of rules for identifying the input traffic stream (Pac) and based on applying at least a subset of the statistical models stored in the data base; and identifying a network traffic pattern in the input traffic stream based on the results of the on-the-fly classification and/or the self- learning-based classification.

By applying self-learning-based classification and on-the-fly classification to the input traffic stream, the method provides an efficient technique for network traffic analysis, in particular for automation of analytical processing of data streams in wireless and wired networks.

Implementation of the method leads to a significant increase of the degree of automation in network traffic analysis applications. The set of network monitoring tools can automatically extract the models of network resources usage. This procedure of extraction may be realized at different levels of hierarchy of analyzed telecommunication network. Automatic multi-parameter analysis of data stream may be realized as a procedure carried out in a real-time parallel processing mode. Adaptive methods for automatic control and management of telecommunication networks may be implemented by applying such method.

Implementation of the method further allows the creation of scalable network monitoring tools. The same set of tools may be used both for host-level analysis and network-level analysis.

Implementation of the method allows the realization of a principally new set of software and hardware tools, in particular a principally new class of tools for the monitoring of the traffic of wired and wireless networks. The method may be applied in Self Organized Networks.

In a first possible implementation form of the method according to the first aspect, the data base is configured to store the following data: computed statistical models of the input traffic stream, statistical parameters of unidentified statistical models, unidentified input traffic streams. Preferably the data base stores data to be used for self-learning based classifications. This provides the advantage that these results can be reused in a later processing step of the method.

When identifying input traffic, first the set of rules, which are present in the knowledge base are applied. If the input traffic cannot be identified based on these rules, the statistical model of this traffic is calculated, and an attempt is performed to identify the obtained model on the basis of traffic models stored in the data base. If this attempt is unsuccessful, the traffic cannot be identified, and the statistical parameters, which define the model of this unidentified input traffic, are stored in the data base. With other words, the input traffic which could not be identified is recorded together with the corresponding statistical model, to be processed later e.g., by human experts.

In a second possible implementation form of the method according to the first aspect as such or according to the first implementation form of the first aspect, the identifying the network traffic pattern provides an identification of a class of information policy for the input traffic stream.

Internet provider companies frequently set up a Service Level Agreement (SLA), which defines requirements related to the quality of service provided by the companies. For example, there may be commitment about speed of data transfer for various applications (i.e., traffic shaping). For example, some packets may be delayed (for example, internet browser-generated) in transferring and other may be accelerated (for example, streaming video). Rules of traffic shaping are set up according to information policy: definite information policy is stated for the definite traffic type. To apply definite information policy it is however necessary to understand what type of traffic (i.e. streaming video, chatting service, browser, etc) is coming through the network. Hence, the identification of network traffic pattern is necessary to identify which type of information policy must be applied for the incoming traffic. This provides the advantage that by identifying the network traffic pattern, network provider may have information for accurately designing its network based on particular classes of input traffic streams.

In a third possible implementation form of the method according to the second

implementation form of the first aspect, the classes of information policy are identified based on the network traffic pattern identification, and comprises at least a first policy class, if the statistical model of traffic is identified with the knowledge base, a second policy class, if the statistical model of traffic is identified with the results of the plurality of self-learning based classifications, and a third policy class, if the statistical model of traffic is not identified.

There are two scenarios possible, either the traffic model is identified or the traffic model is not identified. If the traffic model is not identified it will be necessary to apply information policy associated with unknown traffic model, that is information policy from the third class. If the traffic model is identified, there are also to scenarios possible, leading to two different classes of information policy to be applied. In case the traffic model is identified via the knowledge base (i.e., rule-based identification), that is, the traffic can be directly recognized and classified, information policy of the first class is applied. If the traffic itself cannot be identified, but the underlying statistical model can be indentified based on a traffic model in the data base, then information policy of the second class is applied. Such information policy is pre-defined by human experts, and includes rules such as "any unknown traffic type must be stopped" (requirement from information security) or "any unknown traffic type must be transferred with the speed of data transferring currently used".

This provides the advantage that the method has enough flexibility for analyzing the network traffic: the statistical model of traffic may be identified with the knowledge base and/or by the results of self-learning. In a fourth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the self-learning-based classification and the on-the-fly classification is sequentially applied for the on-the-fly classification of the input traffic stream. This provides the advantage that by applying both, the self-learning-based classification and the on-the-fly classification in a sequential manner, network analysis can be improved because more information is available. In a fifth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the input traffic stream comprises a stream of data packets, in particular IP packets. This provides the advantage that the method can be applied to data networks, in particular IP networks.

In a sixth possible implementation form of the method according to the fifth

implementation form of the first aspect, the method comprises filtering the input traffic stream before processing the input traffic stream, wherein the filtering is based on at least one of the following filtering criteria: a predetermined IP source address, a predetermined IP destination address, a predetermined IP source port number, a predetermined IP destination port number, and/or a predetermined data transfer protocol. This provides the advantage that predefined network traffic can be efficiently analyzed. Data traffic between a source and a destination can be efficiently analyzed.

In a seventh possible implementation form of the method according to the sixth or the fifth implementation form of the first aspect, the on-the-fly classification is processed based on reception of a predetermined number of data packets.

This provides the advantage that the method provides a flexible and adjustable analysis of data packets. In an eighth possible implementation form of the method according to any one of the fifth to the seventh implementation form of the first aspect, the self-learning-based

classification is performed on a greater number of received data packets than the predetermined number of data packets. This provides the advantage that the efficiency of self-learning can be improved if a high number of data packets is used as input.

In a ninth possible implementation form of the method according to any of the fifth to the eighth implementation forms of the first aspect, the statistical model comprises a tuple of predetermined length, wherein each element of the tuple describes the statistical distribution of a unique quantity which characterizes the input traffic stream.

This provides the advantage that the statistical model can be efficiently represented on a processor.

In a tenth possible implementation form of the method according to the ninth

implementation form of the first aspect, the unique quantities characterizing the input traffic stream are divided into the following two categories: a first category comprising quantities characterizing single data packets of the stream of data packets, a second category comprising quantities characterizing the stream of data packets as a whole.

This provides the advantage that by using these two categories, the input traffic stream can be accurately analyzed.

In an eleventh possible implementation form of the method according to the tenth implementation form of the first aspect, the quantities of the first category comprise one of the following: mean length of a packet, packet inter-arrival time, and the quantities of the second category comprise one of the following: flow duration, number of transferred packets.

This provides the advantage that these quantities can be easily provided by inspection of the data packet. In a twelfth possible implementation form of the method according to the tenth of eleventh implementation form of the first aspect, the first category and the second category are ordered inside the tuple such that the quantities of the first category are arranged before the quantities of the second categories in the tuple. This provides the advantage that by using such an ordering scheme the first category and the second category can be efficiently accessed.

In a thirteenth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the knowledge base comprises results of analytical processing of traffic flow by using expert systems in offline mode.

This provides the advantage that the analysis of network traffic can be improved when experts systems in offline mode are available.

In a fourteenth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the knowledge base comprises results of analytical processing of traffic flow by human experts in offline mode.

This provides the advantage that the analysis of network traffic can be improved when using knowledge of human experts in offline mode. In a fifteenth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the data base is configured to store the set of decision rules for rule-based classifications.

This provides the advantage that these data can be reused in a later processing step of the method.

According to a second aspect, the inventions relates to a system for automatic online identification of network traffic patterns, comprising: a data buffer for receiving an input traffic stream from a communication network; a data base configured to store a set of statistical models; a knowledge base comprising a predetermined set of rules for identifying the input traffic stream; and a processor configured to process the input traffic stream by applying self-learning-based classification and on-the-fly classification to the input traffic stream and to identify a network traffic pattern in the input traffic stream based on the results of the on-the-fly classification and the self-learning-based classification, wherein the self-learning-based classification is performed in an online mode by computing a statistical model of the input traffic stream (Pac) on the basis of a pre-defined set of features and storing the statistical model in the data base, and wherein the on-the- fly classification processes the input traffic stream (Pac) based on using the knowledge base and based on applying at least a subset of the set of statistical models stored in the data base. By applying self-learning-based classification and on-the-fly classification to the input traffic stream, the system provides an efficient technique for network traffic analysis, in particular for automation of analytical processing of data streams in wireless and wired networks. The system can automatically extract the models of network resources usage. The system further allows the creation of scalable network monitoring tools in which the same set of tools may be used both for host-level analysis and network-level analysis. The system can be efficiently applied for monitoring the traffic of wired and wireless networks and may also be applied in Self Organized Networks. According to a third aspect, the inventions relates to a method of machine self-learning on the basis of previously calculated statistical model of processed stream of traffic which includes: Initializing a set of candidate models with elements of the database of results of self-learning; For each element of the set of candidate models iteratively formulating and verifying a statistical hypothesis, wherein the mentioned statistical hypothesis formulates the fact that the definite element of the set of candidate models and statistical model to be identified belong to the same stochastic process; Calculating the result of the identification on the basis of result of verification of statistical hypothesis; and Initializing new element of the database of results of self-learning on the basis of the input statistical model in the case if this model is not identified with above mentioned database and the procedure of machine self-learning is permitted.

According to a fourth aspect, the inventions relates to a computer implemented method for automatic online identification of network traffic patterns comprising the steps of calculating statistical model of traffic, on-the-fly identification of statistical model, identification of statistical model of traffic with database of results of self-learning, machine self-learning on the basis of previously calculated statistical model, calculating the class of information policy on the basis of results of statistical model identification.

In a first possible implementation form of the computer implemented method according to the fourth aspect, calculating the statistical model of traffic includes: Initializing data structure representing new statistical model on the basis of the set of rules, each rule defines the way of the calculation of a single statistical characteristic on the basis of values of a definite fields of the processed network packet and fields of statistical model calculated on the previous step of traffic stream processing; Updating each statistical characteristic of statistical model on the basis of a definite rule from the set of rules used for on-the-fly identification of the model; and Updating each statistical characteristic of statistical model on the basis of a definite rule from the set of rules used for the calculation of the element of the database representing the results of self-learning. In a second possible implementation form of the computer implemented method according to the fourth aspect, on-the-fly identification of statistical model includes: Method of identification of statistical model on the basis of Knowledge Base; and Method of identification of statistical model by the use of the database of results of self-learning. In a third possible implementation form of the computer implemented method according to the second implementation form of the fourth aspect, the identification of statistical model on the basis of Knowledge Base includes: Initializing the set of candidate models with all elements of Knowledge Base; Iteratively updating the set of candidate models on the basis of identification rules stored in Knowledge Base; and Calculating the result of the identification on the basis of the number of models finally represented in candidate set.

In a fourth possible implementation form of the computer implemented method according to the second implementation form of the fourth aspect, the identification of statistical model by the use of the database of results of self-learning includes: Initializing the set of candidate models with elements of the database of self-learning results; For each element of the set of candidate models iteratively formulating and verifying statistical hypothesis, wherein the mentioned hypothesis formulates the fact that the definite element of the set of candidate models and statistical model to be identified belong to the same stochastic process; and Calculating the result of the identification on the basis of result of verification of statistical hypothesis.

Methods and systems according to the disclosure may show three kinds of effects as described in the following. A first one is the creation of new means for the control of telecommunications networks, which have the set of still non-realized features. Practical implementation of current invention leads to significant increasing the degree of automation of network traffic analysis applications. Implementation of methods and systems according to the disclosure make the set of network monitoring tools being able to automatically extract the models of network resources usage. This procedure of extraction may be realized at different levels of hierarchy of analyzed telecommunication network. Automatic multi-parameter analysis of data stream may be realized as a procedure carried out in a real-time parallel processing mode. The ability of the presented technique to reconfiguration makes it possible to realize adaptive methods for automatic control and management of telecommunication networks. A second effect is the creation of scalable network monitoring tools. Implementation of methods and systems according to the disclosure create scalable network monitoring tools. The same set of tools may be used both for host-level analysis and network-level analysis. A third effect is the realization of a principally new set of software and hardware tools. The main perspective result of the implementation of methods and systems according to the disclosure is the creation of a principally new class of tools intended for the monitoring of the traffic of wired and wireless networks. The use of adaptive control for network traffic according to the disclosure may be applied in Self Organized Networks.

The presented invention may be used in a very wide range of network analysis applications, for example: Automatic identification of patterns characterizing network behavior of users and automatic profiling of these patterns; Automatic detection of situations which characterize high risk of network attacks; Automatic detection of unauthorized intruders in the network; Automatic detection of the cases characterizing the fraudulent use of hardware or software tools; and Automatic detection of situations which characterize high risk of failure in the monitored network segment.

BRIEF DESCRIPTION OF THE DRAWINGS

Further embodiments of the invention will be described with respect to the following figures, in which: Fig. 1 shows a schematic diagram illustrating a method 100 for automatic online identification of network traffic patterns according to an implementation form;

Fig. 2 shows a schematic diagram illustrating a system 200 for automatic online identification of network traffic patterns according to an implementation form; Fig. 3 shows a sequence diagram illustrating an exemplary main cycle 300 of input traffic stream processing according to an implementation form;

Fig. 4 shows a sequence diagram illustrating an exemplary main algorithm 400 of network traffic model identification according to an implementation form;

Fig. 5 shows a sequence diagram illustrating an exemplary algorithm 500 of traffic statistical model calculation according to an implementation form; Fig. 6 shows a sequence diagram illustrating an exemplary part 600 of an algorithm of traffic statistical model on-the-fly identification using the Knowledge Base according to an implementation form;

Fig. 7 shows a sequence diagram illustrating an exemplary part 700 of the algorithm of traffic statistical model on-the-fly identification using the database of results of self- learning according to an implementation form; and

Fig. 8 shows a sequence diagram illustrating an exemplary algorithm 800 of traffic statistical model identification with database of results of self-learning according to an implementation form.

DETAILED DESCRIPTION OF EMBODIMENTS In the following detailed description, reference is made to the accompanying drawings, which form a part thereof, and in which is shown by way of illustration specific aspects in which the disclosure may be practiced. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

It is understood that comments made in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise. Fig. 1 shows a schematic diagram illustrating a method 100 for automatic online identification of network traffic patterns according to an implementation form.

The method includes receiving 101 an input traffic stream 102 from a communication network.

The method includes processing 103 the input traffic stream 102 by applying self-learning- based classification and on-the-fly classification to the input traffic stream 102. The self- learning-based classification is performed in an online mode by computing a statistical model of the input traffic stream 102 on the basis of a pre-defined set of features and storing the statistical model in a data base which is configured to store results of a plurality of self-learning based classifications. The on-the-fly classification processes the input traffic stream 102 based on using a knowledge base comprising a predetermined set of rules for identifying the input traffic stream 102 and based on applying at least a subset of the statistical models stored in the data base.

The method further includes identifying 105 a network traffic pattern 104 in the input traffic stream 102 based on the results of the on-the-fly classification and/or the self-learning- based classification. The data base may be configured to store the following results of self-learning based classifications: computed statistical models of the input traffic stream 102, statistical parameters of unidentified statistical models, unidentified input traffic streams 102.

The identifying 105 the network traffic pattern 104 may provide an identification of a class of information policy for the input traffic stream 102.

The class of information policy may include at least the following three classes: statistical model of traffic is identified with the knowledge base, statistical model of traffic is identified with the results of the plurality of self-learning based classifications and statistical model of traffic is not identified. The self-learning-based classification and the on-the-fly classification may be jointly applied to the input traffic stream 102. The self-learning-based classification and the on- the-fly classification may be sequentially applied to the input traffic stream 102.

The input traffic stream 102 may include a stream of data packets, in particular Internet Protocol, IP packets.

The method 100 may include filtering the input traffic stream 102 before processing the input traffic stream 102. The filtering may be based on at least one of the following filtering criteria: a predetermined IP source address, a predetermined IP destination address, a predetermined IP source port number, a predetermined IP destination port number and a predetermined data transfer protocol. The on-the-fly classification may be processed based on reception of a predetermined number Nid of data packets.

The self-learning-based classification may be performed on a greater number of received data packets than the predetermined number Nid of data packets.

The statistical model may include a tuple of predetermined length. Each element of the tuple may describe the statistical distribution of a unique quantity which characterizes the input traffic stream 102. The unique quantities characterizing the input traffic stream 102 may be divided into the following two categories: a first category including quantities characterizing single data packets of the stream of data packets and a second category including quantities characterizing the stream of data packets as a whole.

The quantities of the first category may include a mean length of a packet and/or a packet inter-arrival time. The quantities of the second category may include flow duration and/or number of transferred packets. The first category and the second category may be ordered inside the tuple such that the quantities of the first category are arranged before the quantities of the second categories in the tuple. The knowledge base may include results of analytical processing of traffic flow by using expert systems in offline mode.

The method 100 may be implemented on a system 200 described below with respect to Fig. 2 and may implement the algorithms 300, 400, 500, 600, 700, 800 described below with respect to Figures 3 to 8.

Fig. 2 shows a schematic diagram illustrating a system 200 for automatic online identification of network traffic patterns according to an implementation form. The system 200 for automatic online identification of network traffic patterns includes a data buffer 201 for receiving an input traffic stream 202 from a communication network; a data base 207 to store a set of statistical models; a knowledge base 205 including a predetermined set of rules for identifying the input traffic stream 202; and a processor 203.

The processor 203 processes the input traffic stream 202 by applying self-learning-based classification 211 and on-the-fly classification 209 to the input traffic stream 202 and identifies a network traffic pattern 204 in the input traffic stream 202 based on the results of the on-the-fly classification 209 and the self-learning-based classification 21 1.

The self-learning-based classification 211 is performed in an online mode by computing a statistical model of the input traffic stream 202 on the basis of a pre-defined set of features and storing the statistical model in the data base 207.

The on-the-fly classification 209 processes the input traffic stream 202 based on using the knowledge base 205 and based on applying at least a subset of the set of statistical models stored in the data base 207.

The system 200 may apply the method 100 described above with respect to Fig. 1 and the algorithms 300, 400, 500, 600, 700, 800 described below with respect to Figures 3 to 8. Fig. 3 shows a sequence diagram illustrating an exemplary main cycle 300 of input traffic stream processing according to an implementation form. Fig. 3 illustrates the embodiment of the algorithm 300 realizing the main cycle of traffic processing. Immediately after the start 301 of the running of algorithm there is implemented the procedure of initialization 302. In the cycle there is carried out the check on a presence of a packet which must be analyzed. In this embodiment of the algorithm it is supposed that that input stream of packets running from telecommunication network is already filtered. This means that input stream includes only packets combined together in accordance with implementation of some external logic. In one embodiment the filtering of the flow may be done on the basis of some fixed value of the following tuple: {IPSrc, IPDst, SrcPort, DstPort, Protocol), where IPSrc is the IP-address of the source of packet, IPDst is the IP-address of the destination of packet, SrcPort is the number of source port, DstPort is the number of destination port, Protocol is the protocol of data transferring. In the bound of this document a plurality of packets to be processed by the algorithm and to be used for the calculation of an appropriate statistical model (SModel) will be called the flow.

If the algorithm 300 represented on Fig. 3 receives the packet, this packet is transmitted for the processing by main algorithm 400 (see Fig. 4) of network traffic model

identification. The value of counter N = 0 shows that the processing of the stream is not performed. After initialization of this counter 303, the algorithm 300 runs into the main loop of processing of new data packet. If there is a new packet to be processed 304 this packet is transferred to main algorithm 400 of network traffic model identification. After the end of data processing this algorithm checks the completion 308 of the calculation of statistical model. If the model is calculated then the class of information policy is defined for the processed stream of packets. In this case the results of analysis are transmitted to the set of tools 309 responsible to the management of traffic stream. In this algorithm logical variable ExitFlag 305 is used to accomplish the process of the processing of input data stream. This variable is managed by the algorithm the logic of which one is external with respect to described numerical method.

Fig. 4 shows a sequence diagram illustrating an exemplary main algorithm 400 of network traffic model identification according to an implementation form. Input data for this algorithm are as follows: data packet (Pac) received of the analysis; the set of rules (SRule) which are used for the calculation of statistical model {SModel) of the analyzed stream of packets; the number of packets {Nid) used for on-the-fly identification of stream. During processing of data this algorithm has access to the following data stores:

Knowledge Base; Database of Statistical Parameters of Unidentified Models; Database of Streams of Unidentified Traffic; and Database of Results of Self Learning. This algorithm uses the value of the logic variable LFIag which is set outside the logic of the presented numerical method. The result of data processing by algorithm is the class of information policy which is calculated for the analyzed stream of traffic.

Immediately after the start 401 of data processing main algorithm 400 makes the initialization of data structures to start the processing of a new traffic stream. The value 402 of counter N = 0 shows that the processing of the stream is not performed. In this case the set of preparatory procedures 403 is executed. The set of these procedures, in particular, may include the check of the state of Knowledge Base which is intended to keep traffic identification rules. In the case if Knowledge Base doesn't contain any elements at the execution of preparatory procedures the logic of self-learning on default may be initialized. Managing the learning process in the presented numerical method is carried out by means of a Boolean variable LFIag. If the Knowledge Base does not contain structures that could be used to identify the flow (Knowledge Base is empty), the logic of preparatory operations may assign a value true to variable LFIag. In the bounds of this method this will mean the learning in traffic stream. Moreover, in one embodiment the logic of preparatory operations can initialize the buffer in which the set of packets belonging to the processed data stream are copied.

After the end of the preparatory procedures 403 the algorithm 400 performs the check 404 if the recognition of the current traffic flow is accomplished. Integer constant Nid is defined by the logic which is external with respect to this numerical method. The value of this constant is equal to the number of packets of the stream which must be used for the identification of traffic in on-the-fly mode. In the case that the counter 404 of packets N is less than the value of cf the algorithm runs the procedure 405 of the calculation of statistical model of the stream under analysis. After that the counter of packets is incremented 408 and following verification 409 is performed of the condition on the finish of on-the-fly mode of data stream processing. If N = Nid the algorithm 400 launches the procedure 410 of on-the-fly identification of statistical model. And after that it verifies the fact 41 1 that SModel is identified. In the presented embodiment of the algorithm 400 there is the realization of machine self- learning procedure. This procedure of self-learning is realized by branch of the algorithm which matches condition LFlag = true 405. Mentioned branch is used by the algorithm 400 when the counter 404 of the packets is greater than Nid: The procedure of learning 406 is performed on the whole number of packets of analyzed stream. The check "Is flow closed?" 415 represents the verification of the fact of closing of the analyzed stream of packets. Completing the calculation 406 of the statistical model of the flow (SModel) in a learning mode is possible only under the condition that all packets of the stream under analysis are received. This is due to the fact that in the general case, a part of the model parameters can be calculated only after a packet stream received in full. In one embodiment, these parameters include the total number of packets in the flow and other parameters which characterize the flow as a whole.

The branch of the algorithm associated with affirmative answer on the question "Is flow closed?" 415 describes the actions which are made under calculated statistical model of the whole stream of packets. At the first stage of this branch the algorithm makes the update 416 of the database of statistical parameters. Represented numerical method includes identification 417 of the traffic by the use of the database of the results of self- learning.

The identifier of the class of information policy is calculated 412 on the basis of the results obtained during identification of statistical model SMocte/ 410. The class of information policy defines the rules of the processing of network traffic. These rules are defined for each identified class of traffic stream. In represented numerical method it is assumed that these rules are defined by some outside logic. In particular, the set of classes may include the class of traffic Unidentified. In an implementation form of the method 400 the process of calculation of information policy takes into account three different situations which can be the result of statistical model identification: Statistical model of traffic is identified with Knowledge Base; Statistical model of traffic is identified with Database of Results of Self Learning; and Statistical model of traffic is not identified. Each of these three cases is characterized by the separate class of information policy.

The final stage of data processing by the algorithm represented in Fig. 4 includes completion operations 413. In one embodiment of the algorithm 400 these completion operations 413 can contain the procedures of cleanup of memory occupied by data which are used by the main algorithm. This algorithmic module can also contain procedures of processing of the buffer which is used for storing of analyzed data stream. If traffic has been identified, this buffer is released 414, otherwise it is stored in a database for later analysis. If the processing of data stream still is not finished 419 the algorithm increases the value of counter 418 and returns to the start of the stream processing loop 302 described above with respect to Fig. 3.

Fig. 5 shows a sequence diagram illustrating an exemplary algorithm 500 of traffic statistical model calculation according to an implementation form. The input data for this algorithm are as follows: the data packet Pac; the index N of data packet in the sequence of processed packets of the stream; the statistical model SModel, calculated on the previous stage of processing this stream and a set of rules to calculate the fields of statistical model: SRule. The result of the algorithm is a statistical model of the analyzed stream, updated according results of Pac processing.

In general case SModel can be represented mathematically as a tuple with length {Nd+Ne). Each element SModel(i) of this tuple describes the statistical distribution of some unique quantity, which characterizes the flow of traffic. Elements of the statistical model may be divided into the following two categories in accordance with the nature of their computation and use: A first category of the quantities by which the flow can be characterized before its closing. In one embodiment these may include, for example, mean length of the packet or packet inter-arrival time; and a second category of the quantities which characterize a flow only as a whole. These values may be computed only after the closing of the flow. These, in one embodiment, may include, Flow Duration or Number of Transferred Packets.

In the represented sequence diagram 500 it is supposed that mentioned categories of the quantities are ordered inside the tuple SModel: first Nd elements can characterize the stream in on-the-fly mode of data processing. The next Ne elements can be computed only after the closing of data stream. In one embodiment each element of the tuple SModel can be mathematically represented as a tuple. Each tuple SModel(i) is characterizing some definite statistical distribution. It may comprise a unique number of elements. In one embodiment SModel(i) may include the mean value and variance. In one embodiment the set of rules SRule may be represented as a tuple. In the represented block diagram it is supposed that SRule includes a two consecutive parts: DFea and EFea. DFea is a tuple that contains identifiers of the rules for the calculation of the first Nd elements of SModel which may be used for on-the-fly identification of the stream. The length of the tuple DFea is equal Nd. EFea is a tuple that contains identifiers of the rules for the calculation of the last Ne elements of the statistical model SModel. The length of the tuple EFea is equal Ne.

Immediately after the beginning 501 of the run the algorithm 500 checks the start of the processing of a new stream. If the algorithm starts the calculation of a new statistical model (N = 0) 502, it initializes the data structures of the model: SModel 503. After initializing the counter of the fields of statistical model' tuple ( ) the algorithm runs into the loop of fields' update 506 of the tuple SModel.

This procedure of update 506 is carried out on the basis of the fields of the packet Pac. Processing of these fields is defined by the rule DFea(i), and by the data SModel(i), obtained after processing of the previous packet of this flow. Implementation of the method of calculation of the statistical model SModel may include calculating a set of statistical characteristics of the distributions of a given set of values. The composition of this set of characteristics can include a plurality of, e.g., the packet length, and the time interval between the arrival of packets (packet inter-arrival time). The set of statistical characteristics for each of these variables in one embodiment can include the mean value, the variance value and the values of the central moments of statistical distribution.

In the described algorithm it is supposed that the numbering of the elements of the tuples DFea and EFea is performed from the value of zero. Upon the completion of the cycle algorithm checks the condition of closing 508 of the stream under analysis. If the stream is closed and this is the last packet 509 in the stream, it becomes possible to use the data tuple EFea rules for computing the last portion of the statistical model SModel. The result of this algorithm is a tuple SModel 51 1. Each field of this field is a statistical characteristic, the formation of which is specified by a tuple SRule.

Fig. 6 shows a sequence diagram illustrating an exemplary part 600 of an algorithm of traffic statistical model on-the-fly identification using the Knowledge Base according to an implementation form. Fig. 7 shows a sequence diagram illustrating an exemplary part 700 of the algorithm of traffic statistical model on-the-fly identification using the database of results of self-learning according to an implementation form. Fig. 6 includes the part 600 of this algorithm which uses the data of Knowledge Base for the identification of the traffic and Fig. 7 includes the part 700 of this algorithm which uses the Database of Results of Self Learning for fast identification. Input data for the algorithm are as follows: the tuple SModel calculated during the run of the algorithm of traffic statistical model calculation. Within this algorithm it is assumed that the process of identification involves only first Nd data fields of statistical model SModel. During its work this algorithm uses data stored in Knowledge Base and data stored in the Database of Results of Self Learning. The result of the algorithm is the information about the identification of statistical model of traffic.

Immediately after initialization 602 of the counter (/) of fields of statistical model SModel the algorithm 600 performs the initialization of the set {KBSet} 603. In one embodiment each element of this set is represented by the tuple and this tuple represents some definite element of Knowledge Base. Each element of any tuple KBSet(j) is the identifier of a rule, on the basis of which the check can be carried out on the identity of some object to a set of corresponding objects. In some implementation of the algorithm for identification, for example, there may be performed the verification of the fact that the integer value SModel(i) is inside of a certain range of values. In the next step of the algorithm there is the calculation 604 of the number of elements {Lkb) in the set {KBSet}. Then, this algorithm runs into the outer loop 606 which corresponds to a sequential scan of the fields of SModel. After initialization 607 of the counter of elements of the set {KBSet}, the algorithm enters the inner loop 608 of data processing. In the cycle on the variable j, there is the verification of the correspondence of the value SModel(i) to condition 616 which is formulated as the /-th element of the tuple KBSet ). In case of discrepancy the tuple is removed 617 from the set { SSer}.

Completing of data processing by identification algorithm is characterized by the following two basic situations that may arise in the analysis. First, the algorithm terminates if at some point it becomes dear that the set {KBSet} is empty: Lkb = 0. This situation corresponds to the case when the model SModel is not identified 614: processed stream of packets has no analogue in the Knowledge Base. Secondly, the algorithm of on-the-fly identification stops after processing all Nd fields of SModel. In this case there is possible one of three different results, which are presented in the block diagram as outputs from the module 'switch'. If at some point of the processing of statistical model data it turns out that in the Knowledge Base there is no information about the analyzed traffic 614 {Lkb = 0) the process of analysis moves to the branch 615 of the algorithm which carries out identification by means of the Database of Results of Self Learning 700 (see Fig. 7). The algorithm 700 represented in Fig. 7 is a continuation of the algorithm 600 of on-the-fly statistical model identification. This algorithm 700 starts its work in a situation where the identification SModel using the Knowledge Base is unsuccessful 614. At the beginning 615 of this algorithm 700 the initialization procedure is performed. This procedure includes the initialization 701 of the index (/) of statistical model from the Database of Results of Self Learning and the total number of statistical models (Ksm) in a subset Cs of this database. This algorithm uses in its work the set Cs of statistical models 703.

After calculating the amount of these models in the database (Ksm) 704 there starts a cycle in which the following data processing is performed. For each of the models from the Database of Results of Self Learning null hypothesis HO is formulated 706. It consists in the fact that statistical models KBSet(i) and SModel, built on two different sets of statistical samples belong to the same stochastic process. The algorithm of null hypothesis HO verification 707 can be based on the use of well-known non-parametric statistical tests. In one embodiment null hypothesis may be verified by means of one of the following statistical tests: Q - criterion of Rosenbaum, U - test of Mann - Whitney, Kruskal - Wallis test, Pearson's chi-squared test, Kolmogorov - Smirnov test, Anderson - Darling test, F - test (Fisher's criterion). Application of certain statistical criteria is defined by features of a particular task and a set of the used statistical parameters. The embodiment of the algorithm 700 represented in Fig. 7 completes the processing of data when all statistical models represented in the subset Cs are already analyzed.

Completing of data processing by identification algorithm is characterized by the following three basic situations that may arise in the analysis. First, investigated statistical model is not identified because there are no similar statistical models 704 in the Database of Results of Self Learning: Lkb = 0. Second, SModel is identified 713: Lkb ⁼ \ . Third, investigated statistical model is not identified 712 because there are several similar statistical models in the Database of Results of Self Learning: Lkb > 1. Fig. 8 shows a sequence diagram illustrating an exemplary algorithm 800 of traffic statistical model identification with database of results of self-learning according to an implementation form. Input data for the algorithm 800 are as follows: the tuple SModel calculated during the run of the algorithm of traffic statistical model calculation; and Boolean variable FOff. The value of this variable is used to set the mode of the statistical model SModel search in the Database of Results of Self Learning. In particular the value FOff = true makes possible the searching in offline mode. In some embodiment of the numerical method additional parameter may be included in the list of input data. For example, time stamp may be used as additional input parameter in the case of statistical model search optimization. During its work this algorithm 800 uses data stored in the Database of Results of Self Learning. The results of data processing by the algorithm are as follows: information about the identification of statistical model of traffic; modification of the Database of Results of Self Learning (in the case if the process of learning is permitted).

Immediately after the start 801 of data processing the algorithm is initializing 802 the variable Lkb, which is used as a counter of the number of statistical models from a database, similar SModel. At this step as well as the Boolean variable FCrt is initialized which is used to control the access to two sets of models from the database: {Crt}, {Coff). The set KBSet is used by the algorithm as a temporary storage of models extracted from the database. In the next step of the algorithm the {KBSet} is initialized 803 with models from a Crt subset of database. After calculating the amount of these models in the database (Ksm) there starts a cycle in which the following data processing is performed. For each of the models from the Database of Results of Self Learning null hypothesis HO is formulated 807. It consists in the fact that statistical models KBSet(i) and SModel, built on two different sets of statistical samples belong to the same stochastic process. The algorithm of null hypothesis HO verification can be based on the use of known non-parametric statistical tests. In one embodiment null hypothesis may be verified 808 by means of one of the following statistical tests: Q - criterion of Rosenbaum, U - test of Mann - Whitney, Kruskal - Wallis test, Pearson's chi-squared test, Kolmogorov - Smirnov test, Anderson - Darling test, F - test (Fisher's criterion). Application of certain statistical criteria is defined by features of a particular task and a set of the used statistical parameters. The exit from the cycle of statistical hypothesis verification 808 is available by reaching the boundary of KBSet, or directly after the confirmation of null hypothesis. If at this stage the null hypothesis is confirmed (Lkb = 1), the statistical model from the database is updated 81 1 based on the data of SModel and algorithm accomplishes its work. If the null hypothesis was not confirmed for any one element of the set Crt (Lkb < 1 ), then the algorithm checks the possibility of continuing the search in offline mode. If the features of use of the method are that a search in off-line mode is possible (FOff = true) 816, then the algorithm described above is repeated for the subset {Coff of statistical models from the Database of Results of Self Learning. If statistical model has not been identified during the actions of the described algorithm (with both subsets Crt, Coff) there is the initialization 817 of a new element in the Database of Results of Self Learning on the basis of SModel data.

The numerical method described in this disclosure (Figures 3 to 8) uses four data repositories during its work. Two of them are used as primary storages. And the other two storages are auxiliary ones. The primary storages are used at the procedures of traffic pattern identification. The auxiliary storages are used to store the data which may be used later by human experts during their exploratory work on new patterns of traffic. In the following, the knowledge base is described. This primary repository (Knowledge Base) is used in the disclosed method to store data structures underpinning the identification of statistical models of network traffic. In the described method, it is assumed that Knowledge Base contains the results of analytical processing of traffic flow obtained during the work of experts or during the work of special numerical tools. This work of network analysts, particularly, may include the use of certain specialized automated data processing means in offline mode. Working with Knowledge Base aims to classify the packet stream on the basis of the set of classification rules. In this regard, the

implementation of the knowledge base can be based on one of the modern methods of classification. Selection of a particular method may aim to optimize the use of the method described in this disclosure. The speed of the search in Knowledge Base, for example, may be used as one of optimization parameters. The architecture of Knowledge Base depends on the specific purposes of the application of this traffic analysis method.

However, used data structures should provide an automatic verification of compliance of statistical model SModel to certain classification rule. In this method, it is assumed that the Knowledge Base is available only in read mode. In the following, the Database of Statistical Parameters of Unidentified Models is described. This auxiliary repository is used in the disclosed method to store the set of statistical models of traffic which are not identified by means of Knowledge Base. Each model stored in this DB is associated with definite data stream stored in the database of traffic streams. The presence of this database allows to perform the processing of unrecognized models in off-line mode. In one implementation for example, it may be done by using the methods of Unsupervised Learning class (Clustering, Self-Organized Maps, Singular Value Decomposition, etc.). Data structure used to store the element of database of statistical parameters should be chosen so as to ensure storing of a tuple which has predetermined length. Definite embodiment of the element of mentioned database is determined by the requirements of the application of the disclosed method and by the set of statistical quantities used in this method. Particularly if each unidentified model has time label it becomes possible to use a plurality of methods traditionally used for the analysis of time sequences (correlation analysis, analysis of covariance, etc). In current numerical method it is supposed that database of statistical parameters may be accessible in write mode.

In the following, the Database of Streams of Unidentified Traffic is described. This auxiliary repository is used in the disclosed method to store the streams of traffic which was not identified in current numerical method. Each element stored in this database is associated with definite element of the database of parameters of unidentified models. Data structure used to store the element of DB of the streams of unidentified traffic should be chosen so as to ensure storing the whole aggregate of network packets of unidentified stream of traffic. This database is necessary to make available the process of analytical work on the detection of the set of rules of identification which are used for the

construction of Knowledge Base. In the disclosed numerical method it is supposed that database of streams of undefined traffic may be accessible in write mode.

In the following, the Database of Results of Self Learning is described. This primary repository is used in the disclosed method to store the results of traffic stream processing by the algorithm of self-learning. This algorithm is applied if the procedure of identification with Knowledge Base is unsuccessful. This algorithm implements the procedure of self- learning based on the preliminary defined set of statistical parameters. In the current description of the numerical method the implementation of the algorithm is represented based on the principles of statistical self-learning. In the disclosure it is supposed that Database of Results of Self Learning is divided in three parts: Cs, Crt and Coff. Such a division aims to optimize the accuracy and temporal characteristics of the implementation of this numerical method. Processing of models of the subsets Cs and Crt is performed in online mode (see Fig. 7, Fig. 8). The method of storing of these models is to minimize the access time. The need to separate statistical models which are accessed online on two sets is associated with the following circumstances. First, technically the process of storing of these two types of models may be realized so that access speed will be quite high but different for them. Second, separate realization of models processing for Cs and Crt gives additional flexibility: methods of statistical data processing applied for each of these subsets may be different. This circumstance can be directly related to the following condition on the composition of subsets Cs, Crt: these sets can have a nonzero intersection. Processing of models from the subset Cor7 is performed in offline mode. It is supposed that interrelation between the composition of subsets Crt and Coff is dynamical. A separate algorithm which is out of the scope of the current disclosure is responsible for deciding which of these sets should be each particular statistical model. In particular, the ratio between the number of elements in Cs, Crt and Coff should be defined by requirements for the speed of proper identification problem. The mentioned algorithm can make the ordering of elements in each of these subsets according optimization criteria which are defined by the solved problem of identification. Issues associated with architectural embodiment of database of the results of self-learning are beyond the scope of this disclosure. In this disclosure it is supposed that DB of self-learning may be accessed in Read- and Write modes.

The present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein, in particular the method 100 as described above with respect to Fig. 1 or the algorithms 300, 400, 500, 600, 700, 800 described above with respect to Figures 3 to 8. Such a computer program product may include a readable storage medium storing program code thereon for use by a computer. The program code may perform the method 100 as described above with respect to Fig. 1 or the algorithms 300, 400, 500, 600, 700, 800 described above with respect to Figures 3 to 8. While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations, such feature or aspect may be combined with one or more other features or aspects of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "include", "have", "with", or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprise". Also, the terms "exemplary", "for example" and "e.g." are merely meant as an example, rather than the best or optimal. The terms "coupled" and "connected", along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.

Although specific aspects have been illustrated and described herein, it will be

appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein. Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence. Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.

Claims

CLAIMS:

1. A method (100) for automatic online identification of network traffic patterns, comprising: receiving (101) an input traffic stream (102) from a communication network; processing (103) the input traffic stream (102) by applying self-learning-based classification and on-the-fly classification to the input traffic stream (102), wherein the self-learning-based classification is performed in an online mode by computing a statistical model of the input traffic stream (102) on the basis of a pre-defined set of features and storing the statistical model in a data base which is configured to store results of a plurality of self-learning based classifications, and wherein the on-the-fly classification processes the input traffic stream (102) based on using a knowledge base comprising a predetermined set of rules for identifying the input traffic stream (102) and based on applying at least a subset of the statistical models stored in the data base; and identifying (105) a network traffic pattern (104) in the input traffic stream (102) based on the results of the on-the-fly classification and/or the self-learning-based classification.

2. The method (100) of claim 1 , wherein the data base is configured to store the following data: computed statistical models of the input traffic stream (102), statistical parameters of unidentified statistical models, unidentified input traffic streams (102).

3. The method (100) of claim 1 or 2, wherein the identifying (105) the network traffic pattern (104) provides

identification of a class of information policy for the input traffic stream (102).

4. The method (100) of claim 3, wherein classes of information policy are identified based on the network traffic pattern identification, and comprises at least: a first policy class, if the statistical model of traffic is identified with the knowledge base, a second policy class, if the statistical model of traffic is identified with the results of the plurality of self-learning based classifications, and a third policy class, if the statistical model of traffic is not identified.

5. The method (100) of one of the preceding claims, wherein the self-learning-based classification and the on-the-fly classification is sequentially applied for the on-the-fly classification of the input traffic stream (102).

6. The method (100) of one of the preceding claims, wherein the input traffic stream (102) comprises a stream of data packets, in particular IP packets.

7. The method (100) of claim 6, comprising: filtering the input traffic stream (102) before processing the input traffic stream

(102), wherein the filtering is based on at least one of the following filtering criteria: a predetermined IP source address, a predetermined IP destination address, a predetermined IP source port number, a predetermined IP destination port number, and/or a predetermined data transfer protocol.

8. The method (100) of claim 6 or 7, wherein the on-the-fly classification is processed based on reception of a predetermined number (Nid) of data packets.

9. The method (100) of one of claims 6 to 8, wherein the self-learning-based classification is performed on a greater number of received data packets than the predetermined number (Nid) of data packets.

10. The method (100) of one of claims 6 to 9, wherein the statistical model comprises a tuple of predetermined length, wherein each element of the tuple describes the statistical distribution of a unique quantity which characterizes the input traffic stream (102).

11. The method (100) of claim 10, wherein the unique quantities characterizing the input traffic stream (102) are divided into the following two categories: a first category comprising quantities characterizing single data packets of the stream of data packets, a second category comprising quantities characterizing the stream of data packets as a whole. 12. The method (100) of claim 11 , wherein the quantities of the first category comprise one of the following: mean length of a packet, packet inter-arrival time, and wherein the quantities of the second category comprise one of the following: flow duration, number of transferred packets. 13. The method (100) of claim 11 or 12, wherein the first category and the second category are ordered inside the statistical model tuple such that the quantities of the first category are arranged before the quantities of the second categories in the statistical model tuple.

1 . The method (100) of one of the preceding claims, wherein the knowledge base comprises results of analytical processing of traffic flow by using expert systems in offline mode.

15. A system (200) for automatic online identification of network traffic patterns, comprising: a data buffer (201) for receiving an input traffic stream (202) from a communication network; a data base (207) configured to store a set of statistical models; a knowledge base (205) comprising a predetermined set of rules for identifying the input traffic stream (202); and a processor (203) configured to process the input traffic stream (202) by applying self-learning-based classification (21 1 ) and on-the-fly classification (209) to the input traffic stream (202) and to identify a network traffic pattern (204) in the input traffic stream (202) based on the results of the on-the-fly classification (209) and the self-learning-based classification (21 1 ), wherein the on-the-fly classification (209) processes the input traffic stream (202) based on using the knowledge base (205) and based on applying at least a subset of the set of statistical models stored in the data base (207); and wherein the self-learning-based classification (211 ) is performed in an online mode by computing a statistical model of the input traffic stream (202) on the basis of a predefined set of features and storing the statistical model in the data base (207).