US 20100284283 A1 Abstract A method of detecting anomalies in a communication system, includes: providing a first packet flow portion and a second packet flow portion; extracting samples of a numerical feature associated with a traffic status of the first and second packet flow portions; computing from said extracted samples a first statistical dispersion quantity and a second statistical dispersion quantity of the numerical feature associated with the first and second packet flow portions, respectively; computing from the dispersion quantities a variation quantity representing a dispersion change from the first packet flow portion to the second packet flow portion; comparing the variation quantity with a comparison value; and detecting an anomaly in the system in response to said comparison.
Claims(26) 1-25. (canceled)26. A method of detecting anomalies in a communication system, comprising:
providing a first packet flow portion and a second packet flow portion; extracting samples of a numerical feature associated with a traffic status of the first and second packet flow portions; computing from said extracted samples a first statistical dispersion quantity and a second statistical dispersion quantity of the numerical feature associated with the first and second packet flow portions, respectively; computing from said dispersion quantities a variation quantity representing a dispersion change from the first packet flow portion to the second packet flow portion; comparing the variation quantity with a comparison value; and detecting an anomaly in the system in response to said comparison. 27. The detection method of 28. The detection method of 29. The detection method of 30. The detection method of packet size in bytes; total number of packets in a time interval of length; total number of layer 3 bytes in a time interval of length; average packet size in a time interval of length, expressed in bytes; packet rate in a time interval of length; and byte rate in a time interval of length. 31. The detection method of computing a summation of the samples associated with one of said first and second packet flow portions; computing a mean value of the samples associated with one of said first and second packet flow portions; computing a summation of the squared distances from the mean value of the samples associated with one of said first and second packet flow portions; and computing each of said first and second variances from the corresponding summation of the squared distances from the mean value. 32. The detection method of defining a first time window comprising the first packet flow portion and an associated first sample segment of the numerical feature; and defining a second time window comprising the second flow portion and an associated second sample segment of the numerical feature, wherein said first statistical dispersion quantity and said second statistical dispersion quantity are computed from the first and second sample segments, respectively. 33. The detection method of 34. The detection method of 35. The detection method of defining further first and second windows by sliding the first and second windows by said delay; and repeating the method to detect an anomaly applying the method to further first and second packet flow portions corresponding to said further first and second windows, respectively. 36. The detection method of 37. The detection method of defining further first and second sample segments by sliding the first and second sample segments by said delay; and repeating the method to detect an anomaly applying the method to further first and second sample segments. 38. The detection method of 39. The detection method of extracting further samples of a further numerical feature associated with a traffic status of the first and second packet flow portions; computing from said further samples additional statistical dispersion quantities of said further numerical feature associated with the first and second packet flow portions; and computing a further variation quantity representing another dispersion change from the first packet flow portion to the second packet flow portion. 40. The detection method of computing a first variation quantity from said dispersion quantities; and combining the first variation quantity and the further variation quantity to obtain said variation quantity. 41. The detection method of comparing the further variation quantity with a further comparison value; and detecting an anomaly in the system in response to said comparison of the further variation quantity with the further comparison value. 42. The detection method of 43. The detection method of updating the computed first statistical dispersion quantity taking into account samples of the second segment not included in the first segment. 44. The detection method of selecting the comparison value from: a fixed value, a variable value, an adaptive value, and a value depending on historical traffic data. 45. The detection method of 46. The detection method of aggregating samples of the numerical feature values of different network flows according to selected packet parameters; and applying the method to said aggregated samples. 47. An apparatus capable of detecting anomalies in a packet switched communication system, comprising:
a collection module capable of storing samples of a numerical packet feature associated with traffic status of a first packet flow portion and a second packet flow portion; a computing module capable of being arranged so as to:
compute from said samples a first statistical dispersion quantity and a second statistical dispersion quantity of the numerical feature associated with the first and second packet flow portions, respectively; and
compute from said dispersion quantities a variation quantity representing a dispersion change from the first packet flow portion to the second packet flow portion; and
a detection module arranged so as to:
compare the variation quantity with a comparison value; and
detect an anomaly in the system in response to said comparison.
48. The apparatus of 49. A packet switched communication system comprising:
an extractor module capable of extracting samples of a numerical feature associated with a traffic status of a first packet flow portion and a second packet flow portion; and an apparatus capable of detecting anomalies connected to said extractor module and arranged in accordance with the apparatus capable of detecting anomalies in a packet switched communication system of 50. A computer program product comprising program codes capable of performing the detection method of detecting anomalies in a communication system according to Description 1. Technical Field The present invention relates to anomaly detection on packet switched communication systems. Particularly, the present invention is related to statistical methods for detecting network traffic anomalies due to network attacks or to communication system failures. 2. Description of the Related Art Several types of attacks are known, such as: (distributed) denial of service ((D)DoS) attacks, scanning attacks, SPAM or SPIT attacks, and malicious software attacks. Denial-of-Service (DoS) attacks and, in particular, distributed DoS (DDoS) attacks are commonly regarded as a major threat to the Internet. A DoS attack is an attack on a computer system network that causes a loss of service or network connectivity to legitimate users, that is, unavailability of services. Most common DoS attacks aim at exhausting the computational resources, such as connection bandwidth, memory space, or CPU time, for example, by flooding a target network node by valid or invalid requests and/or messages. They can also cause disruption of network components or disruption of configuration information, such as routing information, or can aim at disabling an application making it unusable. In particular, the network components (e.g., servers, proxies, gateways, routers, switches, hubs, etc.) may be disrupted by malicious software attacks, for example, by exploiting buffer overflows or vulnerabilities of the underlying operating system or firmware. A DDoS attack is a DoS attack that, instead of using a single computer as a base of attack, uses multiple compromised computers simultaneously, possibly a large or a very large number of them (e.g., millions), thus amplifying the effect. Altogether, they flood the network with an overwhelming number of packets which exhaust the network or application resources. In particular, the packets may be targeting one particular network node causing it to crash, reboot, or exhaust the computational resources. The compromised computers, which are called zombies, are typically infected by malicious software (worm, virus, or Trojan) in a preliminary stage of the attack, which involves scanning a large number of computers searching for those vulnerable. The attack itself is then launched at a later time, either automatically or by a direct action of the attacker. (D)DoS attacks are especially dangerous for Voice over IP (VoIP) applications, e.g., based on the Session Initiation Protocol (SIP). In particular, the underlying SIP network dealing only with SIP signalling packets is potentially vulnerable to request or message flooding attacks, spoofed SIP messages, malformed SIP messages, and reflection DDoS attacks. Reflection DDoS attacks work by generating fake SIP requests, as an example, with a spoofed (i.e. simulated) source IP address which falsely identify a victim node as the sender, and by sending or multicasting said SIP requests to a large number of SIP network nodes, which all respond to the victim node, and repeatedly so if they do not get a reply, hence achieving an amplification effect. SPAM attacks consist in sending unsolicited electronic messages (e.g., through E-mail over the Internet), with commercial or other content, to numerous indiscriminate recipients. Analogously, SPIT (SPam over Internet Telephony) attacks consist in sending SPAM voice messages in VOID networks. Malicious software attacks consist in sending malicious software, such as viruses, worms, Trojan, or spyware, to numerous indiscriminate recipients, frequently in a covert manner. Scanning or probing attacks over the Internet consist in sending request messages in large quantities to numerous indiscriminate recipients and to collect the information from the provoked response messages, particularly, in order to detect vulnerabilities to be used in subsequent attacks. For example, in port scanning attacks, the collected information consists of the port numbers used by the recipients. Attack detection techniques are known which utilize a description (signature) of a particular attack (e.g., a virus, worm, or other malicious software) and decide if the observed traffic data is consistent with this description or not; the attack is declared in the case of detected consistency. Furthermore, anomaly detection techniques are known which utilize a description (profile) of normal/standard traffic, rather than anomalous attack traffic, and decide if the observed traffic data is consistent with this description or not; an attack or anomalous traffic is declared in the case of detected inconsistency. Unlike attack detection techniques, anomaly detection techniques do not require prior knowledge of particular attacks and as such are in principle capable of detecting previously unknown attacks. However, they typically have non-zero false-negative rates, in a sense that they can miss to declare an existing attack. They also typically have higher false-positive rates, in a sense that they can declare anomalous traffic in the case of absence of attacks. Anomaly detection techniques can essentially be classified into two categories: rule-based techniques and statistic-based or statistical techniques. Rule-based techniques describe the normal behavior in terms of certain static rules or certain logic and can essentially be stateless or stateful. In particular, such rules can be derived from protocol specifications. On the other hand, statistical anomaly detection techniques describe the normal behavior in terms of the probability distributions of certain variables, called statistics, depending on the chosen data features or parameters. Paper “DDoS detection and wavelets”, L. Li and G. Lee, Telecommunication Systems-Modeling, Analysis, Design and Management, vol. 28, no. 3-4, pp. 435-451, 2005, discloses a method comprising the step of dynamically applying a discrete wavelet transform to overlapping sliding windows of the byte rate curves in time and looking for sudden changes in the logarithms of the associated energy distribution coefficients in order to detect DDoS attacks. US-A-2004-0220984 describes a method wherein the packet and byte rates are considered as functions of time and, at each time, the mean values and variances of these rates are estimated by using historical data, possibly as Exponentially Weighted Moving Averages (EWMAs), and then a given sample of traffic at a given time is classified by comparing its packet and byte rates with a threshold being proportional to the sum, at the given time, of the historical mean value and the historical standard deviation (i.e., the square root of the variance) multiplied by a positive constant. Anomalous traffic is declared if the threshold is exceeded, i.e., if the observed sample of traffic is classified as an outlier. U.S. Pat. No. 6,601,014 B1 discloses a method where the mean value and the variance are estimated as the EWMAs, with different, but mutually related associated constants. Article “EWMA techniques for computer intrusion detection through anomalous changes in event intensity”, N. Ye, C. Borror, and Y. Zhang, Qual. Reliab. Engng. Int., vol. 18, pp. 443-451, 2002, describes a method wherein EWMA techniques are applied for dynamically estimating the mean values and variances of the event intensity process derived from the audit trail data describing the activities on a host machine in a computer network. Anomaly detection is based on the outlier classification principle, where the thresholds are determined under certain probabilistic models for the event intensity process. Alternatively, anomaly detection is based on the estimated variance only, which is compared with a reference value and an alert is then declared if the ratio of the two values is too large or too small. Paper “Statistical traffic identification method based on flow-level behavior for fair VoIP service”, T. Okabe, T. Kitamura, and T. Shizuno, Proceedings of the 1st IEEE Workshop on VOID Management and Security, Vancouver, Canada, April 2006, pp. 33-38, describes a flow identification method, for VOID media traffic, using the flow statistics such as the minimal and maximal values of the packet inter-arrival time and some characteristics of the packet size distribution comprising the minimal, maximal, average, and median values as well as the total number of different packet sizes occurring in a flow. The statistics are calculated and compared with reference patterns on short time intervals (e.g., 1 second long) and the verification results are averaged over a longer time interval in order to classify a given flow. Article “Load characterization and anomaly detection for voice over IP traffic”, M. Mandjes, I. Saniee, and A. L. Stolyar, IEEE Transactions on Neural Networks, vol. 16, no. 5, pp. 1019-1026, September 2005, describes a method relating to VOID data traffic that consists in computing the empirical variance estimates of the normalized byte rate on overlapping windows and comparing them with predicted variances that are theoretically obtained under probabilistic models for the number of calls per second. At any time, an anomaly is declared if the ratio of the empirical and theoretical variances is greater than a threshold, which falls in the range between one and two. The Applicant has observed that the known solutions are not satisfactory with respect to the achieved false-negative and false-positive rates, computational complexity and memory requirements. This could be due to the fact that it is difficult for the normal traffic in communications networks to be described by stable probability distributions. Moreover, it is difficult to define statistical models describing the communication networks that would give rise to sufficiently low false-positive and false-negative rates. It should be also noticed that the complexity of the statistical methods of the prior art techniques may be unacceptably high for high-speed and high-volume communications networks. The Applicant has noticed that there is a need in the field for achieving an anomaly detection method providing increased reliability and, preferably, reduced computational complexity and memory requirements. In accordance with a particular embodiment, the Applicant has observed that advantages can be obtained by monitoring the statistical behavior of numerical packet features associated with two packet flow portions lying in corresponding time windows that are moving in time. A numerical packet feature (as an example, the byte rate) is any quantity extracted from network packets that can be expressed as numerical data by a real, rational, or integer number in such a way that the feature values can be regarded as mutually close if the difference of the corresponding numbers is relatively small in absolute value. An object of the present invention is a method of detecting anomalies as defined by the appended independent claim The characteristics and the advantages of the present invention will be better understood from the following detailed description of embodiments thereof, which is given by way of illustrative and non-limiting example with reference to the annexed drawings, in which: Hereinafter, a communication system and several embodiments of a statistical anomaly detection method will be described. In particular, the anomalous traffic to be detected can be due to (D)DoS attacks, SPAM and/or SPIT attacks, scanning attacks, as well as malicious software attacks. It should be noticed that the teachings of the present invention can also be applied to detect anomalous traffic due to failures in hardware apparatuses or in software modules operating in the communication system. The particular communication system As known, the Open Systems Interconnection Basic Reference Model (OSI Reference Model or OSI Model for short) is a layered, abstract description for communications and computer network protocol design. It is also called the OSI seven layer model since it defines the following layers: application ( Layers Subsequently, in an extracting step -
- “size in bytes” of an IP packet is the total number of layer 3 bytes in a packet;
- “total number of packets” (N
_{packet}) and “total number of layer 3 bytes” (N_{byte}) in a considered elementary time interval of length ΔT; these two features are statistics (i.e., numerical data) regarding a flow; - “average packet size” in an interval of length ΔT, in bytes, is computed as N
_{size}=N_{byte}/N_{packet}, provided that N_{packet}>0; - “packet rate” R
_{packet}=N_{packet}/ΔT is the number of packets per second; - “byte rate” R
_{byte}=N_{byte}/ΔT is the number of bytes per second. The average packet size can also be expressed as N_{size}=R_{byte }R_{packet}. The reciprocal of R_{packet }is the average inter-arrival time between two successive packets in a flow.
It is observed that the length ΔT essentially specifies the time resolution with which the traffic is monitored and analyzed and can be static or dynamic. The starting and ending times of the first and the last packet in a flow, respectively, as well as the total number of monitored flows can also be extracted in step Furthermore, it is noticed that the extracting step In a computing step A statistical dispersion quantity of a set of numerical data is a measure of how the observed numerical values in the data set are dispersed from each other with respect to the Euclidean or other related metrics among real numbers. Particularly, a statistical dispersion quantity is a real number that is equal to zero if all the data values are identical, and generally increases as the data values become more dispersed. Examples of statistical dispersion quantity are: the variance defined as the mean squared deviation of the numerical values from their arithmetic mean value; the standard deviation defined as the square root of the variance; the mean absolute deviation of the numerical values from their arithmetic mean value; the minimum mean squared deviation of the numerical values from any affine approximation, where the optimal affine approximation of numerical data, minimizing this mean squared deviation, can be determined by linear regression techniques as will be clarified later. According to a particular embodiment, the first Dq In a further computing step The variation quantity Δ is compared, in a comparison step Following a positive (Yes) or negative (No) anomaly detection, the detection method Due to the fact that the computation of the dispersion quantities (particularly, the two variances) is performed in delayed time intervals, the method Any such feature can be used to detect a respective anomaly. In particular, the average packet size N Moreover, in addition to the average packet size N The detection criteria for a detection method employing a plurality of numerical packet features will be described in greater detail later, with reference to a fifth embodiment and to The features N Alternatively, the features for all the flows monitored can be grouped together, in particular, by distinguishing the direction of flows, regardless of the particular source/destination IP addresses. This type of grouping is interesting for a high level analysis which does not pay attention to particular nodes or users, but rather to the network traffic as a whole. Instead of the IP addresses, the features grouping can be made according to the port numbers, which are indicative of the applications of the packets transmitted. With reference to the numerical feature selection, is also possible to extract and use information contained in other layers such as the application layer (layer A first embodiment As regards step Accordingly, two successive windows of (approximately) the same length T are shifted τ units of time from each other and hence overlap over T−τ units of time. In this embodiment, at any given time, the packet flow portion PFP As shown by means of functional blocks in corresponding to a j -
- i indicates a sample number,
- j indicates a window number,
- m
_{j }indicates the end point of a window, - n
_{j }indicates the number of samples in a window. The number of samples n_{j }in the window/segment is in general variable.
In a computing step
Here, S
According to a further embodiment, for obtaining unbiased estimates of variance, it is possible to divide by n Analogously, a segment of numerical features samples corresponding to a (j+1) In a computing step
According to another particular example, the relative squared difference δ
If formula (8) is applied, then the variances according to expression (5) are not explicitly computed. However, it should be noticed that computation (7) in terms of the variances (i.e., mean squared deviations or normalized Σ In a comparison step With reference to the threshold definition and according to an example, the threshold θ may be a fixed value as it relates to the relative instead of absolute change of variances, and not to the change of mean values, which is expected to be considerable even for normal traffic. More precisely, if the samples are drawn independently according to the same probability distribution, then, even if two successive segments are not overlapping (i.e., if τ=T), the relative squared difference of variances δ is bounded most of the time by a small value inversely proportional to the number of samples in a segment, independently of the variance of the samples. In accordance with another example, to account for changes of variance in normal traffic, the threshold θ could be determined possibly from historical data for normal traffic, at a considered network node, in order to keep the false positive rate reasonably low. In particular, it may depend on the time of the day. Particularly, the threshold can be chosen irrespectively of statistical model estimating the traffic behavior. Given an appropriate value of the threshold θ, it is then expected that the probability that the threshold is exceeded, i.e., the false-positive rate is low for normal traffic, whereas at times where there is a change from normal traffic to anomalous traffic, it is expected that the threshold is not exceeded with a low probability, i.e., with a low false-negative rate. It is noticed that the method is robust as the changes are related to variances and not to the mean values. Moreover, it is robust since the method It should be noticed that the value of the delay or shift τ determines the resolution of the above proposed statistical anomaly detection method In particular, it is suggested that using the average packet size N According to a second example of the detection method According to this second example, in step where the first segment (10) is the initial part of the second segment, without the ending part (x Moreover, in step
Moreover, also in step
In step
As is clear from expression (17), similarly as in the first embodiment, the relative squared difference of variances δ Namely, for normal traffic, the variances for the two segments under consideration (such as the windows W It should be observed that this second embodiment of the method In a third embodiment of the detection method At each time, the packet flow portions PFP which is associated with the j which is associated with the (j+1) In step According to a standard EWMA technique, an iterative-recursive computation of the mean value and the variance of the data for every new data sample is performed. The quantities μ with the initial values μ The explicit solution for the mean value is given by:
and the explicit solution for the variance is given by:
According to this embodiment, the variance at time k measures the exponentially weighted average deviation of the initial k data samples from the corresponding mean values at the same times. Using the above identified formulas (20)-(23) the variances of the first and second segments (18) and (19) are computed by setting k=m In the step
In accordance with a fourth embodiment which is alternative to the above described third embodiment, the mean value μ with the initial values μ The explicit solution for the mean value is still given by:
whereas, the explicit solution for the second moment is given by:
In the case when α=β, the explicit solution for the variance is given by:
Therefore, according to this fourth embodiment, the variance at time k measures the exponentially weighted average deviation of the initial k data samples from the overall mean value at time k, which corresponds to the whole moving window at time k. The fourth embodiment is hence more sensitive than the third embodiment with respect to detecting anomalous changes in traffic data. For both third and fourth embodiments, the values of the constants α and β determine the effective number of past samples influencing the variance and mean value estimates, respectively. More precisely, these numbers increase as the constants α and β decrease. In particular, the values of the constants α and β close to one, which are typically used in standard applications of the EWMA technique (e.g., for detecting the data outliers or for reducing the noise in given data) do not appear to be suitable for the described detection method, where smaller values of the constants α and β are preferred. In any case, it is preferable to choose the constants α and β in accordance with the statistical properties of the normal traffic. In general, the faster the variance variations in normal traffic are expected, the bigger the constants should be chosen. Also for the fourth embodiment described with reference formulas (25)-(30), the relative squared difference of variances for two segments (x The Applicant observes that the above description of the methods applying the EWMA techniques enables efficient computation, by using the corresponding recursions. These recursions allow a considerable data memory reduction in comparison with the first two embodiments. The Applicant has performed computer simulations relating to the above described third and fourth embodiments of the detection method In a fifth embodiment Moreover, for each considered numerical packet feature, a relative squared difference of variances δ It should be noticed that the combination step Moreover, different decision criteria may be employed. According to an example, a total variation quantity Δ According to yet another example, the result of the comparison of the variation quantity (such as the relative squared differences of variances) with a threshold is logically combined with changes in other statistical quantities, e.g., the mean values of the numerical packet features chosen. The Applicant notices that the use of the relative squared differences δ expressed by formulas (7), (8), (17), and (24) are particularly advantageous as it has the capacity for detecting an increase as well as a decrease of variance, as can be readily seen from the following equivalent expression:
According to another example of method
or a decrease of variance (i.e., σ
According to a sixth embodiment of method More precisely, for a sliding window segment (x
For a shortened sliding window segment given by the expression (10), instead of computing only the variance by using the expressions (12)-(15) as in the second embodiment, a first statistical dispersion quantity {circumflex over (ε)} A second statistical dispersion quantity {circumflex over (ε)}
The first and the second statistical dispersion quantities {circumflex over (ε)} This sixth embodiment is particularly suitable to detect anomalies in normal traffic data that is non-stationary or correlated on the sliding windows considered, e.g., when the sliding window duration T is relatively long and, in particular, when the elementary time interval ΔT is relatively long (e.g., ΔT=5 min). Namely, the variance σ In a more general setting, the statistical dispersion quantity ε The formula (34) is valid under the assumption of regular data sampling, i.e., that the elementary time intervals have a fixed duration ΔT. The expression is readily generalized to deal with irregular data sampling, by substituting the irregular normalized timings of the data samples in the considered sliding window for the regular normalized timings i−m The generalized mean squared deviation is used in the same way as the variance. If the data is stationary and uncorrelated, then ε A seventh embodiment refers to the same definition of sliding windows as described in relation to the first embodiment According to the seventh embodiment, instead of re-computing the variance for each new sliding segment by using the expressions (2)-(5) and the relative squared difference of variances by using the expression (7) or (8), the variance already computed for the preceding segment as well as the corresponding relative squared difference of variances are being updated. This approach allows one to save in computations. With reference to the data memory requirements, also for the seventh embodiment all the data samples belonging to a preceding sliding window for which the variance was previously computed need to be stored. As described with reference to the first embodiment Also, the following auxiliary value is defined as:
The seventh embodiment includes an initial step in which, with reference to a sliding window with index j=1, the following quantities are computed: S
It is noted that (42) and (43) can equivalently be defined as follows: x′ The relative squared difference of variances can then be computed as:
In the case when the numbers of samples in two successive segments are equal, i.e., n and the update expressions and the expression for the relative squared difference then simplify into:
An eighth embodiment refers to sliding windows of the type described for the second embodiment ( According to the eighth embodiment, another, even more efficient way of performing the computations is to compute the variances for the shortened sliding segments (10), for j=1,2,3, . . . , by using adapted update expressions from the seventh embodiment. Then, for each j=1,2,3, . . . , the variance of segment (11) is computed from the variance of segment (10), together with the corresponding relative squared difference to of the two variances, either by using the adapted update expressions or, alternatively, by using the following simplified and numerically more convenient expressions, in which Δn
In accordance with a ninth embodiment, employing the minimum mean squared deviation ε
More precisely, for the windows according to the first embodiment
Similarly, for the windows according to the second embodiment described above, the arithmetic mean associated with (x
The findings and teachings of the present invention show many advantages. Theoretical considerations and the simulations made (e.g., Moreover, it should be observed that the example of Furthermore, the methods described above are mathematically relatively simple, sufficiently robust to changes inherent to normal traffic, and yet capable of detecting anomalous traffic due to attacks such as (D)DoS attacks, SPAM and SPIT attacks, and scanning attacks, as well as massive malicious software attacks. For example, the detection methods described above do not need complex computations in contrast with the wavelet computation disclosed in the above cited paper “DDoS detection and wavelets”, L. Li and G. Lee. As such, they are suitable to be applied in high-speed and high-volume communication networks. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure. Patent Citations
Referenced by
Classifications
Legal Events
Rotate |