US 7697418 B2
A method for detecting anomalies in traffic patterns and a traffic anomalies detector are presented. The method and the detector are based on estimating the fan-in of a node, i.e. the number of distinct sources sending traffic to a node, based on infrequent, periodic sampling. Destinations with an abnormally large fan-in are likely to be the target of an attack, or to be downloading large amounts of material with a P2P application. The method and the anomalies detector are extremely simple to implement and exhibit excellent performance on real network traces.
1. A method performed by an anomalies detector for tracking anomalous activity in a data packet network, the method comprising:
sampling, at the anomalies detector, over a predetermined time window a number of sampled packets, for determining a packet source address (PSA) and a packet destination address (PDA) of each said sampled packet; and
determining at least one of:
a fan-in count for said PDA by incrementing a counter for said PDA whenever said PSA of said sampled packet is not the same as a most recently seen source address (MRSS) for said PDA, and
a fan-out count for said PSA by incrementing a counter for said PSA whenever said PDA of said sampled packet is not the same as a most recently seen destination address (MRSD) for said PSA.
2. The method of
3. The method of
providing a plurality of observation points throughout said network; and
generating at each said observation point a list with partial fan-in counts for a specified number of target destination addresses.
4. The method of
automatically transmitting said list from each said observation point to an inspection facility at the end of said time window; and
at said inspection facility, for each target destination address, determining an estimated value of the total number of packets Fanin(d) destined to said target destination address by aggregating said partial fan-in counts that have said target destination address.
5. The method of
6. The method of
providing a plurality of observation points throughout said network; and
generating at each said observation point a list with partial fan-out counts for a specified number of suspect source addresses.
7. The method of
automatically transmitting said list from each said observation point to an inspection facility at the end of said time window; and
at said inspection facility, for each suspect source address, determining an estimated value of a total number of packets transmitted from said suspect source address by aggregating said partial fan-out counts that have said suspect source address.
8. A traffic anomalies detector for tracking anomalous activity in a data packet network, the detector comprising:
a sampling unit that samples a number of sampled packets seen by said detector over a predetermined time window;
a storing means that maintains at least one of:
a fan-in count and a most recently seen source address (MRSS) for at least one observed destination address, and
a fan-out count and a most recently seen destination address (MRSD) for at least one observed source address; and
an address resolving processor that determines a packet source address (PSA) and a packet destination address (PDA) of each said sampled packet and performs at least one of:
when said storing means maintains a fan-in count and an MRSS for said PDA, setting said MRSS equal to said PSA whenever said PSA is not the same as said MRSS, and
when said storing means maintains a fan-out count and an MRSD for said PSA, setting said MRSD equal to said PDA whenever said PDA is not the same as said MRSD.
9. The detector of
10. The detector of
one row for each destination address identified by said address resolving processor;
a first column identifying the destination address of each sampled packet;
a second column identifying the most recently seen source address for the respective destination address in the first column; and
a counter column that provides an approximation to the number of source addresses that have said destination address in said first column.
11. The detector of
12. The detector of
13. The detector of
14. The detector of
one row for each source address identified by said address resolving processor;
a first column identifying the source of each sampled packet;
a second column identifying the most recently seen destination address for the respective source address in the first column; and
a counter column that provides an approximation to the number of destination addresses that have said source address in said first column.
15. The detector of
16. The detector of
17. The detector of
The invention is directed to communication networks and in particular to a method for estimating traffic anomalies at a node of a communication network.
Detailed visibility into individual users and business applications using the global network is essential for optimizing performance and delivering network services to business users. In general, current network monitoring tools are able to collect a large amount of data from various information sources distributed throughout the network. For example, Snort Intrusion System for TCP (SIFT), uses an information dissemination server which accepts long-term user queries, collects new documents from information sources, matches the documents against the queries, and continuously updates the users with relevant information. SIFT is able to process over 40,000 worldwide subscriptions and over 80,000 daily documents.
Also, tracking and monitoring traffic in communication networks is particularly relevant for network vendors who wish to provide access to information on their high-end routers; they must therefore devise scalable and efficient algorithms to deal with the limited per-packet processing time available. Traffic monitoring tools are also useful to network providers, as it allows them to filter information relevant to implementing cost saving measures by optimizing network resources utilization, detecting high-cost network traffic, or tracking down anomalous activity in a network, etc. For example, in order to protect their network and systems today, network providers deploy a layered defense model, which includes firewalls, anti-virus systems, access management and intrusion detections systems (IDS). The capacity to detect as fast as possible the propagation of malware and to react efficiently to on-going attacks inside the network in order to protect the network infrastructure is becoming a real challenge for network operators.
Network performance monitoring mechanisms need to perform traffic analysis in a non-invasive way with respect to the observed networking environment. Detecting attacks and point-to-point traffic is a huge problem for network managers in order to better utilize and protect their networks. Providing information that may help them to do this with minimal cost may be a key differentiator between the services a network may offer to users.
From security point of view, a relevant metric to detect malware is to determine the number of distinct sources sending traffic to a monitored destination, referred to as “node fan-in”. Destinations with an abnormally large fan-in are likely to be the target of an attack, or to be downloading large amounts of material with a point-to-point application (e.g. BitTorrent). This is equivalent to determining the sources with the highest fan-out (number of distinct destinations from the source), by interchanging the roles of source and destination; this is known as “node fan-out”. Sources with an abnormally large fan-out may be attempting to spread a worm or virus.
Some of the tools used today for establishing the node fan-in or fan-out perform monitoring of all packets arriving at a node. These tools require that the respective node be equipped with sophisticated hardware/software for packet inspection at high speed. In addition, these tools require a large amount of memory for maintaining the tables with destination/source information for each packet. Evidently, looking at every packet arriving at a node is not practical for large traffic volumes and nodes that are not equipped with sophisticated, expensive hardware component.
Other current methods of traffic monitoring are for example “linear counting” (described by Whang, K.-Y., Zanden, B. T. V., and Taylor, H. M. in “A linear-time probabilistic counting algorithm for database applications”), or “loglog counting” (see details at http://algo.inria.fr/flajolet/Publications/DuFI03-LNCS.pdf), or “Superspreader algorithms” (see details at http://reports-archive.adm.cs.cmu.edu/anon/2004/CMU-CS-04-142.pdf), to list the most relevant. However, all these tools and algorithms have a number of drawbacks that dissuade their use on a large scale: they do not necessarily work with sampled data, are complicated, and require extensive additional programming.
A need has arisen for both the users and network operators to have better mechanisms to monitor network performance, filter network traffic, and troubleshoot network congestion, without introducing any additional traffic on the communication network. This is especially relevant to Internet providers that must comply with SLAs (Service Level Agreements) provided to customers. As Internet architecture evolves, the SLAs now include requirements on the quality of service such as jitter, throughput, one-way packet delay, and packet loss ratio. Additionally, the need to monitor network traffic is prevalent for the underlying Internet protocol enabling the World Wide Web.
In particular, there is a need to provide a tool for estimating the destinations with the highest fan-in and/or sources with the highest fan-out that operate with high accuracy and provide instant feedback. Such tools need also to operate in high-speed routers at line speed, without the need of additional complex HW/SW at the network nodes. There is also a need to provide a solution that is extremely simple to implement and exhibits excellent performance on real network traces.
It is an object of the invention to provide a method for identifying, within a communication network, the destinations with the highest fan-in, or the sources with the highest fan-out.
It is another object of the invention to provide a fan-in/fan-out method that is simple to implement, works well with sampled data, and does not require equipping the network nodes with complex additional software applications or sophisticated equipment.
Accordingly, the invention provides a method of monitoring traffic for tracking anomalous activity in a data packet network, comprising: a) selecting an observation point in the network; b) at the observation point, sampling over a predetermined time window T, every n-th data packet, for determining the source address and destination address of the sampled packet; and c) determining one of: a fan-in count for each destination address seen at the observation point, a fan-out count for each source address seen at the observation point and both a fan-in and a fan-out count for each destination and respectively source addresses seen at the observation point.
The invention is also directed to a traffic anomalies detector for tracking anomalous activity in a data packet network; comprising: a sampling unit for sampling every n-th data packet seen by the detector over a predetermined time window; an address resolving processor for determining the source address and destination address of each the sampled packet; and storing means for maintaining one of a fan-in count for each destination address identified by the address resolving processor, a fan-out count for each source address identified by the address resolving processor, and both a fan-in and a fan-out count for each destination and respectively source addresses.
Detecting attacks and P2P traffic is a huge problem for network managers. Therefore, information that may help detecting malware patterns may be very useful as it will enable the network managers to better utilize and protect their networks. Advantageously, the method of the invention enables identification of traffic anomalies with a high probability, even in networks equipped with high-speed routers, and provides almost instant feedback on possible malware.
Another advantage of the invention is that it provides a solution that is extremely simple to implement, does not require additional complex HW/SW at the network nodes, and exhibits excellent results on real network traces.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of the preferred embodiments, as illustrated in the appended drawings, where:
As indicated above, the invention provides a method for identifying traffic anomalies based on estimating the fan-in at network nodes, i.e. the number of distinct sources sending traffic to a node. Destinations with an abnormally large fan-in are likely to be the target of an attack (e.g. a large D/DoS attack on the node, or a worm in progress), or to be downloading large amounts of material with a P2P application, such as illegal copying of video content. Traffic anomalies may equally be identified based on the estimated fan-out at network nodes, by interchanging the roles of source and destination. A high fan-out at a node will identify a worm propagating from that node or the source of a P2P download. In the following, we will refer to these two traffic patterns identified by the instant method collectively as “malware”, even if P2P traffic may not necessarily indicate an attack on the respective node/user. Also, in this specification, the invention is described for fan-in estimation; fan-out is analogous.
Let's consider a network 5 where a plurality of nodes communicate between them as shown in
The fan-in of an address (address d in this example) is defined as the number of different nodes (sources s) that send at least one message to address d from the observed message population. In order to observe the messages with destination d, one or more nodes are provided with anomalies detectors according to the invention. In this example, an anomalies detector 50 connected at node i will detect and count message m1(s1,d) and message m2(s2,d. As such, fan-in of destination address d measured at observation point i is Fanin(d),i=2, since detector 50 measures two messages with this destination. It is apparent that not all messages for node d pass over the observation point i. If an additional anomalies detector 50-1 is provided at node c, a third message with destination d will be detected at this node, so that Fanin(d),c=1. The partial fan-in measurements from nodes i and c can be aggregated to obtain the fan-in for destination d. In this example, Fanin(d)=2+1=3.
Similarly, the fan-out of an address is defined as the number of different nodes which receive at least one message from address s from the observed message population.
The anomalies detector 50 of the invention is designed to be simple enough to require a short number of operations on every sampled message so that these operations can be implemented in a slow memory access time environment and can be used for monitoring traffic at a network monitoring point where the incoming traffic rate is very large. As a result, the traffic anomaly detector may be used in places where processing resources are scarce, as a first line of anomaly detection. For example, it may be used at a node to allow basic detection of malware, or to establish if a certain flow should be sent to more sophisticated applications for a further in-depth inspection. In this later case, the traffic anomaly detector may be provided on a DPI (deep packet inspection) card present at the respective node, and then appropriate actions can be taken for the respective flow (filtering, etc).
The inspection facility 60 could also be provided on a DPI card on one of the nodes, or could be an application running on a NMS. This will allow more efficient use of processing resources by pre-filtering the traffic for the entire network based on information collected from a selected number of individual nodes.
It is evident that the accuracy of the Fanin(d) or Fanout(s) measurement increases with the number of observation points, which means that more nodes may be equipped with anomalies detectors 50.
The traffic anomalies detector is based on infrequent, periodic sampling. The anomalies detection is estimated over a time window T, which is configurable by the network operator: each node may use a different time window for collecting the measurements, according to its operating parameters. Also configurable is the number of destinations with the largest fan-in supervised; the repeat offenders may be defined using a “repeat offender” threshold, as seen later. For example, each node may assemble a list 20 (see
For fan-in observation, memory 15 maintains two data structures, namely a destinations table 10 and optionally, a flow table 20. The destinations table 10 has one row for each expected destination, and three columns: a first column identifies the destination d of the sample, a second column identifies the most recently seen source (mrss) for the respective destination, also referred to as the previous source, and a counter column (c) that provides an approximation to the number of sources that have the same destination. For fan-out observation, a sources table 10′ (not shown) is used, which has one row for each expected source, and three columns: a first column identifies the source s of the sample, a second column identifies the most recently seen destination (mrsd) for the respective source, and the counter column (c) that provides an approximation to the number of destination that have the same source.
Preferably, the table 10 is implemented with a hash table, where each destination/source address for every message m(s,d) observed by the anomalies detector is associated to the record in table 10 with index H(d) for the selected hash function. The size |HT| of the destinations table 10 is much smaller than the number N of network nodes. A hash table is used for memory savings; a table with as many rows as there are possible IP addresses, would be huge and difficult to maintain. Other data structures may be equally used, the invention not being limited to keeping the destination information in hash tables. In any case,
The flow table 20 is optionally provided for tracking the address of the nodes with a potentially large fan-in/fan-out. A destination/source address is included in this table if the count in table 10 is larger than the configurable threshold Th.
Sampling unit 12 collects every nth packet arriving at the observation point i and provides it to the address resolving processor 14. Experiments show that good results are obtained when between e.g. 1,000 and 5,000 packets have been sampled. Results were considered “good” when confirmed using known malware detecting algorithm methods, such as these described by Xu et al., in “Joint Data Streaming And Sampling Techniques For Detection Of Super Sources And Destinations”, Technical Report, College of Computing, Georgia Institute of Technology, July 2005, and by Flajolet and Martin, in “Probabilistic Count Algorithm For Data Base Applications”, Journal Of Computer And System Sciences, 31(2):182-209, October 1985.
This experimental observation may be used to configure the duration of the time window T. For example, at a traffic rate of 1 Gbps, assuming 576B packets and a 1:8192 sampling rate, 5,000 packets would be available every 188 seconds (5000×8192×576×8/1000000000). In many cases, good results are obtained after only 1,000 packets have been sampled.
In real life most of the packets are actually very small (i.e. about 64 bytes. An operator would have to tune this parameter to his choice of accuracy vs. timeliness based on his experience and requirements. In addition, as this is intended as the ‘first line of defense’, a little inaccuracy in the results can be accepted, because it would likely be used to identify nodes that are consistently misbehaving, and then these nodes would be subject to further analysis (via DPI or other methods.) As a practical example, the operator would use the anomalies detector in an initial configuration, let's say for a traffic rate of 1 Gbps, 1:8192 sampling rate, T=3 minutes. Then, if the system as a whole is providing the measurements too slowly, the operator would change the T to two minutes, at a cost of decreased accuracy.
Returning now to
Table management unit 16 performs management of the data in the tables, such as table initialization at start-up, initialization of the counters after expiration of the time window T and also performs general control functions for the traffic anomalies detector 50. Table management unit 16 also initiates transfer of data from destinations table 10 into flow table 20, by selecting the destinations with a counter higher than threshold Th. The number m of records transferred from table 10 into table 20 is configurable. As indicated above, this option may be used if it is desired to keep the information about the offenders at the node. The address resolving processor 14 at the node is also capable of clear certain of the repeat offenders form the list. For example, the repeat offenders may be analyzed locally and determined to be legitimate, and thus exempt from further analysis.
Preferably, once the window T has ended, all the information in the destinations table 10 is transmitted to the table management unit 16, that inserts the table and other related information (such as date and time, node location, etc) into more complex tables to be further analyzed by humans, or various monitoring systems in an NMS. Table 10 with the additional related information may be sent to the inspection facility 60 at the end of every time window T, or just the flow table 20 with the related information may be sent to the inspection facility 60 at regular intervals of time.
Next, the address resolving processor 14 check to see if the most recently seen source mrss for destination d is the same as the source of the current packet. If it is different, as shown by branch “No” of decision block 36, the address in the most recently seen source for destination d is set to be the same as the source of the current packet, step 37. The counter c for this destination is incremented, step 38 to show that an additional packet was collected from the flow destined to d. We note that the value in counter will be greater than the actual fan-in of the destination, but experiments have shown that the ordering is approximately the same as that of the actual fan-in. If, on the other hand s=mrss, branch “No” of decision block 36, the counter management unit 18 leaves c unchanged.
At the end of the time window, as shown by block 39, m (say m=10) records for the destinations with the largest value in the counter column are stored in the flow table 20, step 40, and the time table 10 is emptied, step 31. If t<T, branch “No” of decision block 39 the next sample is collected by the sampling unit 12 and processed as described above.
Results of an experiment effected with the anomalies detector on a trace of 100 k packets with approximately 25 k nodes is provided next. The experiment was run on a trace from the University of Memphis' OC-3c link to Abilene's KSCY (Kansas City). The mixture of packet sizes was typical for the Internet, with roughly 50% of the packets size being 64 B long, 25% being 1500 B, and an average packet size of approximately 500 B. Roughly 25,000 distinct addresses were represented in the trace, which has about 100,000 packets. A sampling rate of 1:100 was used. At such a bandwidth, the trace would have represented about 2.5 seconds, but typical utilization on this link is roughly 10%, thus the trace was over (about) 25 seconds.
The detector returned the top ten fan-out counts. Then the trace was analyzed by looking at every packet, and the actual top-ten fan-outs were noted. The two lists agreed on nine candidates shown in the table below.
Further experiments have indicated that what is most significant for accuracy is the number of packets sampled. Thus, trade-offs of sampling rate vs. utilization vs. time window, may be used advantageously.
As an example, the fan-out of each source address was measured at a particular router on the two dates Jul. 11, 2001 and Jul. 13, 2001 using the anomalies detector of the invention. If the source address (converted to integer, rather than the normal four byte representation) is plotted on a horizontal axis, the vertical axis is represents the fan-out as measured by this method. A horizontal line is would indicate a ‘normal’ maximum fan-out that would be expected, and may be obtained by observing the maximum fan-out over many days. Fan-outs that are abnormally high as compared with previous days may be detected. The addresses corresponding to these abnormally high fan-outs are determined from the horizontal axis, and then further investigation of what was happening with these addresses can take place. When the method was run on real traces from Jul. 11, 2001 and Jul. 13, 2001, an anomaly was caught due to the release of the famous “Code Red” worm.