Introduction



Introduction

High Bandwidth Intrusion Analyst Challenges

Defeating malicious attempts to attack any network is difficult. The attacker has all the advantage of stealth, surprise, tenacity, and often even skill. Defending a network becomes even more difficult as the scale of the network increases. Today’s fast-paced information intensive society requires that every member of an organization have a connection to the internet. This is true for small families to the medium sized business of a few hundred employees to the largest government organizations employing millions of people. Each of these connected computers provides an opportunity for an attacker to sneak into your network to wreck havoc or steal vital proprietary or classified information.

Historically IP addresses were doled out by the InterNic in class sized blocks. The entire IPv4 address space is broken down into 5 classes ranging from a Class A network assignment of 16 million different hosts for a particular network to a Class B network of 65,535 hosts to a Class C network with 256 hosts. Protecting even a small Class C network can be a daunting task for an individual intrusion analyst. Sadly in today’s economy few Class B size organizations can afford to pay more then one intrusion analyst, and the poor guy is usually worked to death. Even Class A sized organizations may only have a small team of analysts dedicated to protecting the network from threats both foreign and domestic.

The large number of hosts on a network is only half the problem in the large organization intrusion detection realm. The other major challenge is the high bandwidth that a large set of hosts requires. Even a small Class C size network can command a bandwidth in the T1-T3 range (1.54 – 45 Mb/s approx. 16.6 – 486 Gigabytes/day) depending on the applications desired by the network residents. Larger organizations, such as Class B and A size networks, typically require multiple T3’s and even OC-3’s (155 Mb/s approx. 1.67 Terabytes/day) or OC-12’s (622 Mb/s approx. 6.72 Terabytes/day) to service their customers. Such incredibly large bandwidths are nearly impossible to sample at line rate to determine whether there are intrusions resident within the data stream. Few modern devices can meaningfully process such large amounts of data. Devices such as routers that can handle the data rate are typically performing only a single function such as examining the header of a packet to determine its next hop. Trying to inspect the traffic with a Network Intrusion Detection System (NIDS) at such rates is just about impossible.

Organizations that retain large bandwidth and numbers of hosts usually wish to take active measures to protect their assets. Some example organizations that need to protect their data include large global businesses like AT&T, GE, or Bank of America. These businesses stand to loose billions of dollars if their networks are penetrated and proprietary or financial information is exfiltrated from them. Other examples include large universities which need to protect not only research and development, but they also need to provide a safe environment for students to explore and communicate. Lastly various world militaries rely on the Internet for transmission of logistics information and communication. If these assets aren’t protected, soldiers and sailors could perish in combat.

Each of these large organizations typically hires a small number of network intrusion analysts to help defend their network. In fact more often then not the network administrators, who have more then enough on their plate as it is, are conscripted into this field. Unfortunately the number of qualified intrusion analysts is fairly small. Probably on the order of a few tens of thousands world wide! This small body of individuals is expected to constantly keep up with all the newly reported vulnerabilities as well as analyze the never ending stream of audit data produced by their organizations’ NIDS, firewalls, or host based intrusion detection system. This is easily illustrated by looking at any large metropolitan city. Typically there is only a limited number of policemen and prosecuting attorneys who have the time and energy to catch an ever changing body of criminals intent on committing greater and sneaker crimes. More often then not they are overworked and habitually overlook crimes that would have been preventable had they had more assistance.

Not only are the number of hosts, and the amount of bandwidth major deterrents to good intrusion analysis, but the traffic transiting the protected networks itself complicates an intrusion analyst’s life exponentially. Frequently the traffic transiting an organization’s network is not the type of traffic that the network architects designed the network to handle. Applications like chat (AOL Instant Messenger), games (Battle Field 1942), and the worst P2P (Kazza, Morpheus) can easily ‘clog’ a network depriving the members of the organization the bandwidth they need to perform mission critical functions.

A network intrusion analyst’s job is complicated by these non-mission related applications in several ways. Most NIDS perform their job by searching the incoming traffic for key words or signatures. These key words frequently come up in applications like chat resulting in an inordinate amount of false positives. It is also difficult at the high bandwidth rates mentioned above to parse and prioritize the data streaming across one’s network in order to help the analyst sort out his work load. Lastly these applications are typically forbidden by the network administrator of an organization. The users of such censured applications will frequently do whatever it takes to hide their use of these applications. For example illicit users will typically set their P2P software up to communicate over port 80 so as to hide their traffic in the glut of web traffic typically assigned to traverse port 80. Network defenders often must spend an inordinate amount of time tracking down these non-standard uses of P2P in order to ferret out the offenders and remove them from the network.

But the stick that most often breaks the network defender’s back is the lack of good tools to help them with their job. Because of the massive amount of data an intrusion analyst is responsible to analyze finding good tools to help him/her is often a major challenge. Software designed to help the intrusion analyst is often authored by non-analysts, confounding their efforts and slowing them down. Visualization tools are rarely up to the task of displaying the thousands or millions of elements that a common network may include. The network appliances designed to assist them often fall short. For example Cisco has recently provided a feature, called Netflow, on their routers that reports the 5-tuple (Source IP, Destination IP, Source Port, Destination Port, & Protocol) and time of any session that has passed through the router. This meta-data is often touted as a reduction in data solution that provides an analyst all he/she needs to detect intrusions. Unfortunately Netflow records do not provide any of the payload for any of the sessions that have traversed the router. Without this payload, an analyst is sore pressed to prove that any of his/her conclusions about a particular intrusion event is true or not.

Even NIDS, which are designed from the ground up to help an analyst discover intrusions in the network data, often time create more frustration then help for a network defender. Most NIDS are not designed to handle extremely high data rates like those used by today’s Class A and Class B enterprises. Many of today’s commercial grade NIDS claim to be Gigabit capable, but are in truth merely Gigabit enabled. Meaning that they can process data at Gigabit line rates, but frequently bog down when particularly large traffic spikes crash against the NIDS. The solution most often suggested to mitigate this problem is to deploy multiple NIDS on a network at different strategic locations in order to spread out the burden on the NIDS. Unfortunately this solution often causes more problems then solutions as the number of alerts, both true positive and false positive, increases exponentially.

High Bandwidth Intrusion Analyst Solutions

The best solution developed so far to help mitigate all the difficulties related to high bandwidth intrusion detection is the defense in depth model. The idea is similar to that of a medieval castle. Such a castle had several defense mechanisms that individually wouldn’t have kept out invaders, but when combined provided a substantial defense. Defensive structures like a moat, a drawbridge, a portcullis, inner and outer walls, sentries and an inner and outer bailey all provided layers of defense which protected the castle and the tenants within as they went about their daily lives.

Today’s modern castles are best protected by router access control lists, firewalls, NIDS, and host based IDS. Deploying all of these devices allows a network defender to rely on several avenues of protection instead of placing all of his eggs in one basket. Routers act as the drawbridge in the above illustration. They can provide a single point of entry that is controlled by the network administrator, allowing traffic and communication over only a restricted set of ports and protocols. An example is the Juniper line of routers that provide a host of filtering options. Firewalls are often used in conjunction with a router to more finely control access to a network. Where a router may be too busy routing traffic to keep track of connection state across the network boundary, some firewalls can determine the validity of certain types of traffic based on whether or not a host within the network had initiated the connection in question or not. The CheckPoint firewall is an example of this type of firewall. The Network Intrusion Detection System (NIDS) acts as the sentries watching the travelers coming in and out of the castle. Depending on the bandwidth of a particular site, a NIDS can examine all of the incoming and outgoing packet traffic and determine whether a particular packet contains malicious or benign code. The ISS RealSecure is one of the worlds most popular IDS solutions. The last line of defense is a host based IDS. Unlike the previous 3 devices, this defensive element is not a separate device monitoring the network. Instead it is a piece of software that runs on a host constantly monitoring it’s audit logs or process stack seeking out malicious actions and alerting the network defender that illicit activity is taking place. Tripwire from Tripwire inc. is the classic example when host based IDSs are discussed.

While all of these devices are necessary for a defense in depth solution, perhaps the most important device is the NIDS. While a NIDS has several fundamental short comings, including its ability to handle high traffic loads mentioned above, it is still the best tool available for helping the analyst examine the traffic flowing across the network he/she’s trying to defend. None of the other tools gives the network defender the opportunity to stop an attack while it is on-going or even before it is successful! A NIDS also has an ace in the hole that the other solutions don’t provide, invisibility. If deployed properly, a NIDS can be completely invisible to any attacker seeking to do damage to a network. Just like a drug dealer is more likely to be caught in a sting operation then by a cop in front of a police station, an adversary is more likely to be caught by the IDS he/she doesn’t know about then the firewall he/she can discover and avoid. Lastly, unlike the other devices in the above defense in depth strategy, a NIDS is designed specifically to assist the network defender. Security is not a side effect of the device but is the stated goal and purpose of all NIDS. From buffer overflows to scanning, unlike the other defense in depth devices, a properly implemented NIDS can discover many types of network intrusions and intrusion related activities. For these reasons and others, research focused on enhancing the NIDS element in a defense in depth architecture will provide the most benefit to the analyst.

Hacker Methodology

Before this discussion on improving the toolset of a network defender continues it will be illustrative to discuss what the generic five step attack model of an adversary looks like. This model is followed by all attackers, regardless of their skill, specific attack, or stated goal. In order to clarify the model, an illustration of the process a thief goes through while trying to steal jewels from a safe inside a house will be presented.

Step 1. Before a thief can steal anything he/she has to know where it is located. This requires some sort of surveillance of the house where the safe is located. Including activities like finding the house on a map, driving past the house, and taking pictures at different times of the day and week in order to understand as much as possible about the target. The thief’s main purpose is to find a weakness in the house’s defense. In a similar manner all network attackers must first find the computer they are trying to attack through the network before they can actually launch an attack. This is accomplished in the cyber world through network probing and scanning. Basically this entails sending out specifically formatted datagrams (like a SYN scan) to illicit a response from the targeted host revealing the existence of some vulnerable piece of software. We will cover this step in more detail later in the thesis.

Step 2. The next step a thief takes after he has finished his surveillance is to actual make an initial penetration into the house. For instance the thief may have discovered that there is no alarm on the rear window over the garage. To break into the house the thief will first break the window and get inside the house. Likewise our cyber-attacker must select an exploit from his bag of tricks and execute it against the victim host in an effort to establish a foothold on the victim. An example might be any of the recent Internet Information Server (IIS) buffer overflow vulnerabilities published in regard to Microsoft’s web server. This exploit will typically only grant the attacker user-level access to the host.

Step 3. Once the thief has made it into the house he must now locate the safe and penetrate it in order to acquire the jewels that he/she was after in the first place. In the cyber world, the jewels of a host are typically protected by the root or administrator account. For an attacker to gain access to these jewels he/she must execute further commands or exploits designed to escalate his/her privileges on the system. Without this privilege escalation, the attacker is limited in the destruction he/she can inflict.

Step 4. Once the thief has broken into the safe, he/she can begin to take out the jewels he/she was after in the first place. A good thief will also try to cover his/her tracks so that the owner of the jewels won’t discover that they are even missing until long after he/she is gone. Similarly the cyber adversary will quickly cover his/her tracks by erasing evidence of the intrusion from the host’s logs. Attackers also frequently patch the vulnerability that they used to break into the host in the first place, to keep other would-be attackers from finding their newly acquired victim.

Step 5. Of course any good thief can’t just still on set of jewels. He/she will always try to find other houses with safes full of jewels that need to be liberated. Since neighbors frequently leave keys with each other in case they get locked out, a good thief will scour the house he has finished robbing in search of its neighbor’s keys. This provides the thief an easy access method to his next target. A cyber attacker will almost always take the same step, for similar reasons. He/she will typically want to expand his power-base of victim hosts, or he/she will need to continue to harvest important data to sell later to the highest bidder. In the cyber world most hosts on the same LAN will enjoy a trust relationship designed to more easily facilitate access to the resources on the LAN from anywhere on the LAN. These trust relationships can be easily exploited to allow the attacker to take over all the other machines in the network quickly and quietly.

Signpost for paper

The problem this research seeks to solve is to reduce the workload on the intrusion analyst and allow him/her to more accurately discover network attacks against his/her domain. One solution to this problem is to examine the elements of the 5 step attack model, described above, and find a way to take advantage of the constrained path that all network attackers must follow in order to carry out their goals. It turns out that one of the best ways to help the intrusion analyst is to help him/her reduce the false positives that he/she is faced with every time he/she examines the output of his/her NIDS. This can be easily accomplished by examining the scanning data that is constantly bombarding the network and use that data to predict or guess what alerts should be more accurate then others. The network defender can then spend the majority of his/her time focusing on the alerts that will bear the most fruit.

In order to illustrate this methodology, the researcher has acquired two months worth of Snort logs from the primary University of Maryland, Baltimore County (UMBC) NIDS. Snort is an open source IDS that the university has deployed in order to discover network attacks against their enterprise. The IDS captures alerts for both regular attack data and scanning data. Using several PERL scripts, the researcher has parsed out the logs, graphed the results for easy analysis, and provided a methodology that will assist any large bandwidth intrusion analyst in the pursuit of his/her mission to defend his/her network.

In order to clarify this research the following elements will be examined. First a review of some of the previously related work in the intrusion detection field will be provided to the reader in order to lend context to this discussion. Following that this paper will discuss network scanning and probing in greater depth to illustrate why focusing on scanning records is both possible and useful. A discussion of performing network intrusion analysis with a NIDS in general and SNORT in particular will help the reader understand why the researcher chose this solution. Afterward the researcher’s experiment will be described and the results of the analysis presented. Finally some conclusions and future avenues of research will be tendered.

A Description of Scanning

Background TCP/IP

As previously described all attackers must follow a basic five step plan in order to successfully compromise a victim host. The first step of that plan is network scanning or probing. Some definitional matters should be clarified before we continue. Network scanning is defined by [fydor] as “the process by which an interested party discovers exploitable hosts by sending IP datagrams to illicit a response from the remote host or computer”. Others such as [Northcutt] would call this network mapping, where an adversary sends packets to every host on a network seeking out machines that are active and responsive to stimuli. Whatever you label it, the act of discovering which IP addresses are associated with operational computers on a network is the first step of any intrusion. After a set of hosts has been discovered to exist, an adversary typically performs a host scan or a probe. This activity is similar to network scanning or mapping but instead of targeting every machine on a network, the adversary searches every port on an individual computer seeking information about what services that host might be offering that can be exploited. Often times an attacker will combine the activities of network mapping and probing by searching entire networks for hosts who respond to a limited set of ports. This is done to speed up the attackers’ job so that he/she can more quickly determine who is most likely to fall for his particular exploit.

The very nature of the Transmission Control Protocol / Internet Protocol (TCP/IP) protocol suite lends itself to susceptibility to network scanning/probing. Beginning with the TCP three way handshake we can see that all hosts wishing to establish a reliable connection must be able to connect to a foreign host and be assured that their connection has been accepted by the foreign host. This three way handshake is made up of an initiating host transmitting a TCP packet with the SYNchronize (SYN) flag set to a second host providing a service that the initiating host would like to access. The server then responds to the initiating host with a TCP packet with the SYN and ACKnowledgement (ACK) flags set. Finally the initiating host responds to the server with a third TCP packet with the ACK flag set. Unfortunately this built in sequence of challenge and responses provides the adversary with a legitimate access route for determining the availability of services running on any host that provides said service for use by remote users. The potential victim host has no choice but to respond to any and all connection attempts or risk turning away potential valid users as well as the malicious ones.

The second most commonly used protocol on the Internet is the User Datagram Protocol (UDP). Unlike TCP, UDP is a connectionless, unreliable protocol that does not guarantee the participants that their communication will be reliably delivered. Thus UDP does not have a handshake process to coordinate communication between parties. Despite this, services made available via UDP can still be queried by malcontents seeking to attack a remote host. When a source hosts wishes to connect to a remote server offering a service over UDP, the host sends a single packet requesting the service from the remote server. If the service is available the remote host will automatically respond with any data that the source host had requested. If the remote sever does not provide the service requested by the source host an ICMP (which we will discuss momentarily) error message is sent in response. Thus no matter whether a service is available or not on a remote host, an attacker can query a remote server to determine the availability of any service and guarantee themselves a response. Similar to TCP, hosts providing UDP based services cannot reject incoming requests and out going error messages without preventing legitimate users from accessing the service.

The third and final most popular protocol in the TCP/IP suite is the Internet Control Message Protocol (ICMP). ICMP is provided as a medium for transmitting error and status messages for all the hosts on the Internet. As seen above ICMP is used by other protocols to communicate error conditions. In addition ICMP can be used to solicit information from remote hosts. This information is typically diagnostic in nature and includes things like timestamp requests, address mask requests, and the most popular use, the echo request. The echo request, or PING, is a capability provided to hosts on the internet in order to discover whether or not a remote host is available for communication or not. The reader can quickly ascertain that such a program lends itself very easily to the attackers cause. Disabling ICMP is of course a possible course of action that many networks take. However there are drawbacks to removing or blocking the functionality of a protocol designed to help the benign denizens of the Internet. Drawbacks like increased waiting time for transmission time outs or an inability to perform legitimate testing of a remote network to determine whether it requires repair.

In fact it bears mentioning here that scanning/probing in and of itself is not illegal. In 2000 District Court judge Thomas Thrash ruled that merely scanning a foreign host or network was not an illegal act. Judge Thrash decided that scanning did not violate either the Georgia Computer Systems Protection Act or the Federal Computer Fraud and Abuse Act, because both acts are designed to protect computers from malicious attack or unauthorized entry. A scan by its very nature is not an intrusion. The adversary doesn’t need to actually gain access to the host, but merely needs to query the host. To carry on the illustration of the thief from earlier, a scan is analogous to rattling the door knobs or tapping on the windows of a house that a thief is considering breaking into. While this action may forecast the intent of the thief, the action itself is not actually illegal. Neither is such an action illegal in the cyber-world.

Types of Scans

There are many different types of scans that an attacker can mount against a victim network or host depending on what type of information the attacker is seeking to gain or how sneaky he/she wishes to be. They range from the simple and quick half–open SYN scan to the much more complicated and stealthy null-host scan. Other scan types include scans that can reveal what operating system (OS) a victim host is running, or search out a specific service to be exploited. This discussion will include a description of a representative of each of these classes of scans. For a more detailed treatise on all of the possible scanning techniques the reader is encouraged to review [],[], and [].

The half-open SYN scan gets its name from the action it takes. As mentioned earlier all TCP communications are predicated by the three-way hand shake. Since all TCP enabled servers must respond to a SYN packet with a SYN-ACK packet, an adversary can send a SYN packet to any host it wishes to discover, await the SYN-ACK response and then not reply with the expected ACK. By only committing to half of the connection the attacker can discover what he/she is looking for and not waste processing time on the platform he/she is launching his attack from by waiting to complete the full three-way handshake. This scan type will effectively tell any adversary exactly what services are available on a given port at a given host. It is very easy to execute in most tools, and is in fact usually the default type of scan available to an attacker on any scanning tool. The drawback, from the attacker’s perspective, is that most hosts on the internet log such failed half-open connections in an attempt to help the local system administrator trouble shoot his/her network. Therefore the predominate attacker to use this type of scan is the so-called ‘script-kiddie’ or more lately the malicious code known as a worm has been known to use this technique to discover new victims automatically. Both types of adversary are not interested in subtlety or stealth, but instead are interested in taking out targets of opportunity in as rapid a fashion as possible, before they get caught. Half-open SYN scans provide a very quick result to the hurried, un-careful attacker.

The null-host or dumb-host scan is much more sophisticated scan requiring three hosts instead of the typical two, attacker & victim. The third host is a live host someplace else on the Internet that is currently not sending out any other packets. Such null-hosts are easy to find in the middle of the night in small church offices for instance. The basic scenario is as follows. The attacker sends spoofed SYN packets to the victim host seeking a connection from the null-host to the victim host. Spoofed packets are packets sent by one host using another host’s IP address as a way to fool the victim host into communicating with the null-host instead of the attacking host. This type of spoofed scan is very stealthy and will not reveal the attacking host in any logs possibly gathered by the victim host, because all communication with the victim host will appear to have originated from the null-host. If the port or service is available on the victim host it will send a SYN-ACK packet back towards the null-host. The null-host will respond with a ReSeT (RST) packet indicating that it did not wish to communicate with the victim host. If the port or service is not open, the victim host will just send a RST packet to the null-host, which the null-host will drop without response. While these actions are not overly complicated, the final step in this process requires some sort of previous compromise by the adversary of a host on the network somewhere between the victim host and the null-host. This previously compromised box will be set up to capture all the packets on the segment which will be analyzed by the attacker in order to discover the results of his/her null-host scan. While looking through the collected traffic the attacker will examine the IP-ID of the packets sent between the victim and the null-host. If The IP-ID’s increment one at a time then the attacker surmises that the port/service is unavailable. If the IP-ID’s increment at some other interval, it is assumed that the port/service is available for exploitation. Obviously this type of scan is a much more time consuming and detail oriented process. Attackers committing this type of scan are interested in keeping their activities well under the radar of most intrusion analysts. And in general they succeed because of all the complications the typical intrusion analyst is faced with, described earlier. Such attackers are typically classified as ‘skilled-hackers’ or state-sponsored hackers. Either classification implies an attacker that is well trained and patient in achieving its goal.

The next major type of scan we will examine is the operating system (OS) fingerprint scan. The body of work that determines the standards that all hosts communicating over the internet should adhere to is called the Request For Comments (RFC) document set. These documents lay out very specific details that all OS implementers should follow to ensure that their product will be able to communicate with other OSs developed by other parties. Unfortunately the world’s OS developers rarely follow the RFCs to the letter. This creates a wide variety of possible responses to various stimuli on the Internet. This wide variety allows attackers to pinpoint, with an amazing degree of accuracy, the exact operating system run by practically any host on the Internet. This discovery is performed by sending specifically formatted packets in any of a variety of protocol formats to a victim host. If the protocol of choice is TCP, the attacker may choose to send a specific sequence of flags (i.e. SYN-FIN) that certain OSs will respond to in different very fingerprintable ways. In fact [ofir arkin] has shown that a single ICMP packet is all that is necessary to determine the OS of a particular host. OS fingerprinting is made very simple by most tools, and depending on the patience of the attacker, can be carried out in either a loud, obvious manner, or in a more stealthy, undetectable manner. As mentioned before, the adversary willing to wait longer is typically the more skilled and better funded attacker and is more likely to employ the stealthy approach to OS detection. Armed with the knowledge of what OS a particular victim host is running, the attacker can then ensure that the exploit he/she is about to execute will work on the victim host.

The final major type of scan is the application specific scan. Using one of the scan types mentioned previously, an attacker can quickly scan an entire network looking for any host providing a service that the attacker wishes to exploit. These type of scans are typically very noisy and easy to observe in network traffic. If examined in a log, the intrusion analyst will typically see all the IP addresses of an entire netblock have been connected to all on the same port. The hosts that offer the service that the attacker is looking for will respond in a manner similar to the scans described previously. The attacker then has a list of hosts that can all be quickly exploited with the same exploit. The type of attacker to use this type of scan is usually the low-end ‘script-kidde’ described earlier or a malicious worm of some type or another.

Scan tools

Since almost all scans require multiple packets to be sent to the victim host/network, a plethora of software tools have been developed to automate this task. Some of these tools and their intended use will be described here. These scanning tools are of various sophistications and can be wielded by any capable attacker.

All scanning tools share several common features. They are usually very small applications that can be easily transferred quickly across a network. Scanning tools often have multiple features. These tools include the ability to initiate several different types of scanning, as well as the capability to perform these different types of scanning in parallel. More often then not the tools are designed to be easy to use, and many even are provided with a graphical user interface (GUI). Some of the most well known tools include NMAP, Nessus, and Grim’s Ping which we will describe in the following section. For a more detailed description of other scanning tools please see [ofir arkin?] and [].

Nmap, short for network mapper, is probably the worlds most famous scanning tool. Developed by Fydor in the late 1990s, this tool encompasses many scan types, an easy to use interface, and a rapid execution algorithm. Nmap can perform eleven different types of scanning, from the noisiest SYN scan to the stealthiest Null-host scan. The tool has been ported to just about every OS available, making nmap available to every tom, dick, or harry on the internet. Please see [nmap man page] for more information. While this tool is labeled as a security scanning tool to give it more of a benign sound, like most security tools it is much more often used by network attackers, then by network defenders.

Another tool that was designed for use by defenders, but is most often abused by attackers is the Nessus scanner. The nessus scanner is one in a long line of security scanners originally created to search a network for a databased list of vulnerabilities. Its predecessors SATAN, Saint, and Sara all fall into the same category of tools desginged for use by the security aware network administrator, yet co-opted for evil by the network attacker. The tool has a client and server portion to allow it to be run from anywhere on the local network. This tool could be considered much more dangerous then a tool like nmap, because while nmap can find services running on remote hosts, the wielder of the tool must know what he/she is looking for. Nessus can discover vulnerable hosts without the user having any foreknowledge whatsoever. In fact the nessus tool is made even easier to use with its prepackaged GUI and graphical report generator. Tools such as these make it easy for even the most casual attacker to quickly and easily scan a network in search of opportunities for mischief.

While tools like nessus and nmap show increasing sophistication, a tool that packages remote host scanning as part of an all-in-one exploit tool can be even more dangerous. These tools are much more sophisticated in view of the actions they can take, yet are still quite easy to use. There are plenty of examples of these tools designed to scan an entire network seeking out a specific service to exploit. For our purposes we’ll examine the Grim’s ping tool. This tool scans networks looking for anonymous File Transfer Protocol (FTP) servers. Once it finds one, the software will upload several files to determine whether or not the FTP server is suitable for further attack. The tool first uploads a file called 1mbtest which is about a megabyte in size. This allows the attacker to know exactly how long it takes to move a megabyte’s worth of data to the victim server. The tool then makes several directories on the FTP server to facilitate the storage of data. This data is typically stolen multimedia files or games. After an attacker has used the victim host for storage, he/she may decide to continue down the five step attack methodology mentioned earlier and upload an exploit to escalate his/her privileges for further nefarious activity.

Analyzing NIDS Alerts

Generic NIDS description

A network intrusion detection system (NIDS) is typically deployed as an appliance at some chokepoint within the network. An appliance is a computer that performs a single purpose and a chokepoint is a place in the network where a lot of traffic has to flow through. The most common chokepoint that a NIDS will be installed at is at the gateway between the local LAN/MAN and the Internet. Positioning the NIDS at this point is a double edged sword. This location provides the NIDS with the broadest possible view of what traffic is flowing into and out of the network. At this position the NIDS can, if properly configured, detect any attack entering or leaving the network. The drawback to such a position is the massive amount of traffic the NIDS is exposed to that it then has to sift through in order to find the malicious activity. As mentioned in the introduction, a NIDS may have to process between sixteen and six thousand gigabytes of data a day. With such a high volume of data passing before the NIDS, it can often incorrectly identify traffic as malicious or not observe malicious traffic transiting in right in front of it.

Before we study the problem of these miss-identifications it will be instructive to gain a basic understanding of what a NIDS is designed to do. There are two generic types of NIDS available to both the research and commercial community. They are the signature based and the anomaly based systems. The signature based NIDS examines each packet it encounters searching for particular attributes or features that match any of a wide set of signatures on file within the NIDS. In order to process these packets the NIDS must pick the packet off the wire, and begin to dismantle it according to its protocol specifications. Once the packet is dissected the NIDS must examine every feature of the packet and compare it to the list of signatures in its database. If any part of the packet matches the signature an action is taken depending on the rules of the signature. The most common action is the alert or alarm action. This action publishes some sort of message to either a local file system or to a remote alarm console. This message typically contains the name of the signature it matched, the parties involved in the communication (delineated by IP address or DNS registry), and the time of the infraction. Most tools then provide a link from the alert to the offending traffic that caused the alert to ‘fire’. An example of the type of signature a NIDS could be searching for would be any ICMP packet with the content ‘skillz’ inside. Such packets are attributed to the stacheldraht Distributed Denial of Service (DDoS) tool, and indicate that an attacker is trying to set up a DDoS network using hosts from your network. If the NIDS detected a packet with this signature, it would immediately raise an alarm to the analyst to encourage him/her to investigate the matter further.

The second generic type of NIDS is the anomaly based system. These NIDS focus on anomalous traffic patterns discovered in the flow of data passing by the sensor. Typically anomaly based systems have to be programmed with a profile of what ‘normal’ traffic looks like on a particular link before the NIDS can be put to use. Depending on the sophistication of the NIDS, the profiles created can be based on individual hosts it is designated to protect, specific ports and the traffic proceeding to or from them, or protocol specific profiles can be monitored. Whenever any activity occurs outside of the profile that the NIDS is programmed with, it will publish an alert. In the same manner as the signature based NIDS, the anomaly based NIDS will provide these alerts to either a local file system or to a remote analysis console, usually at the analysts’ discretion. As an example if host x.x.x.x typically receives no connections to its telnet service from outside of the local network, then any host seeking a connection to x.x.x.x on port 23 from the outside Internet would raise an alarm. As with the signature based systems, such an alert would lead the intrusion analyst to investigate a possible intrusion attempt against x.x.x.x.

In either type of NIDS the alert or logging engine typically reports alerts on a graduated scale of severity. This scale is often categorized as low, medium, or high. Events placed in the low category are typically things like scans and probes, or connection attempts to interactive services on the local network from the outside Internet. Events that fall into the medium bucket are usually things like viruses transiting the gateway or violations of company policy like viewing kidde porn. The high priority categorization is typically reserved for ongoing, or highly destructive malicious events. Packets that contain buffer overflows or are part of a DDoS attack are the type of events that generally garner a high rating. This system of prioritization is a double edged sword. The classification of alerts is typically done by the manufacturers of the NIDS who may or may not have the same concerns that you have regarding particular types of malicious traffic and their impact to your network. The other major drawback to this system of prioritization is the desensitization that the intrusion analyst has towards the lower level alerts. Due to the high volume of alerts published by most NIDS, the intrusion analyst will generally only concern his/herself with the highest priority alerts. Usually ignoring the alerts that are most likely to help him/her actually predict and subsequently defend the network.

Regardless of their alert prioritization scheme, the two types of NIDS have other advantages and disadvantages depending on how the automated intrusion detection technology is applied. Perhaps its greatest advantage is an anomaly based NIDS ability to discover never before seen attacks that violate the normal profile of the network they are protecting. Anomaly based NIDS are frequently deployed in high bandwidth environments, because they can sample the data rate and examine the samples for conformity to a predetermined profile. Such NIDS are great at discovering when a Denial of Service (DoS) is targeting your network, but often falls short when quieter, more subtle attacks are levied against the network. For instance a single virus or small buffer overflow will pass right through an anomaly based NIDS with nary a peep. These type of attacks typically fit within the profile established previously and do not trigger the NIDS into alerting. Often times the anomaly based NIDS won’t discover such an attack until stage five of the five step hacker methodology begins and the victimized host begins scanning for its next victim.

The signature based NIDS is also prone to several advantages and disadvantages. The signature based NIDS greatest feature is the ability to discover stealthy attacks that may involve only a few packets. Because a signature based NIDS is designed to examine every packet that comes down the wire, it can frequently catch quiet attacks that match some predetermined signature. Signature based NIDS are also quite good at enforcing company policy. Signatures guarding against pornography or other such actions can be written to alert network administrators to the abuse of company resources. In fact signature based NIDS are also capable of discovering attacks while they are on-going, allowing the network defender the opportunity to quickly stop the attack before it can spread. Unfortunately signature based NIDS have two gaping holes that many attackers are able to slip though unnoticed. The first is the signature based NIDS inability to detect never before seen exploits. Since a signature based NIDS requires a set of signatures before it can look for malicious traffic, any traffic that does not match a pre-known signature will sneak right past such a NIDS. Another major problem with signature based NIDS is the high traffic volume analysis problem. Because each packet must be examined by the NIDS to decide whether it is malicious or not, a NIDS can quickly get bogged down and begin dropping packets without analyzing them if the traffic volume increases too much. At this juncture even malicious content that matches a signature that the NIDS has on board will be missed, and attackers will be able to have their way with the target network.

Of course the greatest difficulty a network defender faces when he relies on a NIDS as his/her frontline attack detection tool is false negatives and false positives. False positives are the most common ailment that an intrusion analyst must suffer from. These are alerts that a NIDS publishes that claim to have detected malicious traffic within a data stream, but when analyzed closely by a human, the alert is proven to be false. These false positives can require an inordinate amount of time to ‘chase down’ before the intrusion analyst is assured that the alert is not a true positive. During this time any alerts that are true positives that may be published by the NIDS will go un-noticed by the analyst. A second major problem that false positives create is the gradual acclimation that an alarm analyst will develop the longer he/she interprets a NIDS log full of false positives. Imagine a child crying wolf ten times a day and you get the general idea.

Of course equally dangerous is the false negative alert. These are alerts that don’t fire because the NIDS examines a set of traffic that it determines is not malicious even though it actually is. These types of alerts are nearly impossible to avoid, because detection signatures can never be absolutely perfect. Because these alarms are never published the alarm analyst cannot even find out about them to deal with their impact because he/she doesn’t know they are there. The best remedy for false positives and negatives is to construct intrusion signatures and anomaly profiles that are as accurate as possible. In fact Axelsson, in [] mentions that unless the false positive rate is less then 1/100,000 ‘events’, a NIDS is nearly impossible to manage effectively because the humans watching it will be quickly desensitized to any alerts that could have revealed truly malicious activity. In this instance an event is a transaction that is either malicious or benign.

Brief Description of SNORT

For the purposes of this thesis, the researcher was able to have full access to the Snort logs of a major university. Since the output of this NIDS was examined as part of this thesis, some details describing how Snort works are in order.

Snort is the product of Marty Roesch’s mission to create an open-source based, lightweight NIDS. His product has become wildly popular and now is deployed at countless organizations regardless of their size. While not designed specifically for a high-bandwidth environment, snort performs well even at high traffic ingestion rates because it uses a simple but fast decoder and detection engine. The basic snort architecture is made up of three main parts, the packet decoder, the detection engine and the logging and alerting system. The packet decoder is based on libpcap and can collect TCP/IP traffic at a blinding rate. The packet data is decoded layer by layer up through the OSI model, until it is passed to the detection engine. Before the engine can compare any of the signatures in its database to the packets, the packet data is passed through a number of user-configurable pre-processors. These pre-processors can reassemble TCP packets into sessions, handle fragmented traffic, and even detect scans and probes. After the preprocessors have formatted the packet data to make it easier to search, the detection engine examines the data for contents that match any of the signatures in its database. If any of the signatures are matched, then the action described by the signature is taken by the third part of snort the log/alert system. This system will capture the data relating to the alert and store it on the hard drive. Additionally the alert system will publish alerts to an area on the file system for examination by the intrusion analyst, or to a remote analysis console through syslog or smb messages.

Of particular interest to this research is the scan pre-processor known as portscan2. Authored by Jed Haile, this preprocessor is designed to work with the stream4 and conversation preprocessors in order to perform some amount of stateful portscan detection. It is configurable to allow the NIDS administrator to set the maximum number of scanners, targets, target limits, port limits, and time out. By adjusting features such as the number of targets a foreign IP can connect to within a certain time out, the portscan2 pre processor can detect portscans with a greater or lesser granularity. The NIDS administrator must then choose between a setting that allows snort to pick up small and large portscan sequences and therefore publish more alarms, or setting snort to only pick out the alerts that involve much larger scans and therefore have less alerts. This decision is a difficult one, because the two sizes of scans indicate malicious intent from two different groups of people, both equally damaging.

As mentioned previously snort’s logging / alert sub-system is responsible for publishing alerts to the intrusion analyst. Depending on the source of the alert, the logging / alert sub-system will publish the alerts to different files within the local file system or to the remote analysis console. Typically the alert /log sub-system publishes its alerts to the /var/log/snort/alerts file. This file is started anew each day and can contain various levels of detail. At a high bandwidth network like UMBC, snort only publishes the time / date of the infraction, the alert title, and the actors involved in the alert. The logging / alert sub-system will also take input from any of the pre processors bundled with snort. In our case the portscan2 pre processor can report alerts directly though the logging / alert sub-system to the /var/log/snort/scans file. As with the alert file, the scans file is written anew each day so as not to overwrite the scan alerts previously published.

Parsing Logs and Comparing scan logs to alert logs

Once the logs are generated by a NIDS what can an intrusion analyst do with them? More often then not an intrusion analyst uses the alerts and the traffic captures provided by the NIDS as pointers to further research. For example before the campus network defender can go kick down the door of some unsuspecting student, the defender needs to gather all his facts and assemble a case. This usually requires some amount of research into the type of exploit being alerted on, the history of the victim host, and its role on the network. As the reader may surmise this type of analysis is a time consuming process. In fact performing all of it in real time is nearly impossible, especially when the network the analyst is defending is Class B (65,536 hosts) in size as is the case at most major universities and large companies.

An intrusion analyst’s greatest accomplishment is when he/she is able to detect an attack in progress and stop the attack before more damage can occur. This requires the analyst to be somewhat predictive, or at least finely attuned to indications and warnings of any type of pending attack. The best way to be proactive is to do as much packet analysis as possible on every incoming datagram. This provides the greatest possible intelligence of what is currently happening on the network at any given time. Unfortunately the more processing a packet has to go through the longer it will take for that analysis to be delivered to the analyst. If the processing time exceeds some acceptable limit, the analyst can no longer claim to be performing real-time analysis and his greatest value has now been reduced significantly. The only way to mitigate this problem is to redefine what real-time means. Typically this definition is refined by examining the manpower required to do the analysis, the dollars it costs to pay the analysts and to provide the equipment and software required for them to do their job, and the criticality of what the organization is trying to protect. For example a small church office with a limited budget probably can survive with a real-time defined at several hours or days. But large bank must have real-time defined in minutes or there could be a massive breach and loss of money and credibility.

Many of the commercial NIDS systems come with tool suites to help the analyst stay as real-time as possible. This is accomplished through visualization tools, alert correlations, and statistical outputs of incoming alarms. It is the opinion of the author that the best tool would be the tool that was most likely to sift though all the alarms being generated and provide to the analyst only the alerts most likely to be related to actual malicious traffic. This can be accomplished though parsing the alert logs generated by the NIDS to look for the attackers and victims involved in scanning activity. Since all network attacks are predicated by network scanning, an adversary has to perform this precursor action before he/she can actually attack the network. If a NIDS analyst wants to maximize his/her chances of finding malicious activity, he/she should be provided a tool that will uncover the recently scanned hosts and then look for alerts that match either the source or destination IP addresses from the recently scanned hosts. As noted in [honeynet project paper] these types of indications and warning can be used to forecast incoming attacks with some degree of confidence. In fact a tool that provided such indications could drastically reduce an intrusion analyst’s estimation of real time. This would be done by providing only alerts that an analyst could actually take action on. Thereby reducing the analyst’s wasted time and allowing him/her to more quickly evaluate an alarm set and generate a report or take some other action relating to the incident.

The best way to tackle this problem is of course to write some software to solve this repetitive, labor intensive process. Clearly human beings are primarily visual creatures. We thrive by being able to analyze visual data and formulate courses of actions and reactions based on past experiences. To capitalize on these genetic pre-dispositions, many developers have tried to create visualization tools to help intrusion analysts interpret intrusion data. Unfortunately use of these tools only serves as a distraction to a skilled intrusion analyst. These tools frequently cannot model the massive amounts of traffic consumed at a large gateway. The difficulty arises when the tool tries to render the millions of possible different connections that could be possible at a large gateway at any given time. By the time the rendering is complete several minutes, (typically on the order of 20 – 40), have usually passed by leaving all hopes of real time analysis in the dust. Even if the visualization tool could render all the data necessary for the analyst to be able to draw a rational conclusion about a particular attack sequence, frequently the small, highly dangerous, malicious events are usually lost in the noise on a crowded visualization. A better solution is to write a tool that doesn’t waste time trying to render a pretty picture but instead parses the NIDS logs quickly, and produces logical conclusions based on the available data that the intrusion analyst can act upon with confidence.

This tool could be written in several languages including PERL, C, and possibly XSB. PERL would probably be the best choice because it is a scripting language designed to parse text quickly and provide reports on the parsed information. Most NIDS dump their alerts to some form of text file for later processing by other tools or for visual inspection by the intrusion analyst at a later date. These alerts can be processed in near real time to provide a network defender a glimpse of what might be coming down the pipe in the next days, weeks, or even months depending on the sophistication of the scan being carried out.

C would also perform quite well in this scenario; however there are some drawbacks to using C in such a context. C was not designed from the ground up to be a text parsing language. Any such functionality would have to be developed from scratch or assembled from some library that might not be very efficient. However because C is a translated language instead of an interpreted one, it is plausible to assume that a highly optimized C program could do the work of a more complicated PERL program in less time.

Lastly performing scan alert analysis with a language like XSB provides an interesting spin on the entire process. XSB was designed to be a logic programming language modeled after Prolog. It provides full first and second order predicate logic evaluation by using a table structure that saves data about the solutions given a set of predicates as the solutions are evaluated. This table structure allows XSB programs to run very quickly and to consistently return answers that may have never been found if the code and query had been written in Prolog. NIDS like SNORT can export their alert data in XML format. It is only a quick hop and a skip to translate XML into RDFS or DAML+ OIL documents which could then be reasoned over to with software written in XSB. This type of scenario could possibly provide the analyst with a further reduction of his/her real time estimate. Not only could such a solution save the analyst time, but a logic based solution could also grant more legitimacy to an analysts claims that an event had taken actually taken place. This is important because frequently intrusion analysts are disregarded when they present their analysis to network and system administrators who are loathe to admit an oversight on their part.

Perhaps the best solution would be one that ties all of these programming languages together. Such a solution could play on the strengths of some of its parts while masking the potential drawbacks of other parts at the same time. This uber tool would probably be written in C and designed to call PERL functions for text parsing, and XSB functions for logic evaluations. Unfortunately development and deployment of such a tool is beyond the scope of this current research.

Predictive Analysis / Attack Forecasting

Today’s fast passed modern internet is still every much like a scene out of an old fashioned WWII movie. Having the advantage by being able to predict when the bad guy is going to come around the corner will protect the good guy every time. Unfortunately in the digital world it’s not quite as easy as watching for the tanks coming over the hill. None the less as we’ve seen that all computer attackers have to follow a generic five step attack plan if they want to be successful. Network defenders can take advantage of their adversaries’ adherence to this plan by examining the different phases of a network attack in progress to predict when the next stage is most likely to occur.

There are generally two approaches when trying to perform predictive analysis, raw statistical analysis and advanced data mining techniques. While data mining would generally be the computer scientist’s first choice of tool, these techniques often fall short when a requirement is levied for real-time analysis. Papers like [data mining for net id] suggest a rather rigorous procedure for discovering previously unknown attacks in both test data and in live traffic, but they make no claims as to the speed that such algorithms and techniques take to complete their analysis. Often times the former analysis approach utilizing simple statistical evaluations of NIDS logs to find the most common scanners and victims in a particular time period can yield a trove of predictive and false positive reducing information.

Most prediction techniques in use throughout the world today are accomplished through looking at patterns of previous events in an attempt to predict the future. The weatherman consults his almanac, the stock picker consults his performance charts for the last 36 months. Why shouldn’t the network defender be able to examine his/her recently published scan alerts to see when an attack could be coming and against what set of services? Unfortunately even the simplest ideas can break down when faced with complicated details. The low & slow scan and subsequent attack is one of those details. Low and slow scans are a class of scan that takes place over such a long period of time that the evidence of it’s occurrence is easily lost in the noise of all the normal legitimate traffic. On small networks (like less then 50 computers it might be possible to keep the NIDS logs for an extended period of time. But if you have a huge network then its nearly impossible to keep all the records you’d need to be able to figure out who and what is perpetrating the low & slow scan. Some summarization or records that reveal particularly weird traffic could help you track some of these types of scans, but it would still be difficult, because on a large network the amount of ‘weird’ traffic is often measured in the megabytes / day metric.

Reducing false positives by focusing on alerts that are related to previously discovered scans is the best possible avenue to predict when an attack is coming and the form it might take. Often times even the skill level of the attacker and therefore the likely timing of the attack can be determined by examining the amount and type of scanning coming in. As mentioned before NIDS analysts are used to analyzing the highest priority alarms/alerts only. Wouldn’t the establishment of a NIDS that could automatically raise the level of any alert that had a previously associated scan with it serve this propose? This would produce a reduced set of alerts of a higher likelihood of attack therefore rewarding the NIDS analyst with less data to cover, and a higher likelihood of catching a bad-guy.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download