Beating Ping: A Bandwidth Measurement Exploit



A Preliminary Study of an Exploit-based Bandwidth/Network Analysis Framework

Master’s Project

Matthew Weaver

University of Colorado at Colorado Springs

April 2007

“I can’t be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on. “– Donald Knuth

Abstract

Background: Traceroute and Ping are the two most common algorithms/implementations for basic bandwidth analysis and network troubleshooting. However, these only work in networks where every node strictly adheres to the standards for network protocols maintained by W3C (World Wide Web Consortium). An exploit, on the other hand, could be used to infect machines on a network and provide metrics while overcoming the problems inherent in the traditional approach.

Results: After examining real world exploits and devising a similar Windows service, and exploiting it with a buffer overflow exception we can infect said service and perform our testing.

Conclusion: Limited success on Windows platforms indicates that this is a feasible hypothesis: one can measure bandwidth using an exploit. However, real world conditions will prevent this from ever being used outside of an academic environment.

Table of Contents

A Preliminary Study of an Exploit-based Bandwidth/Network Analysis Framework 1

Abstract 2

Preface 4

Disclaimer 5

Introduction 6

Exploits 7

Worms 8

The Buffer Overflow Exploit 10

Network Protocols 13

Sockets 14

Winsock 15

COM in Windows 20

Doing some Recon 22

Hypothesis 23

Related Work 23

A Framework for Bandwidth Testing 24

Implementation 30

Results 50

Network Scenario 1 50

Network Scenario 2 52

Network Scenario 3 53

Results Analysis 55

Conclusion 55

Future Work 58

References 59

Demonstration 60

Preface

Writing shell code for Windows is a very dangerous proposition, I crashed a lot of machines before I simplified my approach. You have to hand it to those who are willing to spend hours perusing and testing Windows services, looking for flaws. Not to mention writing and testing shell code and exploits.

I certainly am not done with this theory, I am not sure if I would continue to pursue it in an academic setting, but I am still convinced I can improve on it. That kind of thinking is what makes this all take so long.

As a result, this project has resulted in the sniFF framework, which can be used standalone, however due to the ever-changing nature of exploits it is recommended that further research be done with the Metasploit framework[1].

All that said, I appreciate the help and cooperation of all of my friends, family, coworkers, and especially Dr Chow. At least reading this paper should help you sleep.

Matt Weaver

Spring 2007

Disclaimer

This paper discusses the creation of worms, the generation of shell code, and Windows x86 assembly. All discussion and included code are intended for educational purposes only. The author is not responsible for any damage done to networks or network systems.

As an aside, I urge caution with using the included exploit. It proved to be extremely unstable in testing.

Introduction

Computer viruses date back to the early days of UNIX. In the late seventies and early eighties, network administrators at MIT used worms to communicate and accomplish tasks on the network. Some of these early worms did fairly mundane tasks: backing up systems, sending messages, etc. In the late eighties, with the rise of networks (including the various incarnations of DARPAnet) vulnerabilities were bound to arise.

Modern computer systems run many network based services. A typical install of Windows XP runs services to check for updates with Microsoft servers around the world, network services to keep the IP address up to date, inter operating system messaging for asynchronous processing, and a host of other potential “holes”. Each update to these systems or additional products installed creates more potential vulnerabilities (Kienzle 8).

Networks themselves, are juxtaposed collections of various machines, routers, devices, firewalls, and other miscellany hardware that make the diagnosis of problems very difficult. Most of this is due to the ever evolving area of network security: smarter routers, more firewalls, Intrusion Detection Systems (IDSs), subnets, and tunneling. All of this can make network troubleshooting a black art. Even the simplest problem, say out of date DNS data can be difficult to pinpoint: more so on large corporate networks where resources may span thousands of miles. Running a simple Traceroute between two IP addresses can be an infuriating task.

To that end, most of these problems are the result of trying to mitigate security problems. However, despite these precautions, computer networks are infected with viruses every day. New ones are written as soon as old ones are discovered. This results in a slippery slope of constant patches to software and operating systems.

Viruses break into systems by exploiting weaknesses in running software: invariably through various network visible services. Since viruses still infect networks, is it possible to modify an exploit or author a new one that can be used to measure bandwidth to machines on a network?

This proves to be a very dangerous proposition, as much as it is a novel idea: Windows viruses are very unstable and the more complex they are the more likely they are to fail, causing unstable behavior crashing or permanently crippling a Windows install.

Exploits

Due to the complexity of modern software and the hierarchy of software running on modern computers, glitches and bugs are inevitable. A given program can only do what it is written to do, however this may not be what is intended:

“A man is walking through the woods, and he finds a magic lamp on the ground. Instinctively, he picks the lamp up and rubs the side of it with his sleeve, and out pops a genie. The genie thanks, the man for freeing him and offers to grant him three wishes. The man is ecstatic and knows exactly what he wants.

‘First,‘ says the man, ‘I want a billion dollars.’

The genie snaps his fingers, and a briefcase full of money materializes out of thin air.

The man is wide-eyed in amazement and continues, ‘Next, I want a Ferrari.’

The genie snaps his fingers and a Ferrari appears from a puff of smoke.

The man continues, ‘Finally, I want to be irresistible to women.’

The genie snaps his fingers and the man turns into a box of chocolates.” (Erickson, 12)

In the same way, software regularly has to interact across networks and run services that expose interfaces that may allow behavior unintended by the developers. But they operate as they are coded to.

These glitches are what exploits expose.

Worms

The term ‘worm’ originated from author John Brunner, in his novel Shockwave Rider (first published in 1925). Two researchers from Xerox PARC, John F Shock and John A Hupp, used the term in their paper The Worm Programs (Comm ACM, 25(3):172-180, 1982).

“These kinds of programs represent one of the most interesting and challenging forms of what was once called distributed computing. Unfortunately, that particular phrase has already been co-opted by those who market fairly ordinary terminal systems; thus, we prefer to characterize these as programs which span machine boundaries or distributed computations.” - Hupp and Shock, 172

Shock and Hupp took a very informal “stab” at creating a worm.

“Our work did not start out specifically addressing formal conceptual models, verifiable control algorithms, or language features for distributed computation, but our experience provides some interesting insights on these questions and helps to focus attention on some fruitful areas for further research.” - Hupp and Shock, 173

Their approach is not fundamentally different than the subsequent twenty seven years. But the reason that academic research in the area of “worms” continues this ad hoc approach has less to do with the concepts behind a “worm” than the changing nature of computing security.

The worms first utilized by researchers in the early 80s were driven by a combination of curiosity and usefulness. For example, researchers at MIT used a ‘Town Crier’ worm to send messages to all users on the network.

Early worms did not face the barriers to access that a modern worm designer faces. Simply Berkley-standard C programs could open a socket on a machine without having to fight through firewalls.

The first worm of import, in the ever evolving realm of security was the infamous “Morris Worm” written by Robert Tappan Morris (launched on November 2, 1988), then a student at Cornell University. Through simple dictionary password hacks, Morris worm was intended to “map” the known internet, sending vital information back to the root of the infection (if we imaging the propagation path as a tree).

The Morris Worm, however, was riddled with design flaws. As a result machines would be reinvested ad nauseum until the machines would crash. At this point in history, real security was never a concern: the Internet was used largely by academic institutions and government agencies. Widespread public access was not a concern. As a result, the Morris Worm took advantage of flaws in Unix mail services and poor password choices for privileged accounts (who knew people would use the same root passwords?)

The Buffer Overflow Exploit

Many network exposed services are written in languages that do not perform bounds checking on requests. When a request or call passes more data in than expected, that data overwrites the memory in the stack, effectively putting that “extra data” where it can be executed.

Here is an example of “unsafe” code in C:

Because “strcpy” does no bounds checking, this program promptly crashes at run time. As you can see, the string passed in is far too large to fit into 5 characters (4, in C strings have to end in ‘\0’ or null).

What makes this so dangerous is that most network level programming is done in C, since C has limited string manipulation “strcpy” is used frequently.

So what can be done with this kind of code? At runtime, the stack is filled with linked assembly, as binary that will be executed. To take advantage of this, we need to copy our own binary so that the excess will contain more instructions.

Since this project focused on exploiting a Windows vulnerability, we will write a very simple program to push onto the stack. After the program is compiled and linked, we want to disassemble it and find the assembly instructions. In a Windows environment there are no installed tools to handle this. As a result, you most likely will need to install Cygwin with GCC and the GDB disassembler. After code has been compiled, it can be disassembled in GDB and GDB will also give us the binary that represents that code. This binary is called, “shell code”.

First, we write a simple program… ‘Hello World:

This generates a block of binary that looks like this in C:

The binary instructions are not quite right. What if another program is trying to execute in the space we want to occupy? If this happens at run time a segmentation fault will be thrown by the operating system. Not a good thing.

In his paper “Smashing the Stack”, Aleph One points this out. Of course, it’s still not that simple. More work has to be done to get the correct binary to run our code. This is a tedious process… even small programs take a bit of memory magic to process.

It gets worse on an x86 Windows box.

In Linux distros, to perform most tasks, all one has to do is make the appropriate system call. In Windows, a bit more work is involved:

“The problem is, though, that system call numbers are prone to change between versions of Windows whereas Linux system call numbers are set in stone. This difference is the source of the problem with writing reliable shell code for Windows and for this reason it is generally considered ‘bad practice’ to write code for Windows that uses system calls directly vice going through the native user-mode abstraction layer supplied by ntdll.dll.” (Skape, 6)

To do anything useful in Windows, one first has to locate kernel32.dll. Fortunately through this mechanism, you can load any system libraries you may know about. Of course, if you can’t do that other problems may arise. Ah, but even the address of kernel32.dll vary based upon the version of Windows, other software that may have changed that location, etc. Once again, it’s a little more difficult than you suspect to do exactly what you want.

But it is possible. The sheer number of buffer overflow exploits that have wreaked havoc on Windows systems shows that it is done. However, most of these are rather unstable. Why? Some of it is all of the aforementioned. The other part… the worse part, is that Windows shell code is much more difficult to write than Linux shell code. It’s a process that is very easy to mess up.

Network Protocols

Communication between two machines or devices is only possible if they can both speak the same language and if they both can talk to each other. Some of this is hardware, some of this falls to something above the physical layer. Modern systems communicate using the protocol system(s) defined by the Open Systems Interconnection (OSI) seven layer model:

[pic]

Figure 1: Seven layer model ()

At the lowest end of the model is the physical connection required to transmit the data. Towards the middle, we cross over into a gray area that is controlled by either hardware (firmware) or software running on the host system, until we are purely at the data layer. Connecting to a machine to infect it is more than just sending data to another machine.

To infect a machine, it is not enough to connect to it, nor is it enough to pack the payload of the data with the exploit. What are we typically seeking to do? The most effective and useful exploits look for the classic security flaw: the buffer overflow exception.

Sockets

In modern OS network services run base level code to handle the lower levels of communications protocols. This is done through socket programming, and usually with C based on the Berkley socket programming model. There are some pitfalls to this, which enables us to exploit systems, but we will address this later.

There are many types of sockets which deal with a variety of protocols in the seven layer model. For this research there are really only two of concern: stream sockets and datagram sockets. A stream socket is used by most network services for a variety of reasons: transmission order and protocol being two of the most important. By order we mean that if one service opens a socket with a machine and is sends two packets containing payload data “Fred” and “Barney” they will arrive in that order. Not only that but streaming sockets can be secure. Web browsers use stream sockets to transfer text (HTTP in the presentation layer). FTP uses streaming sockets as well. Why? Stream sockets run TCP (RFC-793[2]). TCP requires a three way hand shake: SYN, ACK, SYN-ACK. First a machine must verify that the target machine is there and that it is ready for a connection. Basically, like a normal, human conversation.

Datagram packets, on the other hand, utilize UDP (RFC-768[3]). UDP is unique in that there is no hand shake involved. UDP data is sent whether or not the target exists or responds. As a result UDP is fast, much faster than data transmitted over TCP. UDP can carry much of the same payload data, but because no handshaking is involved, you can’t guarantee that the target received it, or which order it was received in. UDP is still used, specifically for DNS queries (whether or not this is prudent).

Winsock

As Windows has evolved, the fundamental specification for network programming has resulted in the development of Winsock, short for Windows Sockets. Winsock is a specification first developed in 1992. Originally, Windows provided little to no networking support, most of it based on NetBIOS (from IBM). Microsoft had attempted to define their own network protocol, ignoring the TCP/IP stack.

Winsock consists of two interfaces, an API[4] for programmers seeking to write network based applications and an SPI (Service Provider Interface) that allows developers to also add modules to support new protocols.

At first, Microsoft just followed the Berkley convention: the methods and methodology was identical to the common Berkley UNIX implementation that most other vendors had adopted. However, Microsoft soon chose to add custom functionality and provide wrappers over the standard Berkley code, in addition to a list of custom exceptions.

The additional capabilities built into the current Winsock API (Winsock 2) are critical for this project, to allow us to safely measure bandwidth, as we will explore in the implementation.

Internet Connection Message Protocol

Modern measurement tools depend on the Internet Connection Message Protocol (ICMP) as described in RFC 792[5]. ICMP is designed to provide meaningful communication between software layers and hardware. Using ICMP diagnostic measurements can be generated including, destination unreachable, source quench, redirect, timestamp, timestamp reply, information request, information reply, and the last two of most important significance, echo and time exceeded. These messages general have a simple format.

Echo consists of a UDP datagram that looks like such:

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Type | Code | Checksum |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Identifier | Sequence Number |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Data ...

+-+-+-+-+-

Where type can be either 8 for an echo message or 0 for an echo reply message. Everything else is typical of such data. But, ICMP messages do not look alike. Time exceeded messages return a header that looks more like this:

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Type | Code | Checksum |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| unused |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Internet Header + 64 bits of Original Data Datagram |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The idea behind the time exceeded message is that as a router (this can be a simple hardware router, a computer routing, a LVS director, etc) should discard UDP data that has an expired Time To Live field (TTL). Once the datagram has been discarded a message to the datagrams source should be sent using ICMP. This makes the time exceeded message an ideal diagnostic tool.

Ping

Ping is a simple diagnostic utility first included on UNIX systems in the early eighties. Ping is not a robust tool and is limited in what it can tell the end user of a system, but it is essential none the less. So essential for networking diagnostic on the administrative side that it has been included with almost every Windows based operating system in the last ten years.

A simple ping request on a Windows or UNIX machine can indicate whether a component is up or not.

Ping works as such: after DNS information is obtained about the target of a ping, we send out a data packet. That packet contains an ICMP echo request. When that reaches its destination an echo response is generated by the router at that point and the message is sent back. Ping does this again. On a UNIX system the user can specify the number of times to send the packet and even the algorithms for sending patterns.

Ping is a very limited tool. It can be used to estimate bandwidth, but is not a precise tool for measuring throughput because we cannot see the bottlenecks in the system. This brings us to the next tool: Traceroute.

Traceroute

Traceroute was written by Van Jacobsen in 1988 to solve persistent network problems. The idea behind Traceroute is to send out data that makes each node in a path identify itself. In this way we can see where a system is slowest. Traceroute is the most common tool used for network analysis. It to is included in most modern operating systems and is also encapsulated into more advanced network tools (such as NetTools).

Traceroute depends on the ICMP time exceeded message. Traceroute sends out a series of packets with incredibly short TTL field values. As the packet reaches the routing node, it has already expired, so the node sends an ICMP message back to the source. Then traceroute increments the TTL and sends the packet out again. The slightly longer TTL enables the packet to bypass the last routing node. This continues until we exceed a limit that the user can specify (default is 40 on most systems) or until we get a destination host unreachable message which indicates that we have reached our destination. We call these steps hops.

In practice many difficulties arise with using Traceroute. For one, not all packets are sent on the same path. The routers at each stage determine where the packets are sent. So each packet may not be going to the same place even if the end destination is the same. Another problem occurs in buggy TCP/IP implementations in theses systems. Because TTL values are not always respected and ICMP is a little used format it is possible to miss entire nodes because a value timed out.

At its heart, Traceroute is a simple idea in execution. It attempts to fool the system into revealing each component. What in essence we are doing is removing the transparency of the layers of systems on the Internet. This is a little broad of a statement, so let me clarify. We are not removing the transparency in layers such as IP, UDP, and Voice over IP, etc. What we are doing is attempting to remove the transparency between source and destination.

While Traceroute is a very complete tool, it is also limited in the content of its diagnostic. Van Jacobsen recognized this and in 1997 formulated an improved version that is meant to give us more meaningful feedback as to what the we can say about each hop. He called this Pathchar.

COM in Windows

The Component Object Model (COM) was introduced by Microsoft in 1993. The goal of COM is to reuse software components between applications both inside and outside of a single Windows instance. COM itself is sort of a blanket term to refer to several technologies: OLE, OLE Automation, COM, COM+, and DCOM technologies. In the .NET framework, many COM features are integrated into the CLR (Common Language Runtime).

COM, like many technologies, is not without numerous faults. It is these capabilities that leave Windows open to attacks from outsiders.

If COM is so susceptible to attacks, why is it integrated into Windows? Jack Koziol and company have a simple explanation:

“You should remember that Microsoft’s position on software has always been to distribute binary packages for money and build an economy to support it. Therefore, every Microsoft software architecture supports this model. You can build a fairly complex application entirely by buying third-party COM modules from various vendors...” - Koziol, 110

COM objects, specifically Distributed COM (DCOM) have to be identifiable both across the network and inside the operating system. When a COM object is registered in Windows, an index to it's IDL (Interface Description Language) file is maintained. The IDL is similar to a header file in C++.

Here is an example:

In this example, there are a few things to note: the most critical is the “uuid” field. This field is also called a GUID (Globally Unique Identifier). This number is very crucial. It must be distinct from all other COM objects, this identifier is how we can refer to this specific COM object.

COM objects create instances through factories (this is similar to the classic “Factory” design pattern). Internally, the COM object will create new instances through a factory and track those instances in memory. It can also reuse and recycle old instances to reduce the performance hit you take every time a new instance is created.

COM services can be called in one of two ways: being loaded into the process space in a DLL or launched as a service (Koziol, 111).

In Windows, an endpoint mapper runs and facilitates requests across a network.

While this makes it easy for a developer to look up an arbitrary COM service, it also makes it horribly and uniquely flawed: anyone can find COM services on a given machine.

“The AT services is listening on the following named pipe RFC services, which you can connect to over ports 445 or 139. It is also listening on TCP port 1025 and UDP port 1034 for DCE-RPC calls.” (Koziol, 112)

Hearing it is one thing, seeing it for yourself is another.

Doing some Recon

DCE-RPC enables anyone to find COM services running on a given machine. To do this, we can either author our own tool (intensive), or use one that is already provided.

For this project we will use the DCE-RPC browser built into the RAZOR utility package ().

Running RAZOR's rpcdump tool on a local address, we generate a substantial amount of information about many COM objects that we can access.

In fact, the file is over 74k. Here are the first few entries:

You can feed the interface GUIDs into a fuzzer tool to “inspect” each resource, and search it for flaws.

Hypothesis

The goal of this project is to determine if one can successfully infect a machine to measure bandwidth. More to the point, our goal is to infect a network and measure bandwidth. This allows us to measure performance and provide a sort of map, while undermining the flaws in ping and other tools. To that end, the only requirement is to transfer data and somehow, measure it. Of course, this is only worthwhile if we can transmit large amounts of data.

Related Work

One could argue that the Morris worm itself was a worm in the exact same vein: map a network. In that case it was the Internet as a whole and as a worm it was fairly primitive. In 2006, two researchers (Guangling Chen and Robert Grey) published a paper entitled, “Simulating non-scanning worms on peer-to-peer networks”.

Their goal was to simulate worms infecting machines via peer-to-peer (p2p) networks (such as: Kazaa, Gnutella, etc). Because modern worms, indeed the type of exploit and worm we are writing, do not blindly scan ports, IDS measures in that regard fail. (If a given IP attempts to connect to every port on another IP, it's obviously port scanning.)

They propose three different types of worms that could potentially infect any computer on these networks.

One thing they fail to realize, is that as most users are using Windows, a worm is both hampered and hindered by Windows security features. So a worm may not be able to do much that is useful, but more what amounts to vandalism.

This is the restraint in writing any worm: infecting a system via an overflow does not necessarily give you free reign to execute whatever code you see fit. There are limits to what you can do enforced by the “patch on a patch” security measures.

Yet, the idea is disturbing: one could wreak havoc on a large network with the right virus. This illustrates the earnest need for sufficient security.

A Framework for Bandwidth Testing

After several months of attempting to author exploits from scratch, it became apparent that this would be more useful as a framework for testing, and a basis for exploit development.

The arguments are simple, in light of the nature of modern systems:

• Exploits are hard to keep up with.

• A hard and fast “one exploit fits them all” rule is not reasonable.

• Windows shell code is very, very difficult to write.

• Windows exploits make systems unstable.

As such, the framework can be used on its own, extended, or used in conjunction with Metasploit.

Client Sim/Shell code

To run the client we can either allow the developer to implement a new exploit or to execute a simulator that just runs our shell code. The shell code we use is generated as just a strict translation of assembly to hex.

What does it do?

Well, it’s really simple. It downloads and executes any program you specify, from cmd.exe. That’s it. The concept is rather novel, and no source seemed to be able to agree if it originated from the security community (the good hackers), or malicious individuals. This makes things very easy, and was a bit of a boondoggle to the development of this exploit.

This way, we only have to write one bit of shell code… EVER. Since it will run any application it can download, we only need to include the proper url. This also means that you can update the runnable executable without having to alter your shell code.

Let’s step back for a minute and resume our discussion of Windows shell code, since we are on the topic.

It’s nasty, it’s unstable, and it’s really, really difficult to write. In my other life, I write gobs of Java/C#/C++, every day. Once you get used to it, you just begin to think that way, sort of translating UML to source on the fly.

Shell code is a whole other world, but (paradoxically) shellcode is not the difficult part. The true difficulty in Windows lies in the location of addresses (the memory locations where we load functions). However that does not mean writing good shellcode is easy.

Basically, you need to get up to speed on x86 assembly and the quirks of whichever assembler you decide to use. The best is said to be TASM (Borland Turbo Assembler), except the cost of it is a bit prohibitive. We used MASM (Microsoft Assembler), which is actually free from MSDN. NASM (Netwide Assembler) was a big one, but they no longer appear to be active (the binaries are still available from Sourceforge). Depending on the assembler and the type of instructions you are using, you may need to install a different linker. From this process, the MASM assembler has to be “hacked” to use a 16-bit linker, to avoid handling all the segmentation issues of the 32-bit instruction set. This is largely due to Microsoft’s backwards compatibility with 16-bit binaries. However, if we are being honest, this really due to the fact that a good bit of Microsoft’s code still runs 16-bit binaries.

All that aside, there is another approach that is wildly successful for Linux implementations: the GNU Disassembler. Code compiled with GCC (GNU C Compiler) can be disassembled and GDB will pull out the hex for you. This, combined with the static system call locations makes Linux shell code a relative breeze to write… sort of algorithmic.

On the other hand, Windows shell code is rather nightmarish. First of all, you really don’t want to disassemble an executable in Windows. Chances are you will have a lot of junk. This would be ok if it was just a matter of inserting the appropriate jump instructions so your code stays in the shell… however it’s easier to just write it yourself.

There are two options for getting your system call addresses; the “good” way is trying to locate the DLL from kernel32, which is the only one that is “guaranteed” to be loaded, and the bad way. The bad way is to find the address and then hardcode it.

There is an easy way to do this using two C calls (windows.h): LoadLibrary and GetProcAddress.

With that in mind… here is a basic example:

It’s really simple, it will just spit out “hello” (calling echo on the cmd line). It does this by calling “WinExec”, which is located at 401870.

Now I can pretty much guarantee that if you translate this to shell code, it won’t work. Why? Well those addresses probably won’t work on your machine. Funny enough, they might not work on my machine in a few hours. This is because of that little thing we talked about earlier… Windows loading DLLs as needed. (WinExec is deprecated, there is a newer NT based call.)

Now you could take this into an assembly debugger and grab the hex, which would be the shell code. However, this instruction address thing is a BIG problem that needs to be addressed.

Skape talks about several solutions to this problem, the newest one being a “peak” at the executing threads:

“The process of determining the kernel32.dll base address via this mechanism is to extract the top of the stack by using a pointer stored in the Thread Environment Block (TEB). Each executing thread has its own corresponding TEB with information unique to that thread.” (Skape, 10)

From here we can peak into kernel32.dll. We still need to find our bearing in the maze of instructions, Skape discusses this in an earlier section. The method for doing this consists of “walking” the memory addresses 64kb at a time. In Windows, executable blocks are marked off with the characters “MZ” and align with every 64K. Basically when you find one of these, you know you are in kernel32.dll.

There are several alternative methods in common use, but there is a basic analogy in higher level language. It works kind of like reflective inspection in Java: when Java dynamically loads a class and instantiates it, you can find the parent by purposely throwing an exception, and sifting through the stack to find the parent class. Hokey, but it’s the same concept behind finding the address for your instruction. And this kind of work is all about hokey.

Despite advances in shell code writing, none of the common methods are foolproof. Windows may not have underlying pointers to a given DLL. In other words, kernel32.dll may not be able to find a given method.

All that said, we get some shell code to download and run our application. If you look, you can find “simple” shell code online. Exploit development depends a lot on previous work: especially in the Windows world.

Linux vs. Windows

Original thoughts on this concept were to develop an exploit for both Windows and Linux. This isn’t really a simple task. Besides the aforementioned problems, Winsock only parallels the Berkley implementation of sockets. Some odd differences pop up, requiring some very different code snippets. One example is the packet fragmentation that occurs under Winsock: breaking up packets without telling you.

Unfortunately, at this moment in time, all of the sniFF framework is written for Windows. Future work may expand to using Mono (portable .NET implementation).

IDS, Virus Scans, Routers

The example exploit used will be picked up by various security counter-measures. Most virus scans recognize the shell code for the well known exploits (like LSASS), in all its incarnations (several versions are available from different developers).

Many voices in the exploit community have some thoughts as to how one might beat an IDS (say, Snort).

In “Advances in Windows Shell code” (Phrack 62), some of the common Snort rules are discussed. Specifically, there is a Snort rule that pertains to cmd line execution. Since this is where almost all buffer overflow exploits do most of their processing, this is of particular concern for exploit designers. The suggestion from sk is to add the “/k” to your cmd.exe call. Whether or not this works, remains to be seen: as sniFF was not tested against any IDS, I am unconvinced this works.

Advice

For those who wish to extend this framework, here is a bit of advice:

1) You will need a good x86 assembly reference. Unfortunately, much assembly is written for MIPS, but x86 processors have significant differences (16 and 32 bit backwards compatibility) as well as a much larger library of instructions.

2) Stand on the shoulders of giants, but be careful. Most exploits are rip offs of existing exploits. However, a lot of the exploit code you will find wreaks more havoc than one might expect (SRVSVC corrupts easily).

3) Test on Virtual Machines. Segmentation faults are bad enough, more serious exploits can screw up registry entries and often make Windows “unstartable”. Authentication exploits are the worst at this: in a commercial setting corruption can damage LDAP or other directory service settings making it impossible to login.

4) Slow and steady. It’s all slow. Windows exploit authoring is very tedious. Finding exploits is very difficult.

Implementation

A little review of sockets in winsock

To facilitate communication, Winsock binds a socket to a port. Sockets are the software analog to ports… sockets in the Berkley model are treated EXACTLY like file pointers. You can write or read data from a socket.

When a socket is created, Winsock (according to the Berkley model) supports two different types of sockets: streams and data grams. Streams are TCP (Transmission Control Protocol) whilst data grams are UDP (User Datagram Protocol). As mentioned before, UDP is “send and forget” TCP is “send and tell me if it gets there”. TCP facilitates resend.

There is implicit behavior in Winsock that means something very, very important for this research: winsock will get your data there or tell you if it has to break the connection.

If you have opened a socket, you can write data to it, until whatever reads the socket decides to break it off.

Let us put it this way: we can write data to an open socket all day and every last byte will be delivered, even if nothing useful is done with it. That is until the server decides to break it off, for whatever reason.

This wasn’t and isn’t readily apparent. In some ways, this is a weakness of Winsock, but it’s not as easy to get a socket as it sounds.

If a client merely tries to bind to an open port and open a socket… it will fail. Windows is not in the business of handing connections over to anyone. You can only programmatically bind to a port if the system makes it available.

This is part of how the base part of the exploit works, it has to open a port to talk on. So we pass code to the machine that does this and leaves it in a state where, indeed, our specified port is open for business.

1) Theoretically, at this point we could upload shell code to do anything we wanted. It’s no longer buffer overflow code, it’s just garden variety shell code. The initial exploit has to do all of the nitty gritty jump offset calculations so we don’t fragment the stack, but at this point we can do what we want.

It seemed like a good time to just start loading shell code and doing exactly what we want. Unfortunately, this didn’t prove to be a good thing to do. While it is possible to load shell code, it’s dangerous. If it’s not done right, we can make the system unstable. As a novice shell code developer, this was not something I was prepared to do.

The type 2 attack does exactly this: it sends shell code to download the remote client executable, which will try to talk back on a specified port.

Basically, we generate a very large text file and attempt to break it up and send it to the waiting socket, timing the length of the whole trip. Winsock will tell us how much data it successfully sends, and any errors we get.

So we perform a simple calculation (this function is in the sniFFUtil class):

bytes sent/(time t2 – time t1) = transfer rate

This will be forwarded back to the root instance (the pretty UI) for interpretation.

Remember, we are depending on the code in execution to not “puke” when we send it data.

It doesn’t, but it reads the packets off and disposes of them, since it can’t understand them.

A malicious user could perform a DOS attack by flooding lots and lots of data to this open port. House of dabus intended to merely telnet to the port and wreak havoc on the machine, which you can do as well.

After the attack, anyone can bind to the port, until the socket is closed.

2) There is shell code included to run the download and execute code as well. This is explained, in detail, in the framework section.

A Server, A Utility, and more?

If we think of this project as an attempt to inspect a given network, that is, any size LAN, what we need is a central “place” to record our data.

To that end, we are going to build a utility for all of the clients to report back to.

The idea is that we can do several things:

• Load any executable as an attack that accepts the following parameters:

← Host IP

← Target IP

← Port

• Display a “TreeView” of infected nodes and their children, indicating connection status.

• Run a server that will record information from a client:

← IP

← If it infected a child.

← Data rate

← Time alive.

Underneath the slick UI, we are really building several components:

• sniFF – the UI uses a structure that contains:

← A server to connect and interpret client (infected node) messages.

← An XML file that can store attacks as executables.

← A way to interpret messages to populate the TreeView that serves as a map for the network, displaying the current status of each child node.

• sniFFLibrary – A library of common components:

← ChildNode – a class to run on an infected client. If implemented in an exe, it will run as a server for further nodes, and can also infect new targets by randomly generating Ips.

← sniFFMessages (and sniFFMessageFactory) – a message class that encapsulates message information: IP addresses, transfer rates, etc.

← sniFFUtil – misc methods that are useful for all classes.

• Bad Service – A generic Windows service we built to illustrate buffer overflows from the inside out.

• sniFFClient – a client exe that wraps functionality from the sniFFLibrary.dll.

sniFF

The main client runs the UI above. In addition, it allows a user to use other exploits than the one we will be using. Of course, for those exploits to work, certain conditions must be met. Since that is just an additional feature, we won’t spend too much time discussing it here.

[pic]

When sniFF starts up, the user can punch in initial IP addresses himself or randomly generate each IP. The user can also use a conventional “ping” to test if the machine is visible or to compare performance.

The user then selects the operating system and the attack name, as specified in an Attacks.xml file.

At the same time, an “AttackServer” is running in a separate thread to receive and process requests from infected clients (like looking up their parent for them). AttackServer updates the TreeView with the status of each machine. A red node indicates failure. Green means the node was infected.

Of course, there are two problems we have that we sort of hack around:

1) If a child node tries to infect a machine (thus making the machine a leaf in our tree, assuming it has not already been infected), we will not really know if it fails. So it will not be red. The red nodes only illustrate nodes that failed at the root or nodes that quit responding (timed out, other failure, etc).

2) Clients are downloaded an executed from a common URL, but they are not dynamically recompiled to include their parent IP or the port that the parent is listening on. We solve this by providing a simple lookup service response as part of the AttackServer. The Attack server also listens on another port, assuming all requests are strings (don’t worry, we are carefully to size check, so we don’t have our own host of vulnerable services), and tries to find who tried to infect it based on message data from the client (clients pass back a list of nodes they will try to infect, we have to resolve conflicts). More on this later.

So the AttackServer is sort of a jack of all trades: reading TCP packets and pulling out messages, converting those messages to objects and then updating it’s map (the TreeView). This is the essence of the sniFF root client, as far as we are concerned.

sniFFLibrary

Instead of detailing a simple class (the exe that will run this code), we focus on a library (dll) of common sniFF behaviors. This code is what will be built into our simple executable, and it will do the bulk of the work.

The library also contains many resources for generating and parsing messages: a sniFFMessage class represents individual messages, while a sniFFMessageFactory can take a string and create all of the messages encapsulated within.

Messages grow downwards:

message 4; message 3; message 2; message 1

The first message is the last one we put on their, but since these are back-propagated, that is (technically) a parent node. So the furthers child builds a message, sends it back to the parent, who does the same thing until it hits a root ChildNode instance (we don’t use this here) or the sniFF client UI application.

Messages from each machine are delimited by semi-colons, their attributes are delimited by ampersands and we do not allow break lines (new line characters will cause the message to be truncated).

The library itself, contains many classes and common utilities:

[pic]

The core of the library is the ChildNode class. This is the class that should be hooked into a downloadable client executable. That executable can be very tiny: it is really just a main that will load the ClientNodes with the root parent (the sniFF client UI machine) IP and port.

ClientNode works like this:

1) Check to see if a file was written to the C drive that matches the value in FILE_FLAG_NAME. If it’s present or we get an error, quit.

2) Connect to the MASER_NODE (this is the statically compiled IP that refers to your root UI app) on the MASTER_NODE_QUERY_PORT.

a. Attempt to lookup parent IP.

b. If not found, use the MASTER_NODE as parent. This might not always be accurate but give us a better idea of how well the exploit and clients are working.

c. If there are any problems: terminate the client.

3) Generate a list of IP addresses to infect, the method built into the sniFFUtil class attempts to do this intelligently:

a. First try your nearest neighbors in a given Octet. Alternate between adding n and subtracting n (where n is ever increasing).

b. If you exceed the upper bound on an Octet (greater than 255), then randomly generate an Octet value between your current Octet and 0 (go lower).

c. Likewise if you go below the minimum bound (negative numbers), randomly generate an Octet value between your current Octet and 255 (go up).

d. This may result in many collisions, but unlike the Morris worm, we are extremely careful to kill the client.

4) Generate a list of ports to attempt.

5) Start the server portion of the client in a separate thread. Much like the AttackServer, this will:

a. Connect to its parent (this may be the root), if it is not the root (we don’t really use this in this project, but I built it in anyway).

b. Send initial payload, calculate time. This payload is generated on the file by the sniFFUtil GeneratePayload method. In the default setting, this is roughly 300k.

c. Send the message that includes:

i. Your IP.

ii. Any infected children

iii. Transfer Rate

iv. Time Alive (in Ticks, a Microsoft unit of measure: 1 Tick = 100 nanoseconds).

v. Extra data

d. Then wait to accept connections from children. We do have a five minute timeout, so if no children can be infected in five minutes (this can be changed), we shut down.

e. Accept test connections (bandwidth test with the payload) from children.

f. Next forward any packets from children, appending your message info.

6) In the main thread, attempt to infect children.

If anything goes wrong, we terminate. The goal is not to make the mistake of destroying a system that we infect.

Omitting the IP and port generation steps, it works something like this:

[pic]

Aside from some underlying details, that’s it.

Target

Our target is a little service we cooked up to replicate and illustrated vulnerabilities. For simplicity sake, let’s talk about it as a simple Windows Service.

Basically it accepts connections from a user and then expects the user to pass in a date time as a character array (string). It checks the format of the date and sends back a “true” string if it is formatted correctly and a “false” string if it is not.

This simple service is not quite as silly as it sounds, keep in mind that this is the kind of network available service that Windows often provides. LSASS (the service that Sasser attacked) provided a positive time based token (as packet data) to users. The Slammer worm exploited a very similar function (relating to lookups) in SQLServer: SQLServer provided a service that would indicate whether or not it was in charge of a certain database, it would provide the name of another machine, if it did not have it but knew where it was.

Like LSASS our flaw is that we do not check the size of the “date string” before we try to process it. So this simple “MM:DD:YYYY:HH:MM” is what we expect, but if we go beyond that, it will blindly try to process it. Then we overflow the buffer, inject our command to download and execute the client, and there you have it: infected machine!

The reasoning in providing my own service was the result of some very frustrating experiences trying to use known Windows OS level exploits… exploits that we know exist on most Windows installs.

This had several problems:

1) Windows XP is annoyingly good at patching itself, whether you tell it to or not.

2) There aren’t really that many OS level exploits. They usually exist in some other product that registers as a service or COM object on a machine. This (as mentioned before) is the brilliance and idiocy of the COM model. To Microsoft’s credit, as we show anyone can write a bad service.

3) The dynamically loading memory model inherited from VMS prevents us from easily finding instructions without large, complex shellcode.

4) Known exploits usually hard code instructions (Sasser and Slammer did this, Koziol insists that this is the trademark of Russian hackers).

So we write our own.

This also illustrates the fundamental flaw in this whole project: why would we leave a bad service running?

Smashing the Stack: Windows Redux

Going back to Aleph One's infamous “Smashing the Stack” paper, we can learn a good bit about the concept of smashing the stack. But where it falls short makes it more of an academic exercise. The paper is completely correct, but from our standpoint, we need to convert this to a “Windows” paradigm.

We've already talked about the problem of finding instruction addresses in Windows ad naseum. It's a painful situation, and one we are going to.. ignore. That's right, for our purposes, we are just going to concentrate on the same version of Windows and hard code our addresses.

From a conceptual standpoint, we don't really know much about a given service in Windows. We can get the IDL or find example on how to connect to and utilize a service. We can identify the input and output, and we can do some self-testing to find out if it's vulnerable. We can, through a combination of testing and tools determine the size of the buffer we are going to “explode”.

So we know our buffers size. That's step one.

We also know that our payload shellcode works, it downloads and executes our test program. We can test it all day long, by popping it into some basic C and creating a function pointer to it, and calling said function. This can be done all day long, but it doesn't get the shellcode to magically execute on the target machine.

To do that we need to put our payload into a larger construct, designed to work in our vulnerable service.

The first thing to consider, is how the service deals with the message it gets... what does that look like in memory?

If we think about the stack at runtime, remember that there are a few general registers we need to constantly be aware of: where they point to and what they actually do.

• Base Pointer (EBP) – Points to the base of the local execution frame. Consider this points at the beginning of the current frame.

• Instruction Pointer (EIP) – Points to the current instruction.

• Stack Pointer (ESP) – Points to the top of the stack.

When the code, wherever it resides goes to execute the “strcpy” instruction, it will make a series of calls:

The last three instructions are the ones we are most interested in.

Notice that the call instruction lists an address, 4110AA: this is the address of the “strcpy”. (If you are using the disassemble view in Visual Studio, you can actually go back through the stack to find when this was loaded.)

The arguments have been pushed on the stack and the method is called.

A basic approach for short-cutting this service could be as simple as incrementing the return address, to shortcut operation. This is sort of the method that Aleph 1 outlines.

We don't really want that, instead, we are going to create a “NOP sludge” around our payload.

Basically, before we get to any of the instructions in our payload, there will be many NOP instructions. This is good, because all we really have to do at that point is jump into the NOP sludge. After that we can hit NOPs until we run into our payload instructions.

Let's step back for a moment or two and think about the stack.

Remember our earlier look at the concept of a buffer overflow. When a function returns from a call, it's redirected back to the next instruction after the “call” instruction. So this is how incrementing the return address would nicely mess with that. You can add 8 (each instruction is a WORD) a few times and get all kinds of interesting results.

But we don't really know what is going on in our service, behind the scenes. Most likely we won't have source code, so what do we do?

Well, if we could somehow execute another instruction, a JMP, we could somehow get into our payload. So how would we do that?

“kernel32.dll” keeps some instructions loaded, to do basic functions. We will need to use one to perform the jump to somewhere in our NOP sledge.

To find instruction addresses in kernel32.dll, we first need to find the base address for kernel32.dll. This can be done simply by using the dumpbin utility.

First, run dumpbin with the “/all” flag.

Beside generating copious amounts of text, this will generate a chunk of header data.

The image base points to the address that corresponds to the beginning of kernel32.dll. All other instructions can be indexed from here.

Now we can do an “/exports” dump and get:

To find the address of a given instruction, say “AddAtomW” we add the offset to the base address: 7C800000 + 00022469 = 7C822469.

If we're feeling brave, we can ask for a disassemble dump as well.

Using the jump instruction to go to the memory location in ESI, we should be able to hit the NOP sledge and execute our “Download and Execute” or any other payload.

Payload Options

We use download and execute, but this doesn't work on all systems. There are file, port, and threading permissions that may make this useless. In fact, using Metasploit we can take a popular exploit, and attacking a given machine may only allow us to connect back to netcat.

Because of this we have to be extremely specific when using an exploit. Even our “bad_service” will not work properly on the wrong version of Windows, with the wrong version installed.

Operating System Configuration

In the Windows world, an exploit needs to be small so it's not flagged by an IDS, but a robust exploit can run on many versions of Windows. Even a small exploit contains a good bit of guessing. As a result, when writing or using an existing exploit, one has to be very careful and conscientious about version compatibility.

Notes on Windows Security

Windows is inherently at a disadvantage when compared to other Operating Systems. While Unix (including the BSD based Mac OS X as well as Solaris, etc) and Linux have a weakness in the form of easier to write shellcode (fixed address instructions, less invasive security on the call stack), Windows has not had a major revision since NT 3.5. NT was originally based on OS 2/3.0, a joint Microsoft/IBM project.

When Microsoft decided to take the baseline elsewhere, they hired Dave Cutler who had designed and built VMS for Digital. One of the major VMS-like features of Windows is the dynamic addressing that gives us such a headache.

On the surface, dynamic library loading gives a slight advantage. COM makes it easy to reuse service/components. Many of these things sound very useful.

The problem is a lack of desire to “fix what isn't broken”, by which the only standard of “isn't broken” refers to “still works”.

The TCP/IP implementation in Winsock (as mentioned earlier) is still based on NetBIOS. SQLServer is still a port of Sybase.

Too much of the Windows security model consists of patching bad components and obfuscation. Case in point: the best Windows IDE for exploit writing is Visual C++ 6.0. It has a runtime stack disassembler, integrates with MASM and general, in general, seems very sound. However, due to losing the Java lawsuit to Sun, they will no longer sell any Visual Studio 6 software. Most of this is referred to as “security through obfuscation”.

In the world of .NET, COM has been modified, and older COM object often cease to operate correctly. There is a feature built into the .NET runtime to do real time marshaling between newer and older services, but it too is full of potential security holes, some of which have yet to be exploited.

The real danger in Windows is not that you can write bad services, it's that so many still exist. As Ross Andersen puts it in his book, there is no security through obfuscation. That's a lesson Microsoft has yet to take to heart.

Results

The bandwidth analysis portion of the current framework is centered around two issues:

• Connectivity: can we connect to a machine at a give IP address? This is fundamental, of course, to the whole paradigm of the framework. However, failure to receive data from a client indicates a lack of connectivity. Nodes that do not ever successfully infect and return data are, in effect, not active. Hence, no connectivity.

• Availability via transmission rate: our transmission rate is extremely basic:

Transmission rate (in bytes/second) = (total bytes transferred) / (time_2 - time_1)

Time, in the .NET/Win 32 world is measured in ticks, so we have to convert ticks to seconds. In seconds, a tick is 1e9 seconds.

Keep in mind, this is not the most strenuous bandwidth testing we could be doing. It’s more an indicator of what could potentially be done with this framework.

Network Scenario 1

The first scenario involves five identical machines:

|IP Address |Operating System |Attack |Transfer Rate |Payload |VMWare |

|128.198.60.131 |XP SP 0 |Custom Bad Service |48 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.132 |XP SP 0 |Custom Bad Service |64 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.133 |XP SP 0 |Custom Bad Service |54 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.134 |XP SP 0 |Custom Bad Service |56 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.135 |XP SP 0 |Custom Bad Service |46 kb/s |3900000 bytes |Server 1.2 |

The boxes are configured on a single network, sharing the same physical network card, but using different IP addresses.

Transfer times will vary, according to current network traffic. This is not an isolated system. They are all of the same order of magnitude, which is what we should expect. The median value for this scenario is 54 kb/s.

Note: these values are in kilo bytes/second, not the more common “bits”.

Network Scenario 2

The second scenario involves five identical machines:

|IP Address |Operating System |Attack |Transfer Rate |Payload |VMWare |

|128.198.60.131 |XP SP 0 |LSASS via Metasploit|32 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.132 |XP SP 0 |LSASS via Metasploit|54 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.133 |XP SP 0 |LSASS via Metasploit|47 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.134 |XP SP 0 |LSASS via Metasploit|45 kb/s |3900000 bytes |Server 1.2 |

|128.198.60.135 |XP SP 0 |LSASS via Metasploit|52 kb/s |3900000 bytes |Server 1.2 |

The boxes are configured on a single network, sharing the same physical network card, but using different IP addresses.

Again, transfer times are still in the same ballpark. We can could “cherry pick” the best results, but that would be a little gratuitous. As the network traffic varied (a fairly heavy usage period), we see varying results. These machines are basically identical, running the same client, and passing the same data. This second set just give us more data.

Note: these values are in kilo bytes/second, not the more common “bits”.

Network Scenario 3

The last scenario involves five identical machines, but is done with Ping, instead of an exploit-based test:

|IP Address |Operating System |Attack |Transfer Rate |Payload |VMWare |

|128.198.60.131 |XP SP 0 |ping |8.1 kb/s |65500 |Server 1.2 |

|128.198.60.131 |XP SP 0 |ping |3.4 kb/s |65500 |Server 1.2 |

|128.198.60.131 |XP SP 0 |ping |32.7 kb/s |65500 |Server 1.2 |

|128.198.60.131 |XP SP 0 |ping |32.7 kb/s |65500 |Server 1.2 |

|128.198.60.132 |XP SP 0 |ping |3.8 kb/s |65500 |Server 1.2 |

|128.198.60.132 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.132 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.132 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.133 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.133 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.133 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.133 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.134 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.134 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.134 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.134 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.135 |XP SP 0 |ping |1.9 kb/s |65500 |Server 1.2 |

|128.198.60.135 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.135 |XP SP 0 |ping |21.8 kb/s |65500 |Server 1.2 |

|128.198.60.135 |XP SP 0 |ping |13.1 kb/s |65500 |Server 1.2 |

The boxes are configured on a single network, sharing the same physical network card, but using different IP addresses.

Note: these values are in kilo bytes/second, not the more common “bits”.

Results Analysis

The results using ping are lower, as Windows Ping implementation limits a maximum payload size to 65500. So more data is transmitted by our test clients, and the results on ping are slightly different. This illustrates the differences between UDP and TCP (ping is done using UDP messages) as well as the priority the operating system gives to an application (payload downloaded executables) versus a low level service.

However, these figures are not different by great orders of magnitude... instead the differences are closer to a factor of two (with the average value for ping being around 20 kb/s).

As far as a basic test of connectivity and availability, these results are promising. There is plenty here to work with to forgo the use of ping and traceroute. However, serious network analysis will require flushing out the client-side executables to perform more elaborate tests.

Conclusion

The answer to the basic hypothesis: “Can we write an exploit to measure bandwidth?” based on this research is, yes. Indeed, we can write take an exploit, modify it and measure bandwidth. Despite this success, testing shows that this is dangerous.

Over the time spent developing this project, I became aware of the difficulties in finding default Windows services that could be properly exploited.

Initially, I attempted to exploit LSASS, as it is a default service in Windows. However, since the Sasser worm rained down terror on the IT world, a patch was released.

Oddly enough, this patch does not fix the bug. Instead, it adjusts the “threading apartment state” in Windows. In other words, it isolates the localized stack to prevent overflows. If you can find or hard code addresses for your instructions you too can “crash LSASS”. However, Windows will restart it in short order.

Truth be told, many exploits exist, however very few are default Windows services. Most of the common Windows exploits are in larger programs that typically will not be installed in a wide user base: SQL Server, Windows Server, Active Directory, etc.

The obvious solution would be to find a new exploit by researching new viruses or examining new software for potential holes. There are several problems with this:

Security: any exploit known that could be used should really be patched. Responsible IT teams would not willingly leave a hole in their network, no matter how useful it is. It is more likely that newer alternatives to ping and traceroute should be developed.

Virus Evolution: viruses are based on exploits that are old ideas. The buffer overflow exploit is so well known that it is only a matter of time before major vendors and, more importantly, developers become fully aware of the programming pitfalls that leave a service (software) open for exploitation.

“…it is clear that the vast majority of worms are derivative in nature. These worms have little or no originality and can often be prevented simply by protecting against the last worm. If this study has revealed anything in terms of trends, it is that the history of malicious code is, for the most part, evolutionary. By defending against yesterday's attacks, you can effectively protect against the vast majority of tomorrow's threats. This is not simply a case of closing the barn door after the horse has bolted – it is closing the barn door after one horse has bolted but before the other 99 get the idea to follow him.” (Kienzle, 9)

Development: if a new exploit is discovered, considerable work has to be done to use it to measure bandwidth. It’s not just a matter of writing a program, it’s a matter of writing shell code, which is a much more difficult process: both time consuming and involving an amount of guess work. In Windows systems, getting the shell code to work is a trial and error process that is as likely to make the system unstable, not by algorithmic flaw but due to the way DLLs are loaded into the kernel. Again, the ethical concerns arise: do we want to make our systems unstable?

While it is indeed possible to use an exploit to measure bandwidth, it is unlikely that this would be useful in a real network. With these concerns in mind, it is somewhat irresponsible to simply infect a network, merely to diagnose network problems, when other solutions are available. As research into networks progresses, newer measurement techniques will be refined. Even more pertinent to these concerns is the reality that network protocols will need to evolve to remedy other problems. This will eliminate a lot of these problems.

What can we say?

Ping and traceroute are obsolete. But anything that allows any system to perform bandwidth testing is a potential security hole. A known “bad service” shouldn’t be allowed to exist on machines in a network. Ping and traceroute can be abused to cause a DOS attack.

The solution, I am afraid would be to have a service specifically for bandwidth testing, that requires authentication. This would be entirely external to user access and must be written in a modern language to prevent buffer overflows.

Future Work

The framework is open to additions. With permission from the University, I plan on putting it out with a GPL (GNU public license). Continued work on example implementations (to utilize the random IP generation), the server, the GUI, and the calculation capabilities will extend its usefulness. This is really only the tip of the iceberg, more work can be done to fool IDS systems or implement some type of IP-Tunneling. Since it’s a never-ending task, it makes more sense to create a community around it (if there is enough interest).

Combined with Metasploit, it’s already good to go for penetration testing and experimentation on Windows networks. This only requires some minor recompilation of the lightweight framework components.

In the long run, this isn’t really applicable outside of the lab, for benevolent users. However, the same concepts could easily be used for DoS style attacks, which is always the case with this type of work.

References

Aleph One. Smashing the Stack For Fun and Profit. Phrack (Volume seven, issue forty-nine; November 8, 1996).

Anderson, Ross. Security Engineering: A Guide to Building Dependable Distributed Systems. New York, NY: John Wiley & Sons, Inc 2001.

Blum, Richard. C# Network Programming. Alameda, CA: Sybex, 2003.

Chen, G. and Gray, R. S. 2006. Simulating non-scanning worms on peer-to-peer networks. In Proceedings of the 1st international Conference on Scalable information Systems (Hong Kong, May 30 - June 01, 2006). InfoScale '06, vol. 152. ACM Press, New York, NY, 29. DOI=

Erickson, Jon. Hacking: The Art of Exploitation. San Francisco: No Startch Press, 2003.

Davis, Ralph. Win32 Network Programming. New York: Addison-Wesley, 1996.

Irvine, Kip R. Assembly Language For Intel-Based Computers. Saddle Hill, New Jersey: Prentice Hall, 2006.

Kienzle, D. M. and Elder, M. C. 2003. Recent worms: a survey and trends. In Proceedings of the 2003 ACM Workshop on Rapid Malcode (Washington, DC, USA, October 27 - 27, 2003). WORM '03. ACM Press, New York, NY, 1-10. DOI=

Koziol, Jack et al. The Shellcoders Handbook. Indianaplois: Wiley Publishing, 2004.

Petkov, K. 2005. Overcoming programming flaws: indexing of common software vulnerabilities. In Proceedings of the 2nd Annual Conference on information Security Curriculum Development (Kennesaw, Georgia, September 23 - 24, 2005). InfoSecCD '05. ACM Press, New York, NY, 127-134. DOI=

Pfleeger, Charles P and Shari Lawrence Pfleeger. Security in Computing. Saddle Hill, New Jersey: Prentice Hall, 2003.

Skape. Understanding Windows Shell code. December 6, 2003. . 16 July 2006

Spangler, Ryan. Analysis of the Microsoft Windows LSASS Exploit. Packet Watch.

Sk. Advances in Windows Shell code. Phrack (Volume seven, issue sixty-two; June 22, 2004).

Zou, C. C., Gong, W., and Towsley, D. 2002. Code red worm propagation modeling and analysis. In Proceedings of the 9th ACM Conference on Computer and Communications Security (Washington, DC, USA, November 18 - 22, 2002). V. Atluri, Ed. CCS '02. ACM Press, New York, NY, 138-147. DOI=

Demonstration

The demonstration will show a typical XP machine, go through basic Windows security, and perform several system tests.

1. Scan COM objects on target machine

1. Run fuzzer to demonstrate recon.

2. Show output.

2. Startup sniFF.

3. Startup Ethereal.

4. Attack a 3 node network of XP machines in the same configuration.

1. Do the first machine by hand.

1. Input IP.

2. Select os.

3. Select attack.

2. Autogenerate the rest.

1. Show success/failure of our infection algorithm.

3. Show data from each in the treeview.

4. Perform Ping to compare results.

5. Infect using Metasploit and a lite client. Compare results.

1. Use LSASS with Exce Download payload.

5. Focus on a single machine:

1. Run ethereal to show packet data.

2. Explain the different thread/connections/ports.

6. Attack a 3 node network of mixed machines.

1. Show failures.

7. Explain pluggable exploit architecture.

8. Demonstrate exploit upgrade.

Explain weaknesses and difficulties:

• Why are some attacks failing?

• What happens to the service after infection?

-----------------------

[1]

[2]

[3]

[4]

[5]

-----------------------

void bad_function(char *string)

{

char buffer[5];

strcpy(buffer, string);

}

int main()

{

char a_large_string[] = "This string is far too big. Far, far too big.";

bad_function(a_large_string);

exit(0);

}

int main()

{

printf("Hello World.");

exit(0);

}

0x401290 : 0x55

(gdb)

0x401291 : 0x89

(gdb)

0x401292 : 0xe5

(gdb)

0x401293 : 0x83

(gdb)

0x401294 : 0xec

(gdb)

0x401295 : 0x08

(gdb)

0x401296 : 0x83

(gdb)

0x401297 : 0xe4

(gdb)

0x401298 : 0xf0



[

object,

uuid(33477d48-e629-11db-8314-0800200c9a66),

helpstring("interface IGedankenService provides access to a fake service, which follows the normal parameters")

]

interface IGedankenService : IUnknown

{

HRESULT Foo(char *value);

};

IfId: 3c4728c5-f0ab-448b-bda1-6ce01eb0a6d5 version 1.0

Annotation: DHCP Client LRPC Endpoint

UUID: 00000000-0000-0000-0000-000000000000

Binding: ncalrpc:[dhcpcsvc]

IfId: 4b112204-0e19-11d3-b42b-0000f81feb9f version 1.0

Annotation:

UUID: 00000000-0000-0000-0000-000000000000

Binding: ncacn_np:\\\\EZWEAVE-HOME[\\PIPE\\DAV RPC SERVICE]

RpcMgmtInqIfIds succeeded

Interfaces: 9

c8cb7687-e6d3-11d2-a958-00c04f682e16 v1.0

338cd001-2244-31f1-aaaa-900038001003 v1.0

4b112204-0e19-11d3-b42b-0000f81feb9f v1.0

00000134-0000-0000-c000-000000000046 v0.0

18f70770-8e64-11cf-9af1-0020af6e72f4 v0.0

00000131-0000-0000-c000-000000000046 v0.0

00000143-0000-0000-c000-000000000046 v0.0

00000132-0000-0000-c000-000000000046 v0.0

8fb6d884-2388-11d0-8c35-00c04fda2795 v4.1

004113EF 8B 45 08 mov eax,dword ptr [str]

004113F2 50 push eax

004113F3 8D 8D 2C F8 FF FF lea ecx,[buffer]

004113F9 51 push ecx

004113FA E8 AB FC FF FF call @ILT+165(_strcpy) (4110AAh)

004113FF 83 C4 08 add esp,8

C:\WINDOWS\system32>dumpbin /all kernel32.dll > out.txt

Dump of file kernel32.dll

File Type: DLL

Section contains the following Exports for KERNEL32.dll

0 characteristics

41107EE1 time date stamp Wed Aug 04 00:14:57 2004

0.00 version

1 ordinal base

949 number of functions

949 number of names

ordinal hint name

1 0 ActivateActCtx (0000A634)

2 1 AddAtomA (000392A3)

3 2 AddAtomW (00022469)

4 3 AddConsoleAliasA (0007096F)

5 4 AddConsoleAliasW (00070931)

6 5 AddLocalAlternateComputerNameA (00058D0A)

OPTIONAL HEADER VALUES

10B magic #

7.10 linker version

81E00 size of code

6FE00 size of initialized data

0 size of uninitialized data

B436 address of entry point

1000 base of code

7F000 base of data

----- new -----

7C800000 image base

1000 section alignment

200 file alignment

3 subsystem (Windows CUI)

5.01 operating system version

5.01 image version

4.00 subsystem version

F4000 size of image

400 size of headers

FF848 checksum

.586p

.model flat

.STACK 100

.code

start:

xor eax,eax

jmp short GtCm

retv:

pop ecx

mov [ecx+23]

push eax

push ecx

mov eax,401870h

call eax

GtCm:

call retv

db "cmd.exe /c echo hello 0xN"

END start

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download