A Buffer Overflow Study - SJSU



A Buffer Overflow Study

Hui Zhu

Nov.15, 2002

Introduction

Buffer overflow is an old topic, but it is still the most common form of security attack of the last decade1. Moreover, buffer overflow vulnerabilities dominate in the area of remote network penetration.

Here I will present how buffer overflows work and may compromise a system or a network security, then focus on some existing protection techniques.

What is Buffer Overflow

A buffer is some part of memory which contains chunks of the same data types. Buffer overflow is to write past the bounds of an expected buffer and overwrite the next contiguous chunk of memory.

To understand how buffer overflow works, we first have a look at the memory organization:

0XFFFF 0X0000

| |Arguments |Stack |BSS |Data |Code | |

| |& | | | | | |

| |Environments | | | | | |

| | |Heap | | | | |

When a program is executed, its various elements (instructions, variables...) are mapped in memory, in a structured manner.

The highest zones contain the process environment as well as its arguments.

The next part of the memory consists of two sections, the stack and the heap, which are allocated at run time.

o The stack is used to store function arguments, local variables, or some information allowing to retrieve the stack state before a function call. This stack is based on a LIFO (Last In, First Out) access system, and grows toward the low memory addresses.

o Dynamically allocated variables are found in the heap; typically, a pointer refers to a heap address, if it is returned by a call to the malloc function.

The BSS and Data sections are dedicated to global variables, and are allocated at compilation time.

• The last memory section, Code, contains instructions (e.g. the program code) and may include read-only data.

Since stack and heap are allocated at run time and allow users’ write, this section of memory can be exploited for buffer overflow. Buffer overflows occur mainly because of the C language and partially because of poor programming practices. C language lacks array bounds checking, and the culture of C programmers encourages a performance-oriented style that avoids error checking where possible. For instance, many of the standard C library functions such as gets and strcpy do not do bounds checking by default.

How Can Buffer Overflow Turns Into Security Problem

Reading or writing past the end of a buffer can cause a number of diverse (and often unanticipated) behaviors: 1) programs can act in strange ways, 2) programs can fail completely, or 3) programs can proceed without any noticeable difference in execution.

Likewise, when a buffer overflows, the excess data may trample on other meaningful data that the program might wish to access in the future and can lead to a security problem. Usually, an attacker strive to exploit the vulnerabilities in following 2 ways:

1. In the simplest case, consider a Boolean flag allocated in memory directly after a buffer. Say that the flag determines whether or not the user running the program can access critical files. The default value of the flag is ‘F’. If a malicious user can overwrite the buffer, then he can change the flag value into ‘T’, and access to these files illegally.

[pic]

2. Another more often and more serious security problems caused by buffer overflows is stack-smashing. Stack-smashing attacks target a specific programming fault: careless use of data buffers allocated on the program's run-time stack, namely local variables and function arguments. A creative attacker can take advantage of a buffer overflow vulnerability through stack-smashing and then run arbitrary code.

• They can Insert some attack code somewhere and overwrite the stack in such a way that control gets passed to the attack code.

When a function is called in C, the caller begins by pushing the function parameters to the stack. Thereafter, the caller pushes the address of its next instruction --- the address where execution should continue when the function returns --- to the stack and jumps to the function. The callee, in turn, makes room on the stack for its local variables. In most computer architectures the stack grows from high memory addresses to low, so the memory layout after the call will look something like this:

[pic]

Locals Growth

By overflowing a buffer in the local variables, the attacker can overwrite the return address. This means that when the function is done, it will not return to the caller, instead it will jump to an address determined by the attacker.

• The attacker can also call any function in the program or in the libraries used by it, and specify arbitrary parameters. Thus, it is not necessary for the attacker to inject his own code into the buffer, since he can usually find an existing function with sufficiently devastating effects.

Commonly, attackers exploit buffer overflows to get an interactive session (shell) on the machine under the user-ID of root. The injected attack code is usually a short sequence of instructions that spawns a shell. The normal and most common type of shell code is a straight /bin/sh execve() call. This code calls execve() to execute /bin/sh which obviously spawns a shell . If the program being exploited runs with a high privilege level (such as root or administrator), then the attacker gets that privilege in the interactive session. The effect is to give the attacker a shell with root's privileges.

The most famous example of buffer flow is the Internet worm written by Cornell grad student Robert Morris. In November 1988, 6000 systems were shut down by the worm, just about cutting off half of all traffic on the Internet at that time. The worm exploited a Unix service “finger (Fingerd).” Fingerd is a daemon that responds to requests for a listing of current users, or specific information about a particular user. It reads its input from the network, and sends its output to the network. On many systems, it ran as the superuser or some other privileged user. The daemon, fingerd uses gets() to read the data from the client. As gets does no bounds checking on its argument, which is an array of 512 bytes and is allocated on the stack, a longer input message will overwrite the end of the stack, changing the return address. If the appropriate code is loaded into the buffer, that code can be executed with the privileges of the fingerd daemon.2

Engineering such an attack from scratch is non-trivial. Often, the attacks are based on reverse-engineering the attacked program, so as to determine the exact offset from the buffer to the return address in the stack frame, and the offset from the return address to the injected attack code. However, it is possible to soften these exacting requirements:

• The location of the return address can be approximated by simply repeating the desired return address several times in the approximate region of the return address.

• The offset to the attack code can be approximated by prepending the attack code with an arbitrary number of NOP instructions. The overwritten return address need only jump into the middle of the field of NOPs to hit the target.

• The cook-book descriptions of stack smashing attacks have made construction of buffer-overflow exploits quite easy. The only remaining work for a would-be attacker to do is to find a poorly protected buffer in a privileged program, and construct an exploit. Hundreds of such exploits have been reported in recent years

Techniques of Avoiding Buffer Overflow

Modern Programming Languages

Most modern programming languages are essentially immune to this problem, either because they automatically resize arrays (e.g., Perl, and Java), or because they normally detect and prevent buffer overflows (e.g., Ada95 and Java). However, the C language provides no protection against such problems, and C++ can be easily used in ways to cause this problem too.

Careful Use of C/C++ Library Functions

C users must avoid using functions that do not check bounds unless they've ensured the bounds will never get exceeded. Functions to avoid in most cases include: strcpy(), strcat(), sprintf(), and gets(). These should be replaced with functions such as strncpy(), strncat(), snprintf(), and fgets() respectively. Other functions that may permit buffer overruns include fscanf(), scanf(), vsprintf(), realpath(), getopt(), getpass(), streadd(), strecpy(), and strtrns().

Static and Dynamically Allocated Buffers

The fact that a buffer is of a fixed length may be exploitable. An alternative is to dynamically (re-) allocate all strings instead of using fixed-size buffers. This general approach is recommended by the GNU3 programming guidelines, mainly because it permits programs to handle arbitrarily-sized inputs (until they run out of memory). However, one must be prepared for dynamic allocation to fail.  The program must be designed to be fail-safe when memory is exhausted.  The memory may be exhausted at some other point in the program than the portion where you're worried about buffer overflows. Also, since dynamic reallocation may cause memory to be inefficiently allocated, it is entirely possible to run out of memory even though there is enough virtual memory available to the program to continue. In addition, before running out of memory the program will probably use a great deal of virtual memory  easily resulting in ``thrashing'', a situation in which the system spends all its time just paging in and out. This can have the effect of a denial of service attack.

Newer Libraries

Newer libraries for C include the strlcpy() and strlcat() functions. Both strlcpy and strlcat take the full size of the destination buffer as a parameter (not the maximum number of characters to be copied) and guarantee to NUL-terminate the result (as long as size is larger than 0). One nuisance is that such newer libraries are not, by default, installed in most systems.

Compilation Solutions in C/C++

Newer compilers perform bounds-checking4. Such tools provide one more layer of defense, but it's not wise to depend on this technique as your sole defense.

Non-executable user stack area5

It is possible to modify the kernel of any OS so that the stack segment is not executable. But even in the presence of non-executable stack, you can still do something like this: 1) overflow the buffer on the stack, so that the return value is overwritten by a pointer to the system() library function. 2) the next four bytes are crap (a "return pointer" for the system call, which you don't care about) 3) the next four bytes are a pointer to some random place in the shared library again that contains the string "/bin/sh" .You didn't have to write any code, the only thing you needed to know was where the library is loaded by default, you just select one specific commonly used version to crash. Suddenly you have a root shell on the system. So it's fairly trivial to do. In short, the non-executable stack may catch a few attacks for old binaries that have security problems, but the basic problem is that the binaries allow you to overwrite their stacks. And if they allow that, then they allow the above exploit. It probably takes all of five lines of changes to some existing exploit, and some random program to find out where in the address space the shared libraries tend to be loaded.

No set-user-id Programs?

An attacker targets set-user-id (suid) programs so that after the exploit he is the root, and can do arbitrary things.  So, some “people believe that if their program is not running suid root, they don't have to worry about security problems in their code, since the program can't be leveraged to achieve greater access levels. That idea has some merit, but is still a risky proposition. For one thing, you never know who is going to take your program and set the suid bit on the binary. When people can't get something to work properly, they get desperate. We've seen this sort of situation lead to entire directories of programs needlessly set setuid root."6

But any successful buffer overflow attack will give attackers more privileges than they previously had. Usually, such attacks involve the network. For example, a buffer overflow in a network server program that can be tickled by outside users may provide an attacker with a login on the machine. The resulting session has the privileges of the process running the compromised network service. This type of attack happens all the time. Even when such services don't run as root, as soon as a cracker gets an interactive shell on a machine, it is usually only a matter of time before the machine is "owned" -- that is, the attacker gains complete control over the machine, such as root access on a UNIX box or administrator access on a Windows NT box. Such control is typically garnered by running a different exploit through the interactive shell to escalate privileges.

Conclusion

In short, it's better to work first on developing a correct program that defends itself against buffer overflows. Then, after you've done this, by all means use techniques and tools as an additional safety net.

Notes:

1.

2.

3. The GNU Project was launched in 1984 to develop a complete Unix-like operating system which is a free software. For more information, please go to website:

4. Visit http:// www-ala. doc.ic.ac.uk/ ~phjk/ BoundsChecking.html

5. For more information, go to linux/README

6.

Some useful and interesting websites on this topic:

1. Complete and profound analysis, together with source code:



2. Introduction to buffer overflow:



3. Interesting demos:



-----------------------

Buffer

Boolean flag

F

0X0000

0XFFFF

Stack Growth

Attack Code

T T T T T T … T

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download