Proceedings Template - WORD



Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Design Automation Conference ‘03, June 2-6, 2003, Anaheim, CA.

Copyright 2003 ACM 1-58113-000-0/00/0000…$5.00.

An Extensible System-On-Chip Internet Firewall

-----

-----

-----

-----

-----

-----

ABSTRACT

A single-chip, firewall has been implemented that performs packet filtering, content scanning, and per-flow queuing of Internet packets at Gigabit/second rates. All of the packet processing operations are performed using reconfigurable hardware within a single Xilinx Virtex XCV2000E Field Programmable Gate Array (FPGA). The SOC firewall processes headers of Internet packets in hardware with layered protocol wrappers. The firewall filters packets using rules stored in Content Addressable Memories (CAMs). The firewall scans payloads of packets for keywords using a hardware-based regular expression matching circuit. Lastly, the SOC firewall integrates a per-flow queuing module to mitigate the effect of Denial of Service attacks. Additional features can be added to the firewall by dynamic reconfiguration of FPGA hardware.

Categories and Subject Descriptors

I.5.3 [Pattern Recognition]: Design Methodology; B.4.1 [Data Communications]: Input/Output Devices; C.2.1 [Computer-Communication Networks]: Network Architecture and Design

General Terms

Design, Experimentation, Network Security

Keywords

System On Chip, FPGA, Internet, Firewall, Packet Scanning, Per-Flow Queuing, Network Intrusion Detection

INTRODUCTION

As the Internet has grown, demand for network security has significantly increased. Internet-connected machines continuously are the target of malicious attacks from machines located around the world. Internal hosts can be protected from remote attacks by filtering traffic through a firewall. As shown in Figure 1, firewalls typically reside between the backbone switches and the internal hosts. Firewalls drop packets that are known to be malicious and rate-limit traffic flows that attempt to transmit excessively large amounts of traffic. By placing multiple firewalls throughout a network, individual subnets can be isolated from each other and be protected from other hosts on the Internet.

Recently, new types of firewalls have been introduced with an increasing set of features. While some types of attacks have been thwarted by dropping packets based on the value of packet headers, new types of firewalls must scan the bytes in the payload of the packets as well. Further, new types of firewalls need to defend internal hosts from Denial of Service (DoS) attacks, which occur when remote machines flood traffic to a victim host at high rates [1]. Few existing firewalls have the ability to scan the full packet payload or provide protection against DOS attacks. Of the systems that do, most run in software and are not fast enough to perform those functions at high speeds [3]. There exists a need for hardware accelerated packet processing firewalls which maintain high throughput.

Custom Integrated Circuits (ICs) can be used to implement firewall functions at Gigabit/second rates. They achieve high throughput by performing operations in parallel and by processing packets in deep pipelines. In the past, hardware-based packet processing systems required multiple ASICs to filter and forward packets in hardware. Today, an integrated circuit with tens of millions of transistors can implement a firewall as a single System On Chip (SOC). A challenge in building firewalls is to make the device capable of protecting against both current and future threats [6]. Reconfigurable hardware provides both the logic density to implement a complex firewall while maintaining the flexibility to reconfigure and implement new functions.

SYSTEM ON CHIP FIREWALL

A System-On-Chip Internet firewall has been implemented on a Xilinx Virtex XCV2000E FPGA. In order to protect against current threats, the SOC firewall integrates circuits to filter headers, scan payloads, and buffer traffic. In order to protect against future threats, the SOC is extensible allowing insertion of new packet processing hardware modules.

The top-level architecture of the System On Chip firewall is shown in Figure 2. When data first enters the SOC, a set of layered protocol wrappers parse the headers of the Internet packets. Next, the payload scanner examines the content of the packets to identify keywords and/or regular expressions. Next, the CAM filter compares the fields in the header of the packet with a set of rules stored in Ternary Content Addressable Memory (TCAM). Some rules can cause the CAM filters to outright drop packets, while other rules are used to classify the packet and assign it with a flow identifier. After classification, the queue manager schedules when packets are transmitted from the flow buffer, which stores the packet in off-chip memory. Once scheduled, data is read from the flow buffer and transmitted out of the firewall. Additional features can be added to the system by inserting blocks along the data processing path.

1 Header Processing

Internet protocol packets contain both a header and a payload. The header contain multiple fields that specify the type of packet, the protocol of the packet, where a packet has come from, where is packet is destined to, the length of the packet, and other options relevant to the Internet protocols.

1 Layered Protocol Wrappers

To simplify the processing of the protocol fields on the SOC firewall, a set of layered protocol wrappers was implemented to process protocols at multiple layers [2]. At the lowest layer, data is segmented and reassembled from short cells into complete frames. At another layer of the protocol stack, the fields of the Internet Protocol (IP) packets are computed and verified. At the highest level of the protocol processing, the user-level data is separated from the headers and transport fields used by the network.

2 Content Addressable Memory Filters

Once the header has been processed, a Ternary Content Addressable Memory (TCAM) classifies packets as belonging to a specific flow. A diagram of a two-entry TCAM is shown in Figure 4. When a packet arrives, the packet’s source address, destination address, source port, destination port, and protocol are simultaneously compared to the value fields in all of the rows of the TCAM. After the bits are compared, a mask register is applied to the vector to select which bits of each row must match and which bits can be ignored. If all of the values match in all of the bit locations that are unmasked, then that row of the TCAM is considered to be a match. The flow identifier associated with the rule in the highest-priority matching TCAM is then assigned to flow.

2 Payload Processing

Many types of Internet traffic cannot be classified by examination of the packet headers. For example, the KaZaA program sometimes disguises packet headers to appear as through they were being sent from a web server. For network administrators who care about the security of their networks, it is important to be able to classify a packets based on the their content rather than just the values that appear in the packet headers.

1 Regular Expression Matching

In order to scan the payload of packets, a regular expression matching circuit was implemented. Regular expressions provide a shorthand means to specify the value of a string, a wildcard character (specified by ‘?’), or a string of multiple characters (specified by ‘*’). For example, the string “{A|a}lbert ? {E|e}instein” matches all four case variations of the name Albert Einstein and allows the middle initial to be an arbitrary character.

2 Implementation of the Payload Scanner

To generate high-speed hardware that searches for the regular expression, a design flow was created to automatically generate finite state machines from the specification of regular expressions. A match is detected when the sequence of arriving bytes cause the state machine to reach a matching state. In order to scan for multiple regular expressions, a sequence of scanning engines is instantiated. In order to achieve higher performance, pipelines can operate in parallel. A payload scanner searching for eight Regular Expressions (RE1-RE8) using four parallel search flows is illustrated in Figure 4.

3 Application using the Payload Scanner

A payload processing circuit has been implemented on the SOC firewall that scans email for unwanted messages, commonly referred to as SPAM. By scanning packets as they pass through the network, it is possible to identify SPAM before the message is forwarded to the endpoint host. To implement the SPAM filter, eight categories of phrases were identified that included terms that commonly appear in SPAM emails, such as “CALL NOW” and “MAKE MONEY FAST”. In total, a set of 34 case-insensitive regular expressions were specified. The terms were then compiled into hardware and then programmed into the FPGA to scan packets as they passed through the firewall. Whenever a regular expression search engine found specific content, a bit was set in content vector, as shown in figure 4. A rule was then programmed into the TCAM to drop every message that contained SPAM.

3 Flow Buffering

To provide Quality of Service (QoS) for traffic that passes through the network, the SOC firewall performs both class-based and per-flow queuing. Class-based queuing allows certain types of traffic to receive better service than other traffic. Per-flow queuing ensures that no single traffic flow can consume all of the network’s bandwidth.

To support the multiple classes of service, traffic flows are organized by the firewall into four classes. Traffic from flows in certain classes are given priority over traffic from flows in other classes. Multiple linked lists of packets are managed to support per-flow queuing. All management of queues and tracking of free memory space is computed in the FPGA hardware using constant-time, linked-list data structures maintained in SRAM.

A diagram of the flow buffer and queue manager is shown in Figure 5. The queue manager includes circuits to en-queue traffic flows, de-queue traffic flows, and to schedule flows for transmission. Within the scheduler, four separate queues of flows are maintained (one for each class).

When a packet arrives, the packet’s data is delivered to the flow buffer and the packet’s flow identifier is passed to the En-queue FSM in the Queue Manager. Using the flow ID, the En-queue FSM reads SRAM to retrieve the flow’s state. As shown in figure 6, each entry of the flow’s state table contains a pointer to the head and tail of a linked list of stored packets in SDRAM as well as counters to track the number of packets read and written to that flow.

Meanwhile, the flow buffer is used to store the packet in memory to the location specified by the flow’s tail pointer. The flow buffer includes a controller to store the packet in Synchronous Dynamic Random Access Memory (SDRAM) [4]. After writing a packet to the next available memory location, the value of the tail pointer is passed to the queue manager to identify the next available free memory location.

Within each class of traffic, the queue manager performs round-robin queuing of individual flows. When the first packet of a flow arrives that has no packets already buffered, the flow identifier is inserted into a scheduling queue for that packet’s class of service and the flow state table is updated. When another packet of a flow that is already scheduled arrives, the packet is simply appended to the linked list and the packet write count is incremented.

To transmit packets, the de-queue FSM reads a flow ID from the scheduler. The scheduler de-queues the flow from the next-available flow in the highest-priority class of traffic. The de-queue FSM then reads the flow state to obtain a pointer to head of the flow buffer’s packet storage. Also, the flow identifier is removed from the head of the scheduler’s queue. The flow identifier re-enters the tail of the same queue if that flow has additional packets to transmit.

EXTENSIBLE FEATURES

By implementing the firewall in an FPGA, new functions can be added by inserting new packet processing modules along the data paths within the SOC.

1 Other Modules developed for the SOC Firewall

In addition to the core features described in this paper, several other modules have been prototyped on the SOC firewall. Extensible modules have been implemented that perform the following functions:

• Virus blocking

• Content filtering

• Denial of Service Protection

• Decryption of AES and 3DES packets

• Bitmap image filtering

• Network Address Translation (NAT)

• Domain Name Service (DNS) caching

• IP Version 6 (IPV6) tunneling

• Resource Reservation Protocol (RSVP)

Some or all of these modules can be compiled into a top-level implementation of the SOC firewall. All Modules use standard interfaces to enable integration of components in a common System on Chip

2 Open Interfaces on the SOC Firewall

To facilitate the integration of these extensible modules into the SOC firewall, standard interfaces were defined on the top-level of the SOC firewall circuit. A diagram of the top-level of the firewall with one extensible module is shown in Figure 7. Each of the red lines indicates an interface where an extensible module can be attached. The module that inserts between the payload scanner and the CAM filter, for example, could be used to check for other properties of the packet before the packet is forwarded to the CAM table for lookup. The interfaces to the SRAM and SDRAM controller allow the module to access external memory, if needed. Other modules can integrate into some or all of the other interfaces. The SOC firewall is typically built with all of the core features and a mixture of one or more extensible modules. The number of extensible modules that can be compiled into the SOC firewall is only limited by the size of the FPGA.

DESIGN FLOW

The SOC firewall quickly synthesizes into hardware through the use of multiple design automation tools. The design flow shown in figure 8 depicts how to compile, verify, synthesize, place and route, and test the operation of the SOC firewall. The total time to iterate from source code modification to in-system testing of the SOC firewall with actual network traffic is approximately one half hour.

1 Verification of the SOC Firewall

The first step in building the SOC firewall involves compiling the VHDL source code. Once compiled, a quick simulation of a few packets that contain a mixture of proper and malicious traffic are used to verify that a circuit can correctly process packets. To save time, exhaustive testing is performed at speed in hardware after synthesis.

2 Synthesis of the SOC Firewall

The firewall synthesizes to hardware circuit components using Synplicity’s Symplify Pro tool. The resulting EDIF netlist is generated within a few minutes and then fed into Xilinx’s backend design flow. Pin locations specified in a constraint file are used to map the pins of the SOC firewall to appropriate pins of the Xilinx Virtex XCV2000E FPGA. Next, the place and route tool is run to implement the FPGA circuit. Lastly, the resulting bitstream is generated which contains the configuration data. The placement and routing phase of the design flow typically runs for 20-25 minutes on a Gigahertz-class processor.

3 Testing the SOC Firewall on the FPX Platform

The Field Programmable Port Extender platform was used to evaluate the performance of the SOC firewall with real Internet traffic. The FPX is an open hardware platform that includes two multi-gigabit/second network interfaces and dynamically reconfigurable hardware than can be reprogrammed over the network [7]. To implement the SOC firewall, the bitfile for circuit was uploaded into the Virtex XCV2000E on the FPX for in-system testing [8].

To verify the operation of the SOC firewall with high throughput traffic, actual network traffic was sent to the hardware over the network from remote hosts. Malicious packets were dropped, the SPAM was rate-limited, and all other flows received a fair share of bandwidth.

[pic]

Figure 9: Implementation on FPX Platform

RESULTS

The core components of the SOC firewall, including the layered protocol wrappers, TCAM packet filters, a pipeline of regular expression matching engines to detect SPAM, and the per-flow packet buffer with the SDRAM controller were synthesized into the Reprogrammable Application Device (RAD) of the Field-programmable Port Extender (FPX).

1 Device Utilization

Placement was constrained using Synplicity’s Amplify tool to lock the location of modules into specific regions of the FPGA. A view of the placed and routed Xilinx Virtex XCV2000E is shown in Figure 9. Note that the center region of the chip was left available for insertion of extensible modules.

The results after place and route for the synthesized SOC Firewall on the Xilinx Virtex XCV2000E are listed in Table 1. The core logic occupied 43% of the logic and 39% of the block RAMs.

Table 1. SOC Firewall Implementation Statistics

|Resource |Virtex XCV2000E Device |Utilization |

| |Utilization |Percentage |

|Logic Slices |8342 out of 19200 |43% |

|BlockRAMs |63 out of 160 |39% |

|External IOBs |286 out of 512 |55% |

3 Throughput

The components of the SOC firewall that implement the protocol wrappers, CAM filter, flow buffer, and queue manager were synthesized and operated at 62.5 MHz. Each of these components process 32 bits of data in every cycle, thus giving the SOC firewall a throughput of 32*62.5MHz = 2 Gigabits/second.

The regular expression scanning circuit for the SPAM pipeline synthesized at 37 MHz. Given that each pipeline processes 8 bits of data per cycle, the throughput of the SPAM filter is 8*37MHz=296 Megabits/second per pipeline. By running 8 SPAM pipelines in parallel, the throughput of the payload matching circuit achieves 8*8*37MHz=2.368 Gigabits/second.

CONCLUSIONS

An extensible firewall has been implemented as a reconfigurable System On Chip (SOC). In addition to the standard features implemented by other Internet firewalls, the SOC firewall can also perform high-throughput payload scanning and implement per-flow queuing. The circuit was implemented on a Xilinx XCV-2000E FPGA. The resulting bitfile was tested on the Field Programmable Port Extender (FPX) network platform.

By using parallel hardware and deeply pipelined circuits, the SOC firewall can process protocol headers with TCAMS and the entire payload using regular expression matching at 2 Gigabits/second. Denial of Service attacks are mitigated through the use of class-based and per-flow queuing. A region of gates in the FPGA was left available to be used for extensible plug-in modules. By programming adding modules to the hardware, the firewall can protect a network against additional threats.

ACKNOWLEDGMENTS

--- for contribution to the NCHARGE control software, ---- for contribution to the SDRAM flow buffer, --- for work on the Network Interface Device. --- for work on the regular expression matching circuit and enhancements to the protocol wrappers.

REFERENCES

1] Jose Brustoloni, Protecting Electronic Commerce from Distributed Denial-of-Service Attacks, Proceedings of the 11th International World Wide Web Conference (WWW2002), ACM, Honolulu, HI, May 2002

2] Florian Braun, John Lockwood, and Marcel Waldvogel, Protocol Wrappers for Layered Network Packet Processing in Reconfigurable Hardware, IEEE Micro, Volume 22, Number 3, Feb 2002, pp. 66-74.

3] R. Franklin, D. Carver, B. L. Hutchings, Assisting Network Intrusion Detection with Reconfigurable Hardware, FCCM'02

4] Sarang Dharmapurikar and John Lockwood, Synthesizable Design of a Multi-Module Memory Controller, Washington University, Department of Computer Science, Technical Report WUCS-01-26, October, 2001

5] Edson L. Horta, John W. Lockwood, David E. Taylor, David Parlour, Dynamic Hardware Plugins in an FPGA with Partial Run-time Reconfiguration. Design Automation Conference (DAC), New Orleans, LA, June 10-14, 2002, Paper 24.2.

6] John W. Lockwood, Evolvable Internet Hardware Platforms, NASA/DoD Workshop on Evolvable Hardware (EHW'01), Long Beach, CA, July 12-14, 2001, pp. 271-279.

7] John W. Lockwood, Jon S. Turner, David E. Taylor, Field Programmable Port Extender (FPX) for Distributed Routing and Queuing, ACM International Symposium on Field Programmable Gate Arrays (FPGA'2000), Monterey, CA, February 2000, pp. 137-144.

8] Todd Sproull, John W. Lockwood, David E. Taylor, Control and Configuration Software for a Reconfigurable Networking Hardware Platform, IEEE Symposium on Field-Programmable Custom Computing Machines, (FCCM), Napa, CA, April 24, 2002

-----------------------

Flow

[pic]

[pic]

Figure 3: Ternary CAM Filter

Figure 5: Flow Buffer and Queue Manager

[pic]

Figure 1: Internet Firewall Configuration

Dispatcher

Flow

RE8

RE7

RE6

RE5

RE4

RE3

RE2

RE1

RE8

RE7

RE6

RE5

RE4

RE3

Figure 7: Extensible Interfaces on the SOC Firewall (Shown in Red)

RE2

Figure 4: Regular Expression (RE) Payload Scanner

Figure 8: CAD Flow for the SOC Firewall

Figure 2: Block Diagram of System-On-Chip Firewall

RE1

RE8

RE7

RE6

RE5

RE4

RE3

RE2

RE1

RE8

RE7

RE6

RE5

RE4

RE3

RE2

RE1

Pipeline of Regular Expression scanning (RE) engines

Collector

Incoming

Packets

Outgoing

Packets

Figure 9: FPGA Layout of the SOC Firewall

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download