A Dynamic Algorithm for Range Match in TCAM lookup



Range Encoding for Range Matching Using a TCAM Coprocessor

Hao Che

The Department of Computer Science and Engineering

The University of Texas at Arlington

hche@cse.uta.edu

Abstract

One of the most critical resource management issues using TCAM for packet classification is how to effectively support rules with ranges, known as range matching. Since in general, multiple TCAM entries have to be allocated to represent a rule with ranges, it raises the question about whether the TCAM can effectively support range matching.

In this paper, an efficient range encoding scheme is introduced to allow one TCAM entry per rule for range matching in a TCAM coprocessor. The scheme allows one to select memory-hungry ranges to be encoded while having full control over the range code size. It does not assume the availability of any special hardware to assist the range encoding, except the TCAM coprocessor itself. Hence, the scheme can be readily implemented in a fully programmable network processor using a TCAM coprocessor for packet classification. Based on the analysis of the available statistics on range patterns, the scheme is found to be highly efficient in improving TCAM efficiency.

1. Introduction

A hardware based approach for packet classification is to use a ternary content addressable memory (TCAM) coprocessor to offload the packet classification tasks from a fully programmable network processor (FPNP). While waiting for the TCAM coprocessor to return an action for a matched rule for one thread, the FPNP does a context switching to pick up a different thread for processing without wasting clock cycles. A TCAM coprocessor finds a matched rule in O(1) clock cycle and therefore offers the highest possible lookup/matching performance [1] (For more details on how an FPNP and its TCAM coprocessor work together, please refer to [13]). Indeed, packet processing at OC-192 line rate using an FPNP and its TCAM coprocessor was reported [2].

One of the most critical resource management issues for packet classification using TCAM is how to effectively support rules with ranges, known as range matching. Since in general, multiple TCAM entries have to be allocated to represent a rule with range, it raises the question about whether the TCAM can effectively support range matching. For example, four sample policy filtering (PF) tables acquired from some policy gateways [8] showed that about 50% of the PF rules have their port numbers expressed as “>1023”. A rule with this port range takes up 6 entries in the TCAM [3]. Obviously, it would be costly in terms of memory consumption to directly support range matching using TCAM, given that TCAM memory resource is generally constrained.

Apparently, as FPNPs and TCAM coprocessors become widely available and fully integrated [4, 5, 10], there is an increasingly pressing need to address the range matching issue in the context of an FPNP using a TCAM coprocessor for packet classification. In response to this need, this paper proposes a range encoding scheme to solve this issue. It is purposely designed for immediate implementation in any existing FPNPs using a TCAM coprocessor for packet classification.

The approach taken is based on the idea of bit-map, i.e., mapping a range to an encoded bit. Bit-map based range encoding for packet classification was originally proposed by Lakshman and Stiliadis [12]. The application of bit-map based range encoding for packet classification using a TCAM were also reported [11] [3]. As in [12], the encoding scheme proposed by Lunteran and Engbersen [11] assumes the availability of multiple micro-engines to assist the range encoding in parallel. Since, in general, off-the-shelf FPNPs do not have multiple micro-engines embedded to serve this purpose, the approach proposed in [11] cannot be applied to the existing FPNPs. The encoding scheme proposed by Liu [3] demonstrated the potential benefit of range encoding in reducing TCAM resource consumption. However, since in Liu’s scheme, a rule sub-field to be encoded is eliminated altogether, all distinct values of that sub-field appeared in the PF table must be encoded. Since the number of distinct values for a sub-field in a PF table can be large and may change from time to time, the scheme is practically not very useful. For example, the statistic analysis in [7] showed that out of 4 ISP access control lists studied, 2 have more than 139 unique port numbers. This implies that for a five-tuple PF table, more than 139-16 =123 bits need to be added to each 104-bit rule to encode a 16-bit port sub-field, more than doubling the PF table size. Hence, the solution itself may create the same problem it attempts to solve.

The scheme proposed in this paper differs from the existing ones in its viability for immediate implementation in any existing FPNPs using a TCAM coprocessor for packet classification. First, the scheme does not make any assumption on the availability of any extra hardware, such as on-chip/off-chip memories or multiple micro-engines to aid the range encoding. Second, the scheme allows one to have full control over the code size or encoded rule size. This is highly desirable for any hardware solutions using memories with finite word size to accommodate the encoded rule tables. Third, the scheme takes into account of the impact of the TCAM memory structure on the efficiency of the encoding scheme. Finally, it applies to both of the existing two types of TCAM coprocessors, i.e., weighted TCAM (WEITCAM) and ordered TCAM (OTCAM) coprocessors [13].

The rest of the paper is organized as follows. Section 2 describes the range encoding scheme. Section 3 analyzes the performance of the scheme and proposes future research work.

2. A Range Encoding Scheme

2.1 General Description

To allow full control over whether a range is to be encoded or not and hence the range code size or encoded rule size, unlike the existing approaches, our approach does not make an attempt to eliminate a rule sub-field altogether from an encoded rule by encoding all the distinct values of that sub-field appeared in the rule PF table. Instead, all the sub-fields in a rule are kept, with the addition of a code sub-field encoding a selected number of popular ranges. By doing so, a rule with a range being encoded takes up only one TCAM entry, whereas a rule with a range without being encoded may still take up multiple TCAM entries. In what follows, the encoding scheme is explained in detail.

The scheme involves both the rule encoding and the search key encoding. The rule encoding is done by software and the encoded rules and the associated actions are written into the TCAM and its associated memory, respectively, in a TCAM coprocessor. It allows any K given number of rule sub-fields and any lk number of selected ranges in the k-th sub-field to be encoded, where k = 1, 2, …, K. For example, in a typical 5-tuple {source IP address, destination IP address, source port, destination port, protocol number} PF table, one may set K = 2, with k = 1 and 2 for the source port and destination port sub-fields, respectively. l1 = 2 source port ranges {> 1023, [6000, 6003]} and l2 = 3 destination port ranges {>1023, [256, 612], [20, 24]} are selected to be encoded. With bit-map, a code vector of size lc = l1 + l2 = 5 is appended to each rule to generate an encoded rule. All the rules containing the selected source/destination range values are encoded into the code vector. The exact rules for the code vector encoding will be presented in the next subsection

The search key encoding is done at wire-speed by the FPNP for every packet to be classified. It involves the k-th sub-field in the search key to be matched against all the selected ranges and their intersecting ranges in the k-th range table, where k = 1, 2, …, K. For the above example, there are two range tables containing range entries {> 1023, [6000, 6003]} and {>1023, [256, 612], [20, 24]}, respectively. A match of a particular range in the k-th range table (i.e., the value of the k-th sub-field in the search key falls into a range in the k-th range table) causes an intermediate lc-bit index vector to be returned to the FPNP. Upon receiving a returned intermediate index vector, the FPNP updates an index vector (initially set to NULL) by performing an OR operation between the current index vector and the returned intermediate index vector. Next, the encoded search key is formed by appending the final index vector to the original search key. Finally, this encoded search key is used to match against the encoded PF table in the TCAM coprocessor. The exact rules for the index vector encoding will be presented in the next section.

In summary, a PF table lookup with range encoding involving K-sub-field encoding requires K range table lookups for search key encoding and 1 encoded PF table lookup.

To allow deterministic search performance, a possible solution for the search key encoding is to perform direct range table indexing using an off-chip memory, as suggested by Liu [3]. Namely, allow the k-th sub-field size worth of memory entries to be allocated for the k-th range table with each entry containing an intermediate index vector corresponding to the range the memory address of the entry falls into. This, however, is not applicable to the existing FPNPs, such as Intel IPX1200 and AMCC nP7120, which support 32 and/or 64 bits external memory interfaces. For example, to support a 32-bit sub-field range table for direct indexing, a 232 x 32 or 232 x 64 bits memory space needs to be allocated, which is prohibitively large.

To ensure the applicability of the proposed range encoding scheme, we propose to use the TCAM coprocessor itself for the search key encoding. This involves adding K range tables and the associated intermediate index vectors to the TCAM and the associated memory, respectively, in the TCAM coprocessor. However, the encoding scheme does not exclude the possibility of using an on-chip TCAM for the search key encoding, provided that the FPNP in use has an on-chip TCAM in it.

Figure 1 describes how the memories in a TCAM coprocessor are partitioned to enable range encoding. In the k-th range table, mk ranges are listed, composed of lk selected ranges and their intersections for the k-th sub-field, where k = 1, …, K. Each range in the range table maps to an index vector in an associated memory. Note that since each range in a range table needs to be represented by multiple TCAM entries (not shown in Fig. 1), the corresponding intermediate index vector must be duplicated for every entry belonging to the same range. In the encoded PF table, a code vector (cv) is appended to each rule to form an encoded rule. Each encoded rule is mapped to an action in the associated memory.

The number of bits lc allocated to the code vector can be less than a slot, one slot, or multiple slots, and it is a design parameter. Here a slot is defined as the TCAM width, i.e, the word size associated with each entry. Each bit in the code vector is used to encode a distinct range. So a total of lc ranges can be encoded. In the case given in Fig. 1, each encoded rule (i.e., a rule plus its code vector) in the encoded rule table occupies three slots. As an example, a five-tuple rule needs 104 bits of memory space. So for a TCAM with 64-bit slot size, two slots or 128 bits of memory space are needed to accommodate a rule. This allows lc = 24 + 64nc bits to be allocated for the code vector, where nc is the number of extra slots allocated for the code vector. Note that the number of slots that can be allocated to each rule entry in a TCAM coprocessor is vendor solution specific, e.g., 1, 2, and 4-slot rule entries. This puts a constraint on the actually values the parameter nc may take.

Corresponding to the encoded rule, a search key is appended by an index vector of lc bits in length to form an encoded search key, which is used to match against the encoded rule table in the K+1th step. It takes K TCAM lookups to generate the index vector, one for each sub-field, as shown in Fig. 1.

Initially the index vector is set to null. In the k-th step, the FPNP passes the k-th sub-field from the search key to the TCAM coprocessor to match against the range table k, where k = 1, 2, …, K. If a match is found, the corresponding intermediate index vector for the matched range is returned to the FPNP. Otherwise, no intermediate index vector is returned and FPNP will proceed with the range encoding for the (k+1)th sub-field. If the FPNP does receive an intermediate index vector from the TCAM coprocessor, it updates the old index vector by doing an “OR” operation between the returned intermediate index vector and the old one. This procedure is repeated for all K steps to generate the final index vector.

Note that using TCAM for the search key encoding provides an immediate solution applicable to all the FPNPs using a TCAM as its coprocessor. However, a heavy use of the TCAM coprocessor for the search key encoding may result in reduced packet classification performance due to TCAM access contention. For example, for a 100 MHz TCAM coprocessor, up to 100/ ls Mpps lookup performance can be achieved for encoded search key of ls-slot in length. Since wire-speed forwarding at OC-192 line rate requires about 25 Mpps processing rate in the worst-case, assuming a minimum packet size of 40 bytes, a TCAM coprocessor dedicated to PF table lookup can allow up to 2/ls lookups per packet, or K = 2/ls -1 without compromising the throughput performance. For an FPNP supporting OC-48 line rate, this number increases to 8/ls or K = 8/ls – 1. Hence, when using the scheme proposed in this paper, care must be taken as to the selection of both code size lc and K value. Encoding more sub-fields and more ranges in a given sub-field lead to heavier TCAM access contention, although better memory efficiency. The key is to achieve a balance between lookup performance and memory efficiency.

2.2 Encoding Rules

In what follows, we discuss the encoding rules for the index vector, code vector, and rules separately.

A. Index Vector Encoding Rules. To demonstrate the basic concept, let’s first show, by example, how the index vector is encoded, given that a set of ranges for a single sub-field is encoded. In this example, we have 7 ranges to be encoded, i.e., {R1, R2, R3, R4, R5, R6, R7}. Some ranges are sub-ranges of others and some ranges overlap with one another. There are two overlapping rule groups (ORGs), i.e., {R1, R2, R3} and {R4, R5, R6, R7}

As shown in Fig. 2, R3 is a sub-range of R2 and again R2 is a sub-range of R1. R4 to R7 are overlapping ranges. In our index vector encoding, the index vector for the intersection of two ranges is different from the index vector for either of the component ranges. Hence, the intersections R45 and R47 for R4 and R5, and R4 and R7, respectively, need to be listed in the range table. For the 7 ranges in this example, a 7-bit index vector is used to encode the seven ranges {R1, R2, R3, R4, R5, R6, R7} as listed in the Table 1.

Table. 1 Index Vector Encoding

|Range |Weight |Index Vector |

|R1 |0 |{1000000} |

|R2 |1 |{1100000} |

|R3 |2 |{1110000} |

|R4 |0 |{0001000} |

|R5 |0 |{0000100} |

|R45 |1 |{0001100} |

|R6 |2 |{0001110} |

|R7 |1 |{0000101} |

|R47 |2 |{0001101} |

The encoding rules used to generate the above index vector are stated as follows:

1) The code vector for range Ri must have its i-th bit set to 1.

2) The common area Rp1,p2,…,pn of n overlapping rules Rp1, Rp2, …, Rpn needs to be expressed as a separate range. The corresponding code vector must have its p1-th, p2-th, …, pn-th bits set to 1s. This range must have higher matching priority or weight value than Rp1, Rp2, …, Rpn.

3) If a range Ri has Rj as its super range, its index vector must have its j-th bit set to 1 and Ri’s weight/priority must be higher than Rj’s.

4) All other bits in the index vector must be set to 0s.

As can be seen in the above table, for R3, the third bit of the index vector is set to 1 according to rule (1). Also since R3 has both R1 and R3 as its super ranges, according to rule (3), the first and second bits of the index vector are also set to 1. Finally, according to rule (4), the rest of the bits in the index vector for R3 are set to zero.

We also note that according to the index vector encoding rules, a sub-range has higher matching priority or weight value than its super-range, e.g., R1 versus R2. Also the intersection of overlapping ranges has higher matching priority or weight value than its component ranges, e.g., R45 versus R4 or R5.

With weights assigned as in table 1 for a WEITCAM [6], the ranges can be placed at any available memory location in the block allocated for the range table. However, for an OTCAM [6], the orders for ranges with priority relationship in Table 1 must be maintained.

B. Code Vector Encoding Rules. Using the same example for the index vector encoding, the code vector for {R1, R2, R3, R4, R5, R6, R7} is encoded as in Table 2.

Table. 2 Code Vector Encoding

|Range |Code Vector |

|R1 |{1**0000} |

|R2 |{11*0000} |

|R3 |{1110000} |

|R4 |{0001***} |

|R5 |{000*1**} |

|R6 |{0001110} |

|R7 |{000*101} |

The code vector encoding rules used to generate the above code vectors can be formally stated as follows:

1) The code vector for range Ri must have its i-th bit set to 1.

2) If a range Ri has Rj as its sub range, its index vector must have its j-th bit wildcarded, denoted as “*”.

3) If a range Ri has Rj as its super range, its index vector must have its j-th bit set to 1.

4) If Ri(Rj != (, the code vector for Ri must have its j-th bit wildcarded.

5) All other bits in the code vector must be set to 0s.

For example, the code vector for R5 has its 5th bit set to 1 according to rule (1). Since R5 has R6 and R7 as its sub ranges, the 6th and 7th bits of the code vector are wildcarded, according to rule (2). Also since R5 intersects with R4, according to rule (4), the 4th bit of the code vector is also wildcarded. Finally, all other bits in the code vector are set to 0s, according to rule (5).

The proposed index vector and code vector encoding rules ensure that the index vector for a search key sub-field whose value falls into an area belonging to multiple encoded overlapping ranges match and only match the code vectors for those encoded overlapping ranges.

C. Rule Encoding Rules. Finally, two encoding rules for encoded PF rules must be enforced.

Rule Encoding Rules:

1) In an encoded rule, a sub-field is insignificant and must be wildcarded, if the range in that sub-field is encoded.

2) For a rule without having any of the sub-fields encoded, the corresponding code vector is insignificant and must be either set to null or wildcarded.

Rule (1) must be enforced simply because the range information is already encoded in the code vector. Rule (2) allows fixed length rule matching.

3. Performance Analysis

Real world PF tables showed that the port range “>1023” constitutes a significant portion of the total number of ranges in practice. For example, [9] found that this range appears in 9% of the PF rules and [8] even indicated that 50% of the PF rules have this port range sub-field. [9] also reported that the second popular port range is 20-24. With this information, we can build a rather realistic scenario to test the performance of our scheme.

In Table 3, we made up a list of possible “popular” port ranges with the percentages no less than 5% of the total number of ranges appeared in a hypothetic PF table. Based on the reported data, we set “>1023” and “20-24” the most popular ones, taking 50% and 20% of the total ranges, respectively. The rest of the listed top ranges are also likely to appear frequently in a PF table. For example, range “6000-6063” represents X-win application. The required TCAM slots for each popular range are also given. We note that the third most popular range “1023 |6 |50 |

|A2 |20 – 24 |2 |20 |

|A3 |1, the worst-case TCAM occupancy expansion for our scheme is 50% at nr = 2.

Note in the above analysis, we assumed that nr + 1 is a valid TCAM entry size. However, in practice, this worst-case performance can be worse if nr + 1 is an invalid rule entry size for the TCAM coprocessor in use.

In this paper, the impact of the range update on the TCAM lookup performance was not studied. In a previous work [14], we proposed a general policy rule table update algorithm which allows zero impact on TCAM lookup performance. As our future work, the possibility of applying this previous result to allow efficient range update for the proposed range encoding scheme will be exploited.

References:

[1] P. Gupta and N. McKeown, “Algorithms for Packet Classification,” IEEE Network, March 2001.

[2] “AMCC Ships 10-Gbit/s Processor,” Light Reading, March 25, 2002.

[3] Huan Liu, “Efficient Mapping of Range Classifier into Ternary-CAM,” in the Proceedings of the 10th Symposium on High Performance Interconnects, August, 2002.

[4] A. Gottlieb, “Optimizing Next-Generation Application-Dependent Packet Classification,” .

[5] “IDT Introduces IP Coprocessors with Seamless QDR Interface to Intel’s New Network Processors,” , Feb. 2002.

[6] H. Che, “Encoded Range Matching for a TCAM Coprocessor,” http//crystal.uta.edu/~hche/index.

[7] M. E. Kounavis, A. Kumar, H. Vin, R. Yavatkar and A. T. Campbell, “Directions in Packet Classification for Network Processors,” Second Workshop on Network Processors (NP2), February 8-9, 2003

[8] F. Baboesu and G. Varghese, “Scalable Packet Classification,” in the Proceedings of ACM SIGCOMM’01, Sept. 2001.

[9] P. Gupta and N. McKeown, “Packet Classification on Multiple Fields,” in the Proceedings of ACM SIGCOMM’99, Sept. 1999.

[10] “IDT Samples Industry’s First Network Search Engines with a Fully Integrated Interface for AMCC Network Processors,”

[11] J. van Lunteren and A.P.J. Engbersen, "Dynamic multi-field packet classification," the Proceedings of the IEEE Global Telecommunications Conference GLOBECOM'02, vol. 3, pp. 2215 -2219, November 2002. (An extended version of this paper in: IEEE Journal of Selected Areas in Communications, vol. 21, no. 4, pp. 560- 571, May 2003)

[12] T. Lakshman and D. Stiliadis, “High-speed policy-based packet forwarding using efficient multi-dimensional range matching,” ACM Computer Communication. Review, vol. 28, no. 4, pp. 203-214, October 1998.

[13] H. Che, Y. Wang, and Z. J. Wang, “A Rule Grouping Technique for Weight-Based TCAM Coprocessors,” in the Proceedings of IEEE Hot Interconnects 2003, August 2003.

[14] Z. J. Wang, H. Che, M. Kumar, and S. Das, “Consistent TCAM Policy Table Update with Zero Impact on Data Path Processing,” submitted for publication.

-----------------------

R1

R2

R3

R4

R5

R6

R7

Fig. 1 Encoding scheme: K+1 TCAM searches for PF table matching in a TCAM coprocessor

Encoded PF

Table

Range Table 1

Associated Memory

TCAM

Step K+1: Encoded Search Key

Step K:

Sub-field K

Step 1:

Sub-field 1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

index vector KmK

index vector K1

index vector 1m1

index vector 11

.

.

.

action N

action 1

.

.

.

Range Table K

.

.

.

.

.

.

.

.

.

rang KmK

rang K1

rang 1m1

rang 11

.

.

.

cvN

cvN

rule N

rule N

cv1

.

.

.

Fig. 2 Example ranges for encoding

cv1

rule 1

rule 1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download