Fast RegEx Parsing on GPUs

FAST REGEX PARSING ON GPUS

David Lehavi & Sagi Schein HP Labs Israel

1? Copy?rigChotp2yr0ig11htH2e0w1l1etHt-ePwalcektat-rPdaDckeavredloDpemveenlotpCmoemnpt aCnoym, Lp.aPn.y, L.P.

PLEDGE

? Although this is a lecture about GPUs, this is the only piece of cool graphics you'll see:

(picture from Assassin's Creed Revelations)

2 ? Copyright 2011 Hewlett-Packard Development Company, L.P.

REGULAR EXPRESSIONS (RE)

? One of the most common programming languages.

o Standard syntax. o Platform independent.

.*a(bb)+a

? Commonly used for "text crunching applications":

o Security applications. o Network monitoring. o Database queries (SQL, Google's BigTable, ...).

? Common scenario: many strings & REs. o Pitfall: high run-time variance. o Throughput run time are critical. o Latency is not always critical.

3 ? Copyright 2011 Hewlett-Packard Development Company, L.P.

REGEXS AN AUTOMATON

Ken Thompson's approach

? Transform the regex to an automaton. o Parse to an abstract syntax tree. o "Translate" to an automaton concat

*

a

+

a

anything

concat

b b

4 ? Copyright 2011 Hewlett-Packard Development Company, L.P.

ANY

.*a(bb)+a

a b b

b a

WHEN AUTOMATONS BECOME DATA

Using virtual machines

? Virtual machines are not always VMware, or JVM, or the likes of them.

? Virtual machine "runs" on

o the string (data)

o and a table representing the automaton (bytecode).

? Looks only at the next character.

Input string cdabbbba

? Maintaine "Front" of active states.

NFA byte code

VM

next

a

1

b

2

b

3

a

End

ANY b -

next 0 1 -

5 ? Copyright 2011 Hewlett-Packard Development Company, L.P.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download