Regular Expressions: The Complete Tutorial - Clemson University

Regular Expressions

The Complete Tutorial

Jan Goyvaerts

Regular Expressions: The Complete Tutorial

Jan Goyvaerts

Copyright ? 2006, 2007 Jan Goyvaerts. All rights reserved.

Last updated July 2007.

No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic,

mechanical, photocopying, recording, or otherwise, without written permission from the author.

This book is published exclusively at

Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is

implied. The information is provided on an ¡°as is¡± basis. The author and the publisher shall have neither liability nor

responsibility to any person or entity with respect to any loss or damages arising from the information contained in this

book.

i

Table of Contents

Tutorial................................................................................................................ 1

1. Regular Expression Tutorial ......................................................................................................................................... 3

2. Literal Characters............................................................................................................................................................ 5

3. First Look at How a Regex Engine Works Internally .............................................................................................. 7

4. Character Classes or Character Sets............................................................................................................................. 9

5. The Dot Matches (Almost) Any Character .............................................................................................................. 13

6. Start of String and End of String Anchors............................................................................................................... 15

7. Word Boundaries.......................................................................................................................................................... 18

8. Alternation with The Vertical Bar or Pipe Symbol ................................................................................................. 21

9. Optional Items .............................................................................................................................................................. 23

10. Repetition with Star and Plus ................................................................................................................................... 24

11. Use Round Brackets for Grouping.......................................................................................................................... 27

12. Named Capturing Groups ........................................................................................................................................ 31

13. Unicode Regular Expressions................................................................................................................................... 33

14. Regex Matching Modes ............................................................................................................................................. 42

15. Possessive Quantifiers ............................................................................................................................................... 44

16. Atomic Grouping ....................................................................................................................................................... 47

17. Lookahead and Lookbehind Zero-Width Assertions........................................................................................... 49

18. Testing The Same Part of a String for More Than One Requirement .............................................................. 52

19. Continuing at The End of The Previous Match.................................................................................................... 54

20. If-Then-Else Conditionals in Regular Expressions .............................................................................................. 56

21. XML Schema Character Classes .............................................................................................................................. 59

22. POSIX Bracket Expressions .................................................................................................................................... 61

23. Adding Comments to Regular Expressions ........................................................................................................... 65

24. Free-Spacing Regular Expressions........................................................................................................................... 66

Examples........................................................................................................... 67

1. Sample Regular Expressions....................................................................................................................................... 69

2. Matching Floating Point Numbers with a Regular Expression ............................................................................ 72

3. How to Find or Validate an Email Address............................................................................................................. 73

4. Matching a Valid Date ................................................................................................................................................. 76

5. Matching Whole Lines of Text................................................................................................................................... 77

6. Deleting Duplicate Lines From a File ....................................................................................................................... 78

8. Find Two Words Near Each Other........................................................................................................................... 79

9. Runaway Regular Expressions: Catastrophic Backtracking................................................................................... 80

10. Repeating a Capturing Group vs. Capturing a Repeated Group ........................................................................ 85

Tools & Languages........................................................................................... 87

1. Specialized Tools and Utilities for Working with Regular Expressions .............................................................. 89

2. Using Regular Expressions with Delphi for .NET and Win32............................................................................. 91

ii

3. EditPad Pro: Convenient Text Editor with Full Regular Expression Support .................................................. 92

4. What Is grep?................................................................................................................................................................. 95

5. Using Regular Expressions in Java ............................................................................................................................ 97

6. Java Demo Application using Regular Expressions..............................................................................................100

7. Using Regular Expressions with JavaScript and ECMAScript............................................................................107

8. JavaScript RegExp Example: Regular Expression Tester ....................................................................................109

9. MySQL Regular Expressions with The REGEXP Operator..............................................................................110

10. Using Regular Expressions with The Microsoft .NET Framework ................................................................111

11. C# Demo Application.............................................................................................................................................114

12. Oracle Database 10g Regular Expressions...........................................................................................................121

13. The PCRE Open Source Regex Library ...............................................................................................................123

14. Perl¡¯s Rich Support for Regular Expressions.......................................................................................................124

15. PHP Provides Three Sets of Regular Expression Functions ............................................................................126

16. POSIX Basic Regular Expressions ........................................................................................................................129

17. PostgreSQL Has Three Regular Expression Flavors .........................................................................................131

18. PowerGREP: Taking grep Beyond The Command Line ..................................................................................133

19. Python¡¯s re Module ..................................................................................................................................................135

20. How to Use Regular Expressions in REALbasic................................................................................................139

21. RegexBuddy: Your Perfect Companion for Working with Regular Expressions..........................................142

22. Using Regular Expressions with Ruby..................................................................................................................145

23. Tcl Has Three Regular Expression Flavors .........................................................................................................147

24. VBScript¡¯s Regular Expression Support...............................................................................................................151

25. VBScript RegExp Example: Regular Expression Tester ...................................................................................154

26. How to Use Regular Expressions in Visual Basic...............................................................................................156

27. XML Schema Regular Expressions .......................................................................................................................157

Reference..........................................................................................................159

1. Basic Syntax Reference ..............................................................................................................................................161

2. Advanced Syntax Reference......................................................................................................................................166

3. Unicode Syntax Reference ........................................................................................................................................170

4. Syntax Reference for Specific Regex Flavors.........................................................................................................171

5. Regular Expression Flavor Comparison.................................................................................................................173

6. Replacement Text Reference ....................................................................................................................................182

iii

Introduction

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You

can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations

such as *.txt to find all text files in a file manager. The regex equivalent is ?.*\.txt? .

But you can do much more with regular expressions. In a text editor like EditPad Pro or a specialized text

processing tool like PowerGREP, you could use the regular expression ?\b[A-Z0-9._%+-]+@[A-Z0-9.]+\.[A-Z]{2,4}\b? to search for an email address. Any email address, to be exact. A very similar regular

expression (replace the first \b with ^ and the last one with $) can be used by a programmer to check if the

user entered a properly formatted email address. In just one line of code, whether that code is written in Perl,

PHP, Java, a .NET language or a multitude of other languages.

Complete Regular Expression Tutorial

Do not worry if the above example or the quick start make little sense to you. Any non-trivial regex looks

daunting to anybody not familiar with them. But with just a bit of experience, you will soon be able to craft

your own regular expressions like you have never done anything else. The tutorial in this book explains

everything bit by bit.

This tutorial is quite unique because it not only explains the regex syntax, but also describes in detail how the

regex engine actually goes about its work. You will learn quite a lot, even if you have already been using

regular expressions for some time. This will help you to understand quickly why a particular regex does not

do what you initially expected, saving you lots of guesswork and head scratching when writing more complex

regexes.

Applications & Languages That Support Regexes

There are many software applications and programming languages that support regular expressions. If you are

a programmer, you can save yourself lots of time and effort. You can often accomplish with a single regular

expression in one or a few lines of code what would otherwise take dozens or hundreds.

Not Only for Programmers

If you are not a programmer, you use regular expressions in many situations just as well. They will make

finding information a lot easier. You can use them in powerful search and replace operations to quickly make

changes across large numbers of files. A simple example is ?gr[ae]y? which will find both spellings of the

word grey in one operation, instead of two. There are many text editors and search and replace tools with

decent regex support.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download