Java - Regular Expressions


Copyright ? tutorials

Java provides t he java.ut il.regex package for pat t ern mat ching wit h regular expressions. Java regular

expressions are very similar t o t he Perl programming language and very easy t o learn.

A regular expression is a special sequence of charact ers t hat helps you mat ch or find ot her st rings or

set s of st rings, using a specialized synt ax held in a pat t ern. They can be used t o search, edit , or

manipulat e t ext and dat a.

The java.ut il.regex package primarily consist s of t he following t hree classes:

Pattern Class: A Pat t ern object is a compiled represent at ion of a regular expression. The

Pat t ern class provides no public const ruct ors. To creat e a pat t ern, you must first invoke one

of it s public st at ic compile met hods, which will t hen ret urn a Pat t ern object . These met hods

accept a regular expression as t he first argument .

Matcher Class: A Mat cher object is t he engine t hat int erpret s t he pat t ern and performs

mat ch operat ions against an input st ring. Like t he Pat t ern class, Mat cher defines no public

const ruct ors. You obt ain a Mat cher object by invoking t he mat cher met hod on a Pat t ern

object .

PatternSyntaxExceptio n: A Pat t ernSynt axExcept ion object is an unchecked except ion

t hat indicat es a synt ax error in a regular expression pat t ern.

Capt uring Groups:

Capt uring groups are a way t o t reat mult iple charact ers as a single unit . They are creat ed by placing

t he charact ers t o be grouped inside a set of parent heses. For example, t he regular expression (dog)

creat es a single group cont aining t he let t ers "d", "o", and "g".

Capt uring groups are numbered by count ing t heir opening parent heses from left t o right . In t he

expression ((A)(B(C))), for example, t here are four such groups:





To find out how many groups are present in t he expression, call t he groupCount met hod on a

mat cher object . The groupCount met hod ret urns an int showing t he number of capt uring groups

present in t he mat cher's pat t ern.

There is also a special group, group 0, which always represent s t he ent ire expression. This group is

not included in t he t ot al report ed by groupCount .


Following example illust rat es how t o find a digit st ring from t he given alphanumeric st ring:

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class RegexMatches


public static void main( String args[] ){

// String to be scanned to find the pattern.

String line = "This order was placed for QT3000! OK?";

String pattern = "(.*)(\\d+)(.*)";

// Create a Pattern object

Pattern r = pile(pattern);

// Now create matcher object.

Matcher m = r.matcher(line);

if (m.find( )) {

System.out.println("Found value: " + );

System.out.println("Found value: " + );

System.out.println("Found value: " + );

} else {

System.out.println("NO MATCH");




This would produce t he following result :

Found value: This order was placed for QT3000! OK?

Found value: This order was placed for QT300

Found value: 0

Regular Expression Synt ax:

Here is t he t able list ing down all t he regular expression met acharact er synt ax available in Java:

Subexpressio n



Mat ches beginning of line.


Mat ches end of line.


Mat ches any single charact er except newline. Using m opt ion allows it t o

mat ch newline as well.


Mat ches any single charact er in bracket s.


Mat ches any single charact er not in bracket s


Beginning of ent ire st ring


End of ent ire st ring


End of ent ire st ring except allowable final line t erminat or.


Mat ches 0 or more occurrences of preceding expression.


Mat ches 1 or more of t he previous t hing


Mat ches 0 or 1 occurrence of preceding expression.

re{ n}

Mat ches exact ly n number of occurrences of preceding expression.

re{ n,}

Mat ches n or more occurrences of preceding expression.

re{ n, m}

Mat ches at least n and at most m occurrences of preceding expression.

a| b

Mat ches eit her a or b.


Groups regular expressions and remembers mat ched t ext .

(?: re)

Groups regular expressions wit hout remembering mat ched t ext .

(?> re)

Mat ches independent pat t ern wit hout backt racking.


Mat ches word charact ers.


Mat ches nonword charact ers.


Mat ches whit espace. Equivalent t o [\t \n\r\f].


Mat ches nonwhit espace.


Mat ches digit s. Equivalent t o [0-9].


Mat ches nondigit s.


Mat ches beginning of st ring.


Mat ches end of st ring. If a newline exist s, it mat ches just before newline.


Mat ches end of st ring.


Mat ches point where last mat ch finished.


Back-reference t o capt ure group number "n"


Mat ches word boundaries when out side bracket s. Mat ches backspace (0x08)

when inside bracket s.


Mat ches nonword boundaries.

\n, \t , et c.

Mat ches newlines, carriage ret urns, t abs, et c.


Escape (quot e) all charact ers up t o \E


Ends quot ing begun wit h \Q

Met hods of t he Mat cher Class:

Here is a list of useful inst ance met hods:

Index Met hods:

Index met hods provide useful index values t hat show precisely where t he mat ch was found in t he

input st ring:



Metho ds with Descriptio n

public int start()

Ret urns t he st art index of t he previous mat ch.


public int start(int gro up)

Ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous

mat ch operat ion.


public int end()

Ret urns t he offset aft er t he last charact er mat ched.


public int end(int gro up)

Ret urns t he offset aft er t he last charact er of t he subsequence capt ured by t he given group

during t he previous mat ch operat ion.

St udy Met hods:

St udy met hods review t he input st ring and ret urn a Boolean indicat ing whet her or not t he pat t ern is




Metho ds with Descriptio n

public bo o lean lo o kingAt()

At t empt s t o mat ch t he input sequence, st art ing at t he beginning of t he region, against t he

pat t ern.


public bo o lean find()

At t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern.


public bo o lean find(int start

Reset s t his mat cher and t hen at t empt s t o find t he next subsequence of t he input

sequence t hat mat ches t he pat t ern, st art ing at t he specified index.


public bo o lean matches()

At t empt s t o mat ch t he ent ire region against t he pat t ern.

Replacement Met hods:

Replacement met hods are useful met hods for replacing t ext in an input st ring:



Metho ds with Descriptio n

public Matcher appendReplacement(StringBuffer sb, String replacement)

Implement s a non-t erminal append-and-replace st ep.


public StringBuffer appendT ail(StringBuffer sb)

Implement s a t erminal append-and-replace st ep.


public String replaceAll(String replacement)

Replaces every subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he

given replacement st ring.


public String replaceFirst(String replacement)

Replaces t he first subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he

given replacement st ring.


public static String quo teReplacement(String s)

Ret urns a lit eral replacement St ring for t he specified St ring. This met hod produces a St ring

t hat will work as a lit eral replacement s in t he appendReplacement met hod of t he Mat cher


The start and end Met hods:

Following is t he example t hat count s t he number of t imes t he word "cat s" appears in t he input

st ring:

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class RegexMatches


private static final String REGEX = "\\bcat\\b";

private static final String INPUT =

"cat cat cat cattie cat";

public static void main( String args[] ){

Pattern p = pile(REGEX);

Matcher m = p.matcher(INPUT); // get a matcher object

int count = 0;

while(m.find()) {


System.out.println("Match number "+count);

System.out.println("start(): "+m.start());

System.out.println("end(): "+m.end());




This would produce t he following result :

Match number

start(): 0

end(): 3

Match number

start(): 4

end(): 7

Match number

start(): 8

end(): 11

Match number

start(): 19

end(): 22





You can see t hat t his example uses word boundaries t o ensure t hat t he let t ers "c" "a" "t " are not

merely a subst ring in a longer word. It also gives some useful informat ion about where in t he input

st ring t he mat ch has occurred.

The st art met hod ret urns t he st art index of t he subsequence capt ured by t he given group during

t he previous mat ch operat ion, and end ret urns t he index of t he last charact er mat ched, plus one.

The matches and lookingAt Met hods:

The mat ches and lookingAt met hods bot h at t empt t o mat ch an input sequence against a pat t ern.

The difference, however, is t hat mat ches requires t he ent ire input sequence t o be mat ched, while

lookingAt does not .

Bot h met hods always st art at t he beginning of t he input st ring. Here is t he example explaining t he

funct ionalit y:

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class RegexMatches


private static final String REGEX = "foo";

private static final String INPUT = "fooooooooooooooooo";

private static Pattern pattern;

private static Matcher matcher;

public static void main( String args[] ){

pattern = pile(REGEX);

matcher = pattern.matcher(INPUT);

System.out.println("Current REGEX is: "+REGEX);

System.out.println("Current INPUT is: "+INPUT);


In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download