Java - Regular Expressions

JAVA - REGULAR EXPRESSIONS



Copyright ? tutorials

Java provides t he java.ut il.regex package for pat t ern mat ching wit h regular expressions. Java regular expressions are very similar t o t he Perl programming language and very easy t o learn. A regular expression is a special sequence of charact ers t hat helps you mat ch or find ot her st rings or set s of st rings, using a specialized synt ax held in a pat t ern. They can be used t o search, edit , or manipulat e t ext and dat a. The java.ut il.regex package primarily consist s of t he following t hree classes:

Pattern Class: A Pat t ern object is a compiled represent at ion of a regular expression. The Pat t ern class provides no public const ruct ors. To creat e a pat t ern, you must first invoke one of it s public st at ic compile met hods, which will t hen ret urn a Pat t ern object . These met hods accept a regular expression as t he first argument . Matcher Class: A Mat cher object is t he engine t hat int erpret s t he pat t ern and performs mat ch operat ions against an input st ring. Like t he Pat t ern class, Mat cher defines no public const ruct ors. You obt ain a Mat cher object by invoking t he mat cher met hod on a Pat t ern object . PatternSyntaxExceptio n: A Pat t ernSynt axExcept ion object is an unchecked except ion t hat indicat es a synt ax error in a regular expression pat t ern.

Capt uring Groups:

Capt uring groups are a way t o t reat mult iple charact ers as a single unit . They are creat ed by placing t he charact ers t o be grouped inside a set of parent heses. For example, t he regular expression (dog) creat es a single group cont aining t he let t ers "d", "o", and "g". Capt uring groups are numbered by count ing t heir opening parent heses from left t o right . In t he expression ((A)(B(C))), for example, t here are four such groups:

((A)(B(C))) (A) (B(C)) (C) To find out how many groups are present in t he expression, call t he groupCount met hod on a mat cher object . The groupCount met hod ret urns an int showing t he number of capt uring groups present in t he mat cher's pat t ern. There is also a special group, group 0, which always represent s t he ent ire expression. This group is not included in t he t ot al report ed by groupCount .

Example:

Following example illust rat es how t o find a digit st ring from t he given alphanumeric st ring:

import java.util.regex.Matcher; import java.util.regex.Pattern;

public class RegexMatches {

public static void main( String args[] ){

// String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)";

// Create a Pattern object Pattern r = pile(pattern);

// Now create matcher object. Matcher m = r.matcher(line); if (m.find( )) {

System.out.println("Found value: " + m.group(0) ); System.out.println("Found value: " + m.group(1) ); System.out.println("Found value: " + m.group(2) ); } else { System.out.println("NO MATCH"); } } }

This would produce t he following result :

Found value: This order was placed for QT3000! OK? Found value: This order was placed for QT300 Found value: 0

Regular Expression Synt ax:

Here is t he t able list ing down all t he regular expression met acharact er synt ax available in Java:

Subexpressio n

Mat c hes

^

Mat ches beginning of line.

$

Mat ches end of line.

.

Mat ches any single charact er except newline. Using m opt ion allows it t o

mat ch newline as well.

[...]

Mat ches any single charact er in bracket s.

[^...]

Mat ches any single charact er not in bracket s

\A

Beginning of ent ire st ring

\z

End of ent ire st ring

\Z

End of ent ire st ring except allowable final line t erminat or.

re*

Mat ches 0 or more occurrences of preceding expression.

re+

Mat ches 1 or more of t he previous t hing

re?

Mat ches 0 or 1 occurrence of preceding expression.

re{ n}

Mat ches exact ly n number of occurrences of preceding expression.

re{ n,}

Mat ches n or more occurrences of preceding expression.

re{ n, m}

Mat ches at least n and at most m occurrences of preceding expression.

a| b

Mat ches eit her a or b.

(re)

Groups regular expressions and remembers mat ched t ext .

(?: re)

Groups regular expressions wit hout remembering mat ched t ext .

(?> re)

Mat ches independent pat t ern wit hout backt racking.

\w

Mat ches word charact ers.

\W

Mat ches nonword charact ers.

\s

Mat ches whit espace. Equivalent t o [\t \n\r\f].

\S \d \D \A \Z \z \G \n \b

\B \n, \t , et c. \Q \E

Mat ches nonwhit espace. Mat ches digit s. Equivalent t o [0-9]. Mat ches nondigit s. Mat ches beginning of st ring. Mat ches end of st ring. If a newline exist s, it mat ches just before newline. Mat ches end of st ring. Mat ches point where last mat ch finished. Back-reference t o capt ure group number "n" Mat ches word boundaries when out side bracket s. Mat ches backspace (0x08) when inside bracket s. Mat ches nonword boundaries. Mat ches newlines, carriage ret urns, t abs, et c. Escape (quot e) all charact ers up t o \E Ends quot ing begun wit h \Q

Met hods of t he Mat cher Class:

Here is a list of useful inst ance met hods:

Index Met hods:

Index met hods provide useful index values t hat show precisely where t he mat ch was found in t he input st ring:

SN

Methods with Description

1 public int start()

Ret urns t he st art index of t he previous mat ch.

2 public int start(int group) Ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous mat ch operat ion.

3 public int end() Ret urns t he offset aft er t he last charact er mat ched.

4 public int end(int group) Ret urns t he offset aft er t he last charact er of t he subsequence capt ured by t he given group during t he previous mat ch operat ion.

St udy Met hods:

St udy met hods review t he input st ring and ret urn a Boolean indicat ing whet her or not t he pat t ern is f ound:

SN

Methods with Description

1 public boolean lookingAt()

At t empt s t o mat ch t he input sequence, st art ing at t he beginning of t he region, against t he pat t ern.

2 public boolean find() At t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern.

3 public boolean find(int start Reset s t his mat cher and t hen at t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern, st art ing at t he specified index.

4 public boolean matches() At t empt s t o mat ch t he ent ire region against t he pat t ern.

Replacement Met hods:

Replacement met hods are useful met hods for replacing t ext in an input st ring:

SN

Methods with Description

1 public Matcher appendReplacement(StringBuffer sb, String replacement)

Implement s a non-t erminal append-and-replace st ep.

2 public StringBuffer appendT ail(StringBuffer sb) Implement s a t erminal append-and-replace st ep.

3 public String replaceAll(String replacement) Replaces every subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he given replacement st ring.

4 public String replaceFirst(String replacement) Replaces t he first subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he given replacement st ring.

5 public static String quo teReplacement(String s) Ret urns a lit eral replacement St ring for t he specified St ring. This met hod produces a St ring t hat will work as a lit eral replacement s in t he appendReplacement met hod of t he Mat cher class.

The start and end Met hods:

Following is t he example t hat count s t he number of t imes t he word "cat s" appears in t he input st ring:

import java.util.regex.Matcher; import java.util.regex.Pattern;

public class RegexMatches {

private static final String REGEX = "\\bcat\\b"; private static final String INPUT =

"cat cat cat cattie cat";

public static void main( String args[] ){ Pattern p = pile(REGEX); Matcher m = p.matcher(INPUT); // get a matcher object int count = 0;

while(m.find()) { count++; System.out.println("Match number "+count); System.out.println("start(): "+m.start()); System.out.println("end(): "+m.end());

} } }

This would produce t he following result :

Match number 1 start(): 0 end(): 3 Match number 2 start(): 4 end(): 7 Match number 3 start(): 8 end(): 11 Match number 4 start(): 19 end(): 22

You can see t hat t his example uses word boundaries t o ensure t hat t he let t ers "c" "a" "t " are not merely a subst ring in a longer word. It also gives some useful informat ion about where in t he input st ring t he mat ch has occurred.

The st art met hod ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous mat ch operat ion, and end ret urns t he index of t he last charact er mat ched, plus one.

The matches and lookingAt Met hods:

The mat ches and lookingAt met hods bot h at t empt t o mat ch an input sequence against a pat t ern. The difference, however, is t hat mat ches requires t he ent ire input sequence t o be mat ched, while lookingAt does not .

Bot h met hods always st art at t he beginning of t he input st ring. Here is t he example explaining t he funct ionalit y:

import java.util.regex.Matcher; import java.util.regex.Pattern;

public class RegexMatches {

private static final String REGEX = "foo"; private static final String INPUT = "fooooooooooooooooo"; private static Pattern pattern; private static Matcher matcher;

public static void main( String args[] ){ pattern = pile(REGEX); matcher = pattern.matcher(INPUT);

System.out.println("Current REGEX is: "+REGEX); System.out.println("Current INPUT is: "+INPUT);

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download