A regex is used as a search pattern for strings. Using regex, we can find either a single match or multiple matches as well. We can look for any king of match in a string e.g. a simple character, a fixed string or any complex pattern of characters such email, SSN or domain names.
1. Regular expressionsRegular expressions are the key to powerful, flexible, and efficient text processing. It allow you to describe and parse text. Regular expressions can add, remove, isolate, and generally fold, spindle, and mutilate all kinds of text and data.
1.1. Metacharacters and literalsFull regular expressions are composed of two types of characters.
Regex gain usefulness from advanced expressive powers that their metacharacters provide. We can think of literal text acting as the words and metacharacters as the grammar. The words are combined with grammar according to a set of rules to create an expression that communicates an idea.
1.2. Java Regex ExampleLet’s see a quick Java example to use regex for reference.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String[] args) { Pattern pattern = Pattern.compile("Alex|Brian"); Matcher matcher = pattern.matcher("Generally, Alex and Brian share a great bonding."); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(" - " + matcher.group()); } } }
Program output.
Start index: 11 End index: 15 - Alex Start index: 20 End index: 25 - Brian2. Regex Metacharacters
Let’s explore the commonly used metacharacters to understand them better.
2.1. Start and End of the LineThe start and end are represented with '^'
(caret) and '$'
(dollar) signs. The caret and dollar are special in that they match a position in the line rather than any actual text characters themselves.
For example, the regular expression “cat” finds ‘cat’ anywhere in the string, but “^cat” matches only if the ‘cat’ is at the beginning of the line. e.g. words like ‘category’ or ‘catalogue’.
Similarly, “cat$” matches only if the ‘cat’ is at the end of the line. e.g. words like ‘scat’.
2.2. Character ClassesThe regular-expression construct "[···]"
, usually called a character class, lets us list the characters we want to allow at that point in the match. Character classes are useful in creating spell-checkers.
For example, while “e” matches just an e, and “a” matches just an a, the regular expression [ea]
matches either. e.g. sep[ea]r[ea]te
will match all the words “seperate” “separate” and “separete”.
Another example can be to allow capitalization of a word’s first letter e.g. [Ss]mith will allow the words smith
and Smith
both.
Similarly, <[hH][123456]>
will match all heading tags i.e. H1, H2, H3, H4, H5 and H6.
A dash " - "
indicates a range of characters. <[hH][1-6]>
is similar to <[hH][123456]>
. Other useful character ranges are [0-9]
and [a-z]
which match digits and English lowercase letters.
We can specify multiple ranges in single construct e.g. [0123456789abcdefABCDEF]
can be written as [0-9a-fA-F]
. Note that order in which ranges are given doesn’t matter.
Note that a dash is a metacharacter only within a character class, otherwise it matches the normal dash character. Also, if it is the first character listed in the range, it can’t possibly indicate a range, so it will not be meta character in this case.
2.2.2. Negated character classesIf we use negation sign ( ^ )
in a character class then the class matches any character that isn’t listed. e.g. [^1-6]
matches a character that’s not 1 through 6.
The metacharacter ' . '
is a shorthand for a character class that matches any character. Note that dots are not metacharacters when they are used within character classes. Within character class, it is a simple character only.
For example, 06.24.2019
will match 06/24/2019
or 06-24-2019
or 06.24.2019
. But06[.]24[.]2019
will match only 06.24.2019
.
Pipe symbol '|'
allows you to combine multiple expressions into a single expression that matches any of the individual ones.
For example, “Alex” and “Brian” are separate expressions, but "Alex|Brian"
is one expression that matches either of both.
Similar to dot, pipe is not metacharacter when it is used within character class. Within character class, it is a simple character only.
For example, to match the words “First” or “1st”, we can write regex – “(First|1st)” or in shorthand "(Fir|1)st"
.
Java has inbuilt APIs (java.util.regex
) to work with regular expressions. We do not need any 3rd party library to run regex against any string in Java.
Java Regex API provides 1 interface and 3 classes :
Matcher
object that can match arbitrary character sequences against the regular expression.
Pattern p = Pattern.compile("abc"); Matcher m = p.matcher("abcabcabcd"); boolean b = m.matches(); //true
Look at these classes and important methods in more detail.
3.1. Pattern classIt represents the compiled representation of a regular expression. To use Java regex API, we must compile the regular expression to this class.
After compilation, it’s instance can be used to create a Matcher
object that can match lines/strings against the regular expression.
Note that many matchers can share the same pattern. State information during processing is kept inside Matcher
instance.
Instances of this class are immutable and are safe for use by multiple concurrent threads.
It is the main class that performs match operations on a string/line by interpreting a Pattern
. Once created, a matcher can be used to perform the different kinds of match operations.
This class also defines methods for replacing matched sub-sequences with new strings whose contents can, if desired, be computed from the match result.
Instances of the this class are not thread safe.
find()
method.find()
method. It returns index of character next to last matching character.MatchResult
.Read below given examples to understand the usage of regular expressions to solve these specific problems in applications.
Regular Expression for Email AddressLearn to match email addresses using regular expressions in java
^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$Regular Expression for Password Validation
Learn to match passwords using regular expressions in java
((?=.*[a-z])(?=.*d)(?=.*[@#$%])(?=.*[A-Z]).{6,16})Regular Expression for Trademark Symbol
Learn to match trademark symbol using regular expressions in java
\u2122Regular Expression for Any Currency Symbol
Learn to match currency symbol using regular expressions in java
\\p{Sc}Regular Expression for Any Character in “Greek Extended” or Greek script
Learn to match character in greek extended and greek script using regular expressions in java
\\p{InGreek} and \\p{InGreekExtended}Regular Expression for North American Phone Numbers
Learn to match north american phone numbers using regular expressions in java
^\\(?([0-9]{3})\\)?[-.\\s]?([0-9]{3})[-.\\s]?([0-9]{4})$Regular Expression for International Phone Numbers
Learn to match international phone numbers using regular expressions in java
^\+(?:[0-9] ?){6,14}[0-9]$Regular Expression for Date Formats
Learn to match date formats using regular expressions in java
^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$Regular Expression for Social Security Numbers (SSN)
Learn to match SSNs using regular expressions in java
^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$Regular Expression for International Standard Book Number (ISBNs)
Learn to match ISBNs using regular expressions in java
^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3}) [- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$) (?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$Regular Expression for US Postal Zip Codes
Learn to match US Postal Codes using regular expressions in java
^[0-9]{5}(?:-[0-9]{4})?$Regular Expression for Canadian Postal Zip Codes
Learn to match Canadian Postal Codes using regular expressions in java
^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$Regular Expression for U.K. Postal Codes (Postcodes)
Learn to match U.K. Postal Codes using regular expressions in java
^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$Regular Expression for Credit Card Numbers
Learn to match Credit Card Numbers using regular expressions in java
^(?:(?4[0-9]{12}(?:[0-9]{3})?)| (?5[1-5][0-9]{14})| (?6(?:011|5[0-9]{2})[0-9]{12})| (?3[47][0-9]{13})| (?3(?:0[0-5]|[68][0-9])?[0-9]{11})| (?(?:2131|1800|35[0-9]{3})[0-9]{11}))$More Regular Expression Examples
Match Start or End of String (Line Anchors)
Match any character or set of characters
Drop me your questions related to this java regex tutorial in comments.
Happy Learning !!
References:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4