A regular expression is a way to match patterns in data using placeholder characters, called operators.
Elasticsearch supports regular expressions in the following queries:
Elasticsearch uses Apache Lucene's regular expression engine to parse these queries.
Luceneâs regular expression engine supports all Unicode characters. However, the following characters are reserved as operators:
. ? + * | { } [ ] ( ) " \
Depending on the optional operators enabled, the following characters may also be reserved:
To use one of these characters literally, escape it with a preceding backslash or surround it with double quotes. For example:
Note
The backslash is an escape character in both JSON strings and regular expressions. You need to escape both backslashes in a query, unless you use a language client, which takes care of this. For example, the string a\b
needs to be indexed as "a\\b"
:
PUT my-index-000001/_doc/1
{ "my_field": "a\\b" }
This document matches the following regexp
query:
GET my-index-000001/_search
{ "query": { "regexp": { "my_field.keyword": "a\\\\.*" } } }
Luceneâs regular expression engine does not use the Perl Compatible Regular Expressions (PCRE) library, but it does support the following standard operators.
.
?
+
*
{}
|
( ⦠)
[ ⦠]
Inside the brackets, -
indicates a range unless -
is the first character or escaped. For example:
A ^
before a character in the brackets negates the character or range. For example:
[^abc]
[^a-c]
[^-abc]
[^abc\-]
Note
Character range classes such as [a-c]
do not behave as expected when using case_insensitive: true
â they remain case sensitive. For example, [a-c]+
with case_insensitive: true
will match strings containing only the characters 'a', 'b', and 'c', but not 'A', 'B', or 'C'. Use [a-zA-Z]
to match both uppercase and lowercase characters.
This is due to a known limitation in Lucene's regular expression engine. See Lucene issue #14378 for details.
You can use the flags
parameter to enable more optional operators for Luceneâs regular expression engine.
To enable multiple operators, use a |
separator. For example, a flags
value of COMPLEMENT|INTERVAL
enables the COMPLEMENT
and INTERVAL
operators.
ALL
(Default)
""
(empty string)
ALL
value.
COMPLEMENT
~
operator. You can use ~
to negate the shortest following pattern. For example:
EMPTY
#
(empty language) operator. The #
operator doesnât match any string, not even an empty string.
If you create regular expressions by programmatically combining values, you can pass #
to specify "no string." This lets you avoid accidentally matching empty strings or other unwanted strings. For example:
INTERVAL
<>
operators. You can use <>
to match a numeric range. For example:
INTERSECTION
&
operator, which acts as an AND operator. The match will succeed if patterns on both the left side AND the right side matches. For example:
ANYSTRING
@
operator. You can use @
to match any entire string.
You can combine the @
operator with &
and ~
operators to create an "everything except" logic. For example:
NONE
Luceneâs regular expression engine does not support anchor operators, such as ^
(beginning of line) or $
(end of line). To match a term, the regular expression must match the entire string.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4