This chapter will discuss regular expressions (regexp) and related features in detail. As discussed in earlier chapters:
/searchpattern
search the given pattern in the forward direction?searchpattern
search the given pattern in the backward direction:range s/searchpattern/replacestring/flags
search and replace
:s
is short for the :substitute
commandreplacestring
portion is optional if you are not using flagsDocumentation links:
:substitute
commandFlagsRecall that you need to add the
/
prefix for built-in help on regular expressions, :h /^ for example.
g
replace all occurrences within a matching line
c
ask for confirmation before each replacementi
ignore case for searchpattern
I
don't ignore case for searchpattern
These flags are applicable for the substitute command but not the /
or ?
searches. Flags can also be combined, for example:
s/cat/Dog/gi
replace every occurrence of cat
with Dog
Cat
, cAt
, CAT
, etc will also be replacedi
doesn't affect the case of the replacement stringAnchorsSee :h s_flags for a complete list of flags and more details about them.
By default, regexp will match anywhere in the text. You can use line and word anchors to specify additional restrictions regarding the position of matches. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a \
character (discussed in the Escaping metacharacters section later in this chapter).
^
restricts the match to the start-of-line
^This
matches This is a sample
but not Do This
$
restricts the match to the end-of-line
)$
matches apple (5)
but not def greeting():
^$
match empty lines\<pattern
restricts the match to the start of a word
\<his
matches his
or to-his
or history
but not this
or _hist
pattern\>
restricts the match to the end of a word
his\>
matches his
or to-his
or this
but not history
or _hist
\<pattern\>
restricts the match between the start of a word and end of a word
\<his\>
matches his
or to-his
but not this
or history
or _hist
End-of-line can be
\r
(carriage return),\n
(newline) or\r\n
depending on your operating system and thefileformat
setting.
See :h pattern-atoms for more details.
.
match any single character other than end-of-line
c.t
matches cat
or cot
or c2t
or c^t
or c.t
or c;t
but not cant
or act
or sit
\_.
match any single character, including end-of-lineGreedy QuantifiersAs seen above, matching end-of-line character requires special attention. Which is why examples and descriptions in this chapter will assume you are operating line wise unless otherwise mentioned. You'll later see how
\_
is used in many more places to include end-of-line in the matches.
Quantifiers can be applied to literal characters, the dot metacharacter, groups, backreferences and character classes. Basic examples are shown below, more will be discussed in the sections to follow.
*
match zero or more times
abc*
matches ab
or abc
or abccc
or abcccccc
but not bc
Error.*valid
matches Error: invalid input
but not valid Error
s/a.*b/X/
replaces table bottle bus
with tXus
\+
match one or more times
abc\+
matches abc
or abccc
but not ab
or bc
\?
match zero or one times
\=
can also be used, helpful if you are searching backwards with the ?
commandabc\?
matches ab
or abc
. This will match abccc
or abcccccc
as well, but only the abc
portions/abc\?/X/
replaces abcc
with Xc
\{m,n}
match m
to n
times (inclusive)
ab\{1,4}c
matches abc
or abbc
or xabbbcz
but not ac
or abbbbbc
\{m,n\}
(ending brace is escaped)\{m,}
match at least m
times
ab\{3,}c
matches xabbbcz
or abbbbbc
but not ac
or abc
or abbc
\{,n}
match up to n
times (including 0
times)
ab\{,2}c
matches abc
or ac
or abbc
but not xabbbcz
or abbbbbc
\{n}
match exactly n
times
ab\{3}c
matches xabbbcz
but not abbc
or abbbbbc
Greedy quantifiers will consume as much as possible, provided the overall pattern is also matched. That's how the Error.*valid
example worked. If .*
had consumed everything after Error
, there wouldn't be any more characters to try to match valid
. How the regexp engine handles matching varying amount of characters depends on the implementation details (backtracking, NFA, etc).
See :h pattern-overview for more details.
Non-greedy QuantifiersIf you are familiar with other regular expression flavors like Perl, Python, etc, you'd be surprised by the use of
\
in the above examples. If you use the\v
very magic modifier (discussed later in this chapter), the\
won't be needed.
Non-greedy quantifiers match as minimally as possible, provided the overall pattern is also matched.
\{-}
match zero or more times as minimally as possible
s/t.\{-}a/X/g
replaces that is quite a fabricated tale
with XX fabricaXle
tha
, t is quite a
and ted ta
s/t.*a/X/g
replaces that is quite a fabricated tale
with Xle
since *
is greedy\{-m,n}
match m
to n
times as minimally as possible
m
or n
can be left out as seen in the previous sections/.\{-2,5}/X/
replaces 123456789
with X3456789
(here .
matched 2 times)s/.\{-2,5}6/X/
replaces 123456789
with X789
(here .
matched 5 times)Character ClassesSee :h pattern-overview and stackoverflow: non-greedy matching for more details.
To create a custom placeholder for a limited set of characters, you can enclose them inside the []
metacharacters. Character classes have their own versions of metacharacters and provide special predefined sets for common use cases.
[aeiou]
match any lowercase vowel character[^aeiou]
match any character other than lowercase vowels[a-d]
match any of a
or b
or c
or d
-
can be applied between any two characters\a
match any alphabet character [a-zA-Z]
\A
match other than alphabets [^a-zA-Z]
\l
match lowercase alphabets [a-z]
\L
match other than lowercase alphabets [^a-z]
\u
match uppercase alphabets [A-Z]
\U
match other than uppercase alphabets [^A-Z]
\d
match any digit character [0-9]
\D
match other than digits [^0-9]
\o
match any octal character [0-7]
\O
match other than octals [^0-7]
\x
match any hexadecimal character [0-9a-fA-F]
\X
match other than hexadecimals [^0-9a-fA-F]
\h
match alphabets and underscore [a-zA-Z_]
\H
match other than alphabets and underscore [^a-zA-Z_]
\w
match any word character (alphabets, digits, underscore) [a-zA-Z0-9_]
\W
match other than word characters [^a-zA-Z0-9_]
\s
match space and tab characters [ \t]
\S
match other than space and tab characters [^ \t]
Here are some examples with character classes:
c[ou]t
matches cot
or cut
\<[ot][on]\>
matches oo
or on
or to
or tn
as whole words only^[on]\{2,}$
matches no
or non
or noon
or on
etc as whole lines onlys/"[^"]\+"/X/g
replaces "mango" and "(guava)"
with X and X
s/\d\+/-/g
replaces Sample123string777numbers
with Sample-string-numbers
s/\<0*[1-9]\d\{2,}\>/X/g
replaces 0501 035 26 98234
with X 035 26 X
(numbers >=100 with optional leading zeros)s/\W\+/ /g
replaces load2;err_msg--\ant
with load2 err_msg ant
To include the end-of-line character, use
\_
instead of\
for any of the above escape sequences. For example,\_s
will help you match across lines. Similarly, use\_[]
for bracketed classes.
The above escape sequences do not have special meaning within bracketed classes. For example,
[\d\s]
will only match\
ord
ors
. You can use named character sets in such scenarios. For example,[[:digit:][:blank:]]
to match digits or space or tab characters. See :h :alnum: for full list and more details.
Alternation and GroupingThe predefined sets are also better in terms of performance compared to bracketed versions. And there are more such sets than the ones discussed above. See :h character-classes for more details.
Alternation helps you to match multiple terms and they can have their own anchors as well (since each alternative is a regexp pattern). Often, there are some common things among the regular expression alternatives. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd
in maths, you get a(b|c)d = abd|acd
in regular expressions.
\|
match either of the specified patterns
min\|max
matches min
or max
one\|two\|three
matches one
or two
or three
\<par\>\|er$
matches the whole word par
or a line ending with er
\(pattern\)
group a pattern to apply quantifiers, create a terser regexp by taking out common elements, etc
a\(123\|456\)b
is equivalent to a123b\|a456b
hand\(y\|ful\)
matches handy
or handful
hand\(y\|ful\)\?
matches hand
or handy
or handful
\(to\)\+
matches to
or toto
or tototo
and so onre\(leas\|ceiv\)\?ed
matches reed
or released
or received
There can be tricky situations when using alternation. Say, you want to match are
or spared
— which one should get precedence? The bigger word spared
or the substring are
inside it or based on something else? The alternative which matches earliest in the input gets precedence, irrespective of the order of the alternatives.
s/are\|spared/X/g
replaces rare spared area
with rX X Xa
s/spared\|are/X/g
will also give the same resultIn case of matches starting from the same location, for example spa
and spared
, the leftmost alternative gets precedence. Sort by longest term first if don't want shorter terms to take precedence.
s/spa\|spared/**/g
replaces spared spare
with **red **re
s/spared\|spa/**/g
replaces spared spare
with ** **re
The groupings seen in the previous section are also known as capture groups. The string captured by these groups can be referred later using a backreference \N
where N
is the capture group you want. Backreferences can be used in both search and replacement sections.
\(pattern\)
capture group for later use via backreferences\%(pattern\)
non-capturing group1
, second leftmost group is 2
and so on (maximum 9
groups)\1
backreference to the first capture group\2
backreference to the second capture group\9
backreference to the ninth capture group&
or \0
backreference to the entire matched portionHere are some examples:
\(\a\)\1
matches two consecutive repeated alphabets like ee
, TT
, pp
and so on
\a
refers to [a-zA-Z]
\(\a\)\1\+
matches two or more consecutive repeated alphabets like ee
, ttttt
, PPPPPPPP
and so ons/\d\+/(&)/g
replaces 52 apples 31 mangoes
with (52) apples (31) mangoes
(surround digits with parentheses)s/\(\w\+\),\(\w\+\)/\2,\1/g
replaces good,bad 42,24
with bad,good 24,42
(swap words separated by comma)s/\(_\)\?_/\1/g
replaces _fig __123__ _bat_
with fig _123_ bat
(reduce __
to _
and delete if it is a single _
)s/\(\d\+\)\%(abc\)\+\(\d\+\)/\2:\1/
replaces 12abcabcabc24
with 24:12
(match digits separated by one or more abc
sequences, swap the numbers with :
as the separator)
abc
since it isn't needed laters/\(\d\+\)\(abc\)\+\(\d\+\)/\3:\1/
does the same if only capturing groups are usedReferring to the text matched by a capture group with a quantifier will give only the last match, not the entire match. Use a capture group around the grouping and quantifier together to get the entire matching portion. In such cases, the inner grouping is an ideal candidate to use non-capturing group.
s/a \(\d\{3}\)\+/b (\1)/
replaces a 123456789
with b (789)
a 4839235
will be replaced with b (923)5
s/a \(\%(\d\{3}\)\+\)/b (\1)/
replaces a 123456789
with b (123456789)
a 4839235
will be replaced with b (483923)5
Lookarounds help to create custom anchors and add conditions within the searchpattern
. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions.
Vim's syntax is different than those usually found in programming languages like Perl, Python and JavaScript. The syntax starting with
\@
is always added as a suffix to the pattern atom used in the assertion. For example,(?!\d)
and(?<=pat.*)
in other languages are specified as\d\@!
and\(pat.*\)\@<=
respectively in Vim.
\@!
negative lookahead assertion
ice\d\@!
matches ice
as long as it is not immediately followed by a digit character, for example ice
or iced!
or icet5
or ice.123
but not ice42
or ice123
s/ice\d\@!/X/g
replaces iceiceice2
with XXice2
s/par\(.*\<par\>\)\@!/X/g
replaces par
with X
as long as whole word par
is not present later in the line, for example parse and par and sparse
is converted to parse and X and sXse
at\(\(go\)\@!.\)*par
matches cat,dog,parrot
but not cat,god,parrot
(i.e. match at
followed by par
as long as go
isn't present in between, this is an example of negating a grouping)\@<!
negative lookbehind assertion
_\@<!ice
matches ice
as long as it is not immediately preceded by a _
character, for example ice
or _(ice)
or 42ice
but not _ice
\(cat.*\)\@<!dog
matches dog
as long as cat
is not present earlier in the line, for example fox,parrot,dog,cat
but not fox,cat,dog,parrot
\@=
positive lookahead assertion
ice\d\@=
matches ice
as long as it is immediately followed by a digit character, for example ice42
or ice123
but not ice
or iced!
or icet5
or ice.123
s/ice\d\@=/X/g
replaces ice ice_2 ice2 iced
with ice ice_2 X2 iced
\@<=
positive lookbehind assertion
_\@<=ice
matches ice
as long as it is immediately preceded by a _
character, for example _ice
or (_ice)
but not ice
or _(ice)
or 42ice
Atomic GroupingYou can also specify the number of bytes to search for lookbehind patterns. This will significantly speed up the matching process. You have to specify the number between the
@
and<
characters. For example,_\@1<=ice
will lookback only one byte beforeice
for matching purposes.\(cat.*\)\@10<!dog
will lookback only ten bytes beforedog
to check the given assertion.
As discussed earlier, both greedy and non-greedy quantifiers will try to satisfy the overall pattern by varying the amount of characters matched by the quantifiers. You can use atomic grouping to safeguard a pattern from further backtracking. Similar to lookarounds, you need to use \@>
as a suffix, for example \(pattern\)\@>
.
s/\(0*\)\@>\d\{3,\}/(&)/g
replaces only numbers >= 100 irrespective of any number of leading zeros, for example 0501 035 154
is converted to (0501) 035 (154)
\(0*\)\@>
matches the 0
character zero or more times, but it will not give up this portion to satisfy overall patterns/0*\d\{3,\}/(&)/g
replaces 0501 035 154
with (0501) (035) (154)
(here 035
is matched because 0*
will match zero times to satisfy the overall pattern)s/\(::.\{-}::\)\@>par//
replaces fig::1::spar::2::par::3
with fig::1::spar::3
\(::.\{-}::\)\@>
will match only from ::
to the very next ::
s/::.\{-}::par//
replaces fig::1::spar::2::par::3
with fig::3
(matches from the first ::
to the first occurrence of ::par
)Some of the positive lookbehind and lookahead usage can be replaced with \zs
and \ze
respectively.
\zs
set the start of the match (portion before \zs
won't be part of the match)
s/\<\w\zs\w*\W*//g
replaces sea eat car rat eel tea
with secret
s/\(\<\w\)\@<=\w*\W*//g
or s/\(\<\w\)\w*\W*/\1/g
\ze
set the end of the match (portion after \ze
won't be part of the match)
s/ice\ze\d/X/g
replaces ice ice_2 ice2 iced
with ice ice_2 X2 iced
s/ice\d\@=/X/g
or s/ice\(\d\)/X\1/g
Magic modifiersAs per :h \zs and :h \ze, these "Can be used multiple times, the last one encountered in a matching branch is used."
These escape sequences change certain aspects of the syntax and behavior of the search pattern that comes after such a modifier. You can use multiple such modifiers as needed for particular sections of the pattern.
Magic and nomagic\m
magic mode (this is the default setting)\M
nomagic mode
.
, *
and ~
are no longer metacharacters (compared to magic mode)\.
, \*
and \~
will make them to behave as metacharacters^
and $
would still behave as metacharacters\Ma.b
matches only a.b
\Ma\.b
matches a.b
as well as a=b
or a<b
or acb
etcThe default syntax of Vim regexp has only a few metacharacters like .
, *
, ^
and $
. If you are familiar with regexp usage in programming languages such as Perl, Python and JavaScript, you can use \v
to get a similar syntax in Vim. This will allow the use of more metacharacters such as ()
, {}
, +
, ?
and so on without having to prefix them with a \
metacharacter. From :h magic documentation:
Use of
\v
means that after it, all ASCII characters except0
-9
,a
-z
,A
-Z
and_
have special meaning
\v<his>
matches his
or to-his
but not this
or history
or _hist
a<b.*\v<end>
matches c=a<b #end
but not c=a<b #bending
\v
is used after a<b
to avoid having to escape the first <
\vone|two|three
matches one
or two
or three
\vabc+
matches abc
or abccc
but not ab
or bc
s/\vabc?/X/
replaces abcc
with Xc
s/\vt.{-}a/X/g
replaces that is quite a fabricated tale
with XX fabricaXle
\vab{3}c
matches xabbbcz
but not abbc
or abbbbbc
s/\v(\w+),(\w+)/\2,\1/g
replaces good,bad 42,24
with bad,good 24,42
s/\(\w\+\),\(\w\+\)/\2,\1/g
From :h magic documentation:
Use of
\V
means that after it, only a backslash and terminating character (usually/
or?
) have special meaning
\V^.*{}$
matches ^.*{}$
literally\V^.*{}$\.\*abcd
matches ^.*{}$
literally only if abcd
is found later in the line
\V^.*{}$\m.*abcd
can also be used\V\^This
matches This is a sample
but not Do This
\V)\$
matches apple (5)
but not def greeting():
These will override flags and settings, if any. Unlike the magic modifiers, you cannot apply \c
or \C
for a specific portion of the pattern.
\c
case insensitive search
\cthis
matches this
or This
or THIs
and so on
th\cis
or this\c
and so on will also result in the same behavior\C
case sensitive search
\Cthis
match exactly this
but not This
or THIs
and so on
th\Cis
or this\C
and so on will also result in the same behaviors/\Ccat/dog/gi
replaces cat Cat CAT
with dog Cat CAT
since the i
flag gets overriddenThese can be used in the replacement section:
\u
Uppercases the next character\U
UPPERCASES the following characters\l
lowercases the next character\L
lowercases the following characters\e
or \E
will end further case changes\L
or \U
will also override any existing conversionExamples:
s/\<\l/\u&/g
replaces hello. how are you?
with Hello. How Are You?
\l
in the search section is equivalent to [a-z]
s/\<\L/\l&/g
replaces HELLO. HOW ARE YOU?
with hELLO. hOW aRE yOU?
\L
in the search section is equivalent to [A-Z]
s/\v(\l)_(\l)/\1\u\2/g
replaces aug_price next_line
with augPrice nextLine
s/.*/\L&/
replaces HaVE a nICe dAy
with have a nice day
s/\a\+/\u\L&/g
replaces HeLLo:bYe gOoD:beTTEr
with Hello:Bye Good:Better
s/\a\+/\L\u&/g
can also be used in this cases/\v(\a+)(:\a+)/\L\1\U\2/g
replaces Hi:bYe gOoD:baD
with hi:BYE good:BAD
From :h substitute documentation:
Instead of the
/
which surrounds the pattern and replacement string, you can use any other single-byte character, but not an alphanumeric character,\
,"
or|
. This is useful if you want to include a/
in the search pattern or replacement string.
s#/home/learnbyexample/#\~/#
replaces /home/learnbyexample/reports
with ~/reports
s/\/home\/learnbyexample\//\~\//
Certain characters like tab, carriage return, newline, etc have escape sequences to represent them. Additionally, any character can be represented using their codepoint value in decimal, octal and hexadecimal formats. Unlike character set escape sequences like \w
, these can be used inside character classes as well. If the escape sequences behave differently in searchpattern
and replacestring
portions, they'll be highlighted in the descriptions below.
\t
tab character\b
backspace character\r
matches carriage return for searchpattern
, produces newline for replacestring
\n
matches end-of-line for searchpattern
, produces ASCII NUL for replacestring
\n
can also match \r
or \r\n
(where \r
is carriage return) depending upon the fileformat
setting\%d
matches character specified by decimal digits
\%d39
matches the single quote character\%o
matches character specified by octal digits
\%o47
matches the single quote character\%x
matches character specified by hexadecimal digits (max 2 digits)
\%x27
matches the single quote character\%u
matches character specified by hexadecimal digits (max 4 digits)\%U
matches character specified by hexadecimal digits (max 8 digits)Using
\%
sequences to insert characters inreplacestring
hasn't been implemented yet. See vi.stackexchange: Replace with hex character for workarounds.
See ASCII code table for a handy cheatsheet with all the ASCII characters and conversion tables. See codepoints for Unicode characters.
To match the metacharacters literally (including character class metacharacters like -
), i.e. to remove their special meaning, prefix those characters with a \
(backslash) character. To indicate a literal \
character, use \\
. Depending on the pattern, you can also use a different magic modifier to reduce the need for escaping. Assume default magicness for the below examples unless otherwise specified.
^
and $
do not require escaping if they are used out of position
b^2
matches a^2 + b^2 - C*3
$4
matches this ebook is priced $40
\^super
matches ^superscript
(you need the \
here since ^
is at the customary position)[
and ]
do not require escaping if only one of them is used
b[1
matches ab[123
42]
matches xyz42] =
b\[123]
or b[123\]
matches ab[123] = d
[
in the substitute command requires careful consideration
s/b[1/X/
replaces b[1/X/
with nothings/b\[1/X/
replaces ab[123
with aX23
\Va*b.c
or a\*b\.c
matches a*b.c
&
in the replacement section requires escaping to represent it literally
s/and/\&/
replaces apple and mango
with apple & mango
The following can be used to match character class metacharacters literally in addition to escaping them with a \
character:
-
can be specified at the start or end of the list, for example [-0-5]
and [a-z-]
^
should be other than the first character, for example [+a^.]
]
should be the first character, for example []a-z]
and [^]a]
\=
when replacestring
starts with \=
, it is treated as an expressions/date:\zs/\=strftime("%Y-%m-%d")/
appends the current date
date:
to date:2024-06-25
s/\d\+/\=submatch(0)*2/g
multiplies matching numbers by 2
4 and 10
to 8 and 20
submatch()
function is similar to backreferences, 0
gives the entire matched string, 1
refers to the first capture group and so ons/\(.*\)\zs/\=" = " . eval(submatch(1))/
appends result of an expression
10 * 2 - 3
to 10 * 2 - 3 = 17
.
is the string concatenation operatoreval()
here executes the contents of the first capture group as an expressions/"[^"]\+"/\=substitute(submatch(0), '[aeiou]', '\u&', 'g')/g
affects vowels only inside double quotes
"mango" and "guava"
to "mAngO" and "gUAvA"
substitute()
function works similarly to the s
commandsearchpattern
replacestring
s
commandperldo s/\d+/$&*2/ge
changes 4 and 10
to 8 and 20
perl
interface is available with your Vim installation1,$
(the s
command works only on the current line by default)See :h usr_41.txt for details about Vim script.
See :h sub-replace-expression for more details.
MiscellaneousSee also stackoverflow: find all occurrences and replace with user input.
\%V
match inside the visual area only
s/\%V10/20/g
replaces 10
with 20
only inside the visual area\%V
, the replacement would happen anywhere on the lines covered by the visual selection\%[set]
match zero or more of these characters in the same order, as much as possible
spa\%[red]
matches spa
or spar
or spare
or spared
(longest match wins)
\vspa(red|re|r)?
or \vspa(red?|r)?
and so onap\%[[pt]ly]
matches ap
or app
or appl
or apply
or apt
or aptl
or aptly
\_^
and \_$
restrict the match to start-of-line and end-of-line respectively, useful for multiline patterns\%^
and \%$
restrict the match to start-of-file and end-of-file respectively~
represents the last replacement string
s/apple/banana/
followed by /~
will search for banana
s/apple/banana/
followed by s/fig/(~)/
will use (banana)
as the replacement stringRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4