A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://learnbyexample.github.io/vim_reference/Regular-Expressions.html below:

Regular Expressions - Vim Reference Guide

Regular Expressions

This chapter will discuss regular expressions (regexp) and related features in detail. As discussed in earlier chapters:

Documentation links:

Recall that you need to add the / prefix for built-in help on regular expressions, :h /^ for example.

Flags

These flags are applicable for the substitute command but not the / or ? searches. Flags can also be combined, for example:

See :h s_flags for a complete list of flags and more details about them.

Anchors

By default, regexp will match anywhere in the text. You can use line and word anchors to specify additional restrictions regarding the position of matches. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a \ character (discussed in the Escaping metacharacters section later in this chapter).

End-of-line can be \r (carriage return), \n (newline) or \r\n depending on your operating system and the fileformat setting.

See :h pattern-atoms for more details.

As seen above, matching end-of-line character requires special attention. Which is why examples and descriptions in this chapter will assume you are operating line wise unless otherwise mentioned. You'll later see how \_ is used in many more places to include end-of-line in the matches.

Greedy Quantifiers

Quantifiers can be applied to literal characters, the dot metacharacter, groups, backreferences and character classes. Basic examples are shown below, more will be discussed in the sections to follow.

Greedy quantifiers will consume as much as possible, provided the overall pattern is also matched. That's how the Error.*valid example worked. If .* had consumed everything after Error, there wouldn't be any more characters to try to match valid. How the regexp engine handles matching varying amount of characters depends on the implementation details (backtracking, NFA, etc).

See :h pattern-overview for more details.

If you are familiar with other regular expression flavors like Perl, Python, etc, you'd be surprised by the use of \ in the above examples. If you use the \v very magic modifier (discussed later in this chapter), the \ won't be needed.

Non-greedy Quantifiers

Non-greedy quantifiers match as minimally as possible, provided the overall pattern is also matched.

See :h pattern-overview and stackoverflow: non-greedy matching for more details.

Character Classes

To create a custom placeholder for a limited set of characters, you can enclose them inside the [] metacharacters. Character classes have their own versions of metacharacters and provide special predefined sets for common use cases.

Here are some examples with character classes:

To include the end-of-line character, use \_ instead of \ for any of the above escape sequences. For example, \_s will help you match across lines. Similarly, use \_[] for bracketed classes.

The above escape sequences do not have special meaning within bracketed classes. For example, [\d\s] will only match \ or d or s. You can use named character sets in such scenarios. For example, [[:digit:][:blank:]] to match digits or space or tab characters. See :h :alnum: for full list and more details.

The predefined sets are also better in terms of performance compared to bracketed versions. And there are more such sets than the ones discussed above. See :h character-classes for more details.

Alternation and Grouping

Alternation helps you to match multiple terms and they can have their own anchors as well (since each alternative is a regexp pattern). Often, there are some common things among the regular expression alternatives. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions.

There can be tricky situations when using alternation. Say, you want to match are or spared — which one should get precedence? The bigger word spared or the substring are inside it or based on something else? The alternative which matches earliest in the input gets precedence, irrespective of the order of the alternatives.

In case of matches starting from the same location, for example spa and spared, the leftmost alternative gets precedence. Sort by longest term first if don't want shorter terms to take precedence.

Backreference

The groupings seen in the previous section are also known as capture groups. The string captured by these groups can be referred later using a backreference \N where N is the capture group you want. Backreferences can be used in both search and replacement sections.

Here are some examples:

Referring to the text matched by a capture group with a quantifier will give only the last match, not the entire match. Use a capture group around the grouping and quantifier together to get the entire matching portion. In such cases, the inner grouping is an ideal candidate to use non-capturing group.

Lookarounds

Lookarounds help to create custom anchors and add conditions within the searchpattern. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions.

Vim's syntax is different than those usually found in programming languages like Perl, Python and JavaScript. The syntax starting with \@ is always added as a suffix to the pattern atom used in the assertion. For example, (?!\d) and (?<=pat.*) in other languages are specified as \d\@! and \(pat.*\)\@<= respectively in Vim.

You can also specify the number of bytes to search for lookbehind patterns. This will significantly speed up the matching process. You have to specify the number between the @ and < characters. For example, _\@1<=ice will lookback only one byte before ice for matching purposes. \(cat.*\)\@10<!dog will lookback only ten bytes before dog to check the given assertion.

Atomic Grouping

As discussed earlier, both greedy and non-greedy quantifiers will try to satisfy the overall pattern by varying the amount of characters matched by the quantifiers. You can use atomic grouping to safeguard a pattern from further backtracking. Similar to lookarounds, you need to use \@> as a suffix, for example \(pattern\)\@>.

Set start and end of the match

Some of the positive lookbehind and lookahead usage can be replaced with \zs and \ze respectively.

As per :h \zs and :h \ze, these "Can be used multiple times, the last one encountered in a matching branch is used."

Magic modifiers

These escape sequences change certain aspects of the syntax and behavior of the search pattern that comes after such a modifier. You can use multiple such modifiers as needed for particular sections of the pattern.

Magic and nomagic Very magic

The default syntax of Vim regexp has only a few metacharacters like ., *, ^ and $. If you are familiar with regexp usage in programming languages such as Perl, Python and JavaScript, you can use \v to get a similar syntax in Vim. This will allow the use of more metacharacters such as (), {}, +, ? and so on without having to prefix them with a \ metacharacter. From :h magic documentation:

Use of \v means that after it, all ASCII characters except 0-9, a-z, A-Z and _ have special meaning

Very nomagic

From :h magic documentation:

Use of \V means that after it, only a backslash and terminating character (usually / or ?) have special meaning

Case sensitivity

These will override flags and settings, if any. Unlike the magic modifiers, you cannot apply \c or \C for a specific portion of the pattern.

Changing Case

These can be used in the replacement section:

Examples:

Alternate delimiters

From :h substitute documentation:

Instead of the / which surrounds the pattern and replacement string, you can use any other single-byte character, but not an alphanumeric character, \, " or |. This is useful if you want to include a / in the search pattern or replacement string.

Escape sequences

Certain characters like tab, carriage return, newline, etc have escape sequences to represent them. Additionally, any character can be represented using their codepoint value in decimal, octal and hexadecimal formats. Unlike character set escape sequences like \w, these can be used inside character classes as well. If the escape sequences behave differently in searchpattern and replacestring portions, they'll be highlighted in the descriptions below.

Using \% sequences to insert characters in replacestring hasn't been implemented yet. See vi.stackexchange: Replace with hex character for workarounds.

See ASCII code table for a handy cheatsheet with all the ASCII characters and conversion tables. See codepoints for Unicode characters.

To match the metacharacters literally (including character class metacharacters like -), i.e. to remove their special meaning, prefix those characters with a \ (backslash) character. To indicate a literal \ character, use \\. Depending on the pattern, you can also use a different magic modifier to reduce the need for escaping. Assume default magicness for the below examples unless otherwise specified.

The following can be used to match character class metacharacters literally in addition to escaping them with a \ character:

Replacement expressions

See :h usr_41.txt for details about Vim script.

See :h sub-replace-expression for more details.

See also stackoverflow: find all occurrences and replace with user input.

Miscellaneous Further Reading

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4