A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/Roy-Orbison/proposal-regex-escaping below:

Roy-Orbison/proposal-regex-escaping: Proposal for investigating RegExp escaping for the ECMAScript standard

This ECMAScript proposal seeks to investigate the problem area of escaping a string for use inside a Regular Expression.

This proposal is a stage 1 proposal and is awaiting implementation and more input. Please see the issues to get involved.

It is often the case when we want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens. For example, if we want to replace all occurrences of the the string let text = "Hello." which we got from the user, we might be tempted to do ourLongText.replace(new RegExp(text, "g")). However, this would match . against any character rather than matching it against a dot.

This is commonly-desired functionality, as can be seen from this years-old es-discuss thread. Standardizing it would be very useful to developers, and avoid subpar implementations they might create that could miss edge cases.

This would be a RegExp.escape function, such that strings can be escaped in order to be used inside regular expressions:

const str = prompt("Please enter a string");
const escaped = RegExp.escape(str);
const re = new RegExp(escaped, 'g'); // handles reg exp special tokens with the replacement.
console.log(ourLongText.replace(re));
RegExp.escape("The Quick Brown Fox"); // "The Quick Brown Fox"
RegExp.escape("Buy it. use it. break it. fix it.") // "Buy it\. use it\. break it\. fix it\."
RegExp.escape("(*.*)"); // "\(\*\.\*\)"
RegExp.escape("。^・ェ・^。") // "。\^・ェ・\^。"
RegExp.escape("😊 *_* +_+ ... 👍"); // "😊 \*_\* \+_\+ \.\.\. 👍"
RegExp.escape("\d \D (?:)"); // "\\d \\D \(\?\:\)"

This would be, for example, a template tag function RegExp.tag, used to produce a regular expression:

const str = prompt("Please enter a string");
const re = RegExp.tag`/${str}/g`;
console.log(ourLongText.replace(re));

The list of escaped identifiers should be kept in sync with what the regular expression grammar considers to be syntax characters that need escaping. For this reason, instead of hard-coding the list of escaped characters, we escape characters that are recognized as SyntaxCharacters by the engine. For example, if regexp comments are ever added to the specification (presumably under a flag), this ensures that they are properly escaped. Additionally, named capture groups must be accounted for.

Note that the languages differ in what they do (e.g. Perl does something different from C#), but they all have the same goal.

We've had a meeting about this subject, whose notes include a more detailed writeup of what other languages do, and the pros and cons thereof.

The one obscure case where this could suggest a cause for escaping, avoiding a range for user-supplied numbers in new RegExp('a{'+ RegExp.escape('3,5') + '}'), does not lead to any clearly safer results with escaping, as doing so will cause the sequence {3\,5} to be treated as a literal (rather than say throwing with bad input that an application could recover from).


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4