ECMAScript RegExp
Match Indices provide additional information about the start and end indices of captured substrings relative to the start of the input string.
A polyfill can be found in the regexp-match-indices
package on NPM.
NOTE: This proposal was previously named "RegExp Match Array Offsets", but has been renamed to more accurately represent the current status of the proposal.
Stage: 4 Champion: Ron Buckton (@rbuckton)
For detailed status of this proposal see TODO, below.
Today, ECMAScript RegExp
objects can provide information about a match when calling the exec
method. This result is an Array
containing information about the substrings that were matched, along with additional properties to indicate the input
string, the index
in the input at which the match was found, as well as a groups
object containing the substrings for any named capture groups.
However, there are several more advanced scenarios where this information may not necessarily be sufficient. For example, an ECMAScript implementation of TextMate Language syntax highlighting needs more than just the index
of the match, but also the start and end indices for individual capture groups.
As such, we propose the adoption of an additional indices
property on the array result (the substrings array) of the RegExpBuiltInExec abstract operation (and thus the result from RegExp.prototype.exec()
, String.prototype.match
, etc.). This property would itself be an indices array containing a pair of start and end indices for each captured substring. Any unmatched capture groups would be undefined
, similar to their corresponding element in the substrings array. In addition, the indices array would itself have a groups
property containing the start and end indices for each named capture group.
Why UseNOTE: For performance reasons,
indices
will only be added to the result if thed
flag is specified.
d
For the RegExp Flag
We chose d
due to its presence in the word indices
, which is the basis for the naming of the feature (i.e., lastIndex
on a RegExp, index
on a match, etc. The character i
is already in use for ignore-case, and n
has precedence in other engines for handling capturing vs. non-capturing groups. This is similar to the "sticky" flag using the y
character, since s
was used for dot-all.
Why not use o
and offsets
instead of d
and indices
? Our goal is to align the name of the property with the existing nomenclature on RegExp (i.e., lastIndex
and index
).
Does d
have a different meaning in other engines? Yes and no. For the few engines that do have a d
flag (Onigmo, Perl, and java.util.regex), the meanings differ. Onigmo and Perl both use the d
flag for backwards-compatiblity (and Perl's documentation seems strongly worded towards discouraging its use), while java.util.regex uses d
for the treatment of new-line handling. You can find a full list of the flags supported by 46 different RegExp engines in flags_comparison.md.
captureIndices
propertyCapture.Index
PropertyMatcher.start(int)
Methodconst re1 = /a+(?<Z>z)?/d; // indices are relative to start of the input string: const s1 = "xaaaz"; const m1 = re1.exec(s1); m1.indices[0][0] === 1; m1.indices[0][1] === 5; s1.slice(...m1.indices[0]) === "aaaz"; m1.indices[1][0] === 4; m1.indices[1][1] === 5; s1.slice(...m1.indices[1]) === "z"; m1.indices.groups["Z"][0] === 4; m1.indices.groups["Z"][1] === 5; s1.slice(...m1.indices.groups["Z"]) === "z"; // capture groups that are not matched return `undefined`: const m2 = re1.exec("xaaay"); m2.indices[1] === undefined; m2.indices.groups["Z"] === undefined;
The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:
Stage 1 Entrance CriteriaRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3