A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://blog.stevenlevithan.com/archives/npcg-javascript below:

Non-Participating Groups: A Cross-Browser Mess

Cross-browser issues surrounding the handling of regular expression non-participating capturing groups (which I'll call NPCGs) present several challenges. The standard sucks to begin with, and the three biggest browsers (IE, Firefox, Safari) each disrespect the rules in their own unique ways.

First, I should explain what NPCGs are, as it seems that even some experienced regex users aren't fully aware of or understand the concept. Assuming you're already familiar with the idea of capturing and non-capturing parentheses (see this page if you need a refresher), note that NPCGs are different from groups which capture a zero-length value (i.e., an empty string). This is probably easiest to explain by showing some examples…

The following regexes all potentially contain NPCGs (depending on the data they are run over), because the capturing groups are not required to participate:

On the other hand, these will never contain an NPCG, because although they are allowed to match a zero-length value, the capturing groups are required to participate:

So, what's the difference between an NPCG and a group which captures an empty string? I guess that's up to the regex library, but typically, backreferences to NPCGs are assigned a special null or undefined value.

Following are the ECMA-262v3 rules (paraphrased) for how NPCGs should be handled in JavaScript:

References: ECMA-262v3 sectons 15.5.4.11, 15.5.4.14, 15.10.2.1, 15.10.2.3, 15.10.2.8, 15.10.2.9.

Unfortunately, actual browser handling of NPCGs is all over the place, resulting in numerous cross-browser differences which can easily result in subtle (or not so subtle) bugs in your code if you don't know what you're doing. E.g., Firefox incorrectly uses an empty string with the replace() and split() methods, but correctly uses undefined with the exec() method. Conversely, IE correctly uses undefined with the replace() method, incorrectly uses an empty string with the exec() method, and incorrectly returns neither with the split() method since it doesn't splice backreferences into the resulting array. As for the handling of backreferences to non-participating groups within regexes (e.g., /(x)?\1y/.test("y")), Safari uses the more sensible, non-ECMA-compliant approach (returning false for the previous bit of code), while IE, Firefox, and Opera follow the standard. (If you use /(x?)\1y/.test("y") instead, all four browsers will correctly return true.)

Several times I've seen people encounter these differences and diagnose them incorrectly, not having understood the root cause. A recent instance is what prompted this writeup.

Here are cross-browser results from each of the regex and regex-using methods when NPCGs have an impact on the outcome:

Code ECMA-262v3 IE 5.5 – 7 Firefox 2.0.0.6 Opera 9.23 Safari 3.0.3 /(x)?\1y/.test("y") true true true true false /(x)?\1y/.exec("y") ["y", undefined] ["y", ""] ["y", undefined] ["y", undefined] null /(x)?y/.exec("y") ["y", undefined] ["y", ""] ["y", undefined] ["y", undefined] ["y", undefined] "y".match(/(x)?\1y/) ["y", undefined] ["y", ""] ["y", undefined] ["y", undefined] null "y".match(/(x)?y/) ["y", undefined] ["y", ""] ["y", undefined] ["y", undefined] ["y", undefined] "y".match(/(x)?\1y/g) ["y"] ["y"] ["y"] ["y"] null "y".split(/(x)?\1y/) ["", undefined, ""] [ ] ["", "", ""] ["", undefined, ""] ["y"] "y".split(/(x)?y/) ["", undefined, ""] [ ] ["", "", ""] ["", undefined, ""] ["", ""] "y".search(/(x)?\1y/) 0 0 0 0 -1 "y".replace(/(x)?\1y/, "z") "z" "z" "z" "z" "y" "y".replace(/(x)?y/, "$1") "" "" "" "" "" "y".replace(/(x)?\1y/,
    function($0, $1){
        return String($1);
    })
"undefined" "undefined" "" "undefined" "y" "y".replace(/(x)?y/,
    function($0, $1){
        return String($1);
    })
"undefined" "undefined" "" "undefined" "" "y".replace(/(x)?y/,
    function($0, $1){
        return $1;
    })
"undefined" "" "" "undefined" ""

(Run the tests in your browser.)

The workaround for this mess is to avoid creating any potential for non-participating capturing groups, unless you know exactly what you're doing. Although that shouldn't be necessary, NPCGs are usually easy to avoid anyway. See the examples near the top of this post.

Edit (2007-08-16): I've updated this post with data from the newest versions of the listed browsers. The original data contained a few false negatives for Opera and Safari which resulted from a faulty library used to generate the results.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4