I chatted with @wycats and @gibson042 with regard to sequence properties after the TC39 meeting this week. CC @mathiasbynens, @macchiati, @markusicu.
One intuition that we had when thinking about sequence properties was that users might like to think of sequence properties as describing grapheme clusters. For example, a sequence property like RGI_Emoji_ZWJ_Sequence would describe a single emoji grapheme. This also leads intuitively to the negation of sequence properties: it would match any grapheme that is not described by the sequence property.
However, my understanding is that this is not the mental model used in the Unicode proposal. That mental model is that the sequence properties may or may not describe grapheme clusters, and by its nature, the negation is meaningless.
One aspect of sequence properties as proposed which I find confusing is that it seems the sequence properties are not necessarily "greedy". For example, if you had Emoji-ZWJ-Emoji, would the sequence property be just as happy matching just the first Emoji as it would matching the whole grapheme? I find that behavior nontrivial to rationalize about. If this is true, I wonder if you've considered making a greedy and non-greedy mode?
Another idea that was brought up was to add a "grapheme mode" to regular expressions, similar to the "unicode mode" that operates on code points rather than code units. In this new mode, sequence properties would behave basically the same as code point properties, including the ability to negate them. That would be out of scope for this proposal, but it's something to keep in mind to make sure that the design of this proposal would be compatible with a possible future grapheme mode.
mathiasbynens and tonton-pixel
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4