Baseline 2024
Newly available
The Intl.Segmenter
object enables locale-sensitive text segmentation, enabling you to get meaningful items (graphemes, words or sentences) from a string.
const segmenterFr = new Intl.Segmenter("fr", { granularity: "word" });
const string1 = "Que ma joie demeure";
const iterator1 = segmenterFr.segment(string1)[Symbol.iterator]();
console.log(iterator1.next().value.segment);
// Expected output: 'Que'
console.log(iterator1.next().value.segment);
// Expected output: ' '
Constructor
Intl.Segmenter()
Creates a new Intl.Segmenter
object.
Intl.Segmenter.supportedLocalesOf()
Returns an array containing those of the provided locales that are supported without having to fall back to the runtime's default locale.
These properties are defined on Intl.Segmenter.prototype
and shared by all Intl.Segmenter
instances.
Intl.Segmenter.prototype.constructor
The constructor function that created the instance object. For Intl.Segmenter
instances, the initial value is the Intl.Segmenter
constructor.
Intl.Segmenter.prototype[Symbol.toStringTag]
The initial value of the [Symbol.toStringTag]
property is the string "Intl.Segmenter"
. This property is used in Object.prototype.toString()
.
Intl.Segmenter.prototype.resolvedOptions()
Returns a new object with properties reflecting the locale and granularity options computed during initialization of this Intl.Segmenter
object.
Intl.Segmenter.prototype.segment()
Returns a new iterable Segments
instance representing the segments of a string according to the locale and granularity of this Intl.Segmenter
instance.
If we were to use String.prototype.split(" ")
to segment a text in words, we would not get the correct result if the locale of the text does not use whitespaces between words (which is the case for Japanese, Chinese, Thai, Lao, Khmer, Myanmar, etc.).
const str = "å¾è¼©ã¯ç«ã§ãããååã¯ãã¬ãã";
console.table(str.split(" "));
// ['å¾è¼©ã¯ç«ã§ãããååã¯ãã¬ãã']
// The two sentences are not correctly segmented.
const str = "å¾è¼©ã¯ç«ã§ãããååã¯ãã¬ãã";
const segmenterJa = new Intl.Segmenter("ja-JP", { granularity: "word" });
const segments = segmenterJa.segment(str);
console.table(Array.from(segments));
// [{segment: 'å¾è¼©', index: 0, input: 'å¾è¼©ã¯ç«ã§ãããååã¯ãã¬ãã', isWordLike: true},
// etc.
// ]
Specifications Browser compatibility See also
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4