Natural language parser, for Latin-script languages, that produces nlcst.
This package exposes a parser that takes Latin-script natural language and produces a syntax tree.
If you want to handle natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-latin
, which wraps this project to also parse natural language at a higher-level (easier) abstraction.
Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.
For English and Dutch, you can instead use parse-english
and parse-dutch
.
You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.
This package is ESM only. In Node.js (version 16+), install with npm:
In Deno with esm.sh
:
import {ParseLatin} from 'https://esm.sh/parse-latin@7'
In browsers with esm.sh
:
<script type="module"> import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle' </script>
import {ParseLatin} from 'parse-latin' import {inspect} from 'unist-util-inspect' const tree = new ParseLatin().parse('A simple sentence.') console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:19, 0-18) └─0 ParagraphNode[1] (1:1-1:19, 0-18) └─0 SentenceNode[6] (1:1-1:19, 0-18) ├─0 WordNode[1] (1:1-1:2, 0-1) │ └─0 TextNode "A" (1:1-1:2, 0-1) ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2) ├─2 WordNode[1] (1:3-1:9, 2-8) │ └─0 TextNode "simple" (1:3-1:9, 2-8) ├─3 WhiteSpaceNode " " (1:9-1:10, 8-9) ├─4 WordNode[1] (1:10-1:18, 9-17) │ └─0 TextNode "sentence" (1:10-1:18, 9-17) └─5 PunctuationNode "." (1:18-1:19, 17-18)
This package exports the identifier ParseLatin
. There is no default export.
Create a new parser.
Turn natural language into a syntax tree.
value
(string
, optional) — value to parseTree (RootNode
).
👉 Note: The easiest way to see how
parse-latin
parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.
parse-latin
splits text into white space, punctuation, symbol, and word tokens:
Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.
non-profit
, she’s
, G.I.
, 11:00
, N/A
, &c
, nineteenth- and…
1.
, e.g.
, id.
.)
, ."
This package is fully typed with TypeScript. It exports no additional types.
Projects maintained by me are compatible with maintained versions of Node.js.
When I cut a new major release, I drop support for unmaintained versions of Node. This means I try to keep the current release line, parse-latin@^7
, compatible with Node.js 16.
This package is safe.
parse-english
— English (natural language) parserparse-dutch
— Dutch (natural language) parserYes please! See How to Contribute to Open Source.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4