A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/wooorm/parse-latin below:

wooorm/parse-latin: Latin-script (natural language) parser

Natural language parser, for Latin-script languages, that produces nlcst.

This package exposes a parser that takes Latin-script natural language and produces a syntax tree.

If you want to handle natural language as syntax trees manually, use this.

Alternatively, you can use the retext plugin retext-latin, which wraps this project to also parse natural language at a higher-level (easier) abstraction.

Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.

For English and Dutch, you can instead use parse-english and parse-dutch.

You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.

This package is ESM only. In Node.js (version 16+), install with npm:

In Deno with esm.sh:

import {ParseLatin} from 'https://esm.sh/parse-latin@7'

In browsers with esm.sh:

<script type="module">
  import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle'
</script>
import {ParseLatin} from 'parse-latin'
import {inspect} from 'unist-util-inspect'

const tree = new ParseLatin().parse('A simple sentence.')

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
    └─0 SentenceNode[6] (1:1-1:19, 0-18)
        ├─0 WordNode[1] (1:1-1:2, 0-1)
        │   └─0 TextNode "A" (1:1-1:2, 0-1)
        ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
        ├─2 WordNode[1] (1:3-1:9, 2-8)
        │   └─0 TextNode "simple" (1:3-1:9, 2-8)
        ├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
        ├─4 WordNode[1] (1:10-1:18, 9-17)
        │   └─0 TextNode "sentence" (1:10-1:18, 9-17)
        └─5 PunctuationNode "." (1:18-1:19, 17-18)

This package exports the identifier ParseLatin. There is no default export.

Create a new parser.

Turn natural language into a syntax tree.

Tree (RootNode).

👉 Note: The easiest way to see how parse-latin parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.

parse-latin splits text into white space, punctuation, symbol, and word tokens:

Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.

This package is fully typed with TypeScript. It exports no additional types.

Projects maintained by me are compatible with maintained versions of Node.js.

When I cut a new major release, I drop support for unmaintained versions of Node. This means I try to keep the current release line, parse-latin@^7, compatible with Node.js 16.

This package is safe.

Yes please! See How to Contribute to Open Source.

MIT © Titus Wormer


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4