A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://reference.wolfram.com/language/ref/netencoder/Tokens.html below:

Tokens—Wolfram Documentation

WOLFRAM Consulting & Solutions

We deliver solutions for the AI era—combining symbolic computation, data-driven insights and deep technical expertise

WolframConsulting.com

NetEncoder["Tokens"]

represents an encoder that converts the words in a string to a sequence of integer codes using a standard English vocabulary.

NetEncoder[{"Tokens","language"}]

represents an encoder that uses a standard vocabulary for the given language.

NetEncoder[{"Tokens",{token1,token2,}}]

represents an encoder that uses a specified list of tokens as the vocabulary.

NetEncoder[{"Tokens",,"param"value}]

represents an encoder in which additional parameters have been specified.

Details Examplesopen all close all Basic Examples  (1)

Create a token encoder for English text:

Encode an English sentence:

Out-of-vocabulary words are encoded as the maximum code:

By default, words are detected using a simple regular expression:

The list of words can be explicitly passed using TextElement:

Scope  (6)

Use the default token encoder to encode a sentence:

Give a specific list of tokens:

Give a specific list of tokens, including a split pattern:

Specify that the sequence should be padded or trimmed to be 4 elements long:

Use a built-in dictionary for a specific language:

Use a custom tokenization with TextElement:

Use the output of TextStructure to compute a list of token indices:

A tree structure gets flattened:

Parameters  (3) "IgnoreCase"  (1)

An encoder with "IgnoreCase"->True treats tokens that differ only by the case of their constituent characters as equivalent:

An encoder with "IgnoreCase"->False does not do this:

"SplitPattern"  (2)

Create an encoder that isolates digit characters, using "SplitPattern":

The encoder outputs one token for each digit character:

It is different from the default behavior, which gathers all consecutive digit characters together:

Create an encoder with "SplitPattern"->None and two tokens:

The encoder now expects a list of tokens as input:

The encoder still maps across a batch of examples:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4