We deliver solutions for the AI eraâcombining symbolic computation, data-driven insights and deep technical expertise
The Wolfram Language provides powerful knowledge-based tools for normalizing text in preparation for text analysis, visualization, etc.
Character-Level NormalizationToLowerCase, ToUpperCase — convert all characters to lower, uppercase
IgnoreCase — option to ignore case of letters
RemoveDiacritics — remove diacritics such as accents, umlauts, etc.
CharacterNormalize — reduce or decompose characters to normal forms (e.g. ¼ 1⁄4, ï ī )
Transliterate — transliterate to ASCII or other writing scripts
PrintableASCIIQ — test if a string contains only printable ASCII characters
CharacterEncoding — specify the character encoding to assume
Structural String NormalizationStringSplit — split a string at newlines or other delimiters
StringDelete — delete substrings or patterns
StringReplace — replace substrings or patterns
StringDrop ▪ StringTake ▪ StringCases
StringTrim — trim whitespace or other patterns from strings
StringPadLeft, StringPadRight — pad to fixed width
StringExtract — extract specified parts of strings
Text-Level NormalizationTextSentences — extract a list of sentences
TextWords — extract a list of words
DeleteStopwords — delete standard stopwords ("the", "and", etc.)
Content ExtractionTextCases — extract symbolically specified elements
Containing ▪ Alternatives ▪ Entity
Morphological & Linguistic NormalizationWordStem — reduce a word to its stem
DictionaryLookup — look up a word in dictionaries
Interpreter — convert to many forms from natural language
SpellingCorrectionList — list of spelling suggestions for misspelled words
DictionaryWordQ — test if a word is a correctly spelled dictionary word
Language TranslationLanguageIdentify — identify what language a text is in
WordTranslation — give translations for a word
TextTranslation — translate text using an integrated external service
Word List NormalizationAlphabeticSort — sort strings into alphabetic order
WordCounts ▪ LetterCounts ▪ CharacterCounts
WordFrequency — frequency of words or -grams in text
WordFrequencyData — data on overall word frequencies in typical text
LLM-Based Normalization »LLMResourceFunction — apply operations from the Wolfram Prompt Repository
LLMExampleFunction ▪ LLMFunction ▪ LLMTool ▪ ...
Normalization of External DataImport — import data from files or the web
"Text", "PDF", "TeX", "HTML" — pick out plain text, table data, etc.
ImportString — convert a string with a particular external format
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4