RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.geeksforgeeks.org/machine-learning/python-lemmatization-approaches-with-examples/ below:

Python - Lemmatization Approaches with Examples

Last Updated : 23 Jul, 2025

Lemmatization is the process of reducing words to their base or dictionary form (lemma). Unlike stemming which simply cut off word endings, it uses a full vocabulary and linguistic rules to ensure accurate word reduction. For example:

meeting → meet
was → be
mice → mouse

Lets explore several popular python libraries for performing lemmatization,

1. WordNet

WordNet is a large lexical database of the English language and one of the earliest methods for lemmatization in Python. It groups words into sets of synonyms (synsets) which are related to each other. The WordNet is part of the NLTK (Natural Language Toolkit) library and it is widely used for text preprocessing tasks.

For installation run the following command:

!pip install nltk

Lets see an example,

Python


 import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

word = "meeting"
lemma = lemmatizer.lemmatize(word, pos='v')
print(f"Lemmatized Word: {lemma}")

Output:

meet

2. WordNet with POS Tagging

By default, WordNet Lemmatizer assumes words to be nouns. For more accurate lemmatization, especially for verbs and adjectives, Part of Speech (POS) tagging is required. POS tagging tells the lemmatizer whether the word is a noun, verb or adjective. Lets see an example to understand better,

Python


 from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
sentence = "The dogs are running"
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)

lemmatized_words = [lemmatizer.lemmatize(
    word, pos='v' if tag.startswith('V') else 'n') for word, tag in tagged]
print(lemmatized_words)

Output:

['The', 'dog', 'be', 'run']

3. TextBlob

TextBlob is a simpler library built on top of NLTK and Pattern. It provides a convenient API to perform common NLP tasks like lemmatization. TextBlob’s lemmatization is easy to use and requires minimal setup.

For installation run the following command:

!pip install textblob

Lets see an example,

Python


 from textblob import Word

word = Word("running")
print(word.lemmatize("v"))

Output:

run

4. TextBlob with POS Tagging

Using POS tagging with TextBlob ensures that words are lemmatized accurately. By default, TextBlob treats every word as a noun, so for verbs and adjectives, POS tagging can significantly improve lemmatization accuracy. Lets see an example for this,

Python


 from textblob import TextBlob

sentence = "The dogs barking"
blob = TextBlob(sentence)

lemmatized_words = [word.lemmatize('v') if tag.startswith(
    'VB') else word for word, tag in blob.tags]
print(f"Lemmatized Sentence: {' '.join(lemmatized_words)}")

Output:

Lemmatized Sentence: The dogs bark

5. SpaCy

spaCy is one of the most powerful NLP libraries in Python, known for its speed and ease of use. It provides pre-trained models for tokenization, lemmatization, POS tagging and more. spaCy's lemmatization is highly accurate and works well with complex sentence structures.

For installation run the following command:

pip install spacy
python -m spacy download en_core_web_sm

Lets see an example,

Python


 import spacy
nlp = spacy.load('en_core_web_sm')

doc = nlp("The cats are sitting")
for token in doc:
    print(token.text, token.lemma_)

Output:

The the
cats cat
are be
sitting sit

6. Gensim

Gensim is widely used for topic modeling, document similarity and lemmatization tasks in large text corpora. Its lemmatization relies on the Pattern library and focuses on processing tokens like nouns, verbs, adjectives and adverbs. It is suitable for large-scale text processing.

Installation:

!pip install gensim nltk

Lets see an example,

Python


 import nltk
from nltk.stem import WordNetLemmatizer
from gensim.utils import simple_preprocess

nltk.download('wordnet')
nltk.download('omw-1.4')

lemmatizer = WordNetLemmatizer()

text = "The cats are running and the dogs were barking."

tokens = simple_preprocess(text)

lemmatized_tokens = [lemmatizer.lemmatize(word) for word in tokens]

print("Original Tokens:", tokens)
print("Lemmatized Tokens:", lemmatized_tokens)

Output:

Original Tokens: ['the', 'cats', 'are', 'running', 'and', 'the', 'dogs', 'were', 'barking']
Lemmatized Tokens: ['the', 'cat', 'are', 'runn', 'and', 'the', 'dog', 'were', 'bark']

With all these techniques we can easily do Lemmatization in Python and can make real world projects.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4