TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob
objects as if they were Python strings that learned how to do Natural Language Processing.
First, the import.
>>> from textblob import TextBlob
Let’s create our first TextBlob
.
>>> wiki = TextBlob("Python is a high-level, general-purpose programming language.")Part-of-speech Tagging¶
Part-of-speech tags can be accessed through the tags
property.
>>> wiki.tags [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]Sentiment Analysis¶
The sentiment
property returns a namedtuple of the form Sentiment(polarity, subjectivity)
. The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
>>> testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!") >>> testimonial.sentiment Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857) >>> testimonial.sentiment.polarity 0.39166666666666666Tokenization¶
You can break TextBlobs into words or sentences.
>>> zen = TextBlob( ... "Beautiful is better than ugly. " ... "Explicit is better than implicit. " ... "Simple is better than complex." ... ) >>> zen.words WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex']) >>> zen.sentences [Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]
Sentence
objects have the same properties and methods as TextBlobs.
>>> for sentence in zen.sentences: ... print(sentence.sentiment)
For more advanced tokenization, see the Advanced Usage guide.
Words Inflection and Lemmatization¶Each word in TextBlob.words
or Sentence.words
is a Word
object (a subclass of unicode
) with useful methods, e.g. for word inflection.
>>> sentence = TextBlob("Use 4 spaces per indentation level.") >>> sentence.words WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level']) >>> sentence.words[2].singularize() 'space' >>> sentence.words[-1].pluralize() 'levels'
Words can be lemmatized by calling the lemmatize
method.
>>> from textblob import Word >>> w = Word("octopi") >>> w.lemmatize() 'octopus' >>> w = Word("went") >>> w.lemmatize("v") # Pass in WordNet part of speech (verb) 'go'WordNet Integration¶
You can access the synsets for a Word
via the synsets
property or the get_synsets
method, optionally passing in a part of speech.
>>> from textblob import Word >>> from textblob.wordnet import VERB >>> word = Word("octopus") >>> word.synsets [Synset('octopus.n.01'), Synset('octopus.n.02')] >>> Word("hack").get_synsets(pos=VERB) [Synset('chop.v.05'), Synset('hack.v.02'), Synset('hack.v.03'), Synset('hack.v.04'), Synset('hack.v.05'), Synset('hack.v.06'), Synset('hack.v.07'), Synset('hack.v.08')]
You can access the definitions for each synset via the definitions
property or the define()
method, which can also take an optional part-of-speech argument.
>>> Word("octopus").definitions ['tentacles of octopus prepared as food', 'bottom-living cephalopod having a soft oval body with eight long tentacles']
You can also create synsets directly.
>>> from textblob.wordnet import Synset >>> octopus = Synset("octopus.n.02") >>> shrimp = Synset("shrimp.n.03") >>> octopus.path_similarity(shrimp) 0.1111111111111111
For more information on the WordNet API, see the NLTK documentation on the Wordnet Interface.
WordLists¶A WordList
is just a Python list with additional methods.
>>> animals = TextBlob("cat dog octopus") >>> animals.words WordList(['cat', 'dog', 'octopus']) >>> animals.words.pluralize() WordList(['cats', 'dogs', 'octopodes'])Spelling Correction¶
Use the correct()
method to attempt spelling correction.
>>> b = TextBlob("I havv goood speling!") >>> print(b.correct()) I have good spelling!
Word
objects have a spellcheck() Word.spellcheck()
method that returns a list of (word, confidence)
tuples with spelling suggestions.
>>> from textblob import Word >>> w = Word("falibility") >>> w.spellcheck() [('fallibility', 1.0)]
Spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector”[1] as implemented in the pattern library. It is about 70% accurate [2].
Get Word and Noun Phrase Frequencies¶There are two ways to get the frequency of a word or noun phrase in a TextBlob
.
The first is through the word_counts
dictionary.
>>> monty = TextBlob("We are no longer the Knights who say Ni. " ... "We are now the Knights who say Ekki ekki ekki PTANG.") >>> monty.word_counts['ekki'] 3
If you access the frequencies this way, the search will not be case sensitive, and words that are not found will have a frequency of 0.
The second way is to use the count()
method.
>>> monty.words.count('ekki') 3
You can specify whether or not the search should be case-sensitive (default is False
).
>>> monty.words.count('ekki', case_sensitive=True) 2
Each of these methods can also be used with noun phrases.
>>> wiki.noun_phrases.count('python') 1Parsing¶
Use the parse()
method to parse the text.
>>> b = TextBlob("And now for something completely different.") >>> print(b.parse()) And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O
By default, TextBlob uses pattern’s parser [3].
TextBlobs Are Like Python Strings!¶You can use Python’s substring syntax.
>>> zen[0:19] TextBlob("Beautiful is better")
You can use common string methods.
>>> zen.upper() TextBlob("BEAUTIFUL IS BETTER THAN UGLY. EXPLICIT IS BETTER THAN IMPLICIT. SIMPLE IS BETTER THAN COMPLEX.") >>> zen.find("Simple") 65
You can make comparisons between TextBlobs and strings.
>>> apple_blob = TextBlob("apples") >>> banana_blob = TextBlob("bananas") >>> apple_blob < banana_blob True >>> apple_blob == "apples" True
You can concatenate and interpolate TextBlobs and strings.
>>> apple_blob + " and " + banana_blob TextBlob("apples and bananas") >>> "{0} and {1}".format(apple_blob, banana_blob) 'apples and bananas'
n
-grams¶
The TextBlob.ngrams()
method returns a list of tuples of n
successive words.
>>> blob = TextBlob("Now is better than never.") >>> blob.ngrams(n=3) [WordList(['Now', 'is', 'better']), WordList(['is', 'better', 'than']), WordList(['better', 'than', 'never'])]Get Start and End Indices of Sentences¶
Use sentence.start
and sentence.end
to get the indices where a sentence starts and ends within a TextBlob
.
>>> for s in zen.sentences: ... print(s) ... print("---- Starts at index {}, Ends at index {}".format(s.start, s.end)) ... Beautiful is better than ugly. ---- Starts at index 0, Ends at index 30 Explicit is better than implicit. ---- Starts at index 31, Ends at index 64 Simple is better than complex. ---- Starts at index 65, Ends at index 95Next Steps¶
Want to build your own text classification system? Check out the Classifiers Tutorial.
Want to use a different POS tagger or noun phrase chunker implementation? Check out the Advanced Usage guide.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4