The pattern.it module contains a fast part-of-speech tagger for Italian (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for Italian verb conjugation and noun singularization & pluralization.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.
Italian nouns and adjectives inflect according to gender. The gender()
function predicts the gender (MALE
, FEMALE
, PLURAL
) of a given noun with about 92% accuracy:
>>> from pattern.it import gender, MALE, FEMALE, PLURAL
>>> print gender('gatti')
(MALE, PLURAL)
The article()
function returns the article (INDEFINITE
or DEFINITE
) inflected by gender (e.g., il gatto → i gatti).
>>> from pattern.it import article, DEFINITE, MALE, PLURAL
>>> print article('gatti', DEFINITE, gender=(MALE, PLURAL))
i
Noun singularization & pluralization
For Italian nouns there is singularize()
and pluralize()
. The implementation is slightly less robust than the English version (accuracy 84% for singularization and 93% for pluralization).
>>> from pattern.it import singularize, pluralize
>>>
>>> print singularize('gatti')
>>> print pluralize('gatto')
gatto
gatti
For Italian verbs there is conjugate()
, lemma()
, lexeme()
and tenses()
. The lexicon for verb conjugation contains about 1,250 common Italian verbs, mined from Wiktionary. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 86%.
Italian verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the FUTURE
tense, the IMPERATIVE
, CONDITIONAL
and SUBJUNCTIVE
mood and the PERFECTIVE
aspect:
>>> from pattern.it import conjugate
>>> from pattern.it import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
>>>
>>> print conjugate('sono', INFINITIVE)
>>> print conjugate('sono', PRESENT, 1, SG, mood=SUBJUNCTIVE)
>>> print conjugate('sono', PAST, 3, SG)
>>> print conjugate('sono', PAST, 3, SG, aspect=PERFECTIVE)
essere
sia
era
fu
For PAST
tense + PERFECTIVE
aspect we can also use PRETERITE
(passato remoto) For PAST
tense + IMPERFECTIVE
aspect we can also use IMPERFECT
(imperfetto).
>>> from pattern.it import conjugate
>>> from pattern.it import IMPERFECT, PRETERITE
>>>
>>> print conjugate('sono', IMPERFECT, 3, SG)
>>> print conjugate('sono', PRETERITE, 3, SG)
era
fu
The conjugate()
function takes the following optional parameters:
Instead of optional parameters, a single short alias, or PARTICIPLE
or PAST+PARTICIPLE
can also be given. With no parameters, the infinitive form of the verb is returned.
Italian adjectives inflect with suffixes -o
→ -i
(masculine) and -a
→ -e
(feminine), with some exceptions (e.g., grande → i grandi felini). You can get the base form with the predicative()
function. A statistical approach is used with an accuracy of 88%.
>>> from pattern.it import attributive
>>> print predicative('grandi')
grande
For parsing there is parse(), parsetree()
and split(). The parse()
function annotates words in the given string with their part-of-speech tags (e.g., NN
for nouns and VB
for verbs). The parsetree()
function takes a string and returns a tree of nested objects (Text
→ Sentence
→ Chunk
→ Word
). The split()
function takes the output of parse()
and returns a Text
. See the pattern.en
documentation (here) how to manipulate Text
objects.
>>> from pattern.it import parse, split
>>>
>>> s = parse('Il gatto nero faceva le fusa.')
>>> for sentence in split(s):
>>> print sentence
Sentence('Il/DT/B-NP/O gatto/NN/I-NP/O nero/JJ/I-NP/O'
'faceva/VB/B-VP/O'
'le/DT/B-NP/O fusa/NN/I-NP/O ././O/O')
The parser is mined from Wiktionary. The accuracy is around 92%.
There's no sentiment()
function for Italian yet.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4