A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/clips/pattern/wiki/pattern-es below:

pattern es · clips/pattern Wiki · GitHub

The pattern.es module contains a fast part-of-speech tagger for Spanish (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for Spanish verb conjugation and noun singularization & pluralization.

It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.

The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.  

Noun singularization & pluralization

For Spanish nouns there is singularize() and pluralize(). The implementation is slightly less robust than the English version (accuracy 94% for singularization and 78% for pluralization).

>>> from pattern.es import singularize, pluralize
>>>  
>>> print singularize('gatos')
>>> print pluralize('gato')

gato
gatos 

For Spanish verbs there is conjugate(), lemma(), lexeme() and tenses(). The lexicon for verb conjugation contains about 600 common Spanish verbs, composed by Fred Jehle. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 84%. 

Spanish verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the FUTURE and CONDITIONAL tense, the IMPERATIVE and SUBJUNCTIVE mood and the PERFECTIVE aspect:

>>> from pattern.es import conjugate
>>> from pattern.es import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
>>>  
>>> print conjugate('soy', INFINITIVE)
>>> print conjugate('soy', PRESENT, 1, SG, mood=SUBJUNCTIVE)
>>> print conjugate('soy', PAST, 3, SG) 
>>> print conjugate('soy', PAST, 3, SG, aspect=PERFECTIVE) 

ser
sea
era 
fue   

For PAST tense + PERFECTIVE aspect we can also use PRETERITE. For PAST tense + IMPERFECTIVE aspect we can also use IMPERFECT:

>>> from pattern.es import conjugate
>>> from pattern.es import IMPERFECT, PRETERITE
>>>  
>>> print conjugate('soy', IMPERFECT, 3, SG)
>>> print conjugate('soy', PRETERITE, 3, SG)

era
fue   

 The conjugate() function takes the following optional parameters:

Tense Person Number Mood Aspect Alias Example INFINITVE None None None None "inf" ser PRESENT 1 SG INDICATIVE IMPERFECTIVE "1sg" yo __soy__ PRESENT 2 SG INDICATIVE IMPERFECTIVE "2sg" tú __eres__ PRESENT 3 SG INDICATIVE IMPERFECTIVE "3sg" el __es__ PRESENT 1 PL INDICATIVE IMPERFECTIVE "1pl" nosotros __somos__ PRESENT 2 PL INDICATIVE IMPERFECTIVE "2pl" vosotros __sois__ PRESENT 3 PL INDICATIVE IMPERFECTIVE "3pl" ellos __son__ PRESENT None None INDICATIVE PROGRESSIVE "part" siendo   PRESENT 2 SG IMPERATIVE IMPERFECTIVE "2sg!" PRESENT 2 PL IMPERATIVE IMPERFECTIVE "2pl!" sed   PRESENT 1 SG SUBJUNCTIVE IMPERFECTIVE "1sg?" yo __sea__ PRESENT 2 SG SUBJUNCTIVE IMPERFECTIVE "2sg?" tú __seas__ PRESENT 3 SG SUBJUNCTIVE IMPERFECTIVE "3sg?" el __sea__ PRESENT 1 PL SUBJUNCTIVE IMPERFECTIVE "1pl?" nosotros __seamos__ PRESENT 2 PL SUBJUNCTIVE IMPERFECTIVE "2pl?" vosotros __seáis__ PRESENT 3 PL SUBJUNCTIVE IMPERFECTIVE "3pl?" ellos __sean__   PAST 1 SG INDICATIVE IMPERFECTIVE "1sgp" yo __era__ PAST 2 SG INDICATIVE IMPERFECTIVE "2sgp" tú __eras__ PAST 3 SG INDICATIVE IMPERFECTIVE "3sgp" el __era__ PAST 1 PL INDICATIVE IMPERFECTIVE "1ppl" nosotros __éramos__ PAST 2 PL INDICATIVE IMPERFECTIVE "2ppl" vosotros __erais__ PAST 3 PL INDICATIVE IMPERFECTIVE "3ppl" ellos __eran__ PAST None None INDICATIVE PROGRESSIVE "ppart" sido   PAST 1 SG INDICATIVE PERFECTIVE "1sgp+" yo __fui__ PAST 2 SG INDICATIVE PERFECTIVE "2sgp+" tú __fuiste__ PAST 3 SG INDICATIVE PERFECTIVE "3sgp+" el __fue__ PAST 1 PL INDICATIVE PERFECTIVE "1ppl+" nosotros __fuimos__ PAST 2 PL INDICATIVE PERFECTIVE "2ppl+" vosotros __fuisteis__ PAST 3 PL INDICATIVE PERFECTIVE "3ppl+" ellos __fueron__   PAST 1 SG SUBJUNCTIVE IMPERFECTIVE "1sgp?" yo __fuera__ PAST 2 SG SUBJUNCTIVE IMPERFECTIVE "2sgp?" tú __fueras__ PAST 3 SG SUBJUNCTIVE IMPERFECTIVE "3sgp?" el __fuera__ PAST 1 PL SUBJUNCTIVE IMPERFECTIVE "1ppl?" nosotros __fuéramos__ PAST 2 PL SUBJUNCTIVE IMPERFECTIVE "2ppl?" vosotros __fuerais__ PAST 3 PL SUBJUNCTIVE IMPERFECTIVE "3ppl?" ellos __fueran__   FUTURE 1 SG INDICATIVE IMPERFECTIVE "1sgf" yo __seré__ FUTURE 2 SG INDICATIVE IMPERFECTIVE "2sgf" tú __serás__ FUTURE 3 SG INDICATIVE IMPERFECTIVE "3sgf" el __será__ FUTURE 1 PL INDICATIVE IMPERFECTIVE "1plf" nosotros __seremos__ FUTURE 2 PL INDICATIVE IMPERFECTIVE "2plf" vosotros __seréis__ FUTURE 3 PL INDICATIVE IMPERFECTIVE "3plf" ellos __serán__   CONDITIONAL 1 SG INDICATIVE IMPERFECTIVE "1sg->" yo __sería__ CONDITIONAL 2 SG INDICATIVE IMPERFECTIVE "2sg->" tú __serías__ CONDITIONAL 3 SG INDICATIVE IMPERFECTIVE "3sg->" el __sería__ CONDITIONAL 1 PL INDICATIVE IMPERFECTIVE "1pl->" nosotros __seríamos__ CONDITIONAL 2 PL INDICATIVE IMPERFECTIVE "2pl->" vosotros __seríais__ CONDITIONAL 3 PL INDICATIVE IMPERFECTIVE "3pl->" ellos __serían__

Instead of optional parameters, a single short alias, or PARTICIPLE or PAST+PARTICIPLE can also be given. With no parameters, the infinitive form of the verb is returned.

Reference: Jehle, F. (2012). Spanish Verb Forms. Retrieved from: http://users.ipfw.edu/jehle/verblist.htm.

Attributive & predicative adjectives 

Spanish adjectives inflect with an -o-a , -os, -as, or -es suffix (e.g., curioso → los gatos curiosos) depending on gender. You can get the base form with the predicative() function, or vice versa with attributive(). For predicative, a statistical approach is used with an accuracy of 93%. For attributive, you need to supply gender (MALE, FEMALE, NEUTRAL and/or PLURAL).

>>> from pattern.es import attributive, predicative
>>> from pattern.es import FEMALE, PLURAL 
>>>  
>>> print predicative('curiosos') 
>>> print attributive('curioso', gender=FEMALE)
>>> print attributive('curioso', gender=FEMALE+PLURAL)

curioso
curiosa 
curiosas  

For parsing there is parse(), parsetree() and split(). The parse() function annotates words in the given string with their part-of-speech tags (e.g., NN for nouns and VB for verbs). The parsetree() function takes a string and returns a tree of nested objects (Text → Sentence → Chunk → Word). The split() function takes the output of parse() and returns a Text. See the pattern.en documentation (here) how to manipulate Text objects. 

>>> from pattern.es import parse, split
>>>  
>>> s = parse('El gato negro se sienta en la estera.')
>>> for sentence in split(s):
>>>     print sentence

Sentence('El/DT/B-NP/O gato/NN/I-NP/O negro/JJ/I-NP/O'
         'se/PRP/B-NP/O sienta/VB/B-VP/O'
         'en/IN/B-PP/B-PNP la/DT/B-NP/I-PNP estera/NN/I-NP/I-PNP ././O/O')

The parser is trained on the Spanish portion of Wikicorpus  using 1.5M words from the tagged sections 10,000–15,000. The accuracy is around 92%. The original Parole tagset is mapped to Penn Treebank tagset. If you need to work with the original tags you can also use parse() with an optional parameter tagset="parole".

Reference: Reese, S., Boleda, G., Cuadros, M., Padró, L., Rigau, G (2010). 
Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus. Proceedings of LREC'10

There's no sentiment() function for Spanish yet.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4