Showing content from https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Lexicographical_data/Notability below:
Wikidata:Lexicographical data/Notability - Wikidata
Lexicographical data on Wikidata, like conceptual data in Wikidata, is subject to some standards of notability. These standards are intended to align with Wikidata's notability criteria.
As lexemes do not contain sitelinks in the way that items do (sitelinks between Wiktionary pages are handled by the Cognate extension, after all, and not by Wikidata itself), this page outlines how Wikidata's second and third notability criteria are applicable to lexemes and their forms and senses.
A lexeme is generally clearly identified when a serious and publicly available source treats it distinctly from other lexemes that may be mentioned in that source, and that source is indicated on the lexeme somewhere, either through an external identifier statement, a 'described by' statement, or as a reference on another statement on that lexeme. For a dictionary or lexical database, this usually entails that it is dealt with within a single entry in that dictionary or database.
- The English lexeme brachiopod (L908300) has dedicated entries in six online dictionaries published by Oxford University Press, along with entries in four other English dictionaries and even an entry in the Estonian Language Institute's Sõnaveeb.
- The Bengali lexeme বাঘের ঘরে ঘোগের বাসা (L625095) is described in three dictionaries (one providing a gloss quote and the other two having descriptions that cannot be quoted for copyright reasons) and mentioned in three other proverb lists (one establishing a location in which the proverb is used).
- The French lexeme embrocher (L715177) has dedicated entries in each of the editions of the Dictionnaire de l'Académie française (Q2428961), along with other French dictionaries. The specific sense embrocher (L715177-S3) is sourced from one of the entries for this term in Bob (Q115774277), this being referred to as a source for the earliest attested date of that sense.
Unlike on Wiktionary, lexemes are not automatically excluded from creation if their meanings are merely the sum of their parts, as long as there are adequate sources on them to support their existence.
- The English lexeme apple juice (L1196257) (whose page on the English Wiktionary is currently a 'translation hub') has entries in the Oxford English Dictionary, Sõnaveeb, and the Greenland Language Secretariat's online dictionary.
- Languages that frequently form noun compounds without introducing spaces between their parts may have, in dictionaries and databases describing them, entries for sums-of-parts: consider the Danish jernbanestation (L643460), Estonian raudteejaam (L383129), German Eisenbahnstation (L834621), and Swedish järnvägsstation (L410972), all of which mean 'railway station' ('railway' + 'station', combining morphemes for 'iron', 'way/route', and 'station' in the same fashion), and all of which have multiple sources describing them.
The senses of a lexeme may be clearly identified in a similar manner to lexemes themselves.
The forms of a lexeme are not always as likely to be clearly identified quite so readily. (Though Sanskrit verbs have their Dhatu Ratnakar (Q111095523), not all languages have enumerated, non-machine-generated paradigms—instead references typically only supply a few forms for a word or refer the reader to a pattern in an index that may be applied to the word, in each case relying on the reader to fill in the remaining forms.) Notwithstanding the section 'Number of forms on a lexeme' at Wikidata:Lexicographical data/Documentation/Forms, such forms may generally be added without a need to separately source each one, although if sources do exist for individual forms then they will certainly be welcomed.
Serious and publicly available sources[edit]
In addition to adding references to other statements on a lexeme, serious and publicly available sources may be indicated in at least five different ways:
More on these properties may be found at Wikidata:Lexicographical data/Documentation/Lexeme statements (under 'Properties about lexeme provenance') and Wikidata:Lexicographical data/Documentation/Senses (under 'Properties about sense provenance').
It is generally preferred to use a source as a reference on other statements to the extent that is possible, rather than leaving it as a described by source (P1343) or described at URL (P973) statement on a lexeme, form, or sense.
A list of resources that might be consulted on lexemes in different languages may be found at Wikidata:Lexicographical data/Documentation/Resources.
There are circumstances when a lexeme may be retained simply due to improving the completeness of another lexeme, form, or sense.
The most common instances when lexemes may be introduced to fill a structural need are in faithfully completing chains of derivation, whether through derived from lexeme (P5191) or combines lexemes (P5238).
- The derivation link between the Bengali demonstrative এ (L476061) and its Sanskrit etymon एतद् (L1134198) is filled with the Magadhi Prakrit এদং (L1134399). Though the Bengali and Sanskrit languages have multiple sources that should be consulted for their lexemes, this Magadhi Prakrit lexeme is only described in a section of The Origin and Development of the Bengali Language (Q97256719), and while some might argue that this single passing mention might not be enough to justify its existence under criterion 2, its purpose as a link between related lexemes allows it to qualify under this criterion.
- The term for a transom light (Q17342) in several languages is ultimately derived from the German phrase was ist das? (L1332007), which was borrowed into French as vasistas (L1332008) (that is, as it was heard and not as it might be processed according to German grammar), and from there into other languages. In other circumstances the phrase itself might not be notable under criterion 2 (German dictionaries may not have separate entries for "was ist das?", after all), but the need to represent this link in the derivation chain makes it notable.
- The adverb meaning 'completely/entirely' in several languages is ultimately derived from the Arabic phrase بِالكُلّ (L1319752-F3) (a preposition followed by a definite noun) through Persian (which does not appear to have borrowed the preposition or the noun separately). It might be argued that the form itself is notable on its own (as the Arabic Ontology (Q63107058) has a separate entry for that form), but with regard to etymological need here the justification for keeping it is clearer.
Some parts of a lexeme in a particular language may be properly separated using combines lexemes (P5238) into different parts, but one of those parts is ill-described on its own. Additionally, a source might describe a particular part as only being used in other lexemes, without any specific semantic content indicated by that part.
- For a lexeme that contains the name of a person, its language may be justified in creating a lexeme specifically for that name if it is promptly linked to the larger lexeme. (The term for Bonelli's Eagle (Q234722) in French, for example, described in Larousse and the OQLF's GDT, could have three 'combines' statements for 'aigle', 'de', and 'Bonelli'.)
- A number of Korean lexemes ending in 하다 (L741231), such as 가세하다 (L994867), have bases (in this case, 가세/苛細 (L994866)) that resources from the National Institute of Korean Language describe as simply roots of the larger lexeme, without any indication of what these lexemes might mean on their own.
- Some word-forming suffixes in some languages (like Turkish ان/an (L1221196) or Finnish -nne (L1211962)) can have lots of effects that are not easily enumerable, relative to the generally more enumerable meanings of the bases to which they attach.
- Phonemes are not accepted as lexemes (see also here). They have to be stored as items.
- Letters are accepted as lexemes, although they should generally be added as nouns and treated as such, rather than using letter (Q9788) as the lexical category.
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4