Tasks ----- We propose several tasks for Arabic Semantic Labeling. The tasks will span both the WSD and Semantic Role labeling processes for this evaluation. Both sets of tasks will be evaluated on data derived from the same data set, the test set. We propose 3 subtasks for WSD all of which will only have test data for evaluation and trial data for formatting purposes. This will be taken from the Arabic Treebank 3v2 text data, roughly 3000 words long: 1. The first task is to discover different senses in the data for nouns and verbs without associating labels with those senses. Therefore it is a sense discrimination task. In this task the participants will be required to identify that the different number of senses for nouns and verbs without associating labels with those identified senses. The assumption is that word is one of these senses identified. These senses will be derived from the Arabic WordNet, which correspond to English WN 2.0. There will be two levels of granularity, coarse and fine grain. 2. The second task is to annotate all nouns and verbs in the data with Arabic WordNet senses (provided with the test data, and also accessible via the web at http://www.globalwordnet.org/AWN All verbs and nouns in the data will need to be annotated with their sense indices and/or offsets from Arabic WordNet 3. The third task is to annotate all nouns and verbs in the data with English wordnet senses a. In this task, the participants will be required to link the Arabic nouns and verbs with their corresponding sense(s) in the English WordNet 2.0 b. An English translation corpus will be provided along with the trial/test data c. A bilingual word list will also be provided. We propose 2 subtasks for Semantic Role Labeling (SRL). These subtasks will have trial, training and test data available for it: 4. Identifying Arguments in a sentence In this task, the participants are required to identify all the constituents in a constituency tree that should be annotated with argument roles related to some predetermined verbs 5. Automatic annotations for all arguments In this task, the participants are required to identify and label all the constituents in a constituency tree that should be annotated with both numbered argument roles and ARGM roles related to some predetermined verbs Data ---- The data will be Arabic Treebank 3 v.2 data which is newswire in Modern Standard Arabic. The data will be presented in ascii encoding, with the Buckwalter transliteration scheme. The data will be unvowelised and tokenized according to the Arabic Treebank clitic tokenization scheme. We will provide code for conversion of encoding from UTF-8 and CP1256 to the Buckwalter transliteration scheme. Moreover, we will provide code for the tokenization, POS tagging and Base Phrase chunking of the Arabic text, a package can be downloaded from http://www.cs.columbia.edu/~mdiab/ASVMTools.tar.gz. We will only opt for 100 most frequent verbs in this set to draw training, trial (for the semantic role labeling tasks) and test data for the semantic role labeling and WSD tasks) The data is syntactically and morphologically manually annotated. The syntactic trees are constituency trees. A preliminary version of the Arabic WordNet will be available Evaluation metric ----------------- SRL: Conlleval metrics of precision recall and f measure WSD: Scorer 2.0 metrics of precision, recall and f-measure on both coarse and fine grained sense distinctions. Dates ----- Nov 20th Selecting the data Feb 5th Delivering trial data March 1st Delivering the training and test data April 10th Competition deadline People ------ Mona Diab: Columbia University Christiane Fellbaum: Princeton University Mohamed Maamouri: LDC, University of Pennsylvania Martha Palmer: University of Colorado, Boulder
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4