A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Introduction πCLIcK (Cultural and Linguistic Intelligence in Korean) is a comprehensive dataset designed to evaluate cultural and linguistic intelligence in the context of Korean language models. In an era where diverse language models are continually emerging, there is a pressing need for robust evaluation datasets, especially for non-English languages like Korean. CLIcK fills this gap by providing a rich, well-categorized dataset focusing on both cultural and linguistic aspects, enabling a nuanced assessment of Korean language models.
News π°The CLIcK benchmark comprises two broad categories: Culture and Language, which are further divided into 11 fine-grained subcategories.
Categories πLanguage π£οΈ
Culture π
CLIcK was developed using two human-centric approaches:
The dataset is organized as follows, with each subcategory containing relevant JSON files:
π¦CLIcK
ββ Dataset
ββ Culture
β ββ [Each cultural subcategory with associated JSON files]
ββ Language
ββ [Each language subcategory with associated JSON files]
Exam Code Descriptions π
The CLIcK dataset is available on the Hugging Face Hub: CLIcK Dataset
Citation πIf you use CLIcK in your research, please cite our paper:
@misc{kim2024click,
title={CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean},
author={Eunsu Kim and Juyoung Suk and Philhoon Oh and Haneul Yoo and James Thorne and Alice Oh},
year={2024},
eprint={2403.06412},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Contact π§
For any questions or inquiries, please contact kes0317@kaist.ac.kr.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4