๐ค [LongWriter Dataset] โข ๐ป [Github Repo] โข ๐ [LongWriter Paper]
LongWriter-6k dataset contains 6,000 SFT data with ultra-long output ranging from 2k-32k words in length (both English and Chinese). The data can support training LLMs to extend their maximum output window size to 10,000+ words.
All ModelsWe open-sourced the following list of models trained on LongWriter-6k:
Model Huggingface Repo Description LongWriter-glm4-9b ๐ค Huggingface Repo GLM-4-9B with an extended 10k+ word output context window LongWriter-llama3.1-8b ๐ค Huggingface Repo Llama-3.1-8B with an extended 10k+ word output context window CitationIf you find our work useful, please consider citing LongWriter:
@article{bai2024longwriter,
title={LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs},
author={Yushi Bai and Jiajie Zhang and Xin Lv and Linzhi Zheng and Siqi Zhu and Lei Hou and Yuxiao Dong and Jie Tang and Juanzi Li},
journal={arXiv preprint arXiv:2408.07055},
year={2024}
}
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4