RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/PKU-DAIR/Hetu below:

PKU-DAIR/Hetu: A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia.

This is the preview of Hetu 2.0, which is still under rapid development. Please raise an issue if you need any help.

We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Contribution Guide for more details.

If you are enterprise users and find Hetu is useful in your work, please let us know, and we are glad to add your company logo here.

We have proposed numerous innovative optimization techniques around the Hetu system and published several papers, covering a variety of different model workloads and hardware environments.

If you use Hetu in a scientific publication, we would appreciate citations to the following papers:

 @article{DBLP:journals/chinaf/MiaoXP22,
   author = {Miao, Xupeng and Nie, Xiaonan and Zhang, Hailin and Zhao, Tong and Cui, Bin},
   title = {Hetu:  A highly efficient automatic parallel distributed deep learning system},
   journal = {Sci. China Inf. Sci.},
   url = {http://engine.scichina.com/doi/10.1007/s11432-022-3581-9},
   doi = {10.1007/s11432-022-3581-9},
   year = {2022},
 }
 
 @article{miao2021het,
   title={HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework},
   author={Miao, Xupeng and Zhang, Hailin and Shi, Yining and Nie, Xiaonan and Yang, Zhi and Tao, Yangyu and Cui, Bin},
   journal = {Proc. {VLDB} Endow.},
   volume = {15},
   number = {2},
   pages = {312--320},
   year = {2022},
   publisher = {VLDB Endowment}
 }

 @article{ge2024enabling,
   title={Enable Parallelism Hot Switching for Efficient Training of Large Language Models},
   author={Ge, Hao and Fu, Fangcheng and Li, Haoyang and Wang, Xuanyu and Lin, Sheng and Wang, Yujie and Nie, Xiaonan and Zhang, Hailin and Miao, Xupeng and Cui, Bin},
   journal = {Proceedings of the 30th {ACM} Symposium on Operating Systems Principles},
   year = {2024},
   publisher = {{ACM}}
 }

- OpenMP (*)
- CMake >= 3.24 (*)
- gRPC 1.6.3 (*)
- CUDA >= 11.8 (*)
- CUDNN >= 8.2 (*)
- MPI >= 4.1 (*)
- NCCL >= 2.19 (*)
- Pybind11 >= 2.6.2 (*)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4