Our current test found that the performance of Kimi k2 under TP16 is very poor, in the input and output 3500/1500 scenarios, to meet the SLO for TTFT < 5s and TPOT < 50ms single card total throughput can only reach 36 token/s, so determine the plan aims to quickly improve the performance of Kimi k2 on H20 hardware, fix the bugs in the process, and give the best practices.
RoadmapHanHan009527, zhyncs, YangQun1, hzh0425, artetaout and 9 more
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4