A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct below:

Website Navigation


Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct ยท Hugging Face

ArcticSpeculator

Build a fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!

We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:

method ShareGPT HumanEval VLLM V1 Baseline 84.1 84.1 VLLM V1 Eagle 102.2 112.0 VLLM V1 Eagle3 77.7 85.3 VLLM V0 MLP-Speculator (IBM) 77.9 66.7 ArcticSpeculator 172.4 203.7

For more details about ArcticSpeculator and how to use it:

We also release ArcticSpeculator checkpoints we trained with ArcticTraining to run with ArcticInference:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4