Showing content from https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference below:
Inference Performance for Data Center Deep Learning
Llama3.1 405B 8,850 tokens/sec 72x GB200 NVIDIA GB200 NVL72 NVIDIA GB200 99% of FP16 ((GovReport + LongDataCollections + 65 Sample from LongBench)rougeL=21.6666, (Remaining samples of the dataset)exact_match=90.1335) TTFT/TPOT: 6000 ms/175 ms Subset of LongBench, LongDataCollections, Ruler, GovReport 1,080 tokens/sec 8x B200 SYS-A21GE-NBRT NVIDIA B200-SXM-180GB 99% of FP16 ((GovReport + LongDataCollections + 65 Sample from LongBench)rougeL=21.6666, (Remaining samples of the dataset)exact_match=90.1335) TTFT/TPOT: 6000 ms/175 ms Subset of LongBench, LongDataCollections, Ruler, GovReport 294 tokens/sec 8x H200 Cisco UCS C885A M8 NVIDIA H200-SXM-141GB 99% of FP16 ((GovReport + LongDataCollections + 65 Sample from LongBench)rougeL=21.6666, (Remaining samples of the dataset)exact_match=90.1335) TTFT/TPOT: 6000 ms/175 ms Subset of LongBench, LongDataCollections, Ruler, GovReport Llama2 70B Interactive 62,266 tokens/sec 8x B200 SYS-A21GE-NBRT NVIDIA B200-SXM-180GB 99.9% of FP32 (rouge1=44.4312, rouge2=22.0352, rougeL=28.6162) TTFT/TPOT: 450 ms/40 ms OpenOrca (max_seq_len=1024) 20,235 tokens/sec 8x H200 G893-SD1 NVIDIA H200-SXM-141GB 99.9% of FP32 (rouge1=44.4312, rouge2=22.0352, rougeL=28.6162) TTFT/TPOT: 450 ms/40 ms OpenOrca (max_seq_len=1024) Llama2 70B 98,443 tokens/sec 8x B200 NVIDIA DGX B200 NVIDIA B200-SXM-180GB 99.9% of FP32 (rouge1=44.4312, rouge2=22.0352, rougeL=28.6162) TTFT/TPOT: 2000 ms/200 ms OpenOrca (max_seq_len=1024) 33,072 tokens/sec 8x H200 NVIDIA H200 NVIDIA H200-SXM-141GB-CTS 99.9% of FP32 (rouge1=44.4312, rouge2=22.0352, rougeL=28.6162) TTFT/TPOT: 2000 ms/200 ms OpenOrca (max_seq_len=1024) Mixtral 8x7B 129,047 tokens/sec 8x B200 SYS-421GE-NBRT-LCC NVIDIA B200-SXM-180GB 99% of FP16 ((OpenOrca)rouge1=45.5989, (OpenOrca)rouge2=23.3526, (OpenOrca)rougeL=30.4608, (gsm8k)Accuracy=73.66, (mbxp)Accuracy=60.16) TTFT/TPOT: 2000 ms/200 ms OpenOrca (5k samples, max_seq_len=2048), GSM8K (5k samples of the train split, max_seq_len=2048), MBXP (5k samples, max_seq_len=2048) 61,802 tokens/sec 8x H200 NVIDIA H200 NVIDIA H200-SXM-141GB-CTS 99% of FP16 ((OpenOrca)rouge1=45.5989, (OpenOrca)rouge2=23.3526, (OpenOrca)rougeL=30.4608, (gsm8k)Accuracy=73.66, (mbxp)Accuracy=60.16) TTFT/TPOT: 2000 ms/200 ms OpenOrca (5k samples, max_seq_len=2048), GSM8K (5k samples of the train split, max_seq_len=2048), MBXP (5k samples, max_seq_len=2048) Stable Diffusion XL 29 samples/sec 8x B200 SYS-A21GE-NBRT NVIDIA B200-SXM-180GB FID range: [23.01085758, 23.95007626] and CLIP range: [31.68631873, 31.81331801] 20 s Subset of coco-2014 val 18 samples/sec 8x H200 NVIDIA H200 NVIDIA H200-SXM-141GB-CTS FID range: [23.01085758, 23.95007626] and CLIP range: [31.68631873, 31.81331801] 20 s Subset of coco-2014 val GPT-J 21,813 queries/sec 8x H200 Cisco UCS C885A M8 NVIDIA H200-SXM-141GB 99% of FP32 (72.86%) 20 s CNN Dailymail ResNet-50 676,219 queries/sec 8x H200 G893-SD1 NVIDIA H200-SXM-141GB 76.46% Top1 15 ms ImageNet (224x224) RetinaNet 14,589 queries/sec 8x H200 AS-4125GS-TNHR2-LCC NVIDIA H200-SXM-141GB 0.3755 mAP 100 ms OpenImages (800x800) DLRMv2 590,167 queries/sec 8x H200 HPE Cray XD670 with Cray ClusterStor NVIDIA H200-SXM-141GB 99% of FP32 (AUC=80.31%) 60 ms Synthetic Multihot Criteo Dataset
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4