A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/intel-analytics/ipex-llm below:

intel/ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

💫 Intel® LLM Library for PyTorch*

< English | 中文 >

IPEX-LLM is an LLM acceleration library for Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max), NPU and CPU 1.

Note

More updates

See demos of running local LLMs on Intel Core Ultra iGPU, Intel Core Ultra NPU, single-card Arc GPU, or multi-card Arc GPUs using ipex-llm below.

See the Token Generation Speed on Intel Core Ultra and Intel Arc GPU below1 (and refer to [2][3][4] for more details).

You may follow the Benchmarking Guide to run ipex-llm performance benchmark yourself.

Please see the Perplexity result below (tested on Wikitext dataset using the script here).

Perplexity sym_int4 q4_k fp6 fp8_e5m2 fp8_e4m3 fp16 Llama-2-7B-chat-hf 6.364 6.218 6.092 6.180 6.098 6.096 Mistral-7B-Instruct-v0.2 5.365 5.320 5.270 5.273 5.246 5.244 Baichuan2-7B-chat 6.734 6.727 6.527 6.539 6.488 6.508 Qwen1.5-7B-chat 8.865 8.816 8.557 8.846 8.530 8.607 Llama-3.1-8B-Instruct 6.705 6.566 6.338 6.383 6.325 6.267 gemma-2-9b-it 7.541 7.412 7.269 7.380 7.268 7.270 Baichuan2-13B-Chat 6.313 6.160 6.070 6.145 6.086 6.031 Llama-2-13b-chat-hf 5.449 5.422 5.341 5.384 5.332 5.329 Qwen1.5-14B-Chat 7.529 7.520 7.367 7.504 7.297 7.334

Over 70 models have been optimized/verified on ipex-llm, including LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM and more; see the list below.

Model CPU Example GPU Example NPU Example LLaMA link1, link2 link LLaMA 2 link1, link2 link Python link, C++ link LLaMA 3 link link Python link, C++ link LLaMA 3.1 link link LLaMA 3.2 link Python link, C++ link LLaMA 3.2-Vision link ChatGLM link ChatGLM2 link link ChatGLM3 link link GLM-4 link link GLM-4V link link GLM-Edge link Python link GLM-Edge-V link Mistral link link Mixtral link link Falcon link link MPT link link Dolly-v1 link link Dolly-v2 link link Replit Code link link RedPajama link1, link2 Phoenix link1, link2 StarCoder link1, link2 link Baichuan link link Baichuan2 link link Python link InternLM link link InternVL2 link Qwen link link Qwen1.5 link link Qwen2 link link Python link, C++ link Qwen2.5 link Python link, C++ link Qwen-VL link link Qwen2-VL link Qwen2-Audio link Aquila link link Aquila2 link link MOSS link Whisper link link Phi-1_5 link link Flan-t5 link link LLaVA link link CodeLlama link link Skywork link InternLM-XComposer link WizardCoder-Python link CodeShell link Fuyu link Distil-Whisper link link Yi link link BlueLM link link Mamba link link SOLAR link link Phixtral link link InternLM2 link link RWKV4 link RWKV5 link Bark link link SpeechT5 link DeepSeek-MoE link Ziya-Coding-34B-v1.0 link Phi-2 link link Phi-3 link link Phi-3-vision link link Yuan2 link link Gemma link link Gemma2 link DeciLM-7B link link Deepseek link link StableLM link link CodeGemma link link Command-R/cohere link link CodeGeeX2 link link MiniCPM link link Python link, C++ link MiniCPM3 link MiniCPM-V link MiniCPM-V-2 link link MiniCPM-Llama3-V-2_5 link Python link MiniCPM-V-2_6 link link Python link MiniCPM-o-2_6 link Janus-Pro link Moonlight link StableDiffusion link Bce-Embedding-Base-V1 Python link Speech_Paraformer-Large Python link
  1. Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. 2


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4