LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.
Installationgit clone -b support-dsv3 https://github.com/InternLM/lmdeploy.git cd lmdeploy pip install -e .Offline Inference Pipeline
from lmdeploy import pipeline, PytorchEngineConfig if __name__ == "__main__": pipe = pipeline("deepseek-ai/DeepSeek-V3-FP8", backend_config=PytorchEngineConfig(tp=8)) messages_list = [ [{"role": "user", "content": "Who are you?"}], [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V3 adopts innovative architectures to guarantee economical training and efficient inference."}], [{"role": "user", "content": "Write a piece of quicksort code in C++."}], ] output = pipe(messages_list) print(output)Online Serving
# run lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch
To access the service, you can utilize the official OpenAI Python package pip install openai
. Below is an example demonstrating how to use the entrypoint v1/chat/completions
from openai import OpenAI client = OpenAI( api_key='YOUR_API_KEY', base_url="http://0.0.0.0:23333/v1" ) model_name = client.models.list().data[0].id response = client.chat.completions.create( model=model_name, messages=[ {"role": "user", "content": "Write a piece of quicksort code in C++."} ], temperature=0.8, top_p=0.8 ) print(response)
For more information, please refer to the following link: https://github.com/InternLM/lmdeploy/tree/support-dsv3
Suggest a potential alternative/fixNo response
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4