TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: Token Authorization, Model API control
TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.
Requires python >= 3.8
curl http://127.0.0.1:8080/predictions/bert -T input.txt🚀 Quick start with TorchServe
# Install dependencies # cuda is optional python ./ts_scripts/install_dependencies.py --cuda=cu121 # Latest release pip install torchserve torch-model-archiver torch-workflow-archiver # Nightly build pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly🚀 Quick start with TorchServe (conda)
# Install dependencies # cuda is optional python ./ts_scripts/install_dependencies.py --cuda=cu121 # Latest release conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver # Nightly build conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver🐳 Quick Start with Docker
# Latest release docker pull pytorch/torchserve # Nightly build docker pull pytorch/torchserve-nightly
Refer to torchserve docker for details.
🤖 Quick Start LLM Deployment# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login` python -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth # Try it out curl -X POST -d '{"model":"meta-llama/Llama-3.2-3B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
# Make sure to install torchserve with python venv as described above and login with `huggingface-cli login` # pip install -U --use-deprecated=legacy-resolver -r requirements/trt_llm.txt python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --engine trt_llm --disable_token_auth # Try it out curl -X POST -d '{"prompt":"count from 1 to 9 in french ", "max_tokens": 100}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"🚢 Quick Start LLM Deployment with Docker
#export token=<HUGGINGFACE_HUB_TOKEN> docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth # Try it out curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
Refer to LLM deployment for details and other methods.
torch.compile
For more examples
🛡️ TorchServe Security PolicyWe welcome all contributions!
To learn more about how to contribute, see the contributor guide here.
Made with contrib.rocks.
This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository here.
TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4