Triton is a machine learning inference server for easy and highly optimized deployment of models trained in almost any major framework. This backend specifically facilitates use of tree models in Triton (including models trained with XGBoost, LightGBM, Scikit-Learn, and cuML).
If you want to deploy a tree-based model for optimized real-time or batched inference in production, the FIL backend for Triton will allow you to do just that.
If you aren't sure where to start with this documentation, consider one of the following paths:
I currently use XGBoost/LightGBM or other tree models and am trying to assess if Triton is the right solution for production deployment of my models
I am familiar with Triton, but I am using it to deploy an XGBoost/LightGBM model for the first time.
I am familiar with Triton and the FIL backend, but I am using it to deploy a Scikit-Learn or cuML tree model for the first time
I am a data scientist familiar with tree model training, and I am trying to understand how Triton might be used with my models.
I have never worked with tree models before.
I don't like reading docs.
Ctrl-F
for keywords on the FAQ page.model_repository/
├─ example/
│ ├─ 1/
│ │ ├─ model.json
│ ├─ config.pbtxt
$NUM_FEATURES
with the number of input features, $MODEL_TYPE
with xgboost
, xgboost_json
, lightgbm
or treelite_checkpoint
, and $IS_A_CLASSIFIER
with true
or false
depending on whether this is a classifier or regressor.backend: "fil"
max_batch_size: 32768
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ $NUM_FEATURES ]
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ]
}
]
instance_group [{ kind: KIND_AUTO }]
parameters [
{
key: "model_type"
value: { string_value: "$MODEL_TYPE" }
},
{
key: "output_class"
value: { string_value: "$IS_A_CLASSIFIER" }
}
]
dynamic_batching {}
docker run -p 8000:8000 -p 8001:8001 --gpus all \
-v ${PWD}/model_repository:/models \
nvcr.io/nvidia/tritonserver:23.09-py3 \
tritonserver --model-repository=/models
The Triton server will now be serving your model over both HTTP (port 8000) and GRPC (port 8001) using NVIDIA GPUs if they are available or the CPU if they are not. For information on how to submit inference requests, how to deploy other tree model types, or advanced configuration options, check out the FAQ notebook.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4