A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://inference.readthedocs.io/en/latest/models/custom.html below:

Custom Models — Xinference

Custom Models#

Xinference provides a flexible and comprehensive way to integrate, manage, and utilize custom models.

Directly launch an existing model#

Since v0.14.0, you can directly launch an existing model by passing model_path to the launch interface without downloading it. This way requires that the model’s model_family is among the built-in supported models, and eliminates the hassle of registering the model.

For example:

xinference launch --model_path <model_file_path> --model-engine <engine> -n qwen1.5-chat
curl -X 'POST' \
  'http://127.0.0.1:9997/v1/models' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model_engine": "<engine>",
  "model_name": "qwen1.5-chat",
  "model_path": "<model_file_path>"
}'
from xinference.client import RESTfulClient
client = RESTfulClient("http://127.0.0.1:9997")
model_uid = client.launch_model(
  model_engine="<inference_engine>",
  model_name="qwen1.5-chat",
  model_path="<model_file_path>"
)
print('Model uid: ' + model_uid)

The above example demonstrates how to directly launch a qwen1.5-chat model file without registering it.

For distributed scenarios, if your model file is on a specific worker, you can directly launch it using the worker_ip and model_path parameters with the launch interface.

Define a custom LLM model#

Define a custom LLM model based on the following template:

{
  "version": 1,
  "context_length": 2048,
  "model_name": "custom-llama-2-chat",
  "model_lang": [
    "en"
  ],
  "model_ability": [
    "chat"
  ],
  "model_family": "my-llama-2-chat",
  "model_specs": [
    {
      "model_format": "pytorch",
      "model_size_in_billions": 7,
      "quantizations": [
        "none"
      ],
      "model_uri": "file:///path/to/llama-2-chat"
    },
    {
      "model_format": "ggufv2",
      "model_size_in_billions": 7,
      "quantizations": [
        "q4_0",
        "q8_0"
      ],
      "model_file_name_template": "llama-2-chat-7b.{quantization}.gguf"
      "model_uri": "file:///path/to/gguf-file"
    }
  ],
  "chat_template": "{% if messages[0]['role'] == 'system' %}{% set system_message = '<<SYS>>\n' + messages[0]['content'] | trim + '\n<</SYS>>\n\n' %}{% set messages = messages[1:] %}{% else %}{% set system_message = '' %}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 %}{% set content = system_message + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content | trim + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content | trim + ' ' + '</s>' }}{% endif %}{% endfor %}",
  "stop_token_ids": [2],
  "stop": []
}
Define a custom embedding model#

Define a custom embedding model based on the following template:

{
    "model_name": "custom-bge-base-en",
    "dimensions": 768,
    "max_tokens": 512,
    "language": ["en"],
    "model_id": "BAAI/bge-base-en",
    "model_uri": "file:///path/to/bge-base-en"
}
Define a custom Rerank model#

Define a custom rerank model based on the following template:

{
    "model_name": "custom-bge-reranker-v2-m3",
    "type": "normal",
    "language": ["en", "zh", "multilingual"],
    "model_id": "BAAI/bge-reranker-v2-m3",
    "model_uri": "file:///path/to/bge-reranker-v2-m3"
}
Register a Custom Model#

Register a custom model programmatically:

import json
from xinference.client import Client

with open('model.json') as fd:
    model = fd.read()

# replace with real xinference endpoint
endpoint = 'http://localhost:9997'
client = Client(endpoint)
client.register_model(model_type="<model_type>", model=model, persist=False)

Or via CLI:

xinference register --model-type <model_type> --file model.json --persist

Note that replace the <model_type> above with LLM, embedding or rerank. The same as below.

List the Built-in and Custom Models#

List built-in and custom models programmatically:

registrations = client.list_model_registrations(model_type="<model_type>")

Or via CLI:

xinference registrations --model-type <model_type>
Launch the Custom Model#

Launch the custom model programmatically:

uid = client.launch_model(model_name='custom-llama-2', model_format='pytorch')

Or via CLI:

xinference launch --model-name custom-llama-2 --model-format pytorch
Interact with the Custom Model#

Invoke the model programmatically:

model = client.get_model(model_uid=uid)
model.generate('What is the largest animal in the world?')

Result:

{
   "id":"cmpl-a4a9d9fc-7703-4a44-82af-fce9e3c0e52a",
   "object":"text_completion",
   "created":1692024624,
   "model":"43e1f69a-3ab0-11ee-8f69-fa163e74fa2d",
   "choices":[
      {
         "text":"\nWhat does an octopus look like?\nHow many human hours has an octopus been watching you for?",
         "index":0,
         "logprobs":"None",
         "finish_reason":"stop"
      }
   ],
   "usage":{
      "prompt_tokens":10,
      "completion_tokens":23,
      "total_tokens":33
   }
}

Or via CLI, replace ${UID} with real model UID:

xinference generate --model-uid ${UID}
Unregister the Custom Model#

Unregister the custom model programmatically:

model = client.unregister_model(model_type="<model_type>", model_name='custom-llama-2')

Or via CLI:

xinference unregister --model-type <model_type> --model-name custom-llama-2

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4