A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/vicoooo26/llmaz below:

vicoooo26/llmaz: ☸️ Effortlessly operating LLMs on Kubernetes, e.g. Serving.

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with state-of-the-art inference backends like vLLM to bring the cutting-edge researches to cloud.

Read the Installation for guidance.

Once Models (e.g. facebook/opt-125m) are published, you can quick deploy a Playground to serve the model.

apiVersion: llmaz.io/v1alpha1
kind: Model
metadata:
  name: opt-125m
spec:
  familyName: opt
  dataSource:
    modelID: facebook/opt-125m
  inferenceFlavors:
  - name: t4 # GPU type
    requests:
      nvidia.com/gpu: 1
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: opt-125m
spec:
  replicas: 1
  modelClaim:
    modelName: opt-125m
kubectl port-forward pod/opt-125m-0 8080:8080
curl http://localhost:8080/v1/models
curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "facebook/opt-125m",
    "prompt": "San Francisco is a",
    "max_tokens": 10,
    "temperature": 0
}'

Refer to examples to learn more.

πŸš€ All kinds of contributions are welcomed ! Please follow Contributing.

πŸŽ‰ Thanks to all these contributors.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4