OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource utilization, and sophisticated deployment patterns.
Read the documentation to learn more about OME capabilities and features.
Model Management: Models are first-class citizen custom resources in OME. Sophisticated model parsing extracts architecture, parameter count, and capabilities directly from model files. Supports distributed storage with automated repair, double encryption, namespace scoping, and multiple formats (SafeTensors, PyTorch, TensorRT, ONNX).
Intelligent Runtime Selection: Automatic matching of models to optimal runtime configurations through weighted scoring based on architecture, format, quantization, parameter size, and framework compatibility.
Optimized Deployments: Supports multiple deployment patterns including prefill-decode disaggregation, multi-node inference, and traditional Kubernetes deployments with advanced scaling controls.
Resource Optimization: Specialized GPU bin-packing scheduling with dynamic re-optimization to maximize cluster efficiency while ensuring high availability.
Runtime Integrations: First-class support for SGLang - the most advanced inference engine with cache-aware load balancing, multi-node deployment, prefill-decode disaggregated serving, multi-LoRA adapter serving, and much more. Also supports Triton for general model inference.
Kubernetes Ecosystem Integration: Deep integration with modern Kubernetes components including Kueue for gang scheduling of multi-pod workloads, LeaderWorkerSet for resilient multi-node deployments, KEDA for advanced custom metrics-based autoscaling, K8s Gateway API for sophisticated traffic routing, and Gateway API Inference Extension for standardized inference endpoints.
Automated Benchmarking: Built-in performance evaluation through the BenchmarkJob custom resource, supporting configurable traffic patterns, concurrent load testing, and comprehensive result storage. Enables systematic performance comparison across models and service configurations.
Requires Kubernetes 1.28 or newer
Option 1: OCI Registry (Recommended)Install OME directly from the OCI registry:
# Install OME CRDs helm upgrade --install ome-crd oci://ghcr.io/moirai-internal/charts/ome-crd --namespace ome --create-namespace # Install OME resources helm upgrade --install ome oci://ghcr.io/moirai-internal/charts/ome-resources --namespace omeOption 2: Helm Repository
Install using the traditional Helm repository:
# Add the OME Helm repository helm repo add ome https://sgl-project.github.io/ome helm repo update # Install OME CRDs first helm upgrade --install ome-crd ome/ome-crd --namespace ome --create-namespace # Install OME resources helm upgrade --install ome ome/ome-resources --namespace omeOption 3: Install from Source
For development or customization:
# Clone the repository git clone https://github.com/sgl-project/ome.git cd ome # Install from local charts helm install ome-crd charts/ome-crd --namespace ome --create-namespace helm install ome charts/ome-resources --namespace ome
Read the installation guide for more options and advanced configurations.
Learn more about:
OME uses a component-based architecture built on Kubernetes custom resources:
OME's controller automatically:
High-level overview of the main priorities:
OME is licensed under the MIT License.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4