Red Hat® AI Inference Server optimizes model inference across the hybrid cloud for faster, cost-effective model deployments.
What is an inference server?An inference server is the piece of software that allows artificial intelligence (AI) applications to communicate with large language models (LLMs) and generate a response based on data. This process is called inference. It’s where the business value happens and the end result is delivered.
To perform effectively, LLMs need extensive storage, memory, and infrastructure to inference at scale—which is why it can take the majority of your budget.
As part of the Red Hat AI platform, Red Hat AI Inference Server optimizes inference capabilities to drive down traditionally high costs and extensive infrastructure.
Introduction to Red Hat AI Inference Server How does Red Hat AI Inference Server work?Red Hat AI Inference Server provides fast and cost-effective inference at scale. Its open source nature allows it to support any generative AI (gen AI) model, on any AI accelerator, in any cloud environment.
Powered by vLLM, the inference server maximizes GPU utilization, and enables faster response times. Combined with LLM Compressor capabilities, inference efficiency increases without sacrificing performance. With cross-platform adaptability and a growing community of contributors, vLLM is emerging as the Linux® of gen AI inference.
50%
Some customers who used LLM Compressor experienced 50% cost savings without sacrificing performance.*
Your models are your choiceRed Hat AI Inference Server supports all leading open source models and maintains flexible GPU portability. You have the flexibility to use any gen AI model and choose from our optimized collection of validated, open source, third-party models.
Plus, as part of Red Hat AI, Red Hat AI Inference Server is certified for all Red Hat products. It can also be deployed across other Linux and Kubernetes platforms with support under Red Hat’s third-party support policy.
Icon-Red_Hat-Diagrams-Graph_Arrow_Up-A-Black-RGB
Increased efficiency with vLLMOptimize the deployment of any gen AI model, on any AI accelerator, with vLLM.
Icon-Red_Hat-Thumbs_up-A-Black-RGB
LLM CompressorCompress models of any size to reduce compute utilization and its related costs while maintaining high model response accuracy.
Icon-Red_Hat-Simplify-A-Black-RGB
Hybrid cloud flexibilityMaintain portability across different GPUs and run models on premise, in the cloud, or at the edge.
Icon-Red_Hat-Software-Catalog-A-Black-RGB
Red Hat AI repositoryThird-party validated and optimized models are ready for inference deployment, to help achieve faster time to value and to keep costs low.
Red Hat AI SupportAs one of the largest commercial contributors to vLLM, we have a deep understanding of the technology. Our AI consultants have the vLLM expertise to help you achieve your enterprise AI goals.
How to buyRed Hat AI Inference Server is available as a standalone product, or as part of Red Hat AI. It is included in both Red Hat Enterprise Linux® AI and Red Hat OpenShift® AI.
Deploy with partnersExperts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions.
Frequently asked questionsNo. You can purchase Red Hat AI Inference Server as a standalone Red Hat product.
No. Red Hat AI Inference Server is included when you purchase Red Hat Enterprise Linux AI as well as Red Hat OpenShift AI.
Yes, it can. It can also run on third-party Linux environments under our third-party agreement.
It is priced per accelerator.
Explore more AI resources How to get started with AI at the enterprise Get Red Hat Consulting for AI Maximize AI innovation with open source models Red Hat Consulting: AI Platform Foundation Contact Sales Talk to a Red Hatter about Red Hat AIRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4