RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.redhat.com/en/topics/ai/what-is-ai-inference below:

What is AI inference?

Published January 7, 2025•6-minute read

Jump to section What is AI inference? Why is AI inference important? AI inference use cases What is AI training? Types of AI inference What is an AI inference server? AI inference challenges How Red Hat can help

What is AI inference?

AI inference is when an AI model provides an answer based on data. What some generally call “AI” is really the success of AI inference: the final step—the “aha” moment—in a long and complex process of machine learning technology.

Training artificial intelligence (AI) models with sufficient data can help improve AI inference accuracy and speed.

Explore Red Hat AI

For example, when an AI model is trained on data about animals—from their differences and similarities to typical health and behavior—it needs a large data set to make connections and identify patterns.

After successful training, the model can make inferences such as identifying a breed of dog, recognizing a cat’s meow, or even delivering a warning around a spooked horse. Even though it has never seen these animals outside of an abstract data set before, the extensive data it was trained on allows it to make inferences in a new environment in real time.

Our own human brain makes connections like this too. We can read about different animals from books, movies, and online resources. We can see pictures, watch videos, and listen to what these animals sound like. When we go to the zoo, we are able to make an inference (“That’s a buffalo!”). Even if we have never been to the zoo, we can identify the animal because of the research we have done. The same goes for AI models during AI inference.

Find out what's new and what's next for Red Hat AI at our next live event. Catch the next live session.

Why is AI inference important?

AI inference is the operational phase of AI, where the model is able to apply what it’s learned from training to real-world situations. AI’s ability to identify patterns and reach conclusions sets it apart from other technologies. Its ability to infer can help with practical day-to-day tasks or extremely complicated computer programming.

Predictive AI vs. generative AI

AI inference use cases

Today, businesses can use AI inference in a variety of everyday use cases. These are a few examples:

Healthcare: AI inference can help healthcare professionals compare patient history to current data and trace patterns and anomalies faster than humans. This could be an outlier on a brain scan or an extra “thump” in a heart beat. This can help catch signs of threats to patient health much earlier, and much faster.

Finance: After being trained on large data sets of banking and credit information, AI inference can identify errors or unusual data in real-time to catch fraud early and quickly. This can optimize customer service resources, protect customer privacy, and improve brand reputation.

Automotive: As AI enters the world of cars, autonomous vehicles are changing the way we drive. AI inference can help vehicles navigate the most efficient route from point A to point B or brake when they approach a stop sign, all to improve the ease and the safety of those in the car.

Many other industries are applying AI inference in creative ways, too. It can be applied to a fast food drive-through, a veterinary clinic, or a hotel concierge. Businesses are finding ways to make this technology work to their advantage to improve their accuracy, save time and money, and maintain their edge with competitors.

More AI/ML use cases

What is AI training?

AI training is the process of using data to teach the model how to make connections and identify patterns. Training is the process of teaching a model, whereas inference is the AI model in action.

What are foundation models?

Most AI training occurs in the beginning stages of model building. Once trained, the model can make connections with data it has never encountered before. Training an AI model with a larger data set means it can learn more connections and make more accurate inferences. If the model is struggling to make accurate inferences after training, fine-tuning can add knowledge and improve accuracy.

Training and AI inference are how AI is able to mimic human capabilities such as drawing conclusions based on evidence and reasoning.

Factors like model size can change the amount of resources you need to manipulate your model.

Learn how smaller models can make GPU inference easier.

What are different types of AI inference?

Different kinds of AI inference can support different use cases.

Batch inference: Batch inference gets its name from how it receives and processes data: in large groups. Instead of processing inference in real time, this method processes data in waves, sometimes hourly or even daily, depending on the amount of data and the efficiency of the AI model. These inferences can also be called “offline inferences” or “static inferences.”
Online inference: Online inference or “dynamic” inference can deliver a response in real time. These inferences require hardware and software that can reduce latency barriers and support high-speed predictions. Online inference is helpful at the edge, meaning AI is doing its work where the data is located. This could be on a phone, in a car, or at a remote office with limited connectivity.

OpenAI’s ChatGPT is a good example of an online inference—it requires a lot of upfront operational support in order to deliver a quick and accurate response.
Streaming inference: Streaming inference describes an AI system that is not necessarily used to communicate with humans. Instead of prompts and requests, the model receives a constant flow of data in order to make predictions and update its internal database. Streaming inference can monitor changes, maintain regularity, or predict an issue before it arises.

Learn how distributed inference with vLLM can alleviate bottlenecks

What is an AI inference server?

An AI inference server is the software that helps an AI model make the jump from training to operating. It uses machine learning to help the model apply what it’s learned and put it into practice to generate inferences.

For efficient results, your AI inference server and AI model need to be compatible. Here are a few examples of inference servers and the models they work with best:

Multimodal inference server: This type of inference server is able to support several models at once. This means it can receive data in the form of code, images, or text and process all of these different inferences on a single server. A multimodal inference server uses GPU and CPU memory more efficiently to support more than one model. This helps streamline hardware, makes it easier to scale, and optimizes costs.
Single-model inference server: This inference server only supports one model, rather than several. The AI inference process is specialized to communicate with a model trained on a specific use case. It may only be able to process data in the form of text or only in the form of code. Its specialized nature allows it to be incredibly efficient, which can help with real-time decision making or resource constraints.

AI inference challenges

The biggest challenges when running AI inference are scaling, resources, and cost.

Complexity: It is easier to teach a model to execute simple tasks like generating a picture or informing a customer of a return policy. As we lean on models to learn more complex data—like how to catch financial fraud or identify medical anomalies—they require more data during training and more resources to support that data.
Resources: More complex models will require specialized hardware and software to support the vast amount of data processing which takes place when a model is generating inferences. A key component of these resources is central processing unit (CPU) memory. A CPU is often referred to as the hub or control center of a computer. When a model is preparing to use what it knows (training data) to generate an answer, it must refer back to the data which is held in CPU memory space.
Cost: All of these puzzle pieces that make AI inference possible are not cheap. Whether your goal is to scale or to transition to the latest AI-supported hardware, the resources it takes to get the full picture can be extensive. As model complexity increases and hardware continues to evolve, costs can increase sharply and make it tough for organizations to keep up with AI innovation.

An LLM Compressor can help make these challenges less difficult and make AI inference faster.

What is vLLM?

How Red Hat can help

Red Hat AI is a platform of products and services that can help your enterprise at any stage of the AI journey - whether you’re at the very beginning or ready to scale. It can support both generative and predictive AI efforts for your unique enterprise use cases.

With Red Hat AI, you have access to Red Hat® AI Inference Server to optimize model inference across the hybrid cloud for faster, cost-effective deployments. Powered by vLLM, the inference server maximizes GPU utilization and enables faster response times.

Learn more about Red Hat AI Inference Server

Red Hat AI Inference Server includes the Red Hat AI repository, a collection of third-party validated and optimized models that allows model flexibility and encourages cross-team consistency. With access to the third-party model repository, enterprises can accelerate time to market and decrease financial barriers to AI success.

Explore the repository on Hugging Face

Learn more about validated models by Red Hat AI

Red Hat AI is powered by open source technologies and a partner ecosystem that focuses on performance, stability, and GPU support across various infrastructures.

Explore our partner ecosystem

Keep reading

What is AI in the public sector?

Explore the development and application of AI as a tool to drive public sector transformation and modernization.

SLMs vs LLMs: What are small language models?

A small language model (SLM) is a smaller version of a large language model (LLM) that has more specialized knowledge, is faster to customize, and more efficient to run.

What is enterprise AI?

Enterprise AI is the integration of artificial intelligence (AI) tools and machine learning software into large scale operations and processes. Now, businesses can solve problems in weeks rather than years.

Artificial intelligence resources

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4