A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://huggingface.co/docs/inference-providers below:

Website Navigation


Inference Providers

Inference Providers

Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models your favorite providers.

Partners

Our platform integrates with leading AI infrastructure providers, giving you access to their specialized capabilities through a single, consistent API. Here’s what each partner supports:

Why Choose Inference Providers?

When you build AI applications, it’s tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:

Instant Access to Cutting-Edge Models: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you’ll find them here.

Zero Vendor Lock-in: Unlike being tied to a single provider’s model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.

Production-Ready Performance: Built for enterprise workloads with automatic failover i.e. ~0 downtime, intelligent routing, and the reliability your applications demand.

Here’s what you can build:

Get Started for Free: Inference Providers includes a generous free tier, with additional credits for PRO users and Enterprise Hub organizations.

Key Features Getting Started

Inference Providers works with your existing development workflow. Whether you prefer Python, JavaScript, or direct HTTP calls, we provide native SDKs and OpenAI-compatible APIs to get you up and running quickly.

We’ll walk through a practical example using deepseek-ai/DeepSeek-V3-0324, a state-of-the-art open-weights conversational model.

Inference Playground

Before diving into integration, explore models interactively with our Inference Playground. Test different chat completion models with your prompts and compare responses to find the perfect fit for your use case.

Authentication

You’ll need a Hugging Face token to authenticate your requests. Create one by visiting your token settings and generating a fine-grained token with Make calls to Inference Providers permissions.

For complete token management details, see our security tokens guide.

Quick Start - LLM

Let’s start with the most common use case: conversational AI using large language models. This section demonstrates how to perform chat completions using DeepSeek V3, showcasing the different ways you can integrate Inference Providers into your applications.

Whether you prefer our native clients, want OpenAI compatibility, or need direct HTTP access, we’ll show you how to get up and running with just a few lines of code.

Python

Here are three ways to integrate Inference Providers into your Python applications, from high-level convenience to low-level control:

huggingface_hub

openai

requests

For convenience, the huggingface_hub library provides an InferenceClient that automatically handles provider selection and request routing.

In your terminal, install the Hugging Face Hub Python client and log in:

pip install huggingface_hub
huggingface-cli login # get a read token from hf.co/settings/tokens

You can now use the the client with a Python interpreter:

import os
from huggingface_hub import InferenceClient

client = InferenceClient()

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {
            "role": "user",
            "content": "How many 'G's in 'huggingface'?"
        }
    ],
)

print(completion.choices[0].message)
JavaScript

Integrate Inference Providers into your JavaScript applications with these flexible approaches:

huggingface.js

openai

fetch

Our JavaScript SDK provides a convenient interface with automatic provider selection and TypeScript support.

Install with NPM:

npm install @huggingface/inference

Then use the client with Javascript:

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
  model: "deepseek-ai/DeepSeek-V3-0324",
  messages: [
    {
      role: "user",
      content: "How many 'G's in 'huggingface'?",
    },
  ],
});

console.log(chatCompletion.choices[0].message);
HTTP / cURL

For testing, debugging, or integrating with any HTTP client, here’s the raw REST API format. Our intelligent routing automatically selects the most popular provider for your requested model, or to your preferred provider if you’ve sorted the providers from your user settings.

curl https://router.huggingface.co/v1/chat/completions \
    -H "Authorization: Bearer $HF_TOKEN" \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {
                "role": "user",
                "content": "How many G in huggingface?"
            }
        ],
        "model": "deepseek-ai/DeepSeek-V3-0324",
        "stream": false
    }'
Quick Start - Text-to-Image Generation

Let’s explore how to generate images from text prompts using Inference Providers. We’ll use black-forest-labs/FLUX.1-dev, a state-of-the-art diffusion model that produces highly detailed, photorealistic images.

Python

Use the huggingface_hub library for the simplest image generation experience with automatic provider selection:

import os
from huggingface_hub import InferenceClient

client = InferenceClient(api_key=os.environ["HF_TOKEN"])

image = client.text_to_image(
    prompt="A serene lake surrounded by mountains at sunset, photorealistic style",
    model="black-forest-labs/FLUX.1-dev"
)


image.save("generated_image.png")
JavaScript

Use our JavaScript SDK for streamlined image generation with TypeScript support:

import { InferenceClient } from "@huggingface/inference";
import fs from "fs";

const client = new InferenceClient(process.env.HF_TOKEN);

const imageBlob = await client.textToImage({
  model: "black-forest-labs/FLUX.1-dev",
  inputs:
    "A serene lake surrounded by mountains at sunset, photorealistic style",
});


const buffer = Buffer.from(await imageBlob.arrayBuffer());
fs.writeFileSync("generated_image.png", buffer);
Provider Selection

The Inference Providers API acts as a unified proxy layer that sits between your application and multiple AI providers. Understanding how provider selection works is crucial for optimizing performance, cost, and reliability in your applications.

API as a Proxy Service

When using Inference Providers, your requests go through Hugging Face’s proxy infrastructure, which provides several key benefits:

Because the API acts as a proxy, the exact HTTP request may vary between providers as each provider has their own API requirements and response formats. When using our official client libraries (JavaScript or Python), these provider-specific differences are handled automatically whether you use provider="auto" or specify a particular provider.

Client-Side Provider Selection (Inference Clients)

When using the Hugging Face inference clients (JavaScript or Python), you can explicitly specify a provider or let the system choose automatically. The client then formats the HTTP request to match the selected provider’s API requirements.

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);


await client.chatCompletion({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  provider: "sambanova", 
  messages: [{ role: "user", content: "Hello!" }],
});


await client.chatCompletion({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  
  
  messages: [{ role: "user", content: "Hello!" }],
});

Provider Selection Policy:

Alternative: OpenAI-Compatible Chat Completions Endpoint (Chat Only)

If you prefer to work with familiar OpenAI APIs or want to migrate existing chat completion code with minimal changes, we offer a drop-in compatible endpoint that handles all provider selection automatically on the server side.

Note: This OpenAI-compatible endpoint is currently available for chat completion tasks only. For other tasks like text-to-image, embeddings, or speech processing, use the Hugging Face inference clients shown above.

import { OpenAI } from "openai";

const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

const completion = await client.chat.completions.create({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [{ role: "user", content: "Hello!" }],
});

This endpoint can also be requested through direct HTTP access, making it suitable for integration with various HTTP clients and applications that need to interact with the chat completion service directly.

curl https://router.huggingface.co/v1/chat/completions \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Key Features:

Choosing the Right Approach

Use Inference Clients when:

Use OpenAI-Compatible Endpoint when:

Use Direct HTTP when:

Next Steps

Now that you understand the basics, explore these resources to make the most of Inference Providers:

< > Update on GitHub

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4