DeepInfra Is Now a Hugging Face Inference Provider — Here's What That Means

DeepInfra just became an official Inference Provider on the Hugging Face Hub. If you’ve been using Hugging Face for model inference, this is actually a bigger deal than it sounds.

For context, Hugging Face has been slowly building out this “Inference Providers” ecosystem — letting you route model calls through third-party services directly from the Hub, without cobbling together separate API clients. DeepInfra is the latest to join, and honestly, it’s a smart fit.

What DeepInfra brings to the table

DeepInfra is a serverless inference platform that’s been quietly gaining traction for one simple reason: it’s cheap. Like, noticeably cheaper than some of the bigger names. They claim one of the most cost-effective per-token pricing models out there, and with a catalog of over 100 models, they cover a lot of ground.

Right now, the initial integration supports conversational and text-generation tasks. That means you get access to popular open-weight LLMs like DeepSeek V4, Kimi-K2.6, GLM-5.1, and a bunch more. Text-to-image, text-to-video, embeddings — those are coming soon, but if you’re building with LLMs today, you’re covered.

How it actually works

There are two ways to use DeepInfra through Hugging Face:

Custom key mode — You bring your own DeepInfra API key. Requests go directly from the Hub to DeepInfra’s servers, and you get billed by DeepInfra as usual. No middleman.

Routed by Hugging Face — You authenticate with your Hugging Face token, and the request gets routed through HF to DeepInfra. You don’t need a DeepInfra key at all. Charges hit your HF account, but at the same rates DeepInfra would charge you directly. No markup. That’s refreshingly honest.

You can set your preferred provider order in your account settings, and the model pages will automatically show the available providers sorted by your preference. The widget, code snippets, everything adjusts.

SDK support and agent harnesses

If you’re using the Hugging Face SDKs, this is seamless. The Python SDK (huggingface_hub >= 1.11.2) and the JavaScript SDK (@huggingface/inference) both support DeepInfra out of the box. Just use the model ID with :deepinfra appended, and the router handles the rest.

Here’s the Python version:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function that returns the nth Fibonacci number using memoization."
        }
    ],
)

print(completion.choices[0].message)

And the JS equivalent:

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://router.huggingface.co/v1",
    apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages: [
        {
            role: "user",
            content: "Write a Python function that returns the nth Fibonacci number using memoization.",
        },
    ],
});

console.log(chatCompletion.choices[0].message);

They’ve also integrated with most agent harnesses — Pi, OpenCode, Hermes Agents, OpenClaw, and others. So if you’re using those tools, you can plug in DeepInfra-hosted models without writing any glue code.

Pricing and the PRO perk

For routed requests, you pay standard DeepInfra rates through your Hugging Face account. No markup. For direct requests, you pay DeepInfra directly. Simple.

One thing worth pointing out: PRO users get $2 worth of Inference credits every month, usable across all providers. That’s not huge, but for light experimentation or prototyping, it’s genuinely useful. Free users get a small quota too, but if you’re doing anything serious, PRO is worth it.

My take

I’ve been watching the Inference Providers ecosystem grow, and DeepInfra joining feels like a natural step. They’re not trying to be everything to everyone — they focus on being cheap and fast for a broad set of models. The Hugging Face integration removes a lot of the friction of managing separate accounts and endpoints.

That said, I wish they’d launched with more than just text generation. The promise of text-to-image and embeddings support is nice, but “coming soon” is always a bit of a letdown. Also, the model selection is good but not exhaustive — if you need something niche, you might still need to go directly to DeepInfra’s own platform.

Still, for most developers building LLM-powered apps, this is a solid addition. Less boilerplate, fewer API keys to manage, and competitive pricing. Hard to complain about that.

If you want to try it, head over to your Hugging Face account settings, set up DeepInfra as a provider, and start playing. The full list of supported models is here.

DeepInfra Is Now a Hugging Face Inference Provider — Here’s What That Means

What DeepInfra brings to the table

How it actually works

SDK support and agent harnesses

Pricing and the PRO perk

My take

Comments (0)