Inference Endpoints - Hugging Face

Machine Learning At Your Service

Easily deploy Transformers, Diffusers or any model on dedicated, fully managed infrastructure. Keep your costs low with our secure, compliant and flexible production solution.

Learn More

No Hugging Face account ? Sign up!

One-click inference deployment

Import your favorite model from the Hugging Face hub or browse our catalog of hand-picked, ready-to-deploy models !

google /

gemma-2-27b-it

Text Generation

TGI

Accelerated Text Generation Inference

GPU 4x Nvidia L4

$ 3.8

/ hour

meta-llama /

Llama-3.1-70B-Instruct

Text Generation

TGI

Accelerated Text Generation Inference

GPU 4x Nvidia L40S

$ 8.3

/ hour

Qwen /

Qwen2.5-Coder-7B-Instruct

Text Generation

TGI

Accelerated Text Generation Inference

GPU 1x Nvidia L4

$ 0.8

/ hour

black-forest-labs /

FLUX.1-schnell

Text-to-Image

GPU 1x Nvidia L40S

$ 1.8

/ hour

mixedbread-ai /

mxbai-embed-large-v1

Sentence Embeddings

TEI

Accelerated Text Embeddings Inference

GPU 1x Nvidia L4

$ 0.8

/ hour

openai /

whisper-large-v3-turbo

Automatic Speech Recognition

GPU 1x Nvidia T4

$ 0.5

/ hour

Browse Catalog Hub Models

Customer Stories

Learn how leading AI teams use Inference Endpoints to deploy their models

Endpoints for Music

Musixmatch is the world’s leading music data company

Use Case

Custom text embeddings generation pipeline

Models Deployed

Distilbert-base-uncased-finetuned-sst-2-english
facebook/wav2vec2-base-960h
Custom model based on sentence transformers

The coolest thing was how easy it was to define a complete custom interface from the model to the inference process. It just took us a couple of hours to adapt our code, and have a functioning and totally custom endpoint.