Machine Learning At Your Service

by Hugging Face

Easily deploy Transformers, Diffusers or any model on dedicated, fully managed infrastructure. Keep your costs low with our secure, compliant and flexible production solution.

Learn More

No Hugging Face account ? Sign up!

One-click inference deployment

Import your favorite model from the Hugging Face hub or browse our catalog of hand-picked, ready-to-deploy models !

Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
/ hour
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
/ hour
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
/ hour
Text-to-Image
GPU 1x Nvidia L40S
$ 1.8
/ hour
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
/ hour
Automatic Speech Recognition
GPU 1x Nvidia T4
$ 0.5
/ hour

Customer Stories

Learn how leading AI teams use Inference Endpoints to deploy their models

Endpoints for Music

Musixmatch is the world’s leading music data company

Use Case

Custom text embeddings generation pipeline

Models Deployed
  • Distilbert-base-uncased-finetuned-sst-2-english
  • facebook/wav2vec2-base-960h
  • Custom model based on sentence transformers
The coolest thing was how easy it was to define a complete custom interface from the model to the inference process. It just took us a couple of hours to adapt our code, and have a functioning and totally custom endpoint.
Portrait of Andrea Boscarino, Data Scientist at Musixmatch
Andrea Boscarino
Data Scientist at Musixmatch

Pricing

Choose a plan that fits your needs

Self-Serve

Pay as you go when using Inference Endpoints

  • Pay for what you use, per minute
  • Starting as low as $0.06/hour
  • Billed monthly
  • Email support
See Pricing

Enterprise

Get a custom quote and premium support

  • Lower marginal costs based on volume
  • Uptime guarantees
  • Custom annual contracts
  • Dedicated support, SLAs
Request a Quote