Machine Learning At Your Service
by Hugging FaceEasily deploy Transformers, Diffusers or any model on dedicated, fully managed infrastructure. Keep your costs low with our secure, compliant and flexible production solution.
No Hugging Face account ? Sign up!
One-click inference deployment
Import your favorite model from the Hugging Face hub or browse our catalog of hand-picked, ready-to-deploy models !
Llama-3.1-70B-Instruct
A 70-billion parameter model from Meta, optimized for dialogue. Generates helpful, safe responses and outperforms other open-source chat LLMs.
gemma-2-27b-it
An instruction model fine-tuned from Gemma 2, Google's open LLM. This version has 27 billion parameters.
Qwen2.5-Coder-7B-Instruct
Instruction-tuned 7B model for code generation, code reasoning and code fixing from Qwen, supports a context length of up to 128K tokens.
FLUX.1-schnell
A 12-billion parameter rectified flow transformer that delivers cutting-edge output quality and competitive prompt following.
mxbai-embed-large-v1
This model produces 1024-dimensional embeddings that perform extremely well on a variety of tasks. See the repository for the prefix required for queries.
whisper-large-v3-turbo
Finetuned version of a pruned Whisper large-v3. 8x faster than the original, at the expense of a minor quality degradation.
Customer Stories
Learn how leading AI teams use Inference Endpoints to deploy their models
Endpoints for Music
Musixmatch is the world’s leading music data company
Custom text embeddings generation pipeline
- Distilbert-base-uncased-finetuned-sst-2-english
- facebook/wav2vec2-base-960h
- Custom model based on sentence transformers
The coolest thing was how easy it was to define a complete custom interface from the model to the inference process. It just took us a couple of hours to adapt our code, and have a functioning and totally custom endpoint.
Endpoints for Health
Phamily improves patient health with intelligent care management
HIPAA-compliant secure endpoints for text classification
- Custom model based on text-classification (MPNET)
- Custom model based on text-classification (BERT)
It took off a week's worth of developer time. Thanks to Inference Endpoints, we now basically spend all of our time on R&D, not fiddling with AWS. If you haven't already built a robust, performant, fault tolerant system for inference, then it's pretty much a no brainer.
Endpoints for Search
Pinecone is the vector database for intelligent search
Autoscaling endpoints for fast embeddings generation
- Different sentence transformers and embedding models
We were able to choose an off the shelf model that's very common for our customers to get started with and set it so that it can be configured to handle over 100 requests per second just with a few button clicks. With the release of the Hugging Face Inference Endpoints, we believe there's a new standard for how easy it can be to go build your first vector embedding based solution, whether it be semantic search or question answering system.
Endpoints for Videos
Waymark is a AI-powered video creator
Multi-modal endpoints for embeddings, audio and image generation
- sentence-transformers/all-mpnet-base-v2
- google/vit-base-patch16-224-in21k
- Custom model based on florentgbelidji/blip_captioning
You're bringing the potential time delta between - I've never seen anything that could do this before - to - I could have it on infrastructure ready to support an existing product - down to potentially less than a day.
Pricing
Choose a plan that fits your needs
Self-Serve
Pay as you go when using Inference Endpoints
- Pay for what you use, per minute
- Starting as low as $0.06/hour
- Billed monthly
- Email support
Enterprise
Get a custom quote and premium support
- Lower marginal costs based on volume
- Uptime guarantees
- Custom annual contracts
- Dedicated support, SLAs