Model Catalog

Ready-to-use configurations for 1-click deployments!

Author avatar

Llama-3.1-8B-Instruct

An 8-billion parameter model from Meta, optimized for dialogue. Generates helpful, safe responses and outperforms other open-source chat LLMs.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

Llama-3.1-70B-Instruct

A 70-billion parameter model from Meta, optimized for dialogue. Generates helpful, safe responses and outperforms other open-source chat LLMs.

GPU 4x Nvidia L40S
$ 8.3
/ hour
Author avatar

gemma-2-9b-it

An instruction model fine-tuned from Gemma 2, Google's open LLM. This version has 9 billion parameters.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

gemma-2-27b-it

An instruction model fine-tuned from Gemma 2, Google's open LLM. This version has 27 billion parameters.

GPU 4x Nvidia L4
$ 3.8
/ hour
Author avatar

Qwen2-7B-Instruct

Powerful multilingual instruction-tuned 7B model from Qwen, supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

Qwen2-72B-Instruct

Powerful multilingual instruction-tuned 72B model from Qwen, supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

GPU 4x Nvidia L40S
$ 8.3
/ hour
Author avatar

Qwen2.5-Coder-7B-Instruct

Instruction-tuned 7B model for code generation, code reasoning and code fixing from Qwen, supports a context length of up to 128K tokens.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

OpenHermes-2.5-Mistral-7B-GPTQ

A powerful chat model fine-tuned from Mistral 7B on a large corpus of synthetic data. Capable of function-calling and has strong coding capabilities.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

NeuralHermes-2.5-Mistral-7B-GPTQ

A fine-tuned version of OpenHermes 2.5 that was aligned with Direct Preference Optimization and AI preference examples from the SlimOrca dataset.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

Starling-LM-7B-alpha

Open chat language model by UC Berkeley. Outperformed all 7B models at the time of its release. For non-commercial use only.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

openchat-3.5-0106

Open source LLM that targets high performance and commercial viability. Fine-tuned using C-RLFT, for results on par with ChatGPT.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

Mistral-Nemo-Instruct-2407

Instruct fine-tuned version trained jointly by Mistral AI and NVIDIA. Significantly outperforms existing models similar in size.

GPU 1x Nvidia L40S
$ 1.8
/ hour
Author avatar

zephyr-7b-beta

A chat model fine-tuned from Mistral 7B with synthetic data and Direct Preference Optimization.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

neural-chat-7b-v3-1

A chat model from Intel, fine-tuned from Mistral 7B on the SlimOrca dataset with Direct Preference Optimization.

GPU 1x Nvidia L4
$ 0.8
/ hour
Author avatar

Falcon-180B-Chat-GPTQ

A 180-billion parameter conversational AI model optimized for fast inference through an efficient architecture. Freely available under TII LICENSE.

GPU 2x Nvidia A100
$ 8
/ hour
Author avatar

Mixtral-8x7B-Instruct-v0.1

Mixtral 8x7B is a sparse mixture-of-experts decoder-only model fine-tuned on instruction following a permissive license.

GPU 2x Nvidia A100
$ 8
/ hour