Nousresearch Models

Explore the Nousresearch language and embedding models available through our OpenAI Assistants API-compatible service.

Nousresearch logo

Nous: Hermes 4 70B

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
131,072 tokens

Pricing:

Prompt: $0.00000011
Completion: $0.00000038

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either respond directly or generate explicit ... reasoning traces before answering. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

This 70B variant is trained with the expanded post-training corpus (~60B tokens) emphasizing verified reasoning data, leading to improvements in mathematics, coding, STEM, logic, and structured outputs while maintaining general assistant performance. It supports JSON mode, schema adherence, function calling, and tool use, and is designed for greater steerability with reduced refusal rates.

Nous: Hermes 4 405B

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
131,072 tokens

Pricing:

Prompt: $0.0000003
Completion: $0.0000012

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with ... traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.

Nous: DeepHermes 3 Mistral 24B Preview

Context Length:
32,768 tokens
Architecture:
text->text
Max Output:
32,768 tokens

Pricing:

Prompt: $0.00000015
Completion: $0.00000059

DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications.

DeepHermes 3 supports a reasoning toggle via system prompt, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a "deep thinking" mode—generating extended chains of thought wrapped in <think></think> tags before delivering a final answer.

System Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.

Nous: DeepHermes 3 Llama 3 8B Preview (free)

Context Length:
131,072 tokens
Architecture:
text->text

Pricing:

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling.

DeepHermes 3 Preview is one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt.

Nous: DeepHermes 3 Llama 3 8B Preview

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
131,072 tokens

Pricing:

Prompt: $0.00000003
Completion: $0.00000011

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling.

DeepHermes 3 Preview is one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt.

Nous: Hermes 3 70B Instruct

Context Length:
65,536 tokens
Architecture:
text->text

Pricing:

Prompt: $0.0000003
Completion: $0.0000003

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Nous: Hermes 3 405B Instruct (free)

Context Length:
131,072 tokens
Architecture:
text->text

Pricing:

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

Nous: Hermes 3 405B Instruct

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
16,384 tokens

Pricing:

Prompt: $0.000001
Completion: $0.000001

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

NousResearch: Hermes 2 Pro - Llama-3 8B

Context Length:
32,768 tokens
Architecture:
text->text

Pricing:

Prompt: $0.000000025
Completion: $0.00000008

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Ready to build with Nousresearch?

Start using these powerful models in your applications with our flexible pricing plans.