Ragwalla Memory and Knowledge Graphs are now generally available!

Inception Models

Explore the Inception language and embedding models available through our OpenAI Assistants API-compatible service.

Inception: Mercury

Context Length:: 128,000 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000025

Completion: $0.000001

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the blog post here.

Inception: Mercury Coder

Context Length:: 128,000 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000025

Completion: $0.000001

Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the blog post here.

Ready to build with Inception?

Start using these powerful models in your applications with our flexible pricing plans.

View Pricing