Mistral Models
Explore the Mistral language and embedding models available through our OpenAI Assistants API-compatible service.
Mistral: Mistral Medium 3.1
- Context Length:
- 131,072 tokens
- Architecture:
- text+image->text
Pricing:
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases.
The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
Mistral: Codestral 2508
- Context Length:
- 256,000 tokens
- Architecture:
- text->text
Pricing:
Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
Mistral: Devstral Medium
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
Pricing:
Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost. It is designed for generalization across prompt styles and tool use in code agents and frameworks.
Devstral Medium is available via API only (not open-weight), and supports enterprise deployment on private infrastructure, with optional fine-tuning capabilities.
Mistral: Devstral Small 1.1
- Context Length:
- 128,000 tokens
- Architecture:
- text->text
Pricing:
Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats.
Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.
Mistral: Mistral Small 3.2 24B (free)
- Context Length:
- 131,072 tokens
- Architecture:
- text+image->text
Pricing:
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks.
It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA).
Mistral: Mistral Small 3.2 24B
- Context Length:
- 131,072 tokens
- Architecture:
- text+image->text
- Max Output:
- 131,072 tokens
Pricing:
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks.
It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA).
Mistral: Magistral Small 2506
- Context Length:
- 40,000 tokens
- Architecture:
- text->text
- Max Output:
- 40,000 tokens
Pricing:
Magistral Small is a 24B parameter instruction-tuned model based on Mistral-Small-3.1 (2503), enhanced through supervised fine-tuning on traces from Magistral Medium and further refined via reinforcement learning. It is optimized for reasoning and supports a wide multilingual range, including over 20 languages.
Mistral: Magistral Medium 2506
- Context Length:
- 40,960 tokens
- Architecture:
- text->text
- Max Output:
- 40,000 tokens
Pricing:
Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical.
Mistral: Magistral Medium 2506 (thinking)
- Context Length:
- 40,960 tokens
- Architecture:
- text->text
- Max Output:
- 40,000 tokens
Pricing:
Magistral is Mistral's first reasoning model. It is ideal for general purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs. From legal research and financial forecasting to software development and creative storytelling — this model solves multi-step challenges where transparency and precision are critical.
Mistral: Devstral Small 2505 (free)
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%).
Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license.
Mistral: Devstral Small 2505
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 131,072 tokens
Pricing:
Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. It is optimized for codebase exploration, multi-file editing, and integration into coding agents, achieving state-of-the-art results on SWE-Bench Verified (46.8%).
Devstral supports a 128k context window and uses a custom Tekken tokenizer. It is text-only, with the vision encoder removed, and is suitable for local deployment on high-end consumer hardware (e.g., RTX 4090, 32GB RAM Macs). Devstral is best used in agentic workflows via the OpenHands scaffold and is compatible with inference frameworks like vLLM, Transformers, and Ollama. It is released under the Apache 2.0 license.
Mistral: Mistral Medium 3
- Context Length:
- 131,072 tokens
- Architecture:
- text+image->text
Pricing:
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases.
The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
Mistral: Mistral Small 3.1 24B (free)
- Context Length:
- 128,000 tokens
- Architecture:
- text+image->text
Pricing:
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is Mistral Small 3.2
Mistral: Mistral Small 3.1 24B
- Context Length:
- 128,000 tokens
- Architecture:
- text+image->text
Pricing:
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is Mistral Small 3.2
Mistral: Saba
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. Read more at the blog post here
Mistral: Mistral Small 3 (free)
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Mistral: Mistral Small 3
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Mistral: Codestral 2501
- Context Length:
- 262,144 tokens
- Architecture:
- text->text
Pricing:
Mistral's cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.
Learn more on their blog post: https://mistral.ai/news/codestral-2501/
Mistral Large 2411
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
Pricing:
Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411
It provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling.
Mistral Large 2407
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
Pricing:
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here.
It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.
Mistral: Pixtral Large 2411
- Context Length:
- 131,072 tokens
- Architecture:
- text+image->text
Pricing:
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images.
The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.
Mistral: Ministral 8B
- Context Length:
- 128,000 tokens
- Architecture:
- text->text
Pricing:
Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.
Mistral: Ministral 3B
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.
Mistral: Pixtral 12B
- Context Length:
- 32,768 tokens
- Architecture:
- text+image->text
Pricing:
The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.
Mistral: Mistral Nemo (free)
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 128,000 tokens
Pricing:
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
It supports function calling and is released under the Apache 2.0 license.
Mistral: Mistral Nemo
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
It supports function calling and is released under the Apache 2.0 license.
Mistral: Mistral 7B Instruct (free)
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.
Mistral: Mistral 7B Instruct
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.
Mistral: Mistral 7B Instruct v0.3
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of Mistral 7B Instruct v0.2, with the following changes:
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
NOTE: Support for function calling depends on the provider.
Mistral: Mixtral 8x22B Instruct
- Context Length:
- 65,536 tokens
- Architecture:
- text->text
Pricing:
Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
- strong math, coding, and reasoning
- large context length (64k)
- fluency in English, French, Italian, German, and Spanish
See benchmarks on the launch announcement here.
#moe
Mistral Large
- Context Length:
- 128,000 tokens
- Architecture:
- text->text
Pricing:
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here.
It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.
Mistral Small
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.
Mistral Tiny
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
Note: This model is being deprecated. Recommended replacement is the newer Ministral 8B
This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
Mistral: Mistral 7B Instruct v0.2
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
Pricing:
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of Mistral 7B Instruct, with the following changes:
- 32k context window (vs 8k context in v0.1)
- Rope-theta = 1e6
- No Sliding-Window Attention
Mistral: Mixtral 8x7B Instruct
- Context Length:
- 32,768 tokens
- Architecture:
- text->text
- Max Output:
- 16,384 tokens
Pricing:
Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
Instruct model fine-tuned by Mistral. #moe
Mistral: Mistral 7B Instruct v0.1
- Context Length:
- 2,824 tokens
- Architecture:
- text->text
Pricing:
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Ready to build with Mistral?
Start using these powerful models in your applications with our flexible pricing plans.