Models by deepseek for RAG Use Cases

DeepSeek: DeepSeek V3.2 Exp

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000027

Completion: $0.0000004

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

DeepSeek: DeepSeek V3.1 Terminus

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.00000023

Completion: $0.0000009

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

DeepSeek: DeepSeek V3.1 (free)

Context Length:: 163,800 tokens
Architecture:: text->text

Pricing:

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

DeepSeek: DeepSeek V3.1

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 32,768 tokens

Pricing:

Prompt: $0.00000027

Completion: $0.000001

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

DeepSeek: DeepSeek R1 0528 Qwen3 8B (free)

Context Length:: 131,072 tokens
Architecture:: text->text

Pricing:

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro.
It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought.
The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.

DeepSeek: DeepSeek R1 0528 Qwen3 8B

Context Length:: 32,768 tokens
Architecture:: text->text
Max Output:: 32,768 tokens

Pricing:

Prompt: $0.00000003

Completion: $0.00000011

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro.
It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought.
The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.

DeepSeek: R1 0528 (free)

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model.

DeepSeek: R1 0528

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.0000004

Completion: $0.00000175

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model.

DeepSeek: DeepSeek Prover V2

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

Prompt: $0.0000005

Completion: $0.00000218

DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5 Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description.

DeepSeek: DeepSeek V3 0324 (free)

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

DeepSeek: DeepSeek V3 0324

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.00000024

Completion: $0.00000084

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

DeepSeek: R1 Distill Qwen 32B

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000027

Completion: $0.00000027

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek: R1 Distill Qwen 14B

Context Length:: 32,768 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000015

Completion: $0.00000015

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Other benchmark results include:

AIME 2024 pass@1: 69.7
MATH-500 pass@1: 93.9
CodeForces Rating: 1481

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek: R1 Distill Llama 70B (free)

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 4,096 tokens

Pricing:

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

AIME 2024 pass@1: 70.0
MATH-500 pass@1: 94.5
CodeForces Rating: 1633

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek: R1 Distill Llama 70B

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 131,072 tokens

Pricing:

Prompt: $0.00000003

Completion: $0.00000013

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

AIME 2024 pass@1: 70.0
MATH-500 pass@1: 94.5
CodeForces Rating: 1633

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek: R1 (free)

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model & technical report.

MIT licensed: Distill & commercialize freely!

DeepSeek: R1

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.0000004

Completion: $0.000002

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model & technical report.

MIT licensed: Distill & commercialize freely!

DeepSeek: DeepSeek V3

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.00000085

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.

Deepseek Models

DeepSeek: DeepSeek V3.2 Exp

DeepSeek: DeepSeek V3.1 Terminus

DeepSeek: DeepSeek V3.1 (free)

DeepSeek: DeepSeek V3.1

DeepSeek: DeepSeek R1 0528 Qwen3 8B (free)

DeepSeek: DeepSeek R1 0528 Qwen3 8B

DeepSeek: R1 0528 (free)

DeepSeek: R1 0528

DeepSeek: DeepSeek Prover V2

DeepSeek: DeepSeek V3 0324 (free)

DeepSeek: DeepSeek V3 0324

DeepSeek: R1 Distill Qwen 32B

DeepSeek: R1 Distill Qwen 14B

DeepSeek: R1 Distill Llama 70B (free)

DeepSeek: R1 Distill Llama 70B

DeepSeek: R1 (free)

DeepSeek: R1

DeepSeek: DeepSeek V3

Ready to build with Deepseek?