Baidu Models

Explore the Baidu language and embedding models available through our OpenAI Assistants API-compatible service.

Baidu: ERNIE 4.5 21B A3B Thinking

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 65,536 tokens

Pricing:

Prompt: $0.00000007

Completion: $0.00000028

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks.

Baidu: ERNIE 4.5 21B A3B

Context Length:: 120,000 tokens
Architecture:: text->text
Max Output:: 8,000 tokens

Pricing:

Prompt: $0.00000007

Completion: $0.00000028

A sophisticated text-based Mixture-of-Experts (MoE) model featuring 21B total parameters with 3B activated per token, delivering exceptional multimodal understanding and generation through heterogeneous MoE structures and modality-isolated routing. Supporting an extensive 131K token context length, the model achieves efficient inference via multi-expert parallel collaboration and quantization, while advanced post-training techniques including SFT, DPO, and UPO ensure optimized performance across diverse applications with specialized routing and balancing losses for superior task handling.

Baidu: ERNIE 4.5 VL 28B A3B

Context Length:: 30,000 tokens
Architecture:: text+image->text
Max Output:: 8,000 tokens

Pricing:

Prompt: $0.00000014

Completion: $0.00000056

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

Baidu: ERNIE 4.5 VL 424B A47B

Context Length:: 123,000 tokens
Architecture:: text+image->text
Max Output:: 16,000 tokens

Pricing:

Prompt: $0.00000042

Completion: $0.00000125

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization.

Baidu: ERNIE 4.5 300B A47B

Context Length:: 123,000 tokens
Architecture:: text->text
Max Output:: 12,000 tokens

Pricing:

Prompt: $0.00000028

Completion: $0.0000011

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands.

Ready to build with Baidu?

Start using these powerful models in your applications with our flexible pricing plans.

View Pricing