Z-ai Models
Explore the Z-ai language and embedding models available through our OpenAI Assistants API-compatible service.
Z.AI: GLM 4.6
- Context Length:
- 202,752 tokens
- Architecture:
- text->text
- Max Output:
- 202,752 tokens
Pricing:
Compared with GLM-4.5, this generation brings several key improvements:
Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.
Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.
More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.
Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.
Z.AI: GLM 4.5V
- Context Length:
- 65,536 tokens
- Architecture:
- text+image->text
- Max Output:
- 16,384 tokens
Pricing:
GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the reasoning enabled boolean. Learn more in our docs
Z.AI: GLM 4.5
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 131,072 tokens
Pricing:
GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs
Z.AI: GLM 4.5 Air (free)
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 131,072 tokens
Pricing:
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs
Z.AI: GLM 4.5 Air
- Context Length:
- 131,072 tokens
- Architecture:
- text->text
- Max Output:
- 98,304 tokens
Pricing:
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs
Z.AI: GLM 4 32B
- Context Length:
- 128,000 tokens
- Architecture:
- text->text
Pricing:
GLM 4 32B is a cost-effective foundation language model.
It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks.
It is made by the same lab behind the thudm models.
Ready to build with Z-ai?
Start using these powerful models in your applications with our flexible pricing plans.