Z-ai Models

Explore the Z-ai language and embedding models available through our OpenAI Assistants API-compatible service.

Z-ai logo

Z.AI: GLM 4.6

Context Length:
202,752 tokens
Architecture:
text->text
Max Output:
202,752 tokens

Pricing:

Prompt: $0.0000005
Completion: $0.00000175

Compared with GLM-4.5, this generation brings several key improvements:

Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.
Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.
Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.
More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.
Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

Z.AI: GLM 4.5V

Context Length:
65,536 tokens
Architecture:
text+image->text
Max Output:
16,384 tokens

Pricing:

Prompt: $0.0000006
Completion: $0.0000018
Input cache read: $0.00000011

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the reasoning enabled boolean. Learn more in our docs

Z.AI: GLM 4.5

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
131,072 tokens

Pricing:

Prompt: $0.00000035
Completion: $0.0000015

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

Z.AI: GLM 4.5 Air (free)

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
131,072 tokens

Pricing:

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

Z.AI: GLM 4.5 Air

Context Length:
131,072 tokens
Architecture:
text->text
Max Output:
98,304 tokens

Pricing:

Prompt: $0.00000013
Completion: $0.00000085

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

Z.AI: GLM 4 32B

Context Length:
128,000 tokens
Architecture:
text->text

Pricing:

Prompt: $0.0000001
Completion: $0.0000001

GLM 4 32B is a cost-effective foundation language model.

It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks.

It is made by the same lab behind the thudm models.

Ready to build with Z-ai?

Start using these powerful models in your applications with our flexible pricing plans.