Ragwalla is now generally available! Get started today

Google Gemini Models

Explore the Google Gemini language and embedding models available through our OpenAI Assistants API-compatible service.

Google: Gemini 2.5 Flash Image (Nano Banana)

Context Length:: 32,768 tokens
Architecture:: text+image->text+image
Max Output:: 8,192 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000025

Image: $0.001238

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the image_config API Parameter

Google: Gemini 2.5 Flash Preview 09-2025

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,536 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000025

Image: $0.001238

Input cache read: $0.000000075

Input cache write: $0.0000003833

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling.

Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Google: Gemini 2.5 Flash Lite Preview 09-2025

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,536 tokens

Pricing:

Prompt: $0.0000001

Completion: $0.0000004

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

Google: Gemini 2.5 Flash Image Preview (Nano Banana)

Context Length:: 32,768 tokens
Architecture:: text+image->text+image
Max Output:: 8,192 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000025

Image: $0.001238

Gemini 2.5 Flash Image Preview, a.k.a. "Nano Banana," is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

Google: Gemini 2.5 Flash Lite

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,535 tokens

Pricing:

Prompt: $0.0000001

Completion: $0.0000004

Input cache read: $0.00000001

Input cache write: $0.0000001833

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

Google: Gemma 3n 2B (free)

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 2,048 tokens

Pricing:

Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based on the MatFormer architecture, it supports nested submodels and modular composition via the Mix-and-Match framework. Gemma 3n models are optimized for low-resource deployment, offering 32K context length and strong multilingual and reasoning performance across common benchmarks. This variant is trained on a diverse corpus including code, math, web, and multimodal data.

Google: Gemini 2.5 Flash Lite Preview 06-17

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,535 tokens

Pricing:

Prompt: $0.0000001

Completion: $0.0000004

Audio: $0.0000003

Input cache read: $0.000000025

Input cache write: $0.0000001833

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

Google: Gemini 2.5 Flash

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,535 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000025

Image: $0.001238

Input cache read: $0.00000003

Input cache write: $0.0000003833

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling.

Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Google: Gemini 2.5 Pro

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,536 tokens

Pricing:

Prompt: $0.00000125

Completion: $0.00001

Image: $0.00516

Input cache read: $0.000000125

Input cache write: $0.000001625

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Google: Gemini 2.5 Pro Preview 06-05

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,536 tokens

Pricing:

Prompt: $0.00000125

Completion: $0.00001

Image: $0.00516

Input cache read: $0.00000031

Input cache write: $0.000001625

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Google: Gemma 3n 4B (free)

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 2,048 tokens

Pricing:

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements.

This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. Read more in the blog post

Google: Gemma 3n 4B

Context Length:: 32,768 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000002

Completion: $0.00000004

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements.

This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions. Read more in the blog post

Google: Gemini 2.5 Pro Preview 05-06

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 65,535 tokens

Pricing:

Prompt: $0.00000125

Completion: $0.00001

Image: $0.00516

Input cache read: $0.00000031

Input cache write: $0.000001625

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Google: Gemma 3 4B (free)

Context Length:: 32,768 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

Google: Gemma 3 4B

Context Length:: 96,000 tokens
Architecture:: text+image->text

Pricing:

Prompt: $0.00000001703012

Completion: $0.0000000681536

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

Google: Gemma 3 12B (free)

Context Length:: 32,768 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after Gemma 3 27B

Google: Gemma 3 12B

Context Length:: 131,072 tokens
Architecture:: text+image->text
Max Output:: 131,072 tokens

Pricing:

Prompt: $0.00000003

Completion: $0.0000001

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 12B is the second largest in the family of Gemma 3 models after Gemma 3 27B

Google: Gemma 3 27B (free)

Context Length:: 96,000 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

Google: Gemma 3 27B

Context Length:: 131,072 tokens
Architecture:: text+image->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000009

Completion: $0.00000016

Image: $0.0000256

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

Google: Gemini 2.0 Flash Lite

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Prompt: $0.000000075

Completion: $0.0000003

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5, all at extremely economical token prices.

Google: Gemini 2.0 Flash

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Prompt: $0.0000001

Completion: $0.0000004

Image: $0.0000258

Audio: $0.0000007

Input cache read: $0.000000025

Input cache write: $0.0000001833

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Google: Gemini 2.0 Flash Experimental (free)

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 8,192 tokens

Pricing:

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Google: Gemma 2 27B

Context Length:: 8,192 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000065

Completion: $0.00000065

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models.

Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Google: Gemma 2 9B (free)

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 8,192 tokens

Pricing:

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.

Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Google: Gemma 2 9B

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 8,192 tokens

Pricing:

Prompt: $0.00000001

Completion: $0.00000003

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.

Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

Ready to build with Google Gemini?

Start using these powerful models in your applications with our flexible pricing plans.