Models by meta-llama for RAG Use Cases

Meta: Llama 3.3 8B Instruct (free)

Context Length:: 128,000 tokens
Architecture:: text->text
Max Output:: 4,028 tokens

Pricing:

A lightweight and ultra-fast variant of Llama 3.3 70B, for use when quick response times are needed most.

Learn More

Meta: Llama Guard 4 12B

Context Length:: 163,840 tokens
Architecture:: text+image->text

Pricing:

Prompt: $0.00000018

Completion: $0.00000018

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images.

Learn More

Meta: Llama 4 Maverick (free)

Context Length:: 128,000 tokens
Architecture:: text+image->text
Max Output:: 4,028 tokens

Pricing:

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction.

Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Learn More

Meta: Llama 4 Maverick

Context Length:: 1,048,576 tokens
Architecture:: text+image->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000015

Completion: $0.0000006

Image: $0.0006684

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction.

Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

Learn More

Meta: Llama 4 Scout (free)

Context Length:: 128,000 tokens
Architecture:: text+image->text
Max Output:: 4,028 tokens

Pricing:

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.

Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Learn More

Meta: Llama 4 Scout

Context Length:: 327,680 tokens
Architecture:: text+image->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000008

Completion: $0.0000003

Image: $0.0003342

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.

Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Learn More

Llama Guard 3 8B

Context Length:: 131,072 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000002

Completion: $0.00000006

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.

Learn More

Meta: Llama 3.3 70B Instruct (free)

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 2,048 tokens

Pricing:

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

Learn More

Meta: Llama 3.3 70B Instruct

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000013

Completion: $0.00000038

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

Learn More

Meta: Llama 3.2 3B Instruct (free)

Context Length:: 131,072 tokens
Architecture:: text->text

Pricing:

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.

Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.2 3B Instruct

Context Length:: 16,384 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000002

Completion: $0.00000002

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.

Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.2 1B Instruct

Context Length:: 131,072 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.000000005

Completion: $0.00000001

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.

Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.2 90B Vision Instruct

Context Length:: 32,768 tokens
Architecture:: text+image->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000035

Completion: $0.0000004

Image: $0.0005058

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.

This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.2 11B Vision Instruct

Context Length:: 131,072 tokens
Architecture:: text+image->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.000000049

Completion: $0.000000049

Image: $0.00007948

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.

Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

Click here for the original model card.

Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.1 405B (base)

Context Length:: 32,768 tokens
Architecture:: text->text
Max Output:: 32,768 tokens

Pricing:

Prompt: $0.000004

Completion: $0.000004

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.1 8B Instruct

Context Length:: 16,384 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000002

Completion: $0.00000003

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.1 405B Instruct

Context Length:: 32,768 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.0000008

Completion: $0.0000008

The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3.1 70B Instruct

Context Length:: 131,072 tokens
Architecture:: text->text

Pricing:

Prompt: $0.0000004

Completion: $0.0000004

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: LlamaGuard 2 8B

Context Length:: 8,192 tokens
Architecture:: text->text

Pricing:

Prompt: $0.0000002

Completion: $0.0000002

This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, LlamaGuard 1, it can do both prompt and response classification.

LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated.

For best results, please use raw prompt input or the /completions endpoint, instead of the chat API.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3 8B Instruct

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000003

Completion: $0.00000006

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta: Llama 3 70B Instruct

Context Length:: 8,192 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000004

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Learn More

Meta Models

Meta: Llama 3.3 8B Instruct (free)

Meta: Llama Guard 4 12B

Meta: Llama 4 Maverick (free)

Meta: Llama 4 Maverick

Meta: Llama 4 Scout (free)

Meta: Llama 4 Scout

Llama Guard 3 8B

Meta: Llama 3.3 70B Instruct (free)

Meta: Llama 3.3 70B Instruct

Meta: Llama 3.2 3B Instruct (free)

Meta: Llama 3.2 3B Instruct

Meta: Llama 3.2 1B Instruct

Meta: Llama 3.2 90B Vision Instruct

Meta: Llama 3.2 11B Vision Instruct

Meta: Llama 3.1 405B (base)

Meta: Llama 3.1 8B Instruct

Meta: Llama 3.1 405B Instruct

Meta: Llama 3.1 70B Instruct

Meta: LlamaGuard 2 8B

Meta: Llama 3 8B Instruct

Meta: Llama 3 70B Instruct

Ready to build with Meta?