Microsoft Models

Explore the Microsoft language and embedding models available through our OpenAI Assistants API-compatible service.

Microsoft: Phi 4 Reasoning Plus

Context Length:: 32,768 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000007

Completion: $0.00000035

Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer.

While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed.

Microsoft: MAI DS R1 (free)

Context Length:: 163,840 tokens
Architecture:: text->text

Pricing:

MAI-DS-R1 is a post-trained variant of DeepSeek-R1 developed by the Microsoft AI team to improve the model’s responsiveness on previously blocked topics while enhancing its safety profile. Built on top of DeepSeek-R1’s reasoning foundation, it integrates 110k examples from the Tulu-3 SFT dataset and 350k internally curated multilingual safety-alignment samples. The model retains strong reasoning, coding, and problem-solving capabilities, while unblocking a wide range of prompts previously restricted in R1.

MAI-DS-R1 demonstrates improved performance on harm mitigation benchmarks and maintains competitive results across general reasoning tasks. It surpasses R1-1776 in satisfaction metrics for blocked queries and reduces leakage in harmful content categories. The model is based on a transformer MoE architecture and is suitable for general-purpose use cases, excluding high-stakes domains such as legal, medical, or autonomous systems.

Microsoft: MAI DS R1

Context Length:: 163,840 tokens
Architecture:: text->text
Max Output:: 163,840 tokens

Pricing:

Prompt: $0.0000003

Completion: $0.0000012

Microsoft: Phi 4 Multimodal Instruct

Context Length:: 131,072 tokens
Architecture:: text+image->text

Pricing:

Prompt: $0.00000005

Completion: $0.0000001

Image: $0.00017685

Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the Phi-4 Multimodal blog post.

Microsoft: Phi 4

Context Length:: 16,384 tokens
Architecture:: text->text

Pricing:

Prompt: $0.00000006

Completion: $0.00000014

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed.

At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.

For more information, please see Phi-4 Technical Report

Microsoft: Phi-3.5 Mini 128K Instruct

Context Length:: 128,000 tokens
Architecture:: text->text

Pricing:

Prompt: $0.0000001

Completion: $0.0000001

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as Phi-3 Mini.

The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.

Microsoft: Phi-3 Mini 128K Instruct

Context Length:: 128,000 tokens
Architecture:: text->text

Pricing:

Prompt: $0.0000001

Completion: $0.0000001

Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.

Microsoft: Phi-3 Medium 128K Instruct

Context Length:: 128,000 tokens
Architecture:: text->text

Pricing:

Prompt: $0.000001

Completion: $0.000001

Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.

For 4k context length, try Phi-3 Medium 4K.

WizardLM-2 8x22B

Context Length:: 65,536 tokens
Architecture:: text->text
Max Output:: 16,384 tokens

Pricing:

Prompt: $0.00000048

Completion: $0.00000048

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

It is an instruct finetune of Mixtral 8x22B.

To read more about the model release, click here.

#moe

Ready to build with Microsoft?

Start using these powerful models in your applications with our flexible pricing plans.

View Pricing