OpenAI Assistants API vs. OpenAI Responses API (Updated for Jan 2026)
OpenAI Assistants API vs. OpenAI Responses API (April 2025) OpenAI offers two distinct APIs for building AI-driven assistants and agents: the Assistants API and the Responses API.
Last updated: January 28, 2026
OpenAI still exposes two “assistant/agent-building” API surfaces, but they are no longer peers:
- Assistants API (Assistants / Threads / Runs) is deprecated and scheduled to be removed on 2026-08-26.
- Responses API is the recommended foundation for new agents and assistants.
- For persistent multi-turn state, OpenAI now recommends pairing Responses with the Conversations API (instead of Threads).
This document updates a July 2025 report with major platform changes through Jan 2026: Assistants deprecation, the Conversations state primitive, Background mode for async jobs, Structured Outputs (JSON Schema guarantees), Compaction, and updated tooling + pricing.
Executive Summary
Use Responses API when…
- You’re building anything new.
- You want tool-using agents (web search, file search, code interpreter, MCP, etc.).
- You want async without building your own job system (Background mode).
- You want durable, thread-like persistence (Conversations API).
Only stay on Assistants API if…
- You have a legacy integration that you’re actively migrating before 2026-08-26.
1) Status and Roadmap
Assistants API
- Deprecated.
- Shutdown scheduled for August 26, 2026.
Responses API
- Released March 2025.
- Receives new features and ecosystem investment (tools, deep research, MCP, computer use, etc.).
2) Core Architecture and Primitives
Assistants API (legacy mental model)
- Assistant: persistent configuration (instructions, tools, model selection).
- Thread: persistent conversation container.
- Run: execution step (often async).
This architecture is framework-like: multiple endpoints and objects.
Responses API (modern mental model)
The new stack is split into execution + state:
- Responses: the execution primitive (you send input items and get output items).
- Conversations: the state primitive (durable “thread-like” object that stores items: messages, tool calls, tool outputs, etc.).
- Prompts: a dashboard-managed, versioned way to capture “assistant-like” reusable configurations (migration path from Assistant objects).
3) State and Memory
Responses supports three practical approaches, from “bring your own” to “fully managed.”
A) Stateless: you provide context each turn
Classic pattern: you pass message history in input every time. Easy to reason about, but payload grows.
B) Chain state with previous_response_id
You can continue a conversation by referencing the prior response id.
Important behaviors:
- To store response objects for retrieval and chaining, use
store: true. - Even with
previous_response_id, prior inputs in the chain still contribute to billed input tokens (i.e., “server-managed history” is still part of your effective prompt cost).
C) Durable objects via Conversations API (recommended for “threads”)
The Conversations API persists a durable conversation id you can reuse across sessions/devices/jobs. It stores items (messages, tool calls, tool outputs, other items). You then attach new responses to that conversation.
4) Context Window Management: Compaction (New)
For long-running conversations, OpenAI documents a Compaction workflow:
- Call
/responses/compactwith the full window (must still fit the model’s context limit). - It returns a compacted window you pass into your next
/responsescall.
Compaction properties (as documented):
- Stateless: you send full window, receive compacted window.
- All prior user messages are kept verbatim.
- Prior assistant messages, tool calls/results, and encrypted reasoning are replaced by a single encrypted compaction item that preserves latent understanding while remaining opaque (and described as ZDR-compatible).
5) Tools and Extensions
Assistants API
Historically supported tools (configured per assistant): code interpreter, retrieval/file search, function calling, etc.
In 2026 the practical point is that new tool investment is centered on Responses + Conversations.
Responses API
Tools are configured per request using tools, and you can influence behavior with tool_choice.
Examples of documented built-in tool types include:
web_searchfile_searchcode_interpreter- remote MCP tools (Model Context Protocol servers)
- plus other specialized tools referenced in the agent ecosystem (e.g., computer use)
Tool invocation is integrated into the Responses event/item model: output can include messages, tool calls, tool outputs, etc.
6) Structured Outputs (Major Update vs July 2025)
Structured Outputs is now a first-class feature:
- Ensures text responses adhere to your supplied JSON Schema.
- Prevents common issues like missing required keys or invalid enums.
- Works with modern models, and OpenAI provides SDK-side helpers (e.g., schema parsing via Pydantic/Zod-style patterns).
This substantially changes the “structured output” landscape compared to mid-2025.
7) Async and Long-Running Workloads: Background Mode (New)
Assistants API’s Runs made async natural. Responses now offers a direct replacement:
Background mode
Set background: true to run long tasks asynchronously and poll by response id.
Operational notes (documented):
- Background mode stores response data for ~10 minutes to allow polling.
- It is not compatible with Zero Data Retention (ZDR) guarantees (per docs).
This is now the preferred “async run” pattern in the Responses world.
8) Performance and Latency
Responses API
- Designed for interactive use: single endpoint for text + tools + streaming/event updates.
- Fewer client round-trips for tool-heavy interactions.
- Better support for modern agent workflows (tools + state + orchestration).
Assistants API
- More moving parts (Assistant/Thread/Run objects, polling patterns).
- Still workable, but no longer the forward path and has a shutdown date.
9) Pricing and Cost Predictability (Updated)
Pricing now clearly emphasizes “agent cost drivers” beyond tokens:
Built-in tool costs (high-level)
- Code Interpreter
- $0.03 per session (OpenAI pricing page)
- Platform pricing also lists container tiers: 1GB default ($0.03), 4GB ($0.12), 16GB ($0.48), 64GB ($1.92)
- File Search
- Storage: $0.10 / GB / day (first GB free)
- Tool calls: $2.50 / 1k tool calls (Responses API only)
- Web Search
- Tool calls are charged per 1k calls, and search content tokens are billed at model token rates (see pricing pages for the current breakdown and preview variants)
Cost control knobs (practical)
- Enable only the tools you actually want the model to use.
- Use
tool_choiceto constrain tool usage when necessary. - Consider compaction when conversations grow large.
- Remember chained state (
previous_response_id) still bills prior input tokens.
10) Updated Comparison Table (2026)
| Category | Responses API (Recommended) | Assistants API (Deprecated) |
|---|---|---|
| Platform status | Active, expanding | Deprecated; removal on 2026-08-26 |
| Primary primitives | Responses (execution) + Conversations (state) | Assistants + Threads + Runs |
| Persistent “assistant config” | Prompts (dashboard-managed) or app-managed templates | Assistant objects |
| Persistent conversation | Conversations API (durable id) | Threads |
| Lightweight state | previous_response_id chaining |
Thread ids |
| Async jobs | Background mode + polling | Runs |
| Long chat scaling | /responses/compact workflow |
No modern equivalent documented |
| Tools | tools per request: web_search, file_search, code_interpreter, MCP, etc. |
Legacy tool surface; migration encouraged |
| Structured output | Structured Outputs (JSON Schema) | Possible via prompting/functions; no longer the focus |
11) When to Use Which (2026 Guidance)
Use Responses API for:
- New builds, agent workflows, tool usage, RAG, web-informed answers.
- Long-running tasks (Background mode).
- Long-lived chats (Conversations API) with compaction strategy.
Use Assistants API only for:
- Temporary legacy operation while migrating before 2026-08-26.
12) Example Requests (Responses API)
A) Stateless, simple response
python
from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
model="gpt-4.1-mini",
input="Summarize stateful vs stateless APIs in 5 bullets."
)
print(resp.output_text)
B) Enable web search
python
from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
model="gpt-5",
tools=[{"type": "web_search"}],
input="What was a positive news story from today?"
)
print(resp.output_text)
C) File search (RAG via vector store)
python
from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
model="gpt-4.1",
input="What is deep research by OpenAI?",
tools=[{
"type": "file_search",
"vector_store_ids": [""]
}]
)
print(resp.output_text)
D) Code interpreter with container memory tier
bash
curl https://api.openai.com/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4.1",
"tools": [{
"type": "code_interpreter",
"container": { "type": "auto", "memory_limit": "4g" }
}],
"instructions": "Use the python tool for math.",
"input": "Compute the standard deviation of [1,2,3,4,5]."
}'
E) Durable conversation with Conversations API
python
from openai import OpenAI
client = OpenAI()
conversation = client.conversations.create()
r1 = client.responses.create(
model="gpt-4.1",
conversation=conversation.id,
input=[{"role": "user", "content": "Remember: my project is called Atlas."}]
)
r2 = client.responses.create(
model="gpt-4.1",
conversation=conversation.id,
input=[{"role": "user", "content": "What did I name the project?"}]
)
print(r2.output_text)
F) Background mode (async)
python
import time
from openai import OpenAI
client = OpenAI()
job = client.responses.create(
model="gpt-5.2",
input="Draft a 10-section technical design doc for a rate-limited web crawler.",
background=True,
store=True
)
while job.status in ("queued", "in_progress"):
time.sleep(2)
job = client.responses.retrieve(job.id)
print(job.status)
print(job.output_text)
Sources
- OpenAI API Deprecations (Assistants API sunset date + replacement guidance)
- Assistants migration guide (Assistants → Prompts/Conversations/Responses mapping)
- Conversation state guide (
previous_response_id, Conversations API, 30-day retention, compaction) - Background mode guide (async execution + polling behavior)
- Structured model outputs guide (Structured Outputs + JSON Schema)
- Using tools guide (web_search, file_search, function calling, MCP)
- Pricing:
Build your AI knowledge base today
Start creating intelligent AI assistants that understand your business, your documentation, and your customers.
Get started for free