OpenAI Assistants API vs. OpenAI Responses API (Updated for Jan 2026)

OpenAI Assistants API vs. OpenAI Responses API (April 2025) OpenAI offers two distinct APIs for building AI-driven assistants and agents: the Assistants API and the Responses API.

Share:

Last updated: January 28, 2026

OpenAI still exposes two “assistant/agent-building” API surfaces, but they are no longer peers:

  • Assistants API (Assistants / Threads / Runs) is deprecated and scheduled to be removed on 2026-08-26.
  • Responses API is the recommended foundation for new agents and assistants.
  • For persistent multi-turn state, OpenAI now recommends pairing Responses with the Conversations API (instead of Threads).

This document updates a July 2025 report with major platform changes through Jan 2026: Assistants deprecation, the Conversations state primitive, Background mode for async jobs, Structured Outputs (JSON Schema guarantees), Compaction, and updated tooling + pricing.


Executive Summary

Use Responses API when…

  • You’re building anything new.
  • You want tool-using agents (web search, file search, code interpreter, MCP, etc.).
  • You want async without building your own job system (Background mode).
  • You want durable, thread-like persistence (Conversations API).

Only stay on Assistants API if…

  • You have a legacy integration that you’re actively migrating before 2026-08-26.

1) Status and Roadmap

Assistants API

  • Deprecated.
  • Shutdown scheduled for August 26, 2026.

Responses API

  • Released March 2025.
  • Receives new features and ecosystem investment (tools, deep research, MCP, computer use, etc.).

2) Core Architecture and Primitives

Assistants API (legacy mental model)

  • Assistant: persistent configuration (instructions, tools, model selection).
  • Thread: persistent conversation container.
  • Run: execution step (often async).

This architecture is framework-like: multiple endpoints and objects.

Responses API (modern mental model)

The new stack is split into execution + state:

  • Responses: the execution primitive (you send input items and get output items).
  • Conversations: the state primitive (durable “thread-like” object that stores items: messages, tool calls, tool outputs, etc.).
  • Prompts: a dashboard-managed, versioned way to capture “assistant-like” reusable configurations (migration path from Assistant objects).

3) State and Memory

Responses supports three practical approaches, from “bring your own” to “fully managed.”

A) Stateless: you provide context each turn

Classic pattern: you pass message history in input every time. Easy to reason about, but payload grows.

B) Chain state with previous_response_id

You can continue a conversation by referencing the prior response id.

Important behaviors:

  • To store response objects for retrieval and chaining, use store: true.
  • Even with previous_response_id, prior inputs in the chain still contribute to billed input tokens (i.e., “server-managed history” is still part of your effective prompt cost).

C) Durable objects via Conversations API (recommended for “threads”)

The Conversations API persists a durable conversation id you can reuse across sessions/devices/jobs. It stores items (messages, tool calls, tool outputs, other items). You then attach new responses to that conversation.


4) Context Window Management: Compaction (New)

For long-running conversations, OpenAI documents a Compaction workflow:

  • Call /responses/compact with the full window (must still fit the model’s context limit).
  • It returns a compacted window you pass into your next /responses call.

Compaction properties (as documented):

  • Stateless: you send full window, receive compacted window.
  • All prior user messages are kept verbatim.
  • Prior assistant messages, tool calls/results, and encrypted reasoning are replaced by a single encrypted compaction item that preserves latent understanding while remaining opaque (and described as ZDR-compatible).

5) Tools and Extensions

Assistants API

Historically supported tools (configured per assistant): code interpreter, retrieval/file search, function calling, etc.
In 2026 the practical point is that new tool investment is centered on Responses + Conversations.

Responses API

Tools are configured per request using tools, and you can influence behavior with tool_choice.

Examples of documented built-in tool types include:

  • web_search
  • file_search
  • code_interpreter
  • remote MCP tools (Model Context Protocol servers)
  • plus other specialized tools referenced in the agent ecosystem (e.g., computer use)

Tool invocation is integrated into the Responses event/item model: output can include messages, tool calls, tool outputs, etc.


6) Structured Outputs (Major Update vs July 2025)

Structured Outputs is now a first-class feature:

  • Ensures text responses adhere to your supplied JSON Schema.
  • Prevents common issues like missing required keys or invalid enums.
  • Works with modern models, and OpenAI provides SDK-side helpers (e.g., schema parsing via Pydantic/Zod-style patterns).

This substantially changes the “structured output” landscape compared to mid-2025.


7) Async and Long-Running Workloads: Background Mode (New)

Assistants API’s Runs made async natural. Responses now offers a direct replacement:

Background mode

Set background: true to run long tasks asynchronously and poll by response id.

Operational notes (documented):

  • Background mode stores response data for ~10 minutes to allow polling.
  • It is not compatible with Zero Data Retention (ZDR) guarantees (per docs).

This is now the preferred “async run” pattern in the Responses world.


8) Performance and Latency

Responses API

  • Designed for interactive use: single endpoint for text + tools + streaming/event updates.
  • Fewer client round-trips for tool-heavy interactions.
  • Better support for modern agent workflows (tools + state + orchestration).

Assistants API

  • More moving parts (Assistant/Thread/Run objects, polling patterns).
  • Still workable, but no longer the forward path and has a shutdown date.

9) Pricing and Cost Predictability (Updated)

Pricing now clearly emphasizes “agent cost drivers” beyond tokens:

Built-in tool costs (high-level)

  • Code Interpreter
    • $0.03 per session (OpenAI pricing page)
    • Platform pricing also lists container tiers: 1GB default ($0.03), 4GB ($0.12), 16GB ($0.48), 64GB ($1.92)
  • File Search
    • Storage: $0.10 / GB / day (first GB free)
    • Tool calls: $2.50 / 1k tool calls (Responses API only)
  • Web Search
    • Tool calls are charged per 1k calls, and search content tokens are billed at model token rates (see pricing pages for the current breakdown and preview variants)

Cost control knobs (practical)

  • Enable only the tools you actually want the model to use.
  • Use tool_choice to constrain tool usage when necessary.
  • Consider compaction when conversations grow large.
  • Remember chained state (previous_response_id) still bills prior input tokens.

10) Updated Comparison Table (2026)

Category Responses API (Recommended) Assistants API (Deprecated)
Platform status Active, expanding Deprecated; removal on 2026-08-26
Primary primitives Responses (execution) + Conversations (state) Assistants + Threads + Runs
Persistent “assistant config” Prompts (dashboard-managed) or app-managed templates Assistant objects
Persistent conversation Conversations API (durable id) Threads
Lightweight state previous_response_id chaining Thread ids
Async jobs Background mode + polling Runs
Long chat scaling /responses/compact workflow No modern equivalent documented
Tools tools per request: web_search, file_search, code_interpreter, MCP, etc. Legacy tool surface; migration encouraged
Structured output Structured Outputs (JSON Schema) Possible via prompting/functions; no longer the focus

11) When to Use Which (2026 Guidance)

Use Responses API for:

  • New builds, agent workflows, tool usage, RAG, web-informed answers.
  • Long-running tasks (Background mode).
  • Long-lived chats (Conversations API) with compaction strategy.

Use Assistants API only for:

  • Temporary legacy operation while migrating before 2026-08-26.

12) Example Requests (Responses API)

A) Stateless, simple response

python
from openai import OpenAI
client = OpenAI()

resp = client.responses.create(
    model="gpt-4.1-mini",
    input="Summarize stateful vs stateless APIs in 5 bullets."
)
print(resp.output_text)

B) Enable web search

python
from openai import OpenAI
client = OpenAI()

resp = client.responses.create(
    model="gpt-5",
    tools=[{"type": "web_search"}],
    input="What was a positive news story from today?"
)
print(resp.output_text)

C) File search (RAG via vector store)

python
from openai import OpenAI
client = OpenAI()

resp = client.responses.create(
    model="gpt-4.1",
    input="What is deep research by OpenAI?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [""]
    }]
)
print(resp.output_text)

D) Code interpreter with container memory tier

bash
curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "tools": [{
      "type": "code_interpreter",
      "container": { "type": "auto", "memory_limit": "4g" }
    }],
    "instructions": "Use the python tool for math.",
    "input": "Compute the standard deviation of [1,2,3,4,5]."
  }'

E) Durable conversation with Conversations API

python
from openai import OpenAI
client = OpenAI()

conversation = client.conversations.create()

r1 = client.responses.create(
    model="gpt-4.1",
    conversation=conversation.id,
    input=[{"role": "user", "content": "Remember: my project is called Atlas."}]
)

r2 = client.responses.create(
    model="gpt-4.1",
    conversation=conversation.id,
    input=[{"role": "user", "content": "What did I name the project?"}]
)

print(r2.output_text)

F) Background mode (async)

python
import time
from openai import OpenAI
client = OpenAI()

job = client.responses.create(
    model="gpt-5.2",
    input="Draft a 10-section technical design doc for a rate-limited web crawler.",
    background=True,
    store=True
)

while job.status in ("queued", "in_progress"):
    time.sleep(2)
    job = client.responses.retrieve(job.id)

print(job.status)
print(job.output_text)

Sources

Build your AI knowledge base today

Start creating intelligent AI assistants that understand your business, your documentation, and your customers.

Get started for free