OpenAI

InsightWorker supports the full OpenAI catalog — GPT-4 through GPT-5.5, the o-series reasoning models, plus the new GPT-5 pro / codex / deep-research variants that require the Responses API.

Configuration

~/.insightworker/.env:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-5
OPENAI_REASONING_EFFORT=medium     # optional: minimal | low | medium | high

What works (covered) and what doesn't

As of InsightWorker 2.1.11:

Status	Models
✅ chat.completions	gpt-3.5-turbo, gpt-4, gpt-4-turbo, gpt-4o, gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest, gpt-5.1-5.5 base + chat-latest, o1, o3, o3-mini, o4-mini
✅ Responses API	gpt-5-pro, gpt-5.2-pro, gpt-5.4-pro, gpt-5.5-pro, o1-pro, gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex
❌ Tool-gated (model side)	o4-mini-deep-research — requires `web_search_preview` / `mcp` / `file_search` tools we don't provide

InsightWorker auto-detects which API path to use based on the model name. You don't need to configure it.

Recommended models per role

Role	Model	Notes
Default agent driver	`gpt-5` or `gpt-5-chat-latest`	Closest to Claude Sonnet / Opus class
Heavy reasoning	`gpt-5-pro` or `gpt-5.5-pro`	Slower (5-15s/call), best reasoning
Cheap classification	`gpt-5-mini`, `gpt-5-nano`, `gpt-4.1-nano`	Sub-second responses, very cheap
Code-heavy migrations	`gpt-5.3-codex`, `gpt-5.2-codex`	Code-tuned, fast
Latest	`gpt-5.5-pro-2026-04-23`	Newest with date pinning for reproducibility

reasoning_effort

GPT-5 family + o-series models accept reasoning_effort to control how much "thinking" they do before producing output:

minimal — fastest, least reasoning (good for chat-style)
low — quick reasoning
medium — balanced (default if you don't set it explicitly)
high — slow, thorough (use for planning, complex translations)

Set globally via OPENAI_REASONING_EFFORT=high in .env. The agent applies this to every call when the active model supports it; legacy gpt-4 / gpt-3.5 calls ignore it.

For most agent apps, medium is the right default. high only makes sense if you're hitting the cap on the model's output quality and willing to pay 2-3x the latency.

Two API paths in detail

chat.completions path

The classic OpenAI API. Used for: gpt-3.5/4/4o/4.1, gpt-5 base + mini + nano + chat-latest, o1, o3, o4-mini.

Behavior:

Uses max_tokens for legacy models; switches to max_completion_tokens for gpt-5+ / o-series
tools field for tool definitions, tool_choice: "auto" for tool calling
Standard messages array with role/content pairs

Responses API path (new in 2.0.18)

The new endpoint required for: gpt-5-pro, gpt-5.x-pro, o1-pro, all *-codex variants.

Behavior:

Uses max_output_tokens
instructions field for the system prompt (not a system role message)
input array mixes message items with top-level function_call / function_call_output items
tools are flat (no nested function: {} wrapper)
reasoning: { effort: "..." } instead of top-level reasoning_effort

You don't see this difference — the agent translates between Anthropic-shape internally and the right OpenAI shape per model.

Latency observed (real test on actual API)

gpt-5.4                    ~600ms  (chat-tuned, very fast)
gpt-5-mini                 ~850ms
gpt-5                     ~1200ms  (Opus 4.6-class latency)
gpt-5.5-pro               ~5700ms  (reasoning model, default effort)
gpt-5-pro                ~14700ms  (reasoning model, max effort by default)
o1-pro                    ~8900ms

For interactive use → gpt-5 or gpt-5.4. For batch / scheduled apps → any of them; pick by cost.

Context windows

Model family	Context
gpt-4o, gpt-4-turbo, gpt-4.1	128k
o1, o3	200k
gpt-5 family	128k–400k depending on variant
Older gpt-4	8k
Older gpt-3.5	16k

InsightWorker auto-detects via src/providers/limits.ts:modelContextTokens() and caps tool output at ~10% of the window.

Where data goes

OpenAI's Standard API: prompts go to OpenAI's cloud, retained per their data policy (currently 30 days for abuse monitoring, no training opt-in by default for API customers).

For regulated environments where prompts can't leave your tenant, use Azure OpenAI instead — same OpenAI models, hosted in your Azure tenant.

Common gotchas

Symptom	Cause	Fix
`404: model does not exist`	Wrong model name	Verify with `curl https://api.openai.com/v1/models`
`400: max_tokens not supported, use max_completion_tokens`	Old InsightWorker version	Upgrade to the latest CLI
`404: only supported in v1/responses`	Old InsightWorker version	Upgrade to the latest CLI for the Responses API path
Empty content returned by gpt-5 base	`maxTokens` too low — reasoning ate the budget	Default `maxTokens: 16384` in agent.ts is plenty; if you've reduced it, raise back
Slow gpt-5-pro calls	This is normal	Reasoning models burn tokens internally; 5-15s per call is expected. Switch to `gpt-5` for interactive