OpenAI
InsightWorker supports the full OpenAI catalog — GPT-4 through GPT-5.5, the o-series reasoning models, plus the new GPT-5 pro / codex / deep-research variants that require the Responses API.
Configuration
~/.insightworker/.env:
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-5
OPENAI_REASONING_EFFORT=medium # optional: minimal | low | medium | high
What works (covered) and what doesn't
As of InsightWorker 2.1.11:
| Status | Models |
|---|---|
| ✅ chat.completions | gpt-3.5-turbo, gpt-4, gpt-4-turbo, gpt-4o, gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest, gpt-5.1-5.5 base + chat-latest, o1, o3, o3-mini, o4-mini |
| ✅ Responses API | gpt-5-pro, gpt-5.2-pro, gpt-5.4-pro, gpt-5.5-pro, o1-pro, gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex |
| ❌ Tool-gated (model side) | o4-mini-deep-research — requires web_search_preview / mcp / file_search tools we don't provide |
InsightWorker auto-detects which API path to use based on the model name. You don't need to configure it.
Recommended models per role
| Role | Model | Notes |
|---|---|---|
| Default agent driver | gpt-5 or gpt-5-chat-latest | Closest to Claude Sonnet / Opus class |
| Heavy reasoning | gpt-5-pro or gpt-5.5-pro | Slower (5-15s/call), best reasoning |
| Cheap classification | gpt-5-mini, gpt-5-nano, gpt-4.1-nano | Sub-second responses, very cheap |
| Code-heavy migrations | gpt-5.3-codex, gpt-5.2-codex | Code-tuned, fast |
| Latest | gpt-5.5-pro-2026-04-23 | Newest with date pinning for reproducibility |
reasoning_effort
GPT-5 family + o-series models accept reasoning_effort to control how much "thinking" they do before producing output:
minimal— fastest, least reasoning (good for chat-style)low— quick reasoningmedium— balanced (default if you don't set it explicitly)high— slow, thorough (use for planning, complex translations)
Set globally via OPENAI_REASONING_EFFORT=high in .env. The agent applies this to every call when the active model supports it; legacy gpt-4 / gpt-3.5 calls ignore it.
For most agent apps, medium is the right default. high only makes sense if you're hitting the cap on the model's output quality and willing to pay 2-3x the latency.
Two API paths in detail
chat.completions path
The classic OpenAI API. Used for: gpt-3.5/4/4o/4.1, gpt-5 base + mini + nano + chat-latest, o1, o3, o4-mini.
Behavior:
- Uses
max_tokensfor legacy models; switches tomax_completion_tokensfor gpt-5+ / o-series toolsfield for tool definitions,tool_choice: "auto"for tool calling- Standard
messagesarray withrole/contentpairs
Responses API path (new in 2.0.18)
The new endpoint required for: gpt-5-pro, gpt-5.x-pro, o1-pro, all *-codex variants.
Behavior:
- Uses
max_output_tokens instructionsfield for the system prompt (not asystemrole message)inputarray mixes message items with top-levelfunction_call/function_call_outputitemstoolsare flat (no nestedfunction: {}wrapper)reasoning: { effort: "..." }instead of top-levelreasoning_effort
You don't see this difference — the agent translates between Anthropic-shape internally and the right OpenAI shape per model.
Latency observed (real test on actual API)
gpt-5.4 ~600ms (chat-tuned, very fast)
gpt-5-mini ~850ms
gpt-5 ~1200ms (Opus 4.6-class latency)
gpt-5.5-pro ~5700ms (reasoning model, default effort)
gpt-5-pro ~14700ms (reasoning model, max effort by default)
o1-pro ~8900ms
For interactive use → gpt-5 or gpt-5.4. For batch / scheduled apps → any of them; pick by cost.
Context windows
| Model family | Context |
|---|---|
| gpt-4o, gpt-4-turbo, gpt-4.1 | 128k |
| o1, o3 | 200k |
| gpt-5 family | 128k–400k depending on variant |
| Older gpt-4 | 8k |
| Older gpt-3.5 | 16k |
InsightWorker auto-detects via src/providers/limits.ts:modelContextTokens() and caps tool output at ~10% of the window.
Where data goes
OpenAI's Standard API: prompts go to OpenAI's cloud, retained per their data policy (currently 30 days for abuse monitoring, no training opt-in by default for API customers).
For regulated environments where prompts can't leave your tenant, use Azure OpenAI instead — same OpenAI models, hosted in your Azure tenant.
Common gotchas
| Symptom | Cause | Fix |
|---|---|---|
404: model does not exist | Wrong model name | Verify with curl https://api.openai.com/v1/models |
400: max_tokens not supported, use max_completion_tokens | Old InsightWorker version | Upgrade to the latest CLI |
404: only supported in v1/responses | Old InsightWorker version | Upgrade to the latest CLI for the Responses API path |
| Empty content returned by gpt-5 base | maxTokens too low — reasoning ate the budget | Default maxTokens: 16384 in agent.ts is plenty; if you've reduced it, raise back |
| Slow gpt-5-pro calls | This is normal | Reasoning models burn tokens internally; 5-15s per call is expected. Switch to gpt-5 for interactive |
See also
- overview.md — provider matrix
- azure-openai.md — same models, Azure-tenant hosted
- authentication/perplexity-api.md — for live web search alongside OpenAI
Source: docs/providers/openai.md in the public repo. Open a PR with corrections.
