InsightWorker Logo
  • contact@verticalserve.com
Docs / LLM providers / OpenAI (incl. Responses API)

OpenAI

InsightWorker supports the full OpenAI catalog — GPT-4 through GPT-5.5, the o-series reasoning models, plus the new GPT-5 pro / codex / deep-research variants that require the Responses API.

Configuration

~/.insightworker/.env:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-5
OPENAI_REASONING_EFFORT=medium     # optional: minimal | low | medium | high

What works (covered) and what doesn't

As of InsightWorker 2.1.11:

StatusModels
✅ chat.completionsgpt-3.5-turbo, gpt-4, gpt-4-turbo, gpt-4o, gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest, gpt-5.1-5.5 base + chat-latest, o1, o3, o3-mini, o4-mini
✅ Responses APIgpt-5-pro, gpt-5.2-pro, gpt-5.4-pro, gpt-5.5-pro, o1-pro, gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5.2-codex, gpt-5.3-codex
❌ Tool-gated (model side)o4-mini-deep-research — requires web_search_preview / mcp / file_search tools we don't provide

InsightWorker auto-detects which API path to use based on the model name. You don't need to configure it.

Recommended models per role

RoleModelNotes
Default agent drivergpt-5 or gpt-5-chat-latestClosest to Claude Sonnet / Opus class
Heavy reasoninggpt-5-pro or gpt-5.5-proSlower (5-15s/call), best reasoning
Cheap classificationgpt-5-mini, gpt-5-nano, gpt-4.1-nanoSub-second responses, very cheap
Code-heavy migrationsgpt-5.3-codex, gpt-5.2-codexCode-tuned, fast
Latestgpt-5.5-pro-2026-04-23Newest with date pinning for reproducibility

reasoning_effort

GPT-5 family + o-series models accept reasoning_effort to control how much "thinking" they do before producing output:

  • minimal — fastest, least reasoning (good for chat-style)
  • low — quick reasoning
  • medium — balanced (default if you don't set it explicitly)
  • high — slow, thorough (use for planning, complex translations)

Set globally via OPENAI_REASONING_EFFORT=high in .env. The agent applies this to every call when the active model supports it; legacy gpt-4 / gpt-3.5 calls ignore it.

For most agent apps, medium is the right default. high only makes sense if you're hitting the cap on the model's output quality and willing to pay 2-3x the latency.

Two API paths in detail

chat.completions path

The classic OpenAI API. Used for: gpt-3.5/4/4o/4.1, gpt-5 base + mini + nano + chat-latest, o1, o3, o4-mini.

Behavior:

  • Uses max_tokens for legacy models; switches to max_completion_tokens for gpt-5+ / o-series
  • tools field for tool definitions, tool_choice: "auto" for tool calling
  • Standard messages array with role/content pairs

Responses API path (new in 2.0.18)

The new endpoint required for: gpt-5-pro, gpt-5.x-pro, o1-pro, all *-codex variants.

Behavior:

  • Uses max_output_tokens
  • instructions field for the system prompt (not a system role message)
  • input array mixes message items with top-level function_call / function_call_output items
  • tools are flat (no nested function: {} wrapper)
  • reasoning: { effort: "..." } instead of top-level reasoning_effort

You don't see this difference — the agent translates between Anthropic-shape internally and the right OpenAI shape per model.

Latency observed (real test on actual API)

gpt-5.4                    ~600ms  (chat-tuned, very fast)
gpt-5-mini                 ~850ms
gpt-5                     ~1200ms  (Opus 4.6-class latency)
gpt-5.5-pro               ~5700ms  (reasoning model, default effort)
gpt-5-pro                ~14700ms  (reasoning model, max effort by default)
o1-pro                    ~8900ms

For interactive use → gpt-5 or gpt-5.4. For batch / scheduled apps → any of them; pick by cost.

Context windows

Model familyContext
gpt-4o, gpt-4-turbo, gpt-4.1128k
o1, o3200k
gpt-5 family128k–400k depending on variant
Older gpt-48k
Older gpt-3.516k

InsightWorker auto-detects via src/providers/limits.ts:modelContextTokens() and caps tool output at ~10% of the window.

Where data goes

OpenAI's Standard API: prompts go to OpenAI's cloud, retained per their data policy (currently 30 days for abuse monitoring, no training opt-in by default for API customers).

For regulated environments where prompts can't leave your tenant, use Azure OpenAI instead — same OpenAI models, hosted in your Azure tenant.

Common gotchas

SymptomCauseFix
404: model does not existWrong model nameVerify with curl https://api.openai.com/v1/models
400: max_tokens not supported, use max_completion_tokensOld InsightWorker versionUpgrade to the latest CLI
404: only supported in v1/responsesOld InsightWorker versionUpgrade to the latest CLI for the Responses API path
Empty content returned by gpt-5 basemaxTokens too low — reasoning ate the budgetDefault maxTokens: 16384 in agent.ts is plenty; if you've reduced it, raise back
Slow gpt-5-pro callsThis is normalReasoning models burn tokens internally; 5-15s per call is expected. Switch to gpt-5 for interactive

See also


Source: docs/providers/openai.md in the public repo. Open a PR with corrections.