LLM Providers

InsightWorker is provider-agnostic. The agent's instructions and tool definitions are normalized into the Anthropic message shape, then translated to each provider's native format. You configure one (or auto-detect from whichever credentials are present).

Provider matrix

Provider	Selector	Best for	Notes
AWS Bedrock	`bedrock`	Enterprise — uses your AWS account, no separate key	Recommended default for regulated industries
Anthropic API	`anthropic`	Direct access to latest Claude	When you don't have AWS Bedrock
OpenAI	`openai`	gpt-4o, o1 reasoning models	Tool calling works well; 128k context cap
Azure OpenAI	`azure`	Enterprise — prompts stay in your Azure tenant	Same OpenAI shape, routed through your endpoint
Google Gemini (AI Studio)	`gemini`	Long context (up to 2M tokens), cheapest per-token	Stricter function-call schema
Google Vertex AI	`vertex`	Enterprise — same Gemini models on GCP, IAM-governed, region-pinned	ADC or service-account auth
Custom (OpenAI-compatible)	`custom`	vLLM, Ollama, LM Studio, in-house GPU	Whatever endpoint speaks OpenAI's chat-completions API

Auto-detection rules

When LLM_PROVIDER isn't set, InsightWorker picks the first available:

vertex — if GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION are set
gemini — if GEMINI_API_KEY is set
azure — if AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY are set
custom — if CUSTOM_LLM_BASE_URL is set
openai — if OPENAI_API_KEY is set
anthropic — if ANTHROPIC_API_KEY is set
bedrock — fallback (assumes IAM role / ~/.aws/credentials)

Set LLM_PROVIDER explicitly when you have multiple credentials configured and want deterministic selection.

Per-model context windows

The agent caps tool output to ~10% of the active model's context window so multi-turn loops don't blow past the limit.

Model family	Context	Tool output cap
`gpt-4o`, `gpt-4-turbo`, `gpt-4.1`	128k	~50k chars
`o1`, `o3` (OpenAI reasoning)	200k	~80k chars
`claude-sonnet-4-5/6`, `claude-opus-4-1`, `claude-haiku-4-5`	200k	~80k chars
`claude-sonnet-4-5` (1M variant)	1M	~400k chars
`gemini-2.0-flash`	1M	~400k chars
`gemini-2.5-pro`, `gemini-1.5-pro`	2M	~800k chars
`gpt-4` (legacy)	8k	~3k chars
Unknown / custom	100k	~40k chars

Override with MAX_TOOL_OUTPUT_LENGTH if you want a hard cap regardless of model.

How the agent describes itself

When asked "what model are you using", the agent reports the configured backend, not the underlying model identity. Example:

User: which LLM are you using? Agent: I'm InsightWorker. Under the hood, I'm currently configured to use OpenAI with gpt-4o, but you can swap providers in your settings.

If the model leaks its pretrained identity ("I'm Claude"), update to the latest CLI — the system prompt explicitly overrides this.

Recommended models per provider

AWS Bedrock (default)

Sonnet 4.5: us.anthropic.claude-sonnet-4-5-20250929-v1:0 — general apps, tool use
Opus 4.1: us.anthropic.claude-opus-4-1-20250805-v1:0 — hard reasoning, complex translations
Haiku 4.5: us.anthropic.claude-haiku-4-5-20251001-v1:0 — bulk classification, cheap calls

Anthropic direct

claude-sonnet-4-5 — general
claude-sonnet-4-5 — complex reasoning, 1M context
claude-haiku-4-5 — cheap

OpenAI

gpt-4o — general, tool use
gpt-4o-mini — cheap
o1-preview / o1-mini — hard reasoning

Gemini

gemini-2.0-flash-001 — general
gemini-2.5-pro — long context, hard reasoning
gemini-2.0-flash-lite — cheap classification

Switching providers

You can switch providers any time without re-installing — just update ~/.insightworker/.env and restart the agent. Conversation history isn't portable across providers (different message shapes), so the next session starts fresh.

In the Desktop app: Settings → LLM Provider tab.

Privacy / data residency

Provider	Data path
Bedrock	Your AWS account, your region. Anthropic doesn't see your prompts.
Vertex	Your GCP project, your region. Google's enterprise data terms apply.
Azure	Your Azure tenant, your region. Microsoft's enterprise data terms apply.
Custom (self-hosted)	Whatever endpoint you point at.
Anthropic / OpenAI / Gemini direct APIs	Vendor's cloud. Read their data-use terms.

For regulated workloads (insurance, healthcare, finance) the recommended providers are Bedrock, Vertex, Azure, or custom self-hosted.

Source: docs/providers/overview.md in the public repo. Open a PR with corrections.