InsightWorker Logo
  • contact@verticalserve.com
Docs / LLM providers / Provider overview

LLM Providers

InsightWorker is provider-agnostic. The agent's instructions and tool definitions are normalized into the Anthropic message shape, then translated to each provider's native format. You configure one (or auto-detect from whichever credentials are present).

Provider matrix

ProviderSelectorBest forNotes
AWS BedrockbedrockEnterprise — uses your AWS account, no separate keyRecommended default for regulated industries
Anthropic APIanthropicDirect access to latest ClaudeWhen you don't have AWS Bedrock
OpenAIopenaigpt-4o, o1 reasoning modelsTool calling works well; 128k context cap
Azure OpenAIazureEnterprise — prompts stay in your Azure tenantSame OpenAI shape, routed through your endpoint
Google Gemini (AI Studio)geminiLong context (up to 2M tokens), cheapest per-tokenStricter function-call schema
Google Vertex AIvertexEnterprise — same Gemini models on GCP, IAM-governed, region-pinnedADC or service-account auth
Custom (OpenAI-compatible)customvLLM, Ollama, LM Studio, in-house GPUWhatever endpoint speaks OpenAI's chat-completions API

Auto-detection rules

When LLM_PROVIDER isn't set, InsightWorker picks the first available:

  1. vertex — if GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION are set
  2. gemini — if GEMINI_API_KEY is set
  3. azure — if AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY are set
  4. custom — if CUSTOM_LLM_BASE_URL is set
  5. openai — if OPENAI_API_KEY is set
  6. anthropic — if ANTHROPIC_API_KEY is set
  7. bedrock — fallback (assumes IAM role / ~/.aws/credentials)

Set LLM_PROVIDER explicitly when you have multiple credentials configured and want deterministic selection.

Per-model context windows

The agent caps tool output to ~10% of the active model's context window so multi-turn loops don't blow past the limit.

Model familyContextTool output cap
gpt-4o, gpt-4-turbo, gpt-4.1128k~50k chars
o1, o3 (OpenAI reasoning)200k~80k chars
claude-sonnet-4-5/6, claude-opus-4-1, claude-haiku-4-5200k~80k chars
claude-sonnet-4-5 (1M variant)1M~400k chars
gemini-2.0-flash1M~400k chars
gemini-2.5-pro, gemini-1.5-pro2M~800k chars
gpt-4 (legacy)8k~3k chars
Unknown / custom100k~40k chars

Override with MAX_TOOL_OUTPUT_LENGTH if you want a hard cap regardless of model.

How the agent describes itself

When asked "what model are you using", the agent reports the configured backend, not the underlying model identity. Example:

User: which LLM are you using? Agent: I'm InsightWorker. Under the hood, I'm currently configured to use OpenAI with gpt-4o, but you can swap providers in your settings.

If the model leaks its pretrained identity ("I'm Claude"), update to the latest CLI — the system prompt explicitly overrides this.

Recommended models per provider

AWS Bedrock (default)

  • Sonnet 4.5: us.anthropic.claude-sonnet-4-5-20250929-v1:0 — general apps, tool use
  • Opus 4.1: us.anthropic.claude-opus-4-1-20250805-v1:0 — hard reasoning, complex translations
  • Haiku 4.5: us.anthropic.claude-haiku-4-5-20251001-v1:0 — bulk classification, cheap calls

Anthropic direct

  • claude-sonnet-4-5 — general
  • claude-sonnet-4-5 — complex reasoning, 1M context
  • claude-haiku-4-5 — cheap

OpenAI

  • gpt-4o — general, tool use
  • gpt-4o-mini — cheap
  • o1-preview / o1-mini — hard reasoning

Gemini

  • gemini-2.0-flash-001 — general
  • gemini-2.5-pro — long context, hard reasoning
  • gemini-2.0-flash-lite — cheap classification

Switching providers

You can switch providers any time without re-installing — just update ~/.insightworker/.env and restart the agent. Conversation history isn't portable across providers (different message shapes), so the next session starts fresh.

In the Desktop app: Settings → LLM Provider tab.

Privacy / data residency

ProviderData path
BedrockYour AWS account, your region. Anthropic doesn't see your prompts.
VertexYour GCP project, your region. Google's enterprise data terms apply.
AzureYour Azure tenant, your region. Microsoft's enterprise data terms apply.
Custom (self-hosted)Whatever endpoint you point at.
Anthropic / OpenAI / Gemini direct APIsVendor's cloud. Read their data-use terms.

For regulated workloads (insurance, healthcare, finance) the recommended providers are Bedrock, Vertex, Azure, or custom self-hosted.


Source: docs/providers/overview.md in the public repo. Open a PR with corrections.