InsightWorker Logo
  • contact@verticalserve.com
Docs / LLM providers / Google Vertex AI

Google Vertex AI

Gemini models hosted in your GCP project, with IAM-governed access, region-pinned data, and VPC-SC support. The recommended Google path for enterprise / regulated workloads.

For auth setup, see authentication/google-cloud.md.

Configuration

~/.insightworker/.env:

LLM_PROVIDER=vertex
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_CLOUD_LOCATION=us-central1
VERTEX_MODEL=gemini-2.0-flash-001

# Optional — falls back to ADC if unset:
# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Recommended models

Same Gemini family as AI Studio. Vertex doesn't add or remove models — it changes where they're hosted.

Use caseModel
General appsgemini-2.0-flash-001
Long context, hard reasoninggemini-2.5-pro (2M tokens)
Cheapgemini-2.0-flash-lite

Region availability

Some Gemini models are only available in a subset of regions. Common choices:

  • us-central1 — broadest availability
  • us-east5 — newer, some models
  • europe-west4 — Europe
  • asia-northeast1 — Tokyo

Check Vertex AI locations doc for which model is GA where. If you specify a model in a region where it's not available, you'll get a 404.

VPC Service Controls

If your GCP project is inside a VPC-SC perimeter, ensure aiplatform.googleapis.com is permitted and the InsightWorker service account is in the access policy. See GCP VPC-SC docs.

Where data goes

Vertex AI: prompts and responses stay in your GCP project, in the region you configured. Google's enterprise data terms apply (cloud.google.com/terms). No training on customer data.

This is the regulated-customer-acceptable Google path.

Common gotchas

SymptomCauseFix
Could not load default credentialsADC not rungcloud auth application-default login, or set GOOGLE_APPLICATION_CREDENTIALS to a service-account JSON
Permission deniedRole missing aiplatform.userGrant roles/aiplatform.user to the principal
Model not found in locationModel / region mismatchCheck the locations doc; switch to a region where the model is GA
Slow first callCold startSubsequent calls warm up. For latency-sensitive apps pre-warm

See also


Source: docs/providers/vertex-ai.md in the public repo. Open a PR with corrections.