Google Vertex AI

Gemini models hosted in your GCP project, with IAM-governed access, region-pinned data, and VPC-SC support. The recommended Google path for enterprise / regulated workloads.

For auth setup, see authentication/google-cloud.md.

Configuration

~/.insightworker/.env:

LLM_PROVIDER=vertex
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_CLOUD_LOCATION=us-central1
VERTEX_MODEL=gemini-2.0-flash-001

# Optional — falls back to ADC if unset:
# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Recommended models

Same Gemini family as AI Studio. Vertex doesn't add or remove models — it changes where they're hosted.

Use case	Model
General apps	`gemini-2.0-flash-001`
Long context, hard reasoning	`gemini-2.5-pro` (2M tokens)
Cheap	`gemini-2.0-flash-lite`

Region availability

Some Gemini models are only available in a subset of regions. Common choices:

us-central1 — broadest availability
us-east5 — newer, some models
europe-west4 — Europe
asia-northeast1 — Tokyo

Check Vertex AI locations doc for which model is GA where. If you specify a model in a region where it's not available, you'll get a 404.

VPC Service Controls

If your GCP project is inside a VPC-SC perimeter, ensure aiplatform.googleapis.com is permitted and the InsightWorker service account is in the access policy. See GCP VPC-SC docs.

Where data goes

Vertex AI: prompts and responses stay in your GCP project, in the region you configured. Google's enterprise data terms apply (cloud.google.com/terms). No training on customer data.

This is the regulated-customer-acceptable Google path.

Common gotchas

Symptom	Cause	Fix
`Could not load default credentials`	ADC not run	`gcloud auth application-default login`, or set `GOOGLE_APPLICATION_CREDENTIALS` to a service-account JSON
`Permission denied`	Role missing `aiplatform.user`	Grant `roles/aiplatform.user` to the principal
`Model not found in location`	Model / region mismatch	Check the locations doc; switch to a region where the model is GA
Slow first call	Cold start	Subsequent calls warm up. For latency-sensitive apps pre-warm