InsightWorker Logo

Ask My Data — Ask Databricks insurance questions in plain English

← All posts
Data Warehouse Optimization May 27, 2026 by Hitesh Talesra 11 min read

Teams that live in insurance data usually have the same problem: business users can’t write SQL, and analysts don’t have time for constant ad-hoc query requests.

Ask My Data bridges that gap. It connects to Databricks, discovers the available tables/columns, translates your question into valid Databricks SQL, executes the query, and returns results with a clear natural-language answer. When your question is about files, it can also retrieve content from Databricks Volumes (PDFs/CSVs) and extract what matters.

The problem

  • SQL is the bottleneck: business users need analytics answers, but most don’t have SQL expertise, so data teams get pulled into simple queries.
  • Tribal knowledge: writing correct SQL requires knowing table names, schemas, and how to join claims ↔ policies ↔ customers in your domain model.
  • Ad-hoc work has no queue: the “quick question” becomes a time sink, especially during busy release cycles.
  • Risk of incorrect results: manual SQL attempts can produce wrong numbers, which is worse than slow work because it can lead to wrong decisions.

The workflow: from question to answer

Ask My Data runs a tight, repeatable sequence for every question:

1
Discover schema & volumes — reads available tables/columns in your configured catalog/schema and lists files in volumes. Databricks API · schema introspection
2
Generate SQL or a file retrieval plan — interprets the question, maps it to the right tables, and produces safe, read-only SQL (or a plan to fetch the right file). natural language interpretation · SQL generation
3
Execute / retrieve — runs the query against Databricks, or downloads the file via REST API and extracts its content. databricks-sql-connector · query execution
4
Format results — shows the generated SQL for transparency, presents data in a clean table view, and generates a short answer summary. result formatting · summary generation

Example questions (copy/paste)

"How many claims are filed for each policy type?"
"Show all policies where coverage limits exceed $5 million"
"Which broker has the most claims in the last 90 days?"
"List open claims with payout greater than $50,000"

What the agent returns

  • Generated SQL (or retrieval plan) so users can audit logic.
  • Query Results as a readable table (or extracted file content).
  • Answer Summary in plain English with key takeaways.
  • Row count & notable observations to avoid “looks right” ambiguity.

Transparency by design

Ask My Data doesn’t treat SQL as a black box. It displays the SQL it generated so users understand what was run (and analysts can quickly spot when a question mapped to the wrong table relationship).

SELECT policy_type, COUNT(*) AS claim_count FROM <catalog>.<schema>.claims GROUP BY policy_type ORDER BY claim_count DESC LIMIT 50;

Key capabilities (and limits)

  • Read-only access: only SELECT queries are generated (no writes/DDL).
  • Auto-discovery: tables and columns come from the live Databricks schema.
  • Smart JOIN mapping: relationships across claims, policies, customers, reinsurance, and submissions.
  • Error handling: connection/query failures handled gracefully with optional retry.
  • Query limits: LIMIT clauses prevent runaway queries.
  • Volume file support: can retrieve and extract PDFs/CSVs from Databricks Volumes.

Out of scope (v1): multi-turn conversational context, chart/graph generation, cross-schema joins, and write operations (INSERT/UPDATE/DELETE or DDL).

Tables vs volumes: two paths in one workflow

Before it generates an answer, the workflow discovers what’s available in your Databricks catalog/schema:

  • Path A (table data): translate your question into a read-only Databricks SQL query using the correct fully-qualified table names and joins.
  • Path B (volume files): identify the file you’re asking for, download it via Databricks REST APIs, and extract content (PDF text/sections, CSV rows/summary, or text content).

This matters because “Ask My Data” isn’t just “chat + SQL”. It decides whether you want rows from claims/customers/policy tables or content from volume files like files (PDFs/CSVs) and then uses the right execution strategy.

SQL safety and transparency

To keep results trustworthy, the workflow is designed to be auditable:

  • Generated SQL is shown: you can verify the logic before trusting the numbers.
  • Read-only constraint: only SELECT queries are allowed.
  • Row limiting: sensible LIMIT defaults help prevent huge scans.
  • Clear escalation: if the question is ambiguous or requires data outside the configured catalog/schema, the workflow escalates for a human clarification.

Where it fits

  • Self-serve analytics: business users ask questions without needing SQL skills.
  • Faster ad-hoc answers: reduce turnaround time for “can you query this for me?” requests.
  • Transparency for analysts: the workflow shows the generated SQL so logic can be audited.
  • File Q&A: retrieve and extract relevant content from PDFs/CSVs stored in Databricks Volumes.

Starting prompt

user_question: "How many claims are filed for each policy type?" catalog: verticalserve schema: insurance

For the step-by-step screenshots (question input → generated SQL → results → answer summary), see the full use-case walkthrough on the Ask My Data use case page.

Try Ask My Data

Run the workflow in InsightStudio and ask anything about your Databricks insurance data.

Download Free Trial