Ask My Data — Ask Databricks insurance questions in plain English

← All posts

Data Warehouse Optimization May 27, 2026 by Hitesh Talesra 11 min read

Teams that live in insurance data usually have the same problem: business users can’t write SQL, and analysts don’t have time for constant ad-hoc query requests.

Ask My Data bridges that gap. It connects to Databricks, discovers the available tables/columns, translates your question into valid Databricks SQL, executes the query, and returns results with a clear natural-language answer. When your question is about files, it can also retrieve content from Databricks Volumes (PDFs/CSVs) and extract what matters.

The problem

SQL is the bottleneck: business users need analytics answers, but most don’t have SQL expertise, so data teams get pulled into simple queries.
Tribal knowledge: writing correct SQL requires knowing table names, schemas, and how to join claims ↔ policies ↔ customers in your domain model.
Ad-hoc work has no queue: the “quick question” becomes a time sink, especially during busy release cycles.
Risk of incorrect results: manual SQL attempts can produce wrong numbers, which is worse than slow work because it can lead to wrong decisions.

The workflow: from question to answer

Ask My Data runs a tight, repeatable sequence for every question:

Discover schema & volumes — reads available tables/columns in your configured catalog/schema and lists files in volumes. Databricks API · schema introspection

Generate SQL or a file retrieval plan — interprets the question, maps it to the right tables, and produces safe, read-only SQL (or a plan to fetch the right file). natural language interpretation · SQL generation

Execute / retrieve — runs the query against Databricks, or downloads the file via REST API and extracts its content. databricks-sql-connector · query execution

Format results — shows the generated SQL for transparency, presents data in a clean table view, and generates a short answer summary. result formatting · summary generation

Example questions (copy/paste)

"How many claims are filed for each policy type?"

"Show all policies where coverage limits exceed $5 million"

"Which broker has the most claims in the last 90 days?"

"List open claims with payout greater than $50,000"

What the agent returns

Generated SQL (or retrieval plan) so users can audit logic.
Query Results as a readable table (or extracted file content).
Answer Summary in plain English with key takeaways.
Row count & notable observations to avoid “looks right” ambiguity.

Transparency by design

Ask My Data doesn’t treat SQL as a black box. It displays the SQL it generated so users understand what was run (and analysts can quickly spot when a question mapped to the wrong table relationship).

SELECT policy_type, COUNT(*) AS claim_count FROM <catalog>.<schema>.claims GROUP BY policy_type ORDER BY claim_count DESC LIMIT 50;

Key capabilities (and limits)

Read-only access: only SELECT queries are generated (no writes/DDL).
Auto-discovery: tables and columns come from the live Databricks schema.
Smart JOIN mapping: relationships across claims, policies, customers, reinsurance, and submissions.
Error handling: connection/query failures handled gracefully with optional retry.
Query limits: LIMIT clauses prevent runaway queries.
Volume file support: can retrieve and extract PDFs/CSVs from Databricks Volumes.

Out of scope (v1): multi-turn conversational context, chart/graph generation, cross-schema joins, and write operations (INSERT/UPDATE/DELETE or DDL).

Tables vs volumes: two paths in one workflow

Before it generates an answer, the workflow discovers what’s available in your Databricks catalog/schema:

Path A (table data): translate your question into a read-only Databricks SQL query using the correct fully-qualified table names and joins.
Path B (volume files): identify the file you’re asking for, download it via Databricks REST APIs, and extract content (PDF text/sections, CSV rows/summary, or text content).

This matters because “Ask My Data” isn’t just “chat + SQL”. It decides whether you want rows from claims/customers/policy tables or content from volume files like files (PDFs/CSVs) and then uses the right execution strategy.

SQL safety and transparency

To keep results trustworthy, the workflow is designed to be auditable:

Generated SQL is shown: you can verify the logic before trusting the numbers.
Read-only constraint: only SELECT queries are allowed.
Row limiting: sensible LIMIT defaults help prevent huge scans.
Clear escalation: if the question is ambiguous or requires data outside the configured catalog/schema, the workflow escalates for a human clarification.

Where it fits

Self-serve analytics: business users ask questions without needing SQL skills.
Faster ad-hoc answers: reduce turnaround time for “can you query this for me?” requests.
Transparency for analysts: the workflow shows the generated SQL so logic can be audited.
File Q&A: retrieve and extract relevant content from PDFs/CSVs stored in Databricks Volumes.

Starting prompt

user_question: "How many claims are filed for each policy type?" catalog: verticalserve schema: insurance

For the step-by-step screenshots (question input → generated SQL → results → answer summary), see the full use-case walkthrough on the Ask My Data use case page.

Try Ask My Data

Run the workflow in InsightStudio and ask anything about your Databricks insurance data.

Download Free Trial

See the Ask My Data use case