SQL in Sheets

Use analyze_range and transform_range to run Postgres SQL against live sheet data. The definitive AX cookbook.

Note

Agent Experience first. This guide is written for AI agents using sheets-mcp, not just human developers. Every pattern here is optimised to minimise tool calls, protect the agent context window, and produce reliable, deterministic outcomes.

Why SQL instead of read → process → write?

The naive pattern for an AI agent operating on spreadsheet data is:

read_range(large range) → 50,000 cells enter context → LLM processes → write_range

This is the worst possible approach for agents:

Context bloat: 50k cells can consume the entire context window, leaving no room for reasoning
Token cost: Unnecessary token spend on raw data the LLM doesn't need to see
Hallucination risk: The LLM may produce incorrect aggregations on large numeric datasets

The correct pattern:

analyze_range(range, SQL) → only the aggregated result enters context → done

SQL runs in Postgres inside Vercel's infrastructure. The sheet data never touches the LLM context beyond the final summary.

The data model

analyze_range and transform_range both expose your sheet as a CTE named sheet_data with a single JSONB column data. Row 1 of your range is treated as the header row. Each subsequent row becomes one JSONB object with header keys.

Sheet:
| Name    | Revenue | Region |
|---------|---------|--------|
| Acme    | 12000   | South  |
| Globex  | 8500    | North  |

JSONB objects:
{ "Name": "Acme",   "Revenue": "12000", "Region": "South" }
{ "Name": "Globex", "Revenue": "8500",  "Region": "North" }

Warning: All values in data are strings — even numbers. Always cast before arithmetic: (data ->>'Revenue')::numeric

Step 0 — Always inspect headers first

Before writing any query, run this to see the exact header keys (case, spaces, special chars):

SELECT jsonb_object_keys(data) AS header
FROM sheet_data
LIMIT 50

This prevents null results from silent key mismatches like 'Revenue ' vs 'Revenue'.

Core patterns

Aggregate with GROUP BY

SELECT
  data ->>'Region' AS region,
  COUNT(*)::int AS deals,
  SUM((data ->>'Revenue')::numeric) AS total_revenue,
  ROUND(AVG((data ->>'Revenue')::numeric), 2) AS avg_deal
FROM sheet_data
WHERE data ->>'Status' = 'Closed Won'
GROUP BY region
ORDER BY total_revenue DESC

Filter rows

SELECT data ->>'Name', data ->>'Revenue', data ->>'Due Date'
FROM sheet_data
WHERE (data ->>'Revenue')::numeric > 10000
  AND data ->>'Status' != 'Cancelled'

Find duplicates

SELECT data ->>'Email' AS email, COUNT(*) AS occurrences
FROM sheet_data
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC

Running total (window function)

SELECT
  data ->>'Date' AS date,
  (data ->>'Amount')::numeric AS amount,
  SUM((data ->>'Amount')::numeric) OVER (ORDER BY data ->>'Date') AS running_total
FROM sheet_data

Conditional classification

SELECT
  data ->>'Name' AS name,
  (data ->>'Score')::numeric AS score,
  CASE
    WHEN (data ->>'Score')::numeric >= 90 THEN 'A'
    WHEN (data ->>'Score')::numeric >= 75 THEN 'B'
    WHEN (data ->>'Score')::numeric >= 60 THEN 'C'
    ELSE 'F'
  END AS grade
FROM sheet_data
ORDER BY score DESC

`analyze_range` — read-only

Use when you need insight without mutation. Results go to the agent only — the sheet is untouched.

{
  "tool": "analyze_range",
  "spreadsheetId": "1BxiMVs0XRA...",
  "range": "Sheet1!A:F",
  "query": "SELECT data ->>'Region' AS region, SUM((data ->>'Revenue')::numeric) AS total FROM sheet_data GROUP BY region ORDER BY total DESC"
}

Caching behaviour: analyze_range caches the raw sheet data in Supabase for 60 seconds. Repeated calls within this window skip the Sheets API fetch entirely, reducing latency and API quota usage.

`transform_range` — read → SQL → write back

Use when you need to mutate the sheet data in-place using SQL logic. The agent writes the transformation rule, not the transformed data.

{
  "tool": "transform_range",
  "spreadsheetId": "1BxiMVs0XRA...",
  "range": "Sheet1!A:D",
  "query": "SELECT data ->>'Date', INITCAP(TRIM(data ->>'Name')), (data ->>'Revenue')::numeric * 1.1, data ->>'Region' FROM sheet_data"
}

Critical: transform_range never uses cache. It always fetches fresh data from Google Sheets before executing SQL. This prevents stale data from being written back.

Cell limit: transform_range is capped at 50,000 cells per call. For larger datasets, narrow the range or use column subsets.

Pre-write snapshot: Before every write, transform_range saves the original range to write_history in Supabase. Use restore_snapshot to undo.

Safe workflow for transforms

1. analyze_range(range, query)          → Preview the SQL output
2. If output looks correct →
   transform_range(range, same_query)   → Write it back
3. If something went wrong →
   restore_snapshot(action: "list")     → Find the snapshot
   restore_snapshot(action: "restore", snapshotId: "...") → Undo

AX anti-patterns to avoid

✗ Anti-pattern	✓ Correct approach
`read_range` on 10k+ rows then summarise in LLM	`analyze_range` with SQL GROUP BY
Multiple sequential `analyze_range` calls for the same range	One query with multiple CTEs or subqueries
Using `transform_range` without previewing with `analyze_range` first	Always preview before mutating
Casting headers as numbers without inspecting them first	`SELECT jsonb_object_keys(data)` first
Passing raw LLM-generated strings directly into `values[][]`	Clean and validate in SQL, then write

Security constraints (automatic — agents cannot bypass)

Constraint	Effect
`BEGIN READ ONLY` transaction	Postgres blocks any mutation even if SQL contains `INSERT`/`UPDATE`
`;` stripped from query	Statement chaining blocked
`DROP`, `DELETE`, `UNION` rejected	Destructive keywords rejected before execution
`LIMIT 1000` auto-appended	Result sets capped to prevent context overflow
`SET LOCAL ROLE sheet_analyzer`	Query runs as a restricted Postgres role

SQL in Sheets

On this page