sheets-mcp docs
Guides

SQL in Sheets

Use analyze_range and transform_range to run Postgres SQL against live sheet data. The definitive AX cookbook.

Note

Agent Experience first. This guide is written for AI agents using sheets-mcp, not just human developers. Every pattern here is optimised to minimise tool calls, protect the agent context window, and produce reliable, deterministic outcomes.


Why SQL instead of read → process → write?

The naive pattern for an AI agent operating on spreadsheet data is:

read_range(large range) → 50,000 cells enter context → LLM processes → write_range

This is the worst possible approach for agents:

  • Context bloat: 50k cells can consume the entire context window, leaving no room for reasoning
  • Token cost: Unnecessary token spend on raw data the LLM doesn't need to see
  • Hallucination risk: The LLM may produce incorrect aggregations on large numeric datasets

The correct pattern:

analyze_range(range, SQL) → only the aggregated result enters context → done

SQL runs in Postgres inside Vercel's infrastructure. The sheet data never touches the LLM context beyond the final summary.


The data model

analyze_range and transform_range both expose your sheet as a CTE named sheet_data with a single JSONB column data. Row 1 of your range is treated as the header row. Each subsequent row becomes one JSONB object with header keys.

Sheet:
| Name    | Revenue | Region |
|---------|---------|--------|
| Acme    | 12000   | South  |
| Globex  | 8500    | North  |

JSONB objects:
{ "Name": "Acme",   "Revenue": "12000", "Region": "South" }
{ "Name": "Globex", "Revenue": "8500",  "Region": "North" }

Warning: All values in data are strings — even numbers. Always cast before arithmetic: (data ->>'Revenue')::numeric


Step 0 — Always inspect headers first

Before writing any query, run this to see the exact header keys (case, spaces, special chars):

SELECT jsonb_object_keys(data) AS header
FROM sheet_data
LIMIT 50

This prevents null results from silent key mismatches like 'Revenue ' vs 'Revenue'.


Core patterns

Aggregate with GROUP BY

SELECT
  data ->>'Region' AS region,
  COUNT(*)::int AS deals,
  SUM((data ->>'Revenue')::numeric) AS total_revenue,
  ROUND(AVG((data ->>'Revenue')::numeric), 2) AS avg_deal
FROM sheet_data
WHERE data ->>'Status' = 'Closed Won'
GROUP BY region
ORDER BY total_revenue DESC

Filter rows

SELECT data ->>'Name', data ->>'Revenue', data ->>'Due Date'
FROM sheet_data
WHERE (data ->>'Revenue')::numeric > 10000
  AND data ->>'Status' != 'Cancelled'

Find duplicates

SELECT data ->>'Email' AS email, COUNT(*) AS occurrences
FROM sheet_data
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC

Running total (window function)

SELECT
  data ->>'Date' AS date,
  (data ->>'Amount')::numeric AS amount,
  SUM((data ->>'Amount')::numeric) OVER (ORDER BY data ->>'Date') AS running_total
FROM sheet_data

Conditional classification

SELECT
  data ->>'Name' AS name,
  (data ->>'Score')::numeric AS score,
  CASE
    WHEN (data ->>'Score')::numeric >= 90 THEN 'A'
    WHEN (data ->>'Score')::numeric >= 75 THEN 'B'
    WHEN (data ->>'Score')::numeric >= 60 THEN 'C'
    ELSE 'F'
  END AS grade
FROM sheet_data
ORDER BY score DESC

analyze_range — read-only

Use when you need insight without mutation. Results go to the agent only — the sheet is untouched.

{
  "tool": "analyze_range",
  "spreadsheetId": "1BxiMVs0XRA...",
  "range": "Sheet1!A:F",
  "query": "SELECT data ->>'Region' AS region, SUM((data ->>'Revenue')::numeric) AS total FROM sheet_data GROUP BY region ORDER BY total DESC"
}

Caching behaviour: analyze_range caches the raw sheet data in Supabase for 60 seconds. Repeated calls within this window skip the Sheets API fetch entirely, reducing latency and API quota usage.


transform_range — read → SQL → write back

Use when you need to mutate the sheet data in-place using SQL logic. The agent writes the transformation rule, not the transformed data.

{
  "tool": "transform_range",
  "spreadsheetId": "1BxiMVs0XRA...",
  "range": "Sheet1!A:D",
  "query": "SELECT data ->>'Date', INITCAP(TRIM(data ->>'Name')), (data ->>'Revenue')::numeric * 1.1, data ->>'Region' FROM sheet_data"
}

Critical: transform_range never uses cache. It always fetches fresh data from Google Sheets before executing SQL. This prevents stale data from being written back.

Cell limit: transform_range is capped at 50,000 cells per call. For larger datasets, narrow the range or use column subsets.

Pre-write snapshot: Before every write, transform_range saves the original range to write_history in Supabase. Use restore_snapshot to undo.

Safe workflow for transforms

1. analyze_range(range, query)          → Preview the SQL output
2. If output looks correct →
   transform_range(range, same_query)   → Write it back
3. If something went wrong →
   restore_snapshot(action: "list")     → Find the snapshot
   restore_snapshot(action: "restore", snapshotId: "...") → Undo

AX anti-patterns to avoid

✗ Anti-pattern✓ Correct approach
read_range on 10k+ rows then summarise in LLManalyze_range with SQL GROUP BY
Multiple sequential analyze_range calls for the same rangeOne query with multiple CTEs or subqueries
Using transform_range without previewing with analyze_range firstAlways preview before mutating
Casting headers as numbers without inspecting them firstSELECT jsonb_object_keys(data) first
Passing raw LLM-generated strings directly into values[][]Clean and validate in SQL, then write

Security constraints (automatic — agents cannot bypass)

ConstraintEffect
BEGIN READ ONLY transactionPostgres blocks any mutation even if SQL contains INSERT/UPDATE
; stripped from queryStatement chaining blocked
DROP, DELETE, UNION rejectedDestructive keywords rejected before execution
LIMIT 1000 auto-appendedResult sets capped to prevent context overflow
SET LOCAL ROLE sheet_analyzerQuery runs as a restricted Postgres role

On this page