sql-sentinel

← Back to skills

A static-analysis skill that audits SQL for the cost & performance anti-patterns that dominate warehouse bills — `SELECT *`, full-table scans, non-sargable predicates, Cartesian joins, the `NOT IN` NULL trap, and 15 more. It scores warehouse query health 0-100 (A-F) and outputs a prioritized cost-reduction plan, each finding with a `why`, a concrete `fix`, and an estimated savings.

Category: Backend & Cloud
Repo: antigravity-awesome-skills
Path: skills/sql-sentinel/SKILL.md
Updated: 6/29/2026, 7:45:17 AM

AI Summary

A static-analysis skill that audits SQL for the cost & performance anti-patterns that dominate warehouse bills — `SELECT *`, full-table scans, non-sargable predicates, Cartesian joins, the `NOT IN` NULL trap, and 15 more. It scores warehouse query health 0-100 (A-F) and outputs a prioritized cost-reduction plan, each finding with a `why`, a concrete `fix`, and an estimated savings. It is useful for API design, databases, authentication, cloud deployment, and serverless. Source: antigravity-awesome-skills (skills/sql-sentinel/SKILL.md).

sql-sentinel

Overview

A static-analysis skill that audits SQL for the cost & performance anti-patterns that dominate warehouse bills — SELECT *, full-table scans, non-sargable predicates, Cartesian joins, the NOT IN NULL trap, and 15 more. It scores warehouse query health 0-100 (A-F) and outputs a prioritized cost-reduction plan, each finding with a why, a concrete fix, and an estimated savings.

Built for analytics engineers (dbt, Looker), data platform teams running FinOps / "reduce cloud spend" initiatives, and anyone reviewing a SQL pull request before it hits production. Works across BigQuery, Snowflake, Redshift, and Postgres. Zero dependencies, MIT licensed.

The executable engine and full rule set live in the source repository: https://github.com/takeaseatventure/sql-sentinel

When to Use This Skill

  • A user writes or reviews a query for BigQuery, Snowflake, Redshift, Postgres, or Spark SQL.
  • A user asks "why is this query so slow?" or "why is my warehouse bill so high?"
  • A user is about to promote a dashboard query or dbt model to production.
  • A data engineer wants a second pair of eyes before a code review or a cost-optimization sweep.
  • A team is running a "reduce cloud spend" or FinOps initiative.

How It Works

The engine splits a SQL script into statements (honoring quotes and comments), runs 20 rules over each statement, scores health 0-100 weighted by severity (critical 25, high 12, medium 5, low 1), and returns a prioritized cost-reduction plan.

Step 1: Run the audit

Install or clone the source repository, then run the zero-dependency engine:

git clone https://github.com/takeaseatventure/sql-sentinel.git
cd sql-sentinel
node scripts/sql-sentinel.js path/to/query.sql

Or programmatically:

const { auditSql } = require('./scripts/sql-sentinel');
const report = auditSql(yourSqlString, { dialect: 'bigquery' });
console.log(report.healthScore);      // 0-100
console.log(report.grade);            // 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
console.log(report.prioritizedPlan);  // array, worst findings first

Step 2: Read the prioritized plan

The output leads with critical findings (Cartesian joins, mass DELETE) and descends to low-severity style issues. Each finding explains why it costs money and how to fix it.

Examples

Example 1: A messy dashboard query

SELECT DISTINCT *
FROM user_events, raw_logs
WHERE LOWER(event_name) LIKE '%signup%'
  AND user_id NOT IN (SELECT id FROM deleted_users)
ORDER BY created_at;

The audit scores this 17/100 (grade F) and flags 7 findings:

  • CRITICAL: comma-join produces a Cartesian product (can turn a $0.02 query into a $200 query)
  • HIGH: SELECT * forces full column scan (30-90% wasted bytes on wide tables)
  • HIGH: leading-wildcard LIKE '%signup%' defeats indexes
  • HIGH: LOWER(event_name) defeats indexes (non-sargable)
  • HIGH: NOT IN (SELECT ...) — NULL semantics hazard
  • MEDIUM: SELECT DISTINCT dedup cost
  • MEDIUM: ORDER BY without LIMIT sorts the full result

Example 2: A clean, sargable query

-- This scores 90+/100 (grade A) — no findings
SELECT id, email, created_at
FROM users
WHERE created_at >= TIMESTAMP '2026-01-01'
  AND created_at <  TIMESTAMP '2026-02-01'
ORDER BY id
LIMIT 100;

The 20 rules (ruleset v1.0.0)

RuleSeverityCatches
SQL001highSELECT * full column scan
SQL002criticalNo WHERE → full table scan
SQL003highLIKE '%term' non-sargable
SQL004highFunction on column kills index
SQL005criticalCROSS JOIN / comma-join
SQL006mediumSELECT DISTINCT dedup cost
SQL007mediumORDER BY without LIMIT
SQL008highNOT IN (SELECT ...) NULL trap
SQL009mediumImplicit type cast
SQL010lowMany ORs (use IN/UNION)
SQL011mediumCOUNT(DISTINCT) at scale (use HLL)
SQL012lowLIMIT without ORDER BY
SQL013mediumScalar subquery in SELECT
SQL014medium5+ JOINs broadcast/spill risk
SQL015highFact table, no partition filter
SQL017lowString concat in SELECT
SQL018mediumWindow OVER () no PARTITION
SQL020criticalDELETE/UPDATE without WHERE
SQL021lowSELECT * in EXISTS/IN
SQL022mediumUNION vs UNION ALL

Run the test suite to verify each rule fires on real SQL:

cd scripts && node test.js   # 26 tests, zero dependencies

Limitations

  • This is a static analyzer. It finds anti-patterns in the text of SQL; it does not read query plans, row counts, or billing. A flagged query on a 100-row table is cheap; the same query on a billion-row table is the problem the rule exists to prevent.
  • The fact-table heuristic (SQL015) keys off table names (*_events, *_log) and is advisory, not definitive.
  • It does not execute SQL — safe to run on any .sql file.

Related skills