Verify done-claims against ground truth, not the agent's word

← Back to skills

When an AI agent says "done", "shipped", or "fixed", that is a **claim**, not a fact — and a claim the agent checks by re-reading its own work is *consistency, not grounding*. This skill replaces that self-report with a verdict from a witness the agent did not author: it shells the **DOS kernel** (`dos verify`, `dos commit-audit`) to confirm the claimed effect from git ancestry and the commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is git-only and offline as used here; the one exception is `dos verify` in a workspace that wires a CI oracle, which `--no-ci` suppresses (see Security & Safety Notes).

Category: AI & Intelligent Agents
Repo: antigravity-awesome-skills
Path: skills/dos-verify-done-claims/SKILL.md
Updated: 6/15/2026, 9:11:44 AM

AI Summary

When an AI agent says "done", "shipped", or "fixed", that is a **claim**, not a fact — and a claim the agent checks by re-reading its own work is *consistency, not grounding*. This skill replaces that self-report with a verdict from a witness the agent did not author: it shells the **DOS kernel** (`dos verify`, `dos commit-audit`) to confirm the claimed effect from git ancestry and the commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is git-only and offline as used here; the one exception is `dos verify` in a workspace that wires a CI oracle, which `--no-ci` suppresses (see Security & Safety Notes). It is useful for LLM applications, agent orchestration, RAG pipelines, AI evaluation, and multi-agent workflows. Source: antigravity-awesome-skills (skills/dos-verify-done-claims/SKILL.md).

Verify done-claims against ground truth, not the agent's word

Overview

When an AI agent says "done", "shipped", or "fixed", that is a claim, not a fact — and a claim the agent checks by re-reading its own work is consistency, not grounding. This skill replaces that self-report with a verdict from a witness the agent did not author: it shells the DOS kernel (dos verify, dos commit-audit) to confirm the claimed effect from git ancestry and the commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is git-only and offline as used here; the one exception is dos verify in a workspace that wires a CI oracle, which --no-ci suppresses (see Security & Safety Notes).

This skill adapts the DOS reference "witness-claim" pattern (anthony-chaudhary/dos-kernel) into a host-agnostic screenplay.

When to Use This Skill

  • Use when an agent reports a task/phase/feature as complete and you want that "done" confirmed from evidence before building on it.
  • Use right after a commit, to confirm the commit's message matches its diff (catch a fix: that only touched a README, or a "tests pass" that deleted the assertions).
  • Use when folding many sub-agents' results — verify each claimed effect instead of trusting the return string.
  • Do not use it to judge whether code is correct — that is what the test suite is for. This skill checks did-the-claimed-thing-actually-ship.

How It Works

Step 1: Install the kernel (once)

pip install dos-kernel        # provides the `dos` CLI; deterministic, no key

Step 2: Audit the latest commit's claim vs its diff

A commit subject is forgeable (whoever wrote the message authored it); the files it touched are not (git did). dos commit-audit grades the subject against the actual diff:

dos commit-audit --workspace . HEAD --json

commit-audit --json prints a JSON array of audited commits (one element even for a single HEAD), so read verdict from the first element — e.g. dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'. (Without --json the same verdict prints as a one-line text row: · OK …, ⚑ UNWITNESSED …, or · abstain ….) The verdicts are: OK (the diff backs the claim's kind), CLAIM_UNWITNESSED (the subject's claim is not evidenced by the diff — treat the "done" as unproven), or ABSTAIN. This judges the kind of change, never correctness — run the tests for that.

Step 3: Verify a named phase actually shipped

If the agent claims a specific plan/phase landed, confirm it from git history rather than the transcript:

dos verify --workspace . PLAN PHASE --json --no-ci

--no-ci keeps the verdict git-only (see the Security note below). With --json you get the shipped and source fields. (The default text form prints SHIPPED PLAN PHASE (via grep) or NOT_SHIPPED PLAN PHASE (via none) — the same verdict, and the process exit code is non-zero when not shipped.)

Grade shipped: true by the source, because git fallback grades itself by forgeability — and forgeable evidence is exactly what this skill exists to distrust:

  • registry or grep-artifactnon-forgeable (a registry row, or an artefact/diff rung). This closes the claim.
  • grep-subject (or bare grep) — forgeable: a commit subject or body carried the phase token, which an agent can write without doing the work (even on an empty commit). Treat this as shipped-per-the-subject, not confirmed — corroborate it (run dos commit-audit on that commit, below) before you close.
  • none — no positive evidence; accept as "not shipped", not as a tool failure.

Step 4: Fold only confirmed effects

Accept the agent's "done" only when Step 2/3 corroborate it. If CLAIM_UNWITNESSED or shipped: false, the work is not done regardless of how confidently the agent narrated it — send it back.

Examples

Example 1: gate an agent's "I fixed the bug" claim

# The agent committed and said it's fixed. Check the diff backs the claim.
# commit-audit --json returns an array, so read the first element's verdict:
dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'
# OK                -> the change is of the claimed kind; now run the tests
# CLAIM_UNWITNESSED -> the commit doesn't do what it says; reject

Example 2: confirm a feature phase shipped before closing a ticket

dos verify --workspace . AUTH AUTH2 --json --no-ci
# shipped: true, source: registry|grep-artifact -> non-forgeable; safe to close
# shipped: true, source: grep-subject|grep       -> forgeable subject/body match;
#   shipped-per-the-subject only -> corroborate with commit-audit before closing
# shipped: false, source: none -> no evidence; keep the ticket open

Best Practices

  • ✅ Run dos commit-audit HEAD immediately after every agent commit.
  • ✅ Treat source: none / CLAIM_UNWITNESSED as "not done", not as a tool error.
  • ✅ Close a claim on a non-forgeable source (registry, grep-artifact). Treat grep-subject / bare grep as forgeable (an agent can write the subject text) — corroborate before closing.
  • ✅ Keep the test suite as the separate correctness gate — this skill checks shipping, not correctness.
  • ❌ Don't accept a "done" because the agent's prose was confident.
  • ❌ Don't use this to replace code review or testing.

Limitations

  • This skill does not replace environment-specific validation, testing, or expert review.
  • It checks whether a claimed change shipped / matches its diff — not whether the code is correct.
  • dos verify reads git history; in a repo with no commits there is nothing to witness (it will honestly report source: none).
  • Stop and ask for clarification if required inputs (a git repo, the dos CLI) are missing.

Security & Safety Notes

  • This skill runs shell commands: pip install dos-kernel and the read-only dos verbs (dos commit-audit, dos verify). These verbs never mutate the repo or push. dos commit-audit only reads git history and the working tree (no network). dos verify is also git-only unless the workspace has wired a CI oracle ([verify] non_git_oracle in its dos.toml), in which case it may shell a network check (e.g. gh api) for the verdict — pass --no-ci (as the examples above do) to force the git-only path and guarantee no network.
  • pip install dos-kernel installs from PyPI. The distribution name is dos-kernel (the bare dos on PyPI is an unrelated package — do not install it). Pin a version in locked environments.
  • Run in the repository you intend to adjudicate; the --workspace . argument scopes every verdict to that repo.

Common Pitfalls

  • Problem: dos verify returns source: none and it looks like a failure. Solution: That is the honest "no evidence" verdict — it means the phase has no ship commit, so the claim is unproven. Re-stamp the real commit or keep the task open.
  • Problem: Installing the wrong package. Solution: The PyPI name is dos-kernel, not dos.

Related Skills

  • The upstream DOS reference screenplays (dos-witness-claim, dos-goal-gate) in anthony-chaudhary/dos-kernel cover the multi-agent fan-out and self-stopping-agent variants of this same witness discipline.

Related skills