Heat score
1Topic analysis
Testing distributed systems with AI agents
Two skills for AI coding agents that design and run claim-driven tests for distributed and stateful systems. Together they produce a structured Markdown test plan and a findings report with 9-state verdicts and an explicit SUT / harness / checker / environment blame classification. A reviewer reads the two artifacts and decides whether to ship; nothing else has to be re-run. Works with Claude Code, Codex, Copilot CLI, Cursor, Gemini, or any agent that reads Markdown and runs shell. The skills are plain SKILL.md files. The agent executes them; the plan and findings report are the output. One skill designs the plan. The other runs it. A plan starts from the product's claims, generates hypotheses tied to those claims, and writes scenarios named after the claim each tries to falsify. For consistency-critical scenarios, each scenario also binds an abstract model ( register | queue | log | lock | lease | ledger | … ) to an operation-history schema, a named checker, and a nemesis with observable landing evidence. The plan ends with a coverage adequacy argument and a conservative confidence statement. The default for testing distributed and stateful systems — write a few integration tests and call it done — finds a small fraction of the bugs that actually break these systems in production: partial network partitions, non-deterministic concurrency, crash-recovery, upgrade/rollback, idempotency under replay, timing-sensitive ordering. These skills enforce an opinionated workflow that pulls from the field's hard-won knowledge: End-to-end, the two skills produce: The plan structure (a reviewer can read this and decide whether to ship without re-running the tests): (The full findings template carries Oracle, Oracle execution evidence, artifact links, an adequacy-vs-plan section, and a confidence delta — see skills/executing-distributed-system-tests/assets/findings-report-template.md .) Paste this at any AI coding agent (Claude Code, Codex, Copilot CLI, Cursor, Gemini, or anything else that reads Markdown and runs shell): The agent fetches INSTALL.md , clones the repo to ~/.local/share/distributed-testing-skills/ , and wires the skills in (symlinks under ~/.claude/skills/ for Claude Code, a pointer block in ~/AGENTS.md for other agents). After that, ask any agent on the machine to "design a test plan for this system" or "execute the plan at X" and it'll follow the SKILL.md workflow. Paste the same one-liner again. INSTALL.md is idempotent: if the install path exists, it does git pull --ff-only ; if not, it does git clone . Symlinks always point at the cloned content so they pick up the new version automatically. The ~/AGENTS.md pointer block uses HTML markers and is replaced cleanly on each run — no duplication. If you have local edits to the cloned skills, git pull --ff-only will fail; the agent will stop and ask before discarding them. Once the skills are installed, you have two ways to drive them: Casual ask (Claude Code with auto-trigger): The skill descriptions pick up natural phrasing like "design a test plan", "execute the plan", "run stability tests", "design a release validation plan", etc. For a specific mode, output path, or a non-auto-trigger agent, USAGE.md has copy/paste prompts for every workflow (design and execute, in their respective modes) plus tips on scope, env probing, and long-run checkpointing. Walks the repo, extracts the claims the product makes, generates hypotheses tied to those claims, picks techniques from the catalog, and writes a structured Markdown plan with a coverage adequacy argument and a confidence statement. For consistency-critical scenarios, the plan fills a §7.M block per scenario: model under test, operation-history schema, named checker, nemesis + landing evidence, ambiguous-outcome handling, reduction plan. Details: history-discipline.md . Two modes: change-scoped (a specific commit or PR) and project-wide (a holistic plan with existing-test inventory and gap analysis). Reads the plan, discovers the SUT's toolbox, probes the environment, and runs scenarios with checkpoint discipline. Per scenario: captures landing evidence for the fault, runs the green-but-broken and weak-oracle audits, assigns a verdict from the 9-state taxonomy in verdict-taxonomy.md , and classifies every FAIL into SUT / harness / checker / environment before filing. Produces a findings report with adequacy-vs-plan assessment and confidence delta. Two modes: default (read-only on the SUT, ephemeral harnesses under the session dir) and author mode (writes scenario skeletons declared in the plan's §7 into the SUT for review). Eight reference files distilled from the field's literature: Each follows the same shape: when to reach for it, what it detects well, what it misses, concrete tools, papers, cost signal, plan checklist. The catalog index pairs symptoms to references. Early but exercised. Both skills have been driven against AgentDB (a distributed agent runtime in Rust) end-to-end multiple times, surfacing six findings (one P0-candidate now closed, two P1s shipped as a PR, two open). The skill bodies evolve as harness experience accumulates; expect minor updates to the SKILL.mds and templates over the next few iterations. Real plan outputs, session directories, and findings reports from those runs live under verification/ , one subdirectory per run, each with a README.md describing what passed, what failed, and what the skill surfaced about itself in the process. Notable runs: There is also an eval suite under evals/ (separate evals.json for the design and execute skills) — used to validate behavioural changes to the SKILL.md bodies between iterations. The technique catalog is distilled from Andrey Satarin's comprehensive testing-distributed-systems catalog. Seminal papers anchoring the catalog include:
Sources
1Platforms
1Relations
0- First seen
- May 20, 2026, 10:40 PM
- Last updated
- May 21, 2026, 12:01 AM
Why this topic matters
Testing distributed systems with AI agents is currently shaped by signals from 1 source platforms. This page organizes AI analysis summaries, 1 timeline events, and 0 relationship edges so search engines and AI systems can understand the topic's factual basis and propagation arc.
Keywords
9 tagsSource evidence
1 evidence itemsTesting distributed systems with AI agents
News · 1Timeline
Testing distributed systems with AI agents
May 20, 2026, 10:40 PM
Related topics
No related topics have been aggregated yet, but this page still preserves the AI summary, source links, and timeline.