Agentic safety rail

AgentRail must make agent work safer before it makes agent work faster.

Multi-agent coding workflows touch files, secrets, terminals, browsers, GitHub, and release assets. The safest personal Beta is a control rail with explicit scope, tool permissions, evidence, and human approval before publish.

Open SOP Open Stack Map Verify Install

Rule

No agent publishes directly.

AgentRail should treat agents as workers, not owners. They can propose changes, run checks, and attach evidence, but the operator decides whether the workitem is ready to merge and publish.

Prompt injection

Untrusted content stays untrusted

Web pages, issues, PR comments, email, and downloaded docs can contain malicious instructions. AgentRail should label external content as evidence, not as authority.

Tool scope

Tools need per-task boundaries

A workitem should state which tools are allowed: read files, edit repo, run tests, browse web, call GitHub, or deploy. Anything outside scope requires human approval.

Secrets

Secrets should not enter prompts

Tokens, API keys, cookies, and release credentials should remain outside agent-visible instructions. If a tool needs a secret, the approval and action should be logged.

GitHub

PRs are evidence containers

Branch protection, CI status, code review, and PR conversation are part of the control rail. AgentRail should link workitems to PRs and record what passed before merge.

MCP

MCP servers need allowlists

MCP expands the action surface. AgentRail should document which servers and tools a task can use, and separate read-only context from state-changing actions.

Minimum publish checklist

Scope matches the original workitem and no unrelated files were changed.
HTTP checks return 200 for changed pages and core routes.
Install smoke test still passes for the current release path.
Desktop and mobile screenshots show no navigation, text, or layout breakage.
The operator reviews the final diff and release evidence before push or merge.