Agent Supervision
"Run an agent" is easy. "Run an agent for 6 hours overnight without it dying or doing something dumb" is the actual job. Supervision in the AI agent context is:
- Lifecycle. Agent dies → respawn (with a ceiling). Agent hangs → kill button works. Agent finishes → record exit code.
- Containment. Sandbox per agent. Workspace boundary enforced. Capability grants explicit.
- Approvals. Agent needs human consent for a privileged action → buzz the phone, not the desk.
- Audit. Hash-chained log of every privileged action. Verifiable later.
- Observability. Last 64KB of output retained per agent for late-joining clients.
Why this matters
Without supervision, AI agents are scripts you launched and hoped for the best on. With supervision, they're production workloads with a known shape. You can run more of them in parallel, leave them longer, and recover from crashes without losing work.
Detailed deep-dives:
- Restart-on-crash for long-running AI agents
- Sandboxing Claude Code on macOS
- Haptic approvals from your phone
- The audit log every agent needs
FAQ
What kinds of agents does this cover?
Any process — Claude Code, Codex, Aider, Playwright runs, Python scripts, Go binaries, training jobs. Anything that runs in a PTY.
Do I need to instrument my agent?
No. The daemon supervises the process; the agent is an opaque consumer of stdin/stdout/stderr. Permission requests are signaled out-of-band — the agent prints a JSON line on stderr that the daemon recognizes.