Blog

What Is Zero-Trust Process Orchestration?

Akshay Sarode Jul 26, 2025

Direct answer

Zero-trust process orchestration is a control plane for processes — spawn, watch, kill across multiple machines — where every API call is authenticated, every privileged action requires an explicit capability grant, and every decision is recorded in a tamper-evident audit log. It's the orchestration layer that doesn't trust the operator, the network, the spawning agent, or the running process.

"Zero-trust" gets thrown at every product. Most of the time it means "we use OAuth and MFA." That's not zero-trust; that's authenticated. Zero-trust as a posture is more specific: no implicit trust based on network location, identity, or prior consent. Every action is checked, every time, against an explicit policy.

Process orchestration is the layer where this matters because the actions are unbounded. A daemon that can spawn arbitrary processes is, by definition, a daemon that can do anything the user can do. Authenticated isn't enough; the question is "what specifically did you grant this caller to do at this moment, and can you prove it later."

Five properties

1. The network is not trusted

"It came from 127.0.0.1, so it's safe" is the original sin. Loopback only proves the caller is on the same machine — it does not prove the caller is the user, or even a process the user authorized. The daemon must authenticate every request the same way it would over the public internet: identity assertion (Firebase ID token, signed JWT) plus capability check.

2. Identity is verified offline when possible

Don't depend on a cloud identity provider for every call. Cache JWKS, validate signatures locally, refresh on rotation. If the network goes down, the daemon should still be able to verify "is this a valid token from my IdP" without a round-trip. Avoids both latency tax and availability dependency.

3. Privileged actions need explicit capabilities

"This user is authenticated" gives them session access. It does not give them the ability to read screen, write outside the workspace, drive other apps, or read TCC-protected files. Each privileged action requires a capability grant — typically a token granted by some out-of-band consent (the user clicked yes in System Settings, or the user wrote it into a config file). Capabilities are composable, named, and limited.

4. Every spawn is sandboxed by default

The default policy is that an agent gets workspace-write, network, subprocess, read-anywhere — and that's it. Want screen recording? Explicit grant. Want write outside the workspace? Explicit grant. Want full disk access? Explicit grant. The point isn't to make the user fight the system; it's to make the dangerous things visible and consented-to.

5. The audit log is append-only and verifiable

Every grant, every spawn, every kernel block decision goes into a hash-chained log: row.hash = sha256(prev.hash || row.body). Hourly verification re-walks the chain. If a row's hash doesn't match, you know exactly which row was tampered. The log is the answer to "what did the agent do" — not console.log, not memory, not "I think it was…"

What this looks like in code

// Pseudo-code for a zero-trust process spawn

func handleSpawn(req SpawnRequest) error {
  // 1. Verify identity
  uid, err := verifyFirebaseIDToken(req.Token, jwksCache)
  if err != nil { return ErrUnauthorized }
  if !cfg.AllowedUIDs.Has(uid) { return ErrUnauthorized }

  // 2. Resolve capabilities for this agent
  caps := resolveCapabilities(req.AgentName, req.Source, req.Command)
  // (caps come from ~/.celistra_capabilities.json — versioned config)

  // 3. Render Seatbelt profile from caps
  profile := renderSeatbeltProfile(caps, cfg.Workspace)

  // 4. Spawn under the profile
  cmd := exec.Command("sandbox-exec", "-f", profile, req.Command...)
  cmd.Dir = cfg.Workspace
  err = cmd.Start()

  // 5. Audit: append, hash-chain, return
  appendAuditEntry(uid, req.AgentName, caps, req.Command, profile, cmd.Process.Pid)
  return err
}

Five steps. None of them is "and trust that the caller is who they say they are." Each step is checking something.

What it doesn't mean

Zero-trust isn't paranoid. It's not "every keystroke triggers Face ID." It's "the things that could blow your foot off require an explicit gesture, and the things that don't are just frictionless." The capability list defaults to a sensible bundle (workspace_rw, read_anywhere, network, subprocess) — most agents never need more, and they get it without prompting. The friction is reserved for the dangerous capabilities.

The reason this matters

AI agents in particular are bad at "did the user really mean this." They follow instructions the user maybe didn't write — prompt injection from a webpage, a bad cell in a CSV, a malicious README. The agent doesn't know it's being lied to. The infrastructure has to be the layer that says "no, the workspace boundary still holds, even if the agent really wants to write outside it."

Zero-trust process orchestration is what catches that.

What we ship

Celistra implements all five properties. Identity via Firebase + offline JWKS. Capabilities in versioned JSON. Sandbox via macOS Seatbelt (Linux equivalent on Linux). Audit chain via the open-source @axy/audit-chain library. Network: loopback by default, tunnel via Firebase-gated frp.

FAQ

Is zero-trust the same as 'no SSH'?

No. SSH can be zero-trust if you treat the SSH cert as one input among many (also need capability check, also need audit). The crime is treating SSH access as 'and now I can do anything.'

Does this slow down spawns?

Negligibly. JWKS verify is offline (cache hit, ~50µs). Capability resolve is config lookup (~10µs). Seatbelt profile render is 1ms (template + write). Audit append is one Firestore document — async to the spawn return. Total: 5–15ms overhead on a sub-second spawn.

Is this the same as Hashicorp Boundary?

Boundary is for human-to-machine sessions (replace SSH/RDP). Zero-trust process orchestration is for human-to-process and process-to-process. Overlap exists but the abstractions differ.

Where can I read more?

Forrester's Zero-Trust eXtended (ZTX) framework covers the principles. NIST SP 800-207 is the canonical doc. Both predate the agent context but the principles map cleanly.