Blog

The Audit Log Every AI Agent Needs

Akshay Sarode
Direct answer

Every privileged action — every spawn, every capability grant, every kernel block, every permission ask — gets a row in an append-only log. The log is hash-chained: row.hash = sha256(prev.hash || row.body). An hourly verifier re-walks the chain. The log is the answer to 'what did the agent do' — not console.log, not memory.

The first time you let an AI agent loose, you don't know what to log. The fifth time, you wish you'd been logging. The fiftieth time, an agent does something weird and you go looking for the log and the log is incomplete and you can't tell what happened.

This post is the structure I converged on. It's small. It's been useful.

What goes in the log

Every action that the system would care about if subpoenaed. For an AI agent supervisor, that's:

Notably not in the log: every keystroke. Every model token. The 30-day SQLite history is for that. The audit log is for events with consequences.

The hash chain

Each row stores: payload + hash of (previous_hash || payload).

row[N].hash = sha256(row[N-1].hash || serialize(row[N].body))
row[0].hash = sha256("" || serialize(row[0].body))

Two properties:

Each row stores: seq, ts, actor (uid or "system"), action, target, body (action-specific JSON), prevHash, hash.

The verifier

A worker walks the chain hourly:

prev := ""
for seq := 0; seq < lastSeq; seq++ {
  row := getRow(seq)
  expected := sha256(prev || serialize(row.body))
  if expected != row.hash {
    alertVerificationFailed(seq)
    return
  }
  prev = row.hash
}
markVerifiedThrough(lastSeq, time.Now())

If verification fails, page someone. The 99.999% case is "everything is fine" but that's the case where the log is most useful for tomorrow's question.

Where the log lives

For Celistra: every row is a Firestore doc at audit/{seq}. The hourly verifier is a Cloud Function. The "view audit log" UI is an SSE stream from the daemon (live tail) plus a paginated query of older rows.

For self-hosted: the same code (@axy/audit-chain in TS, axy-audit-chain in Python) writes to whatever store you back it with — Firestore, Postgres, S3-versioned objects.

What this enables

"Did the agent really push to main?" Find rows where action=permission-asked + target=git-push + decided=approved. There's the row. There's who approved. There's when.

"When did the sandbox come down?" Find rows where action=sandbox-toggled. The most recent shows on/off and who flipped it.

"What capabilities did agent X have when it ran on Tuesday?" Find the spawn row for that agent on that date. The capabilities resolved at spawn time are in the body.

"How many block decisions has the sandbox made this week?" Count rows where action=sandbox-block. (Cheap on Firestore with a composite index on action + ts.)

What this doesn't enable

The log is a record of things the system did, not of things the agent's model thought. Internal model reasoning isn't logged. If you need that level, you're past audit and into observability — Langfuse, LangSmith, OpenLIT.

The log also doesn't replace privacy. PII in body stays there. If you can't afford to retain certain fields, redact at append time.

Why open-source the chain code

Because the value of an audit chain is the property "I can verify it." If only the vendor has the code, the vendor is also the only one who can verify. Open-source means anyone can verify the chain you exported. The TS lib is <200 LoC; the Python lib is <200 LoC. Both Apache-2.0.

What we ship

The Celistra daemon writes to a local audit log + the Firebase project's audit collection. The hourly verifier runs as a Cloud Function. The dashboard's "Activity" panel is an SSE stream from the daemon. The @axy/audit-chain library is the same code Ujex uses for its agent-infra audit trail.

FAQ

Why hash chain instead of just signing each row?

Signing each row catches modification; hash chain catches modification AND insertion/deletion in the middle. Both are useful but the chain is strictly more powerful at the cost of one extra field per row.

How big does the log get?

Each row is ~500 bytes. A heavy day might have 5,000 events. Yearly: ~1GB. Cheap on any store. Tier old rows to cold storage if needed.

Is the chain GDPR-friendly?

The chain itself is — rows are append-only and verifiable. PII inside row bodies is your job. Redact at append, not after.

Can I export the log to prove a property to a third party?

Yes. Export rows + the hash of the last verified row. The third party runs the same verifier (open-source), reaches the same hash, knows the export wasn't tampered.