The first description of Zed I ever wrote called it “a tool that compiles declarative deployment specifications into convergence operations against ZFS.” The sentence was true and useless. It told you what category Zed belonged to and nothing about what made it interesting. After nine days of iteration that landed five layers on top of the existing four, the right elevator pitch is shorter:
Zed is a deploy tool that lives inside the filesystem it deploys to.
The state of every running app is a ZFS user property. The secret
material is an encrypted dataset whose mount point produces files
the BEAM can read. The jails are managed by a 1048-star FreeBSD CLI
called Bastille, wrapped in an Elixir adapter that converts the
CLI’s soft contract into a hard one. The privilege model is two
Unix users connected by a Unix-domain socket whose authentication
mechanism is getpeereid(2). The whole thing is about
three thousand lines of Elixir, plus eighty lines of C for one NIF.
There is no etcd. There is no Vault. There is no Kubernetes. There
is no Docker. There are two mix release targets that
each produce a self-contained tarball. You install the tarball.
You run it. The system tells you what state it’s in by reading the
filesystem.
This is a reintroduction — not a retrospective. The previous post on this site, The Lie at Exit Zero, was the messy birth of the Bastille adapter. This is what the toddler looks like.
The DSL
A Zed deployment is an Elixir module. It does not generate YAML. It
does not consume YAML. The compile step IS the validation step:
unknown verbs, broken references, and forbidden storage modes all
fail at mix compile with a source location.
defmodule MyInfra.Trading do
use Zed.DSL
deploy :trading, pool: "jeff" do
dataset "apps/exmc" do
compression :lz4
quota "50G"
end
app :exmc do
dataset "apps/exmc"
version "1.4.2"
cookie {:secret, :beam_cookie, :value}
env_file "/etc/exmc/env"
health :beam_ping, timeout: 5_000
end
jail :trading_jail do
dataset "jails/trading"
ip4 "10.0.1.100/24"
contains :exmc
end
snapshots do
before_deploy true
keep 5
end
end
end
This is not a template. This is not a configuration file. This is
the actual specification. MyInfra.Trading.converge/0
diffs current ZFS state against this IR, plans an ordered sequence
of operations, snapshots before mutations, and rolls back the entire
plan if any single step fails. The rollback is one ZFS command and
runs in constant time regardless of dataset size.
The cookie {:secret, :beam_cookie, :value} reference
is the part that wasn’t there a month ago. We’ll come back to it.
The state model: ZFS user properties
Every ZFS dataset can carry arbitrary string-keyed metadata. The
keys live in a namespace; ours is com.zed. The values
travel with the dataset through snapshots and through
zfs send | zfs receive over an SSH pipe. There is no
external state store because the state IS the dataset.
$ zfs get all jeff/apps/exmc | grep com.zed
jeff/apps/exmc com.zed:managed true
jeff/apps/exmc com.zed:app exmc
jeff/apps/exmc com.zed:version 1.4.2
jeff/apps/exmc com.zed:prev_version 1.4.1
jeff/apps/exmc com.zed:deployed_at 2026-04-25T17:00:00Z
jeff/apps/exmc com.zed:health passing
jeff/apps/exmc com.zed:snapshot_pre jeff/apps/exmc@zed-1.4.2-20260425
That output is the entire deployment state for that app. To see
what’s deployed across a host: zfs get -r -s local
all jeff. To replicate a deployment to another host:
zfs send | ssh other-host zfs receive. The metadata
arrives with the data, and Zed on the other side knows what was
deployed because it reads the same property keys.
The trick is mundane. ZFS user properties have existed since 2007. What’s strange is how rarely deployment tools use them. Most reach for an external state store first and discover only later that the filesystem already had the right primitive.
Secrets: an encrypted dataset, plus fingerprints in properties
The secrets pipeline is the part that took the most arguing to get right. The original design proposal had two forks: store secrets in ZFS user properties (Fork 1) or in age-encrypted files (Fork 2). The forum chose neither.
What shipped is a third option: an encrypted ZFS dataset for the secret values, plus user properties on the parent dataset for the fingerprints only.
jeff/zed # carries com.zed:* metadata
jeff/zed/secrets # encrypted (aes-256-gcm), canmount=noauto
# mounted at /var/db/zed/secrets when unlocked
# files inside the encrypted mountpoint
/var/db/zed/secrets/beam_cookie (mode 0400)
/var/db/zed/secrets/admin_passwd (mode 0400, PHC-formatted hash)
/var/db/zed/secrets/ssh_host_ed25519 (mode 0400, raw private key)
/var/db/zed/secrets/ssh_host_ed25519.pub (mode 0444, raw public key)
# fingerprints stamped on the parent dataset
$ zfs get -s local all jeff/zed
NAME PROPERTY VALUE
jeff/zed com.zed:secret.beam_cookie.fingerprint sha256:e3b0c4...
jeff/zed com.zed:secret.beam_cookie.algo random_256_b64
jeff/zed com.zed:secret.beam_cookie.consumers beam
jeff/zed com.zed:secret.beam_cookie.rotation_count 3
jeff/zed com.zed:secret.beam_cookie.last_rotated_at 2026-04-25T17:00:00Z
The properties are world-readable to anyone with zfs get
rights. The values are not — they live behind the dataset’s
encryption. zed bootstrap verify reads each file,
recomputes its fingerprint, and tells you which slots have drifted
from what was stamped. Drift is a fingerprint mismatch; the values
themselves never appear in the verify output.
Rotation, with archive
zed bootstrap rotate beam_cookie generates fresh
material, archives the old file under
<mountpoint>/_archive/<slot>-<timestamp>/,
writes the new value, stamps the new fingerprint, increments
com.zed:secret.beam_cookie.rotation_count, records the
prior fingerprint as prev_fingerprint, and snapshots
the zed dataset both before and after the operation. The return
value carries a restart plan derived from the slot’s declared
consumers:
{:ok, %{
slot: :beam_cookie,
prev_fingerprint: "sha256:e3b0c4...",
new_fingerprint: "sha256:9ac1b8...",
archive_path: "/var/db/zed/secrets/_archive/beam_cookie-20260425T173000123456",
rotation_count: 4,
snapshot_pre: "jeff/zed@rotate-pre-beam_cookie-20260425T173000123456",
snapshot_post: "jeff/zed@rotate-post-beam_cookie-20260425T173000123456",
restart_plan: [:beam]
}}
The microsecond suffix on the timestamp is there because the first live test on a Mac Pro discovered that two consecutive rotates within the same wall-clock second produced identical snapshot names and ZFS rejected the second. This is the kind of thing the test fixture is for.
A multi-file slot — :ssh_host_ed25519 has both a
private and a public file — archives both into the same
directory. The archive is a complete prior state; if the rotation
turns out to have been a mistake, you cat the old file back into
place and stamp the old fingerprint. Or roll back the snapshot.
The two paths exist for different operator instincts.
Bastille: the jail backend
Bastille is a 1048-star pure-shell FreeBSD jail manager, BSD-licensed, maintained by Christer Edwards. It does what we’d otherwise have to write: cloned interfaces, jail.conf rendering, pf rdr rules, ZFS dataset snapshotting per-jail. We chose to depend on it rather than reinvent it.
What we wrote was an adapter. The adapter is 540 lines of Elixir
(Zed.Platform.Bastille) plus a Runner behaviour and a
Mock for unit tests. Each public function returns
:ok | {:error, reason} with a typed reason; each
shellout goes through doas; each destructive operation
verifies its own post-condition by re-reading the world after the
exit code says success.
The post-condition check is the line that earned the previous
blog post’s title. bastille destroy -f exits 0 when
asked to destroy a running jail without -a. It says
nothing in the output, leaves the jail running, and lies about the
exit code. The adapter calls destroy, then calls
exists?/1, and returns {:error,
{:destroy_did_nothing, name}} when the truth and the exit
code disagree.
This pattern — verify the truth, not the exit code — is not unique to Bastille. Every adapter we’ve written since has the same shape. The exit code is what the upstream tool says. The post-condition is what we know.
The privilege boundary
The most recent layer, A5a, splits the BEAM into two processes running as two Unix users.
zedweb (uid 8501) zedops (uid 8502)
───────── ─────────
Phoenix endpoint Zed.Ops.Socket ──── /var/run/zed/ops.sock
LiveView controllers Zed.Ops.Bastille.Handler
OTT ledger Runner.System ──── doas bastille ...
Passkey + SSH-key auth doas zfs ...
doas pfctl ...
WebSocket
Audit log writer (planned: A5b.3)
Has zero privileges Holds the per-verb doas rules
zedweb connects to a Unix socket at
/var/run/zed/ops.sock. zedops accepts the
connection and immediately calls a small NIF that wraps
getpeereid(2) on FreeBSD or
getsockopt(SO_PEERCRED) on Linux. Wrong UID → the
socket closes before a single byte of request body flows. Right UID
→ the connection enters a request loop carrying length-prefixed
Erlang term-binary frames.
A LiveView controller bug that lets an attacker call
System.cmd("bastille", ["destroy", "every-jail"]) in
the BEAM gets a :eacces from the kernel, not root. The
attacker has zedweb’s view of the world, which is
read-only secret metadata and a connection to a process that
refuses to do anything except what its doas rules
permit.
The doas rules are per-verb, not catch-all. This
is the line that the rules now contain:
permit nopass zedops as root cmd bastille args create
permit nopass zedops as root cmd bastille args start
permit nopass zedops as root cmd bastille args stop
permit nopass zedops as root cmd bastille args list
...
# destructive ops require step-up auth (planned: A5c)
permit zedops as root cmd bastille args destroy
permit zedops as root cmd zfs args destroy
permit zedops as root cmd zfs args rollback
The boundary is a process boundary, not a code boundary. Every
module is loaded in every release. The role —
:web, :ops, or :full for
single-process dev mode — is decided at boot from the release
name. A release named zedweb that boots without
ZED_ROLE=web in its environment now refuses to start.
The supervisor crash takes the BEAM down before any socket binds.
What this gives you
The kind of shop Zed is for is the kind that runs ten BEAM apps across two or three FreeBSD hosts and finds Kubernetes ridiculous. Most companies fit this description. Most companies are running Kubernetes anyway. Zed is the alternative for the ones who notice.
What’s in the box, after A0 through A5a + A1.rotate:
- A declarative DSL with compile-time validation. No YAML, no Helm, no Kustomize, no Jsonnet.
- State storage that requires no external system. ZFS user properties, replicated by
zfs send. - Encrypted secrets with rotation, archive, fingerprint verification, restart-plan derivation.
- FreeBSD jail lifecycle through a typed adapter that catches the upstream CLI’s lies.
- A process-boundary privilege model with
getpeereid-authenticated IPC and capability-scopeddoas. - An idempotent host-bring-up script that lays down users, network, pf, doas rules, audit directories.
- An admin LiveView with password, passkey (WebAuthn), SSH-key challenge, and QR pairing.
- A self-contained release per role.
tar,scp,./bin/zedweb start.
What’s not yet in the box:
- Multi-host orchestration. Single-host today. The cluster shape is sketched in the iteration plan; no code yet.
- Mobile QR scanner (B0 in the plan). The desktop QR pairing flow exists; the phone-side fork of Probnik is next.
- Step-up auth for destructive ops (A5c in the plan). The doas rules already require it; the LiveView modal does not exist yet.
- illumos parity. SMF + zones, mirroring the FreeBSD path. Same DSL, different platform backend.
Three thousand lines doesn’t feel like much for what it does. That is partly because the verbs being assembled are all things FreeBSD already had: jails, ZFS user properties, doas, encrypted datasets. The work is in connecting them in a way that survives the small betrayals each component commits when nobody is watching.
How it’s tested
216 unit and integration tests on the developer laptop. The laptop is a Linux box; ZFS and Bastille aren’t available there. So we have two additional suites that run only on FreeBSD:
:zfs_live— 24 tests. Stand up an encrypted dataset under a delegated parent (ZED_TEST_DATASET=zroot/zed-test), run the full bootstrap / status / verify / rotate / export-pubkey cycle, tear down. Caught the sub-second snapshot collision in the rotate loop.:bastille_live— 7 tests. Create a jail, rununame -sinside it, stop, destroy, verify the destroy actually destroyed. Two of the seven exercise the full A5a privilege boundary — the test runner connects through the Unix socket and back.
The pattern by now is: write the unit tests against a Mock, land the implementation, push the branch, ssh into one of two FreeBSD Mac Pros, run the live tag. If a real-world quirk surfaces, the live suite catches it in seconds rather than the deploy catching it in production. This is what the test fixture is for.
How to look at it
The repo is at github.com/borodark/zed.
The iteration plan is in specs/iteration-plan.md; the
per-layer specs (A5 Bastille, A5a privilege boundary, B0 mobile
companion) are siblings of it. The ElixirForum thread that started
the secrets-design conversation is the prequel to all of this.
The license is Apache 2.0. The status is pre-1.0,
single-maintainer, design-iterating — so PRs are welcome but
the design surface is still being negotiated. If you’re running ten
BEAM apps on two FreeBSD hosts and you’re tired of explaining what
your helm upgrade does, this might be the project to
watch.
The next layer to land is B0 — the mobile companion app for QR pairing, forked from Probnik. After that, a privileged step-up flow for destructive operations (A5c) and then probably the multi-host shape. None of these change what Zed already is, only what it can do.