What Zed Is, Now — dataalienist.com

The first description of Zed I ever wrote called it “a tool that compiles declarative deployment specifications into convergence operations against ZFS.” The sentence was true and useless. It told you what category Zed belonged to and nothing about what made it interesting. After nine days of iteration that landed five layers on top of the existing four, the right elevator pitch is shorter:

Zed is a deploy tool that lives inside the filesystem it deploys to. The state of every running app is a ZFS user property. The secret material is an encrypted dataset whose mount point produces files the BEAM can read. The jails are managed by a 1048-star FreeBSD CLI called Bastille, wrapped in an Elixir adapter that converts the CLI’s soft contract into a hard one. The privilege model is two Unix users connected by a Unix-domain socket whose authentication mechanism is getpeereid(2). The whole thing is about three thousand lines of Elixir, plus eighty lines of C for one NIF.

There is no etcd. There is no Vault. There is no Kubernetes. There is no Docker. There are two mix release targets that each produce a self-contained tarball. You install the tarball. You run it. The system tells you what state it’s in by reading the filesystem.

This is a reintroduction — not a retrospective. The previous post on this site, The Lie at Exit Zero, was the messy birth of the Bastille adapter. This is what the toddler looks like.

The DSL

A Zed deployment is an Elixir module. It does not generate YAML. It does not consume YAML. The compile step IS the validation step: unknown verbs, broken references, and forbidden storage modes all fail at mix compile with a source location.

defmodule MyInfra.Trading do
  use Zed.DSL

  deploy :trading, pool: "jeff" do
    dataset "apps/exmc" do
      compression :lz4
      quota "50G"
    end

    app :exmc do
      dataset "apps/exmc"
      version "1.4.2"
      cookie {:secret, :beam_cookie, :value}
      env_file "/etc/exmc/env"

      health :beam_ping, timeout: 5_000
    end

    jail :trading_jail do
      dataset "jails/trading"
      ip4 "10.0.1.100/24"
      contains :exmc
    end

    snapshots do
      before_deploy true
      keep 5
    end
  end
end

This is not a template. This is not a configuration file. This is the actual specification. MyInfra.Trading.converge/0 diffs current ZFS state against this IR, plans an ordered sequence of operations, snapshots before mutations, and rolls back the entire plan if any single step fails. The rollback is one ZFS command and runs in constant time regardless of dataset size.

The cookie {:secret, :beam_cookie, :value} reference is the part that wasn’t there a month ago. We’ll come back to it.

The state model: ZFS user properties

Every ZFS dataset can carry arbitrary string-keyed metadata. The keys live in a namespace; ours is com.zed. The values travel with the dataset through snapshots and through zfs send | zfs receive over an SSH pipe. There is no external state store because the state IS the dataset.

$ zfs get all jeff/apps/exmc | grep com.zed
jeff/apps/exmc  com.zed:managed         true
jeff/apps/exmc  com.zed:app             exmc
jeff/apps/exmc  com.zed:version         1.4.2
jeff/apps/exmc  com.zed:prev_version    1.4.1
jeff/apps/exmc  com.zed:deployed_at     2026-04-25T17:00:00Z
jeff/apps/exmc  com.zed:health          passing
jeff/apps/exmc  com.zed:snapshot_pre    jeff/apps/exmc@zed-1.4.2-20260425

That output is the entire deployment state for that app. To see what’s deployed across a host: zfs get -r -s local all jeff. To replicate a deployment to another host: zfs send | ssh other-host zfs receive. The metadata arrives with the data, and Zed on the other side knows what was deployed because it reads the same property keys.

The trick is mundane. ZFS user properties have existed since 2007. What’s strange is how rarely deployment tools use them. Most reach for an external state store first and discover only later that the filesystem already had the right primitive.

Secrets: an encrypted dataset, plus fingerprints in properties

The secrets pipeline is the part that took the most arguing to get right. The original design proposal had two forks: store secrets in ZFS user properties (Fork 1) or in age-encrypted files (Fork 2). The forum chose neither.

What shipped is a third option: an encrypted ZFS dataset for the secret values, plus user properties on the parent dataset for the fingerprints only.

jeff/zed                    # carries com.zed:* metadata
jeff/zed/secrets            # encrypted (aes-256-gcm), canmount=noauto
                            # mounted at /var/db/zed/secrets when unlocked

# files inside the encrypted mountpoint
/var/db/zed/secrets/beam_cookie           (mode 0400)
/var/db/zed/secrets/admin_passwd          (mode 0400, PHC-formatted hash)
/var/db/zed/secrets/ssh_host_ed25519      (mode 0400, raw private key)
/var/db/zed/secrets/ssh_host_ed25519.pub  (mode 0444, raw public key)

# fingerprints stamped on the parent dataset
$ zfs get -s local all jeff/zed
NAME      PROPERTY                              VALUE
jeff/zed  com.zed:secret.beam_cookie.fingerprint    sha256:e3b0c4...
jeff/zed  com.zed:secret.beam_cookie.algo            random_256_b64
jeff/zed  com.zed:secret.beam_cookie.consumers       beam
jeff/zed  com.zed:secret.beam_cookie.rotation_count  3
jeff/zed  com.zed:secret.beam_cookie.last_rotated_at 2026-04-25T17:00:00Z

The properties are world-readable to anyone with zfs get rights. The values are not — they live behind the dataset’s encryption. zed bootstrap verify reads each file, recomputes its fingerprint, and tells you which slots have drifted from what was stamped. Drift is a fingerprint mismatch; the values themselves never appear in the verify output.

Rotation, with archive

zed bootstrap rotate beam_cookie generates fresh material, archives the old file under <mountpoint>/_archive/<slot>-<timestamp>/, writes the new value, stamps the new fingerprint, increments com.zed:secret.beam_cookie.rotation_count, records the prior fingerprint as prev_fingerprint, and snapshots the zed dataset both before and after the operation. The return value carries a restart plan derived from the slot’s declared consumers:

{:ok, %{
  slot: :beam_cookie,
  prev_fingerprint: "sha256:e3b0c4...",
  new_fingerprint:  "sha256:9ac1b8...",
  archive_path:     "/var/db/zed/secrets/_archive/beam_cookie-20260425T173000123456",
  rotation_count:   4,
  snapshot_pre:     "jeff/zed@rotate-pre-beam_cookie-20260425T173000123456",
  snapshot_post:    "jeff/zed@rotate-post-beam_cookie-20260425T173000123456",
  restart_plan:     [:beam]
}}

The microsecond suffix on the timestamp is there because the first live test on a Mac Pro discovered that two consecutive rotates within the same wall-clock second produced identical snapshot names and ZFS rejected the second. This is the kind of thing the test fixture is for.

A multi-file slot — :ssh_host_ed25519 has both a private and a public file — archives both into the same directory. The archive is a complete prior state; if the rotation turns out to have been a mistake, you cat the old file back into place and stamp the old fingerprint. Or roll back the snapshot. The two paths exist for different operator instincts.

Bastille: the jail backend

Bastille is a 1048-star pure-shell FreeBSD jail manager, BSD-licensed, maintained by Christer Edwards. It does what we’d otherwise have to write: cloned interfaces, jail.conf rendering, pf rdr rules, ZFS dataset snapshotting per-jail. We chose to depend on it rather than reinvent it.

What we wrote was an adapter. The adapter is 540 lines of Elixir (Zed.Platform.Bastille) plus a Runner behaviour and a Mock for unit tests. Each public function returns :ok | {:error, reason} with a typed reason; each shellout goes through doas; each destructive operation verifies its own post-condition by re-reading the world after the exit code says success.

The post-condition check is the line that earned the previous blog post’s title. bastille destroy -f exits 0 when asked to destroy a running jail without -a. It says nothing in the output, leaves the jail running, and lies about the exit code. The adapter calls destroy, then calls exists?/1, and returns {:error, {:destroy_did_nothing, name}} when the truth and the exit code disagree.

This pattern — verify the truth, not the exit code — is not unique to Bastille. Every adapter we’ve written since has the same shape. The exit code is what the upstream tool says. The post-condition is what we know.

The privilege boundary

The most recent layer, A5a, splits the BEAM into two processes running as two Unix users.

zedweb (uid 8501)              zedops (uid 8502)
─────────                       ─────────
Phoenix endpoint                Zed.Ops.Socket  ────  /var/run/zed/ops.sock
LiveView controllers            Zed.Ops.Bastille.Handler
OTT ledger                      Runner.System  ────  doas bastille ...
Passkey + SSH-key auth                                doas zfs ...
                                                      doas pfctl ...
WebSocket
                                Audit log writer (planned: A5b.3)
Has zero privileges            Holds the per-verb doas rules

zedweb connects to a Unix socket at /var/run/zed/ops.sock. zedops accepts the connection and immediately calls a small NIF that wraps getpeereid(2) on FreeBSD or getsockopt(SO_PEERCRED) on Linux. Wrong UID → the socket closes before a single byte of request body flows. Right UID → the connection enters a request loop carrying length-prefixed Erlang term-binary frames.

A LiveView controller bug that lets an attacker call System.cmd("bastille", ["destroy", "every-jail"]) in the BEAM gets a :eacces from the kernel, not root. The attacker has zedweb’s view of the world, which is read-only secret metadata and a connection to a process that refuses to do anything except what its doas rules permit.

The doas rules are per-verb, not catch-all. This is the line that the rules now contain:

permit nopass zedops as root cmd bastille args create
permit nopass zedops as root cmd bastille args start
permit nopass zedops as root cmd bastille args stop
permit nopass zedops as root cmd bastille args list
...
# destructive ops require step-up auth (planned: A5c)
permit zedops as root cmd bastille args destroy
permit zedops as root cmd zfs args destroy
permit zedops as root cmd zfs args rollback

The boundary is a process boundary, not a code boundary. Every module is loaded in every release. The role — :web, :ops, or :full for single-process dev mode — is decided at boot from the release name. A release named zedweb that boots without ZED_ROLE=web in its environment now refuses to start. The supervisor crash takes the BEAM down before any socket binds.

What this gives you

The kind of shop Zed is for is the kind that runs ten BEAM apps across two or three FreeBSD hosts and finds Kubernetes ridiculous. Most companies fit this description. Most companies are running Kubernetes anyway. Zed is the alternative for the ones who notice.

What’s in the box, after A0 through A5a + A1.rotate:

A declarative DSL with compile-time validation. No YAML, no Helm, no Kustomize, no Jsonnet.
State storage that requires no external system. ZFS user properties, replicated by zfs send.
Encrypted secrets with rotation, archive, fingerprint verification, restart-plan derivation.
FreeBSD jail lifecycle through a typed adapter that catches the upstream CLI’s lies.
A process-boundary privilege model with getpeereid-authenticated IPC and capability-scoped doas.
An idempotent host-bring-up script that lays down users, network, pf, doas rules, audit directories.
An admin LiveView with password, passkey (WebAuthn), SSH-key challenge, and QR pairing.
A self-contained release per role. tar, scp, ./bin/zedweb start.

What’s not yet in the box:

Multi-host orchestration. Single-host today. The cluster shape is sketched in the iteration plan; no code yet.
Mobile QR scanner (B0 in the plan). The desktop QR pairing flow exists; the phone-side fork of Probnik is next.
Step-up auth for destructive ops (A5c in the plan). The doas rules already require it; the LiveView modal does not exist yet.
illumos parity. SMF + zones, mirroring the FreeBSD path. Same DSL, different platform backend.

Three thousand lines doesn’t feel like much for what it does. That is partly because the verbs being assembled are all things FreeBSD already had: jails, ZFS user properties, doas, encrypted datasets. The work is in connecting them in a way that survives the small betrayals each component commits when nobody is watching.

How it’s tested

216 unit and integration tests on the developer laptop. The laptop is a Linux box; ZFS and Bastille aren’t available there. So we have two additional suites that run only on FreeBSD:

:zfs_live — 24 tests. Stand up an encrypted dataset under a delegated parent (ZED_TEST_DATASET=zroot/zed-test), run the full bootstrap / status / verify / rotate / export-pubkey cycle, tear down. Caught the sub-second snapshot collision in the rotate loop.
:bastille_live — 7 tests. Create a jail, run uname -s inside it, stop, destroy, verify the destroy actually destroyed. Two of the seven exercise the full A5a privilege boundary — the test runner connects through the Unix socket and back.

The pattern by now is: write the unit tests against a Mock, land the implementation, push the branch, ssh into one of two FreeBSD Mac Pros, run the live tag. If a real-world quirk surfaces, the live suite catches it in seconds rather than the deploy catching it in production. This is what the test fixture is for.

How to look at it

The repo is at github.com/borodark/zed. The iteration plan is in specs/iteration-plan.md; the per-layer specs (A5 Bastille, A5a privilege boundary, B0 mobile companion) are siblings of it. The ElixirForum thread that started the secrets-design conversation is the prequel to all of this.

The license is Apache 2.0. The status is pre-1.0, single-maintainer, design-iterating — so PRs are welcome but the design surface is still being negotiated. If you’re running ten BEAM apps on two FreeBSD hosts and you’re tired of explaining what your helm upgrade does, this might be the project to watch.

The next layer to land is B0 — the mobile companion app for QR pairing, forked from Probnik. After that, a privileged step-up flow for destructive operations (A5c) and then probably the multi-host shape. None of these change what Zed already is, only what it can do.