The Lie at Exit Zero — dataalienist.com

The dev-host test suite finished in 5.4 seconds. 175 tests, 0 failures, 42 excluded. The 42 excluded were tagged :bastille_live — integration tests that need a real FreeBSD host with a real bastille binary and a real ZFS pool. Nothing on the laptop satisfies any of those. The laptop ran what it could, declared victory, and shut down its BEAM in a respectable five and a half seconds.

The adapter under test was 540 lines of Elixir (Zed.Platform.Bastille) plus a 79-line Runner behaviour that abstracted the actual shelling out, plus a 64-line Runner.Mock that recorded calls and returned canned responses, plus 245 lines of unit tests. Its job is to convert the bastille CLI — a 1048-star pure-shell FreeBSD jail manager — into a typed Elixir API suitable for the rest of zed, a declarative deploy tool. The adapter has a small public surface: create/2, start/1, stop/1, destroy/2, cmd/2, exists?/1. Each function returns :ok | {:error, reason} or {:ok, output} | {:error, reason}. Each function passes its argv through the runner. Each function had a unit test asserting the runner saw the right argv and the adapter classified the runner’s exit code correctly.

The mock was faithful. It modelled the wire between the adapter and bastille. It did not model bastille.

Four hours, eight commits, and seven distinct production-grade bugs later, the live integration test would land at five passes in eleven and a half seconds against a real Mac Pro running FreeBSD 15.0 and bastille 1.4.1. None of the seven bugs were findable on the laptop. None of the seven were the adapter author’s fault, exactly. All of them were lies the mock could not have told because the mock did not know they were lies.

What a mock knows

A mock for a CLI adapter knows three things: argv, exit code, output binary. The unit test phrases its assertions in those terms. Mock.expect(:create, {"verify-sandbox: created\n", 0}) declares that when the next :create call happens, the mock will return that two-element tuple. The adapter under test calls the runner, gets back the canned tuple, and decides what to do with it. The unit test asserts :ok came out the other side; the test passes; the developer moves on.

This is, on its face, all you need. The runner contract is a function with a stable signature:

@callback run(subcommand(), argv(), opts()) ::
  {output :: binary(), exit_code :: non_neg_integer()}

Every implementation conforms to that, by definition. The mock conforms; the production Zed.Platform.Bastille.Runner.System conforms; they are interchangeable from the adapter’s perspective. The unit test exercises the adapter against the mock; the integration test exercises it against the system runner. If the contract is right, the same adapter works against both.

What the mock cannot know is whether the system runner’s exit code means what the adapter assumes it means. Or whether the user the BEAM runs as can actually invoke the binary. Or whether the binary’s textual output, which the mock can trivially produce in any shape, matches the shape the production binary actually emits on the host you happen to be deploying to. The mock’s contract is its handicap. It models the channel. It does not model the world on the other side of the channel.

Seven things the world had to say about it

1. Bastille refuses to talk to you

First commit on the feature branch: 0b78c70 — A5.1: Zed.Platform.Bastille adapter + Runner behaviour + tests. 540 lines of code, 245 lines of tests, clean compile, 173 / 0. git push -u origin feat/a5-bastille-adapter. git pull on the FreeBSD box. mix test --only bastille_live test/zed/platform/bastille_integration_test.exs. The very first assertion in the very first test:

match (=) failed
code:  assert :ok = Bastille.create(name, ip: ip)
left:  :ok
right: {:error, {:bastille_exit, 1, "Bastille: Permission Denied
       \nroot / sudo / doas required"}}

The user running the test was io, a regular operator account. Bastille runs as root, full stop, and is unapologetic about it. The mock, never having actually invoked anything, had no opinion on this. The adapter, having dispatched System.cmd("bastille", ["create", ...]) directly, landed on a binary that promptly told it to go away.

The fix was a configurable privilege prefix — nil by default (call bastille directly, suitable when the BEAM runs as root), or a string like "doas" that the system runner prepends to every invocation:

config :zed, Zed.Platform.Bastille, privilege_prefix: "doas"

The integration test’s setup_all sets the prefix to "doas". Commit 9f99b40. The unit tests were unaffected because the mock doesn’t care about prefixes. The contract the adapter exposes upward had not changed; only its compliance with the operating system’s privilege rules.

2. The destroy that always asks

With the prefix in place, the live test got further. Four of five tests passed. destroy hung. doas bastille destroy -f <name> — with the -f flag that the manpage sells as “force” — nonetheless prompts the operator with Are you sure you want to continue? [y|n]: and waits forever on stdin, which System.cmd/3 never provides. The flag exists. It does not do what its name claims.

Bastille has a -y in some versions, an -a (auto-yes) in others, neither in older builds. The version in front of us was 1.4.1. Hunting through the release notes for a version-stable invocation is a fool’s errand when the cure is sitting in /usr/bin/yes. The system runner’s :destroy branch became:

cmd = "yes | #{sudo_prefix}#{bastille} destroy -f #{shell_escape(name)}"
System.cmd("sh", ["-c", cmd], stderr_to_stdout: true)

Pipe yes into bastille, let the destroy command have all the “y”s it wants, exit when it stops asking. Version-independent. Boring. Effective.

3. The colon that took two hours

The privilege prefix worked, but only sometimes. On free-macpro-nvidia the test ran to its second-to-last assertion. On free-macpro-gpu the same code timed out at bastille create. Same Bastille version, same FreeBSD release, same adapter, different /usr/local/etc/doas.conf. One had:

permit nopass :wheel as :root cmd bastille

The other had:

permit nopass :wheel as root cmd bastille

One character. The OpenBSD doas reference manual is unambiguous: target is a username or, prefixed with a colon, a group name. as root means “as the user named root.” as :root means “as a member of the group root.” The two are not synonymous in the doas grammar. On a healthy doas implementation, both should match the implicit target of doas <cmd>, which runs as user root, member of group root.

On the FreeBSD port we were running, only the colon form worked. Or on some boxes, only the bare form. The empirical evidence kept shifting because the underlying file kept changing — a trailing-newline bug in one of our heredoc commits had been silently truncating the last token of the rule on certain edits. The grammar of doas had not changed; the bytes on disk had. For roughly two hours we argued with the manpage while the actual file flickered between three different parses depending on whether the last commit had ended with \n.

The fix was to write the file via printf "%s\n" "..." "..." with explicit newlines and verify with cat -A — the variant of cat that displays end-of-line markers as $. Both lines must end with $. After that, permit nopass :wheel as root cmd bastille parsed cleanly and matched the implicit target. The adapter had nothing to do with this; we were debugging a config file in a language none of us read for a living. But the live test surfaced it because nothing else was going to.

4. The path that didn’t match itself

A different operator had configured the rule slightly differently:

permit nopass :wheel as :root cmd /usr/local/bin/bastille

Full path. Defensible: the path is unambiguous, anyone could shadow bastille in $PATH, lock the rule to the canonical binary. Reasonable security posture. Unfortunately, doas matches the cmd field against argv[0] — the literal string the caller invoked. doas bastille create ... sets argv[0] to bastille, bare name. The full-path rule does not match. The rule that doesn’t match is replaced, in priority order, by permit persist :wheel, which prompts for a password, which the BEAM doesn’t have a tty for, which hangs.

The fix was to drop the path. The adapter could equally have specified the full path in its invocations, but a shell-PATH lookup is what every operator types from a terminal, and the adapter should not require its users to know which way doas resolves cmd matchers. The contract that the adapter wants from its host environment is “doas bastille works without prompting.” The host’s job is to honour that.

5. The directory that wasn’t a jail

With doas finally co-operating, the live tests got most of the way through. The last test failure was refute Bastille.exists?(name) after a successful destroy. The exists? function was, at the time:

def exists?(name) when is_binary(name) do
  case validate_name(name) do
    :ok -> File.dir?(Path.join(jails_dir(), name))
    _ -> false
  end
end

A perfectly reasonable check. The bastille jails directory at /usr/local/bastille/jails/ contains one subdirectory per managed jail. Created on bastille create, removed on bastille destroy. Or so we expected.

Bastille’s ZFS-backed jails live as datasets that mount at /usr/local/bastille/jails/<name>. bastille destroy destroys the dataset. Destroying a dataset unmounts it. Unmounting it leaves the empty mountpoint directory behind, because the parent dataset (zroot/bastille/jails) defines that directory as one of its children, and a parent dataset’s children directory is just a directory. ZFS does not garbage-collect directories it didn’t create. The dataset was gone. The directory remained as an empty stub. File.dir? reported true on the stub. The adapter said the jail still existed. The test failed.

6. The directory that was a jail but you couldn’t see it

The first repair was “directory exists AND has at least one entry.” Real jails have fstab, root/, configuration files. Empty stubs have nothing. The fix passed in the unit tests immediately and broke in the live tests immediately.

The bastille jail directories are owned by root, mode 0700. The BEAM runs as io. File.ls on a 0700-owned directory returns {:error, :eacces} — permission denied, no listing. My pattern-match for the “non-empty” case was {:ok, [_ | _]} -> true; the :eacces case fell through to _ -> false. Result: every jail looked empty to the adapter, regardless of whether it actually was. False negative across the board.

The contract that exists?/1 wanted from the filesystem was liveness: is this jail a thing or not? The filesystem could only tell us about presence: is there a directory here, can I read it? The two questions overlap most of the time and disagree exactly when the operator is running the BEAM with reduced privilege, which is the operator we want to encourage.

7. The exit code that lied

The third exists?/1 implementation dispatched the question to bastille itself: parse bastille list output, look for the name in column 2. Authoritative across filesystems, permissions, ZFS-vs-UFS, version drift. The unit tests for it asserted column-2 matching, name-prefix collision rejection (ghost must not match ghost-test), empty list, error on non-zero exit. Six new unit tests, all green. The live test ran. Four passes. Final test failed at the final refute Bastille.exists?(name) — the one after a successful destroy.

But destroy had returned :ok. assert :ok = Bastille.destroy(name) on the line above had passed. The adapter said the destroy succeeded. The new authoritative existence check disagreed. One of them was lying.

A direct shell session settled it:

$ doas bastille create test-debug-1777133014 15.0-RELEASE 10.17.89.243/24
test-debug-1777133014: created

$ doas bastille list
 JID  Name                   State  IP Address    Release
 28   test-debug-1777133014  Up     10.17.89.243  15.0-RELEASE

$ yes | doas bastille destroy -f test-debug-1777133014
Jail is running.
Use [-a|--auto] to auto-stop the jail.
$ echo $?
0

$ doas bastille list
 JID  Name                   State  IP Address    Release
 28   test-debug-1777133014  Up     10.17.89.243  15.0-RELEASE

bastille destroy -f against a running jail will not destroy the jail. It will say so. It will exit zero. The exit code is a lie. The adapter, having received exit zero, classified the result as :ok and moved on. The jail continued to run. The list command continued to report it. The live test — the one that, unlike every other test of the destroy contract, actually checked reality afterward — caught the lie in the act.

Bastille has been around for seven years. destroy -f exiting zero on a no-op is a bug in any reasonable reading of the flag’s name. It is also clearly intentional, given the helpful hint pointing at -a. -a means auto-stop running jails before destroying. The combination -a -f does what most operators believe -f should have done all along. The system runner now uses -a -f. And, separately, Bastille.destroy/2 re-checks exists?/1 after the runner returns success — the post-condition that catches not just this lie but whatever future lie a future bastille version finds a new way to tell:

def destroy(name, opts \\ []) do
  with :ok <- validate_name(name),
       :ok <- classify(runner().run(:destroy, [name], opts)) do
    if exists?(name) do
      {:error, {:destroy_did_nothing, name}}
    else
      :ok
    end
  end
end

This is the line where the adapter stopped trusting bastille. Not because bastille is malicious. Because bastille is software, and software occasionally exits zero with the wrong number of fingers on the keyboard.

Side-quests

Three smaller things the live run surfaced that aren’t worth their own section but bear mentioning, because they round out the picture of what mocks miss.

Stale residue. Earlier failed tests left jails on disk. The setup’s on_exit tries to clean up, but on_exit doesn’t run if the test crashes hard enough — an Elixir “test timed out after 60000ms” qualifies. The next run inherits the residue. Defensive cleanup at the start of each setup destroys any jail with the chosen name. The unit tests, having no real disk, were never going to flag this.

Test name collisions. System.unique_integer/1 returns small integers, unique within a single VM’s lifetime. Two consecutive mix test invocations will both produce zed-test-1, zed-test-2, ... If the first invocation leaves any of those jails behind (see previous bullet), the second collides. Switched to :crypto.strong_rand_bytes(4) — 32 bits of entropy in the name, no collision possible across runs. The unit tests, never having had two VM lifetimes, were never going to flag this either.

The space-named file. Somewhere in the iteration, a stray scp deposited a copy of verify-bastille-host.sh at the repo root with a literal space as its filename. git add -A swept it into a commit. Removed in c73a2ec. Not a bastille problem. Not a test problem. A reminder that git add -A will swallow exactly what you put in front of it, including artefacts you don’t remember creating.

What the post-condition check is for

The unit tests covered the adapter’s contract with the runner. The integration tests covered the runner’s contract with bastille. Between them, every wire in the adapter-to-bastille pipe was exercised. Every wire was, at the beginning, fine. The wires were not the problem.

The problem was the contract between the adapter and the truth. The adapter believed destroy followed by exit 0 meant the jail was gone. The truth had no opinion on what the adapter believed. The truth was a directory on disk and a row in a list, observable, queryable, and persistently present despite bastille’s assurances to the contrary. The adapter had to be taught to ask the truth, not the tool.

This is what post-condition checks are for. They are not paranoia. They are not a redundant assert. They are the place where the adapter admits it does not, and cannot, fully trust the soft contract of a CLI. Bastille is a 525-kilobyte shell program with seven years of accumulated semantics, half of them undocumented, some of them version-dependent, all of them subject to whatever the next pull request decides to change. The exit_code integer that bastille emits is exactly as trustworthy as the bastille contributor who wrote the surrounding shell function. Sometimes that is very trustworthy. Sometimes it is “Jail is running. Use [-a|--auto] to auto-stop the jail.” followed by exit 0.

Adapters exist precisely to convert soft contracts into hard ones. The hardness comes from the post-condition: I will tell you this succeeded only after I have observed that it did. The runner cannot promise that. The mock cannot promise that. Only the adapter, with the authority to call exists?/1 after destroy/2 returns, can promise that.

The arc

SHA	Lie	Repair
`0b78c70`	None yet — the foundation	Adapter, Runner, Mock, 245 lines of unit tests
`9f99b40`	Bastille refuses non-root	`privilege_prefix` config; runner prepends doas
`5c8f826`	Test name collisions across VM restarts	`os_time + unique_integer` in setup name
`8cb46e5`	Same, harder — clock-second collisions still possible	32-bit random hex
`6dc5b28`	`exists?` false-positive on empty stubs	(Wrong fix — check `jail.conf`; rolled back)
`1366b9f`	Stub vs real-jail still ambiguous	(Wrong fix — non-empty dir; broke for non-root callers)
`0bac0f1`	FS perms hide truth from non-root BEAM	Authoritative `bastille list` parse
`c510b50`	`destroy -f` on running jail exits 0 with no effect	`-a -f` + post-condition `exists?` check
`daea21a`	— merged to main	5/0 live; adapter ships

Each commit message documents the specific lie the previous version was telling. The lie was usually mine, sometimes bastille’s, once or twice doas’s. None of them were findable until a real host pushed back.

Coda

The final live test ran in eleven and a half seconds and reported 5 tests, 0 failures. The adapter that produced that number was, by line count, twenty-six lines longer than the adapter that had produced 175 tests, 0 failures on the laptop earlier the same afternoon.

Twenty-six lines is not a lot of code for an afternoon. The adapter had not, in any meaningful sense, improved. The adapter had lost its illusions. Each lost illusion took the form of a function that now does slightly more than it used to: destroy/2 verifies, exists?/1 queries authoritatively, the system runner pipes yes and forces auto-stop, the configuration accepts a privilege_prefix. None of this is more elegant. All of it is more correct.

The unit-test count went from 173 to 175 to 175 to 173 to 175 to 175. It moved up and down by twos, like a tide; the count was never the point. The point was the integration test, which spent the afternoon failing in seven different ways and ended up green in a way that, if it ever fails again, will fail loudly and with a specific, actionable diagnosis. {:error, {:destroy_did_nothing, name}} is not a sentence I want to read in production. It is a sentence I am very glad my adapter is now capable of producing, because the alternative is silence followed by an angry phone call.

The adapter did not improve. The truth about Bastille got better.

The adapter at daea21a is the merged form of the seven-bug arc. Zed.Platform.Bastille is at lib/zed/platform/bastille.ex; the runner at lib/zed/platform/bastille/runner.ex; the integration tests, the ones that actually mattered, at test/zed/platform/bastille_integration_test.exs. The verify script that catches host-config drift before any of this runs lives at scripts/verify-bastille-host.sh. The seven-failure live session was on free-macpro-gpu, a 2013 Mac Pro running FreeBSD 15.0 with bastille 1.4.1.260315 and a ZFS pool named zroot_mac. None of these details would have mattered to a mock.