The dev-host test suite finished in 5.4 seconds. 175 tests, 0
failures, 42 excluded. The 42 excluded were tagged
:bastille_live — integration tests that need a real
FreeBSD host with a real bastille binary and a real ZFS pool.
Nothing on the laptop satisfies any of those. The laptop ran what it
could, declared victory, and shut down its BEAM in a respectable
five and a half seconds.
The adapter under test was 540 lines of Elixir
(Zed.Platform.Bastille) plus a 79-line
Runner behaviour that abstracted the actual shelling
out, plus a 64-line Runner.Mock that recorded calls and
returned canned responses, plus 245 lines of unit tests. Its job is
to convert the bastille CLI — a 1048-star
pure-shell FreeBSD jail manager — into a typed Elixir API
suitable for the rest of zed, a declarative deploy
tool. The adapter has a small public surface:
create/2, start/1, stop/1,
destroy/2, cmd/2, exists?/1.
Each function returns :ok | {:error, reason} or
{:ok, output} | {:error, reason}. Each function passes
its argv through the runner. Each function had a unit test asserting
the runner saw the right argv and the adapter classified the runner’s
exit code correctly.
The mock was faithful. It modelled the wire between the adapter and bastille. It did not model bastille.
Four hours, eight commits, and seven distinct production-grade bugs later, the live integration test would land at five passes in eleven and a half seconds against a real Mac Pro running FreeBSD 15.0 and bastille 1.4.1. None of the seven bugs were findable on the laptop. None of the seven were the adapter author’s fault, exactly. All of them were lies the mock could not have told because the mock did not know they were lies.
What a mock knows
A mock for a CLI adapter knows three things: argv, exit code, output
binary. The unit test phrases its assertions in those terms.
Mock.expect(:create, {"verify-sandbox: created\n", 0})
declares that when the next :create call happens, the
mock will return that two-element tuple. The adapter under test
calls the runner, gets back the canned tuple, and decides what to
do with it. The unit test asserts :ok came out the
other side; the test passes; the developer moves on.
This is, on its face, all you need. The runner contract is a function with a stable signature:
@callback run(subcommand(), argv(), opts()) ::
{output :: binary(), exit_code :: non_neg_integer()}
Every implementation conforms to that, by definition. The mock
conforms; the production
Zed.Platform.Bastille.Runner.System conforms;
they are interchangeable from the adapter’s perspective.
The unit test exercises the adapter against the mock; the
integration test exercises it against the system runner. If the
contract is right, the same adapter works against both.
What the mock cannot know is whether the system runner’s exit code means what the adapter assumes it means. Or whether the user the BEAM runs as can actually invoke the binary. Or whether the binary’s textual output, which the mock can trivially produce in any shape, matches the shape the production binary actually emits on the host you happen to be deploying to. The mock’s contract is its handicap. It models the channel. It does not model the world on the other side of the channel.
Seven things the world had to say about it
1. Bastille refuses to talk to you
First commit on the feature branch:
0b78c70 — A5.1: Zed.Platform.Bastille adapter + Runner
behaviour + tests. 540 lines of code, 245 lines of tests,
clean compile, 173 / 0. git push -u origin
feat/a5-bastille-adapter. git pull on the
FreeBSD box. mix test --only bastille_live
test/zed/platform/bastille_integration_test.exs. The very
first assertion in the very first test:
match (=) failed
code: assert :ok = Bastille.create(name, ip: ip)
left: :ok
right: {:error, {:bastille_exit, 1, "Bastille: Permission Denied
\nroot / sudo / doas required"}}
The user running the test was io, a regular operator
account. Bastille runs as root, full stop, and is unapologetic about
it. The mock, never having actually invoked anything, had no
opinion on this. The adapter, having dispatched
System.cmd("bastille", ["create", ...]) directly,
landed on a binary that promptly told it to go away.
The fix was a configurable privilege prefix —
nil by default (call bastille directly, suitable when
the BEAM runs as root), or a string like "doas" that
the system runner prepends to every invocation:
config :zed, Zed.Platform.Bastille, privilege_prefix: "doas"
The integration test’s setup_all sets the prefix
to "doas". Commit 9f99b40. The unit tests
were unaffected because the mock doesn’t care about prefixes.
The contract the adapter exposes upward had not changed;
only its compliance with the operating system’s privilege
rules.
2. The destroy that always asks
With the prefix in place, the live test got further. Four of five
tests passed. destroy hung. doas bastille destroy
-f <name> — with the -f flag that the
manpage sells as “force” — nonetheless prompts the
operator with Are you sure you want to continue? [y|n]:
and waits forever on stdin, which System.cmd/3 never
provides. The flag exists. It does not do what its name claims.
Bastille has a -y in some versions, an
-a (auto-yes) in others, neither in older builds.
The version in front of us was 1.4.1. Hunting through the
release notes for a version-stable invocation is a fool’s
errand when the cure is sitting in
/usr/bin/yes. The system runner’s
:destroy branch became:
cmd = "yes | #{sudo_prefix}#{bastille} destroy -f #{shell_escape(name)}"
System.cmd("sh", ["-c", cmd], stderr_to_stdout: true)
Pipe yes into bastille, let the destroy command have
all the “y”s it wants, exit when it stops asking.
Version-independent. Boring. Effective.
3. The colon that took two hours
The privilege prefix worked, but only sometimes. On
free-macpro-nvidia the test ran to its second-to-last
assertion. On free-macpro-gpu the same code timed out
at bastille create. Same Bastille version, same
FreeBSD release, same adapter, different
/usr/local/etc/doas.conf. One had:
permit nopass :wheel as :root cmd bastille
The other had:
permit nopass :wheel as root cmd bastille
One character. The OpenBSD doas reference manual is unambiguous:
target is a username or, prefixed with a colon, a group
name. as root means “as the user named
root.” as :root means “as a member of the
group root.” The two are not synonymous in the doas grammar.
On a healthy doas implementation, both should match the implicit
target of doas <cmd>, which runs as user root,
member of group root.
On the FreeBSD port we were running, only the colon form worked.
Or on some boxes, only the bare form. The empirical evidence kept
shifting because the underlying file kept changing — a
trailing-newline bug in one of our heredoc commits had been
silently truncating the last token of the rule on certain
edits. The grammar of doas had not changed; the bytes on disk had.
For roughly two hours we argued with the manpage while the actual
file flickered between three different parses depending on whether
the last commit had ended with \n.
The fix was to write the file via
printf "%s\n" "..." "..." with explicit newlines and
verify with cat -A — the variant of
cat that displays end-of-line markers as
$. Both lines must end with $. After
that, permit nopass :wheel as root cmd bastille
parsed cleanly and matched the implicit target. The adapter had
nothing to do with this; we were debugging a config file in a
language none of us read for a living. But the live test surfaced
it because nothing else was going to.
4. The path that didn’t match itself
A different operator had configured the rule slightly differently:
permit nopass :wheel as :root cmd /usr/local/bin/bastille
Full path. Defensible: the path is unambiguous, anyone could
shadow bastille in $PATH, lock the rule to
the canonical binary. Reasonable security posture.
Unfortunately, doas matches the cmd field against
argv[0] — the literal string the caller invoked.
doas bastille create ... sets argv[0] to
bastille, bare name. The full-path rule does not
match. The rule that doesn’t match is replaced, in
priority order, by permit persist :wheel, which prompts
for a password, which the BEAM doesn’t have a tty for, which
hangs.
The fix was to drop the path. The adapter could equally have
specified the full path in its invocations, but a shell-PATH lookup
is what every operator types from a terminal, and the adapter
should not require its users to know which way doas resolves
cmd matchers. The contract that the adapter wants from
its host environment is “doas bastille works
without prompting.” The host’s job is to honour that.
5. The directory that wasn’t a jail
With doas finally co-operating, the live tests got most of the way
through. The last test failure was refute Bastille.exists?(name)
after a successful destroy. The exists? function was, at
the time:
def exists?(name) when is_binary(name) do
case validate_name(name) do
:ok -> File.dir?(Path.join(jails_dir(), name))
_ -> false
end
end
A perfectly reasonable check. The bastille jails directory at
/usr/local/bastille/jails/ contains one subdirectory
per managed jail. Created on bastille create, removed on
bastille destroy. Or so we expected.
Bastille’s ZFS-backed jails live as datasets that mount at
/usr/local/bastille/jails/<name>. bastille
destroy destroys the dataset. Destroying a dataset unmounts
it. Unmounting it leaves the empty mountpoint directory behind,
because the parent dataset (zroot/bastille/jails)
defines that directory as one of its children, and a parent
dataset’s children directory is just a directory. ZFS does
not garbage-collect directories it didn’t create. The dataset
was gone. The directory remained as an empty stub.
File.dir? reported true on the stub. The
adapter said the jail still existed. The test failed.
6. The directory that was a jail but you couldn’t see it
The first repair was “directory exists AND has at least one
entry.” Real jails have fstab, root/,
configuration files. Empty stubs have nothing. The fix passed in
the unit tests immediately and broke in the live tests immediately.
The bastille jail directories are owned by root, mode 0700. The BEAM
runs as io. File.ls on a 0700-owned
directory returns {:error, :eacces} — permission
denied, no listing. My pattern-match for the “non-empty”
case was {:ok, [_ | _]} -> true; the
:eacces case fell through to _ -> false.
Result: every jail looked empty to the adapter, regardless of
whether it actually was. False negative across the board.
The contract that exists?/1 wanted from the
filesystem was liveness: is this jail a thing or not?
The filesystem could only tell us about presence: is there
a directory here, can I read it? The two questions overlap most of
the time and disagree exactly when the operator is running the
BEAM with reduced privilege, which is the operator we want to
encourage.
7. The exit code that lied
The third exists?/1 implementation dispatched the
question to bastille itself: parse bastille list
output, look for the name in column 2. Authoritative across
filesystems, permissions, ZFS-vs-UFS, version drift. The unit
tests for it asserted column-2 matching, name-prefix collision
rejection (ghost must not match ghost-test),
empty list, error on non-zero exit. Six new unit tests, all green.
The live test ran. Four passes. Final test failed at the final
refute Bastille.exists?(name) — the one
after a successful destroy.
But destroy had returned :ok. assert :ok =
Bastille.destroy(name) on the line above had passed. The
adapter said the destroy succeeded. The new authoritative existence
check disagreed. One of them was lying.
A direct shell session settled it:
$ doas bastille create test-debug-1777133014 15.0-RELEASE 10.17.89.243/24
test-debug-1777133014: created
$ doas bastille list
JID Name State IP Address Release
28 test-debug-1777133014 Up 10.17.89.243 15.0-RELEASE
$ yes | doas bastille destroy -f test-debug-1777133014
Jail is running.
Use [-a|--auto] to auto-stop the jail.
$ echo $?
0
$ doas bastille list
JID Name State IP Address Release
28 test-debug-1777133014 Up 10.17.89.243 15.0-RELEASE
bastille destroy -f against a running jail will not
destroy the jail. It will say so. It will exit zero. The exit code
is a lie. The adapter, having received exit zero, classified the
result as :ok and moved on. The jail continued to run.
The list command continued to report it. The live test
— the one that, unlike every other test of the destroy
contract, actually checked reality afterward — caught the lie
in the act.
Bastille has been around for seven years. destroy -f
exiting zero on a no-op is a bug in any reasonable reading of the
flag’s name. It is also clearly intentional, given the helpful
hint pointing at -a. -a means
auto-stop running jails before destroying. The combination
-a -f does what most operators believe -f
should have done all along. The system runner now uses
-a -f. And, separately, Bastille.destroy/2
re-checks exists?/1 after the runner returns success
— the post-condition that catches not just this lie but
whatever future lie a future bastille version finds a new way to
tell:
def destroy(name, opts \\ []) do
with :ok <- validate_name(name),
:ok <- classify(runner().run(:destroy, [name], opts)) do
if exists?(name) do
{:error, {:destroy_did_nothing, name}}
else
:ok
end
end
end
This is the line where the adapter stopped trusting bastille. Not because bastille is malicious. Because bastille is software, and software occasionally exits zero with the wrong number of fingers on the keyboard.
Side-quests
Three smaller things the live run surfaced that aren’t worth their own section but bear mentioning, because they round out the picture of what mocks miss.
Stale residue. Earlier failed tests left jails on
disk. The setup’s on_exit tries to clean up, but
on_exit doesn’t run if the test crashes hard
enough — an Elixir “test timed out after 60000ms”
qualifies. The next run inherits the residue. Defensive cleanup at
the start of each setup destroys any jail with the chosen name.
The unit tests, having no real disk, were never going to flag this.
Test name collisions.
System.unique_integer/1 returns small integers,
unique within a single VM’s lifetime. Two consecutive
mix test invocations will both produce
zed-test-1, zed-test-2, ... If the first
invocation leaves any of those jails behind (see previous bullet),
the second collides. Switched to
:crypto.strong_rand_bytes(4) — 32 bits of
entropy in the name, no collision possible across runs. The unit
tests, never having had two VM lifetimes, were never going to flag
this either.
The space-named file. Somewhere in the iteration,
a stray scp deposited a copy of
verify-bastille-host.sh at the repo root with a
literal space as its filename. git add -A swept it
into a commit. Removed in c73a2ec. Not a bastille
problem. Not a test problem. A reminder that git add -A
will swallow exactly what you put in front of it, including
artefacts you don’t remember creating.
What the post-condition check is for
The unit tests covered the adapter’s contract with the runner. The integration tests covered the runner’s contract with bastille. Between them, every wire in the adapter-to-bastille pipe was exercised. Every wire was, at the beginning, fine. The wires were not the problem.
The problem was the contract between the adapter and the
truth. The adapter believed destroy followed by
exit 0 meant the jail was gone. The truth had no
opinion on what the adapter believed. The truth was a directory on
disk and a row in a list, observable, queryable, and persistently
present despite bastille’s assurances to the contrary. The
adapter had to be taught to ask the truth, not the tool.
This is what post-condition checks are for. They are not paranoia.
They are not a redundant assert. They are the place where the
adapter admits it does not, and cannot, fully trust the soft
contract of a CLI. Bastille is a 525-kilobyte shell program with
seven years of accumulated semantics, half of them undocumented,
some of them version-dependent, all of them subject to whatever
the next pull request decides to change. The
exit_code integer that bastille emits is exactly as
trustworthy as the bastille contributor who wrote the surrounding
shell function. Sometimes that is very trustworthy. Sometimes it
is “Jail is running. Use [-a|--auto] to auto-stop the
jail.” followed by exit 0.
Adapters exist precisely to convert soft contracts into hard ones.
The hardness comes from the post-condition: I will tell you
this succeeded only after I have observed that it did. The
runner cannot promise that. The mock cannot promise that. Only the
adapter, with the authority to call exists?/1 after
destroy/2 returns, can promise that.
The arc
| SHA | Lie | Repair |
|---|---|---|
0b78c70 | None yet — the foundation | Adapter, Runner, Mock, 245 lines of unit tests |
9f99b40 | Bastille refuses non-root | privilege_prefix config; runner prepends doas |
5c8f826 | Test name collisions across VM restarts | os_time + unique_integer in setup name |
8cb46e5 | Same, harder — clock-second collisions still possible | 32-bit random hex |
6dc5b28 | exists? false-positive on empty stubs | (Wrong fix — check jail.conf; rolled back) |
1366b9f | Stub vs real-jail still ambiguous | (Wrong fix — non-empty dir; broke for non-root callers) |
0bac0f1 | FS perms hide truth from non-root BEAM | Authoritative bastille list parse |
c510b50 | destroy -f on running jail exits 0 with no effect | -a -f + post-condition exists? check |
daea21a | — merged to main | 5/0 live; adapter ships |
Each commit message documents the specific lie the previous version was telling. The lie was usually mine, sometimes bastille’s, once or twice doas’s. None of them were findable until a real host pushed back.
Coda
The final live test ran in eleven and a half seconds and reported
5 tests, 0 failures. The adapter that produced that
number was, by line count, twenty-six lines longer than the
adapter that had produced 175 tests, 0 failures on
the laptop earlier the same afternoon.
Twenty-six lines is not a lot of code for an afternoon. The
adapter had not, in any meaningful sense, improved. The adapter had
lost its illusions. Each lost illusion took the form of a function
that now does slightly more than it used to:
destroy/2 verifies, exists?/1 queries
authoritatively, the system runner pipes yes and
forces auto-stop, the configuration accepts a
privilege_prefix. None of this is more elegant. All of
it is more correct.
The unit-test count went from 173 to 175 to 175 to 173 to 175 to
175. It moved up and down by twos, like a tide; the count was
never the point. The point was the integration test, which spent
the afternoon failing in seven different ways and ended up green
in a way that, if it ever fails again, will fail loudly and with a
specific, actionable diagnosis. {:error,
{:destroy_did_nothing, name}} is not a sentence I want to
read in production. It is a sentence I am very glad my adapter is
now capable of producing, because the alternative is silence
followed by an angry phone call.
The adapter did not improve. The truth about Bastille got better.
The adapter at daea21a is the merged form of the
seven-bug arc. Zed.Platform.Bastille is at
lib/zed/platform/bastille.ex;
the runner at
lib/zed/platform/bastille/runner.ex;
the integration tests, the ones that actually mattered, at
test/zed/platform/bastille_integration_test.exs.
The verify script that catches host-config drift before any of this
runs lives at
scripts/verify-bastille-host.sh.
The seven-failure live session was on free-macpro-gpu,
a 2013 Mac Pro running FreeBSD 15.0 with bastille 1.4.1.260315 and a
ZFS pool named zroot_mac. None of these details would
have mattered to a mock.