The Release That Never Seized

April 2026 sim_ex proper_statem testing

The counter-example was four lines long:

init_resource(3, true)
seize(1, 3)
release(1)
release(3)

Initialize a resource with capacity three. Seize it with job 1 at priority 3. Release job 1. Release job 3.

Job 3 was never seized. The engine accepted the release anyway. Busy count: zero. Releases: two. Grants: one. A resource that has released more jobs than it ever granted is not a resource. It is a bookkeeping error with a seize/release protocol wrapped around it.

One hundred and fourteen tests missed this. Eleven property-based tests across 350 random parameter configurations missed this. The bug is not in the parameters. It is in the sequence.

The Tool

PropEr’s proper_statem is a state machine testing framework. You model the system under test as an abstract state machine: an initial state, a set of commands that can be generated from each state, preconditions that guard which commands are valid, and postconditions that verify the system’s response. PropEr generates random command sequences, executes them against the real system, and checks that every postcondition holds after every step. When one fails, it shrinks the sequence to the minimal reproduction.

The difference from property-based testing is the dimension of attack. Property tests generate random inputs and check invariants on the output. State machine tests generate random sequences of operations and check invariants on the state transitions. The barbershop with seed 42 is one path through the state space. Seven hundred adversarial command sequences are seven hundred paths, each chosen to maximize the probability of finding a postcondition violation.

No discrete-event simulation engine in the published literature has been tested this way.

Three Models

We built three proper_statem models, each targeting a different layer of the sim_ex engine.

ModelTestsCommandsWhat it hunts
Engine200step, run_n, check, initCalendar corruption, clock regression, entity disappearance
Resource300seize, release, check, initProtocol violations, queue corruption, statistics drift
Adversarial200step, run_n, check, initPreemption bugs, generation counter failures, hold interruption

The Engine model wraps the full simulation loop. It initializes a barbershop (one barber, Poisson arrivals, exponential service), then generates sequences of step (execute one event), run_n (execute up to 50 events), and check (verify all invariants). After every command, the postcondition verifies: clock never decreases, events processed increments correctly, all entities still exist, calendar remains sorted, every pending event targets a registered entity, departures never exceed arrivals.

Two hundred sequences. Every postcondition held. The engine loop is clean.

The Adversarial model replaces the barbershop with a preemptive factory: one machine, normal orders at priority 5 arriving every 5 time units, rush orders at priority 1 arriving every 50. Rush orders eject normal orders mid-service. The postconditions add: busy never exceeds capacity, grants never fall below releases, preemption count is non-negative.

Two hundred sequences. Every postcondition held. The preemption protocol survives adversarial scheduling.

The Resource model strips away the engine entirely. It operates on Sim.DSL.Resource in isolation: initialize with random capacity (1–4) and random preemptive flag, then generate sequences of seize (with random job ID and priority) and release (of a held job). No simulation clock. No calendar. No event dispatch. Just the seize/release protocol, hammered with three hundred random sequences.

This is where the bug was.

The Discovery

The first run of the Resource model produced zero release commands. The generator offered releases only when the model predicted busy > 0, but the model’s prediction of busy was wrong during PropEr’s symbolic generation phase. During generation, PropEr doesn’t execute commands — it reasons about them symbolically. The model predicted busy = 0 because the symbolic result of seize hadn’t been evaluated yet. No predicted holders, no generated releases, no tested sequences.

The fix was to predict optimistically: if capacity is available, assume the seize succeeds and add the job to the holder list.

With releases enabled, PropEr immediately found the four-line sequence shown above. Shrinking reduced it from whatever longer sequence first failed to the minimal case: init, seize, release, release-of-ungranted. The shrunk example is four operations. The original failure was likely dozens. PropEr’s shrinking algorithm — the feature that distinguishes it from hand-rolled property testing — delivered exactly the diagnosis a developer needs: not “something is wrong,” but “this is the shortest sequence that makes it wrong.”

The Bug

The release handler in Sim.DSL.Resource:

def handle_event({:release, job_id}, clock, %{preemptive: true} = state) do
  state = %{state | releases: state.releases + 1, busy: max(state.busy - 1, 0)}
  state = %{state | holders: Map.delete(state.holders, job_id)}
  ...

The handler unconditionally increments releases and decrements busy. It does not check whether job_id is actually held. Map.delete on a key that doesn’t exist returns the map unchanged — no error, no warning. The max(busy - 1, 0) clamp prevents busy from going negative, which hides the accounting error from downstream code.

The result: releases = 2, grants = 1. A resource that claims to have released more jobs than it ever granted. In a real simulation, this doesn’t crash — the clamp absorbs the inconsistency. But the statistics are wrong. Utilization computed from grants and releases will be negative. A validation report comparing simulated utilization to historical data will conclude the model is invalid when it is merely miscounting.

The bug has never manifested in normal simulation because the DSL compiler guarantees that every release follows a seize. The process flow is seize → hold → release, and a job cannot reach the release step without passing through seize. The protocol is correct when the caller obeys it. The resource does not enforce it.

Why 114 Tests Missed It

Point tests run the simulation forward from t = 0 with a fixed seed. Every entity follows its compiled flow: arrive, seize, hold, release, depart. No entity ever sends a spurious release because the DSL compiler doesn’t generate one.

Property tests generate random parameters — lambda, mu, capacity, seed — and verify output invariants: Little’s Law, flow conservation, utilization convergence. The parameters change; the sequence does not. Every test runs the same arrive-seize-hold-release-depart flow with different numbers. No test generates a release without a preceding seize.

State machine tests generate random sequences. The sequence init → seize(1) → release(1) → release(3) is not a valid simulation trace — no compiled DSL model would produce it. But it is a valid API call sequence. The resource is a module with public functions. Anyone can call handle_event({:release, 3}, clock, state). The question is not whether the DSL compiler produces this sequence. The question is whether the resource handles it correctly when someone — or something — does.

That something might be a preemption race condition, a hot code reload that resets entity state, or a distributed simulation where messages arrive out of order. The DSL guarantee is a compile-time property. The resource’s robustness is a runtime property. Testing only the compile-time path leaves the runtime path unverified.

The Command Distribution

Modelseizereleasecheckinit
Enginestep 52% / run_n 30%9%9%
Resource47%17%25%12%
Adversarialstep 51% / run_n 30%10%9%

The Resource model’s 17% release rate is the critical number. Without releases, the model degenerates to “seize until capacity is full, then seize into the queue forever.” With releases, the state space opens: seize-release-seize cycles, partial capacity utilization, queue drain after release, preemption followed by re-grant from queue. The bug lives in the release path. Without generating releases, PropEr could run a million sequences and never find it.

What the Adversarial Model Tests

The Adversarial model targets the preemptive resource protocol specifically. Two sources feed one machine: normal orders (priority 5, frequent) and rush orders (priority 1, rare). When a rush order arrives and the machine is busy with a normal order, preemption occurs: the normal order is ejected, its hold is interrupted, a generation counter increments to invalidate the pending hold_complete event, and the rush order takes over the machine.

The postconditions verify that this protocol survives two hundred adversarial schedules: busy never exceeds capacity, grants never fall below releases, the calendar remains sorted, the clock never regresses. Two hundred sequences. All passed. The preemption protocol is correct — when exercised through the engine.

The Resource isolation model found what the Adversarial model could not: a bug in the module’s interface, not in the engine’s use of that interface. The engine always calls the resource correctly. The resource does not protect itself from incorrect calls. These are different failure modes, and they require different test strategies.

The Lesson

There are three dimensions of testing for a stochastic system:

DimensionWhat variesWhat it findsTool
Point testsNothing (seed=42)RegressionsExUnit
Property testsParameters (lambda, mu)Invariant violationsPropCheck
State machine testsSequences (seize, release, ...)Protocol violationsproper_statem

Each dimension catches bugs the others cannot. Point tests are fast and catch regressions. Property tests find parameter combinations that violate mathematical invariants (Little’s Law, flow conservation). State machine tests find operation sequences that violate protocol invariants (busy ≤ capacity, grants ≥ releases).

The bug we found — releasing an ungranted job — lives in the sequence dimension. No parameter change produces it. No single-seed regression test triggers it. Only a sequence generator that can produce release(3) without a preceding seize(3) will find it. PropEr generated that sequence, and shrunk it to four operations.

Seven hundred adversarial sequences. Three state machine models. One bug that 114 tests missed.

Jason Statham always delivers.


proper_statham is at github.com/borodark/sim_ex/test/statham_test.exs. Three models. Seven hundred sequences. The shrunk counter-example is the interesting part.