The Clipboard, the Auditor, and the Saboteur

A factory floor has three kinds of inspectors. The first two are familiar. The third is the one that finds the bug.

The Clipboard Inspector

She walks the floor at 9 AM every Tuesday. She counts the parts in the bin. She checks the number against last Tuesday. If it matches, the line is fine.

This is how most simulation software is tested. Run the model with a fixed random seed — seed 42 is traditional — and check that the output matches a known value. Did the barbershop produce 12,449 customers? Yes. Pass.

The clipboard inspector is fast, cheap, and catches one specific thing: did the output change since the last time we checked? She is valuable. She catches the bug you introduced at 2 AM when you refactored the queue logic. She catches the dependency update that silently changed the random number generator. She catches the merge conflict that deleted a line from the event dispatcher.

She does not catch anything else. If someone moved the bin on Wednesday, the Tuesday inspector will never know. If the model has always been wrong in a way that produces the same wrong number every time, the clipboard says “pass.”

In software: point tests. One seed, one expected output, one assertion. Seventy-seven of them in our test suite, all passing, all blind to the bugs that don’t change the output of seed 42.

The Auditor

She shows up with a calculator and a statistics textbook. She does not care what the bin count was last Tuesday. She cares whether the factory obeys the laws of physics.

Her first question: does the average number of parts in the system equal the arrival rate times the average time each part spends there? This is Little’s Law, proved in 1961, valid for every stable queueing system regardless of the distributions involved. It is the E = mc² of industrial engineering: if your simulation violates it, your simulation is wrong, and no amount of explaining will fix it.

Her method: generate random production configurations. Different arrival rates. Different machine speeds. Different numbers of operators. For each configuration, run the simulation and check the invariant. Not one check — thirty checks, fifty checks, three hundred checks. If Little’s Law holds across three hundred random configurations, the engine is probably correct. If it fails on configuration 217, there is a bug that seed 42 would never have found, because seed 42 never tried configuration 217.

She found one. The utilization gauge on Resource #7 had been reading zero since the day it was installed. The gauge was wired correctly — the code compiled, the field existed, the statistics function returned it. But the counter behind the gauge was never incremented. Not once. In four months and seventy-seven passing tests. The auditor caught it on her first visit because she checked utilization against the theoretical value, and zero is not 0.5.

In software: property-based tests. Random parameters, mathematical invariants, generous tolerances for Monte Carlo noise. Eleven of them in our test suite, running 350 trials across random parameter spaces. The clipboard checks outputs. The auditor checks laws.

The Saboteur

He does not test the product. He tests the protocol.

He walks up to the machine operator and says: “I’m releasing Job 3.”

The operator checks the board. Job 3 was never checked in.

The operator shrugs, decrements the counter anyway, and writes “released” in the log. The display shows zero jobs in progress. This is correct — there are zero jobs in progress. But the log now shows two releases and one checkout. The monthly utilization report, which divides busy time by total time, will produce a number that is physically impossible. The plant manager will question the model. The consultant will question her data. Nobody will question the assumption that every release is preceded by a checkout, because nobody ever tested what happens when it isn’t.

The saboteur tested it. In four operations:

init_resource(capacity: 3, preemptive: true)
seize(job_id: 1, priority: 3)
release(job_id: 1)
release(job_id: 3)    ← never seized

The engine accepted all four. No error. No warning. Releases: 2. Grants: 1. A resource that has released more jobs than it ever granted.

Why the Saboteur Exists

The barbershop simulation has a DSL — a domain-specific language — that guarantees the sequence. The code reads:

seize :barber
hold exponential(16.0)
release :barber
depart

This compiles to a state machine that cannot reach “release” without passing through “seize.” The compiler enforces the protocol. So why test what happens when the protocol is violated?

Because the compiler is not the only caller.

Preemption breaks the sequence. When a rush order (priority 1) ejects a normal order (priority 5) mid-service, the ejected order receives a “preempted” message. Its hold is interrupted. A generation counter increments to invalidate the pending completion event. The ejected order re-enters the queue. When it gets the machine back, it resumes with the remaining service time.

This is four separate state transitions happening in response to one arrival. If any of them fails — if the generation counter doesn’t increment, if the remaining time is wrong, if the re-queued order has the wrong priority — the result is a sequence violation. A release without a matching seize. A grant to an order that already holds the resource. A hold completion for a service that was already interrupted.

The clipboard inspector doesn’t find these because she runs the same seed every Tuesday. The auditor doesn’t find these because she varies the parameters, not the sequence. Only the saboteur — who generates random sequences of operations, including operations that the compiler would never produce — finds the combination that makes the protocol lie.

The Three Dimensions

Inspector	What varies	What she finds	Factory analog
Clipboard	Nothing	Something changed since last time	Shift-end checklist
Auditor	The production orders	The math doesn’t hold	SPC / statistical audit
Saboteur	The sequence of actions	The protocol doesn’t protect itself	FMEA with adversarial inputs

If you are an industrial engineering student, you already know the first two. The shift-end checklist is regression testing. SPC — statistical process control — is property-based testing: check whether the measurements obey the distribution, not whether they match a specific number. Every IE curriculum teaches both.

The third is FMEA thinking applied to software. Failure Mode and Effects Analysis asks: what could go wrong, and what happens when it does? The saboteur asks the same question, but instead of enumerating failure modes manually, he generates them randomly and lets the computer find the shortest sequence that triggers each one.

The technical name is stateful property-based testing, and the tool is PropEr’s proper_statem. The principle is simple: model your system as a state machine, generate random command sequences, check postconditions after each step, and when something fails, shrink the sequence to the minimal reproduction.

The minimal reproduction for our bug was four operations. The original failing sequence was likely dozens. The shrinking algorithm — which has no equivalent in the clipboard or auditor’s toolkit — found the essence of the failure: you can release what you never held.

What This Means for Your Simulation

If your simulation has a seize/release protocol — and every manufacturing simulation does — the saboteur should visit. Not because your DSL allows spurious releases. It doesn’t. But because your DSL is not the only thing that touches the resource. Hot code reloads. Distributed message delivery. Preemption interrupts. Recovery after crash. Every one of these is a sequence that the compiler didn’t generate.

The clipboard catches Tuesday’s bugs. The auditor catches the math. The saboteur catches the assumptions.

You need all three.

The code behind the three inspectors: proper_statham (the saboteur, 700 adversarial sequences), property tests (the auditor, 350 random configurations), and 114 point tests (the clipboard). All in one test suite.