The first number was 44,000. Events per second, running a PHOLD benchmark through a simulation engine that had been alive for about three hours. Three GenServers — a clock, a calendar, an entity manager — passing messages to each other for every simulated event. It was correct. It was clean. It was, by discrete-event simulation standards, absurdly slow.
The second number was 539,000. Same benchmark, same machine, same entities. One
change: remove the GenServers from the hot path. Run the event loop as a single
tail-recursive function operating on plain data structures. No mailboxes, no term
copying, no scheduling overhead. Just a :gb_trees priority queue, a
Map of entity states, and a function that pops, dispatches, pushes,
and recurs.
The ratio between those numbers — 7.8× at 100 entities, 2.8× at ten thousand — is the entire thesis of this project, compressed into a benchmark.
The Isomorphism Nobody Talks About
Averill Law's Simulation Modeling and Analysis has 23,700 citations. It is the textbook. And if you read it as an Erlang developer, a strange recognition settles in. Law's simulation architecture — entities with state, an event calendar, a clock that advances discretely, resources that queue — isn't analogous to OTP. It is OTP.
| Law's DES | OTP |
|---|---|
| Entity with state | GenServer |
| Event calendar | Priority queue |
| Simulation clock | Monotonic GenServer state |
| Entity failure → recovery | Supervisor restart |
| Parallel replications | Task.async_stream |
| Common random numbers | Functional :rand state |
| Hot model fix | Hot code reload |
Sim-Diasca at Électricité de France proved this in 2010, running millions of Erlang actors for energy grid simulation. InterSCSimulator did the same for urban traffic. The precedent exists. What neither had was a statistical inference layer.
The Engine
sim_ex is a discrete-event simulation engine. Zero dependencies. Eleven tests. The core is eight modules:
Sim.Entity # @behaviour: init/1, handle_event/3, statistics/1
Sim.Clock # next-event time advance
Sim.Calendar # :gb_trees priority queue, FIFO tie-breaking
Sim.EntityManager # registry + dispatch
Sim.Resource # capacity-limited server (M/M/c)
Sim.Source # arrival generator
Sim.Topology # ETS shared state
Sim.Statistics # Welford streaming + batch means CI
An M/M/1 queue — Law's Chapter 1, the hello-world of simulation — is five lines of configuration:
{:ok, result} = Sim.run(
entities: [
{:arrivals, Sim.Source, interarrival: {:exponential, 1.0}},
{:server, Sim.Resource, service: {:exponential, 0.5}}
],
initial_events: [{0.0, :arrivals, :generate}],
stop_time: 50_000.0
)
Utilization converges to 0.5. Mean wait converges to the theoretical value.
Same seed, same trajectory. The simulation is deterministic because :rand
state is functional — no global mutable PRNG, no thread-local surprises.
The Two Modes
The first implementation was clean OTP. Three GenServers, three processes, proper Elixir. Pop an event from the Calendar process. Send it to the EntityManager process. EntityManager dispatches, gets new events back, sends them to Calendar. Three mailbox round-trips per simulated event.
At 10 microseconds per GenServer.call, that's 30 microseconds of
overhead on events that take 5 microseconds of actual work. The infrastructure
was six times more expensive than the simulation.
The Engine fixes this by running the entire loop in one process. No GenServer.
No message passing. A tail-recursive function that pops from :gb_trees,
calls module.handle_event/3 directly, inserts new events, and recurs.
The result:
| LPs | Engine (events/s) | GenServer (events/s) | Speedup |
|---|---|---|---|
| 100 | 539,076 | 82,553 | 6.5× |
| 1,000 | 157,598 | 64,928 | 2.4× |
| 10,000 | 123,967 | 43,842 | 2.8× |
Both modes remain. Engine for throughput. GenServer for interactive stepping,
distributed simulation, live dashboards, fault tolerance. The choice is one
keyword: mode: :engine or mode: :genserver.
The Lesson
This is the same insight we found building the NUTS sampler. The JIT boundary
matters. EXLA.jit outputs returned on EXLA.Backend; copying them to
BinaryBackend gave a 3× speedup. The leapfrog integrator belongs inside
the JIT; the tree builder belongs outside.
Simulation has the same structure. The event dispatch loop belongs in a tight function. The entity lifecycle — creation, failure recovery, distribution across nodes, hot code reload — belongs in OTP processes.
The actor model is the right abstraction for the system. It is the wrong abstraction for the inner loop. Use processes for the outer structure. Use functions for the hot path. The BEAM's value is architectural, not computational. This is true for MCMC. It is true for DES. It may be true for everything.
There Is No Now
In 1978, Leslie Lamport published a paper that changed how we think about time in distributed systems. The title was plain: “Time, Clocks, and the Ordering of Events in a Distributed System.” The insight was not. Physical clocks cannot be trusted across machines. What matters is not when something happened, but what caused what. His logical clocks gave distributed systems a way to reason about causality without pretending that “now” means the same thing on two different computers.
Justin Sheehy — then CTO of Basho, the company behind Riak, one of the most important distributed databases built on the BEAM — drove this point home in his 2015 ACM Queue article “There is No Now.” He opened with Rear Admiral Grace Hopper handing each student a piece of wire 11.8 inches long: the maximum distance electricity can travel in one nanosecond. A physical argument against a comforting abstraction. Sheehy’s thesis: even Google’s Spanner, with GPS satellites and atomic clocks, does not give you “now.” TrueTime returns a range of uncertainty. The best Google can do is one to seven milliseconds of clock drift at any moment. If that is the best, the rest of us should stop pretending.
Simulation has the same problem. When two events happen at the same simulated
time, which one goes first? In Arena and SimPy, the answer is insertion order
— FIFO within a timestamp. This is arbitrary. If entity A’s event at
t=10.0 causes entity B to react, and both are scheduled at
t=10.0, FIFO might process B before A. The effect before the cause.
Most simulation textbooks wave this away. Lamport would not.
Sim-Diasca, the Erlang simulation engine built at Électricité
de France in 2010, solved this with a two-level timestamp:
{tick, diasca}. When entity A handles an event at tick T,
diasca D, any events it produces for other entities land at
(T, D+1). Cause at diasca 0, effect at diasca 1, reaction at
diasca 2. The tick advances only when no more diascas are pending —
quiescence. No Lamport clocks needed. No vector clocks. The causal ordering
is built into the timestamp structure itself.
sim_ex implements this. Three event forms:
# Causal reaction — same tick, next diasca
{:same_tick, target, payload} # → (T, D+1)
# Schedule at a future tick
{:tick, future_tick, target, payload} # → (future_tick, 0)
# Relative delay
{:delay, delta, target, payload} # → (T + delta, 0)
The calendar key is {tick, diasca, seq} — a three-element
tuple that sorts naturally in Erlang’s :gb_trees.
{5, 2, _} always comes before {5, 3, _}, which
always comes before {6, 0, _}. Quiescence detection is free:
just pop the smallest key. If it’s a new tick, the old tick is done.
No barrier protocol, no global synchronization, no coordination service. The
data structure is the synchronization.
The lineage runs through the BEAM community like a wire. Lamport’s logical clocks (1978). Chandy and Misra’s distributed simulation (1979). Sim-Diasca on Erlang (2010). Sheehy at Basho, building Riak on the same runtime, writing about the same impossibility (2015). And now sim_ex, where the simulation engine and the distributed database share not just a theoretical heritage but a virtual machine. The BEAM has always understood that “now” is a distributed lie. It was built for telephone switches, where a dropped call is worse than a slow one, and where two switches must never disagree about who is talking to whom. Causal ordering is not a feature. It is the reason the runtime exists.
Le Quatrième
sim_ex is the fourth library. The quartet:
| Library | Inference | For when you say |
|---|---|---|
| eXMC | NUTS/HMC | “I know the model” |
| smc_ex | O-SMC² | “The data never stops” |
| StochTree-Ex | BART | “I don't know the function” |
| sim_ex | DES | “I need to simulate it” |
The first three answer questions about data. The fourth generates data by simulating systems. But here's what happens when you put them together: a simulation that fits posteriors over its own input parameters (eXMC), calibrates online from sensor data (smc_ex), discovers which inputs matter (StochTree-Ex), and runs the simulation itself (sim_ex). All in one runtime. All supervised. All hot-reloadable.
No commercial simulation engine offers this. Not AnyLogic. Not Simio. Not Arena. They have better GUIs. They have decades of domain libraries. But they cannot build a simulation that learns, because their runtimes weren't designed for it. Ours was designed for telephone switches, which turns out to be close enough.
sim_ex is at
github.com/borodark/sim_ex.
Zero dependencies. Twenty-six tests. 539,000 events per second. Tick-diasca
causal ordering. GPSS-style DSL. The PHOLD benchmark is in benchmark/.