Thirty to One — dataalienist.com

We raced SimPy. The per-replication numbers were honest but unremarkable: 2–3x in Elixir, 8–10x in Rust. Then we ran one thousand replications in parallel and the margin became thirty to one.

This is the story of how we measured the wrong thing, measured it honestly, and then found the right thing to measure.

The Per-Rep Race

Same models. Same parameters. Sequential execution — Python runs first, then Elixir, each getting the full machine. An 88-core Xeon. Python 3.12.3, SimPy 4.1.1. Elixir 1.18.3, OTP 27.

Model (200K time units)	SimPy	sim_ex Elixir	sim_ex Rust
Barbershop	127ms	55ms (2.3x)	15ms (8.5x)
Job Shop (5 stages)	2,479ms	1,200ms (2.1x)	—
Rework Loop (15%)	601ms	225ms (2.7x)	—

Elixir is two to three times faster than SimPy on a single replication. The Rust NIF is eight times faster. These are real numbers, measured honestly, reported with the system load average because benchmark variance across runs is 30–40%.

These are also the wrong numbers.

The Wrong Numbers

A single simulation run tells you one trajectory through the state space. It does not tell you the distribution of outcomes. It does not give you confidence intervals. It does not propagate input uncertainty. Averill Law calls the analysis that does this — hundreds or thousands of replications with different parameter draws — “rarely done because it’s too expensive.”

The expense is the per-replication wall time multiplied by the number of replications. On one core. Sequentially. SimPy: 6.3 milliseconds per replication at 200,000 time units. One thousand replications: 6.3 seconds. This is the number the plant manager waits for.

On one core, Elixir is slower than SimPy: 18.2 milliseconds per replication. The Map.fetch! overhead at this entity count dominates. Python’s generators are faster than Elixir’s hash trie lookups for large simulations. One thousand sequential replications in Elixir: 18.2 seconds. Worse than SimPy.

This is the fact we could have hidden. We didn’t.

The Right Numbers

Replications are independent. Replication 1 with seed 1 has no dependency on replication 2 with seed 2. Each produces an independent trajectory. The only coordination is collecting the results.

The BEAM virtual machine runs 88 schedulers on 88 cores. Each scheduler is a native OS thread with its own run queue. There is no Global Interpreter Lock. Task.async_stream distributes one thousand replications across all schedulers with one function call.

Configuration	1,000 reps × 200K	Per-rep	vs SimPy
SimPy (Python, sequential)	~6,300ms	6.3ms	1.0x
Elixir sequential	18,212ms	18.2ms	0.3x
Elixir parallel (88 cores)	683ms	0.7ms	9.4x
Rust NIF sequential	7,391ms	7.4ms	0.9x
Rust NIF parallel (88 cores)	207ms	0.2ms	30x

Two hundred and seven milliseconds. One thousand complete simulation replications, each with a different random seed, each running 200,000 time units. In less time than it takes to blink.

The Elixir engine — which is slower than SimPy per-replication — finishes in 683 milliseconds. Nine times faster than SimPy. Not because each replication is faster. Because 88 replications run at the same time.

Thirty to one.

Why SimPy Cannot Do This

Python’s Global Interpreter Lock prevents true thread parallelism. multiprocessing spawns OS processes with IPC overhead. concurrent.futures serializes through the GIL. asyncio is cooperative — one thread, interleaved execution. None of these gives you 88 independent simulations running on 88 cores with zero coordination overhead.

The BEAM was designed for ten thousand concurrent telephone switches. One thousand concurrent simulations is a rounding error.

The One-Line Change

results = Sim.Experiment.replicate(fn seed ->
  {:ok, r} = MyModel.run(seed: seed, stop_time: 200_000.0)
  r.stats[:machine].mean_wait
end, 1000)

Parallel by default. Task.async_stream with max_concurrency: System.schedulers_online(). No configuration. No thread pool. No MPI. Every sim_ex user with a multi-core machine gets 30x over SimPy without changing a line of model code.

What SimPy Still Wins

Fifteen thousand GitHub stars. Hundreds of tutorials. A coroutine syntax that feels natural to anyone who has written Python. An ecosystem of monitoring tools, logging hooks, and integration libraries built over fifteen years by a community that dwarfs ours by three orders of magnitude.

These are real advantages. They are the advantages of incumbency. Speed is not among them — not at the analysis scale that matters.

What sim_ex Wins Beyond Speed

The DSL is the differentiator. Not the 2x per-rep. Not the 30x parallel. The fact that a manufacturing engineer can read seize :machine / hold exponential(8.0) / release :machine and know what it means without being told.

The Bayesian integration is the moat. SimPy has no equivalent of “run 1,000 replications with posterior-sampled parameters from eXMC.” It has no particle filter for streaming calibration. It has no BART for sensitivity analysis. It is a simulation framework. sim_ex is a simulation framework that connects to an inference ecosystem. The speed makes that connection practical. The connection makes the speed valuable.

The Numbers That Matter

What you care about	SimPy	sim_ex
One simulation run	Fine	Fine (2-3x faster)
1,000 replications for UQ	6.3 seconds	207 milliseconds
Can a process engineer read it?	No	Yes
Does it learn from data?	No	Yes (eXMC + smc_ex)
Does it use all your cores?	No (GIL)	Yes (88 schedulers)

The single-run speedup is nice. The parallel speedup changes what’s possible. The DSL changes who can use it. The Bayesian integration changes what it means.

Full results and reproduction scripts at github.com/borodark/sim_ex/benchmark/simpy_race. Same models. Same parameters. Sequential and parallel. Fair fight. Load average reported with every measurement.