Four engines on the BEAM. Continuous parameters, discrete states, unknown functional forms, simulated systems — each solved by the right algorithm, all sharing one runtime where liveness, fault tolerance, and concurrency aren't features but consequences.
Three architectural consequences of building a probabilistic programming framework on a virtual machine designed for telephone switches.
sample_stream sends each posterior draw as a BEAM message to any process — a Scenic window, a Phoenix LiveView, a GenServer. The sampler doesn't know or care what the receiver does.
Sampler.sample_stream(model, self(), init, opts)
receive do
{:exmc_sample, i, point_map, step_stat} -> ...
end
Each subtree build is wrapped in try/rescue. Zero-overhead on the happy path — BEAM's try is exception registration in the process stack. When a subtree crashes, it's replaced with a divergent placeholder.
try do
build_subtree(state, direction, depth, step_fn)
rescue
_ -> divergent_placeholder(state)
end
Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent.
:erpc.call(node, Sampler, :sample_stream,
[model, coordinator_pid, init, opts])
Head-to-head against PyMC. 5-seed medians, 1 chain, same model, same data.
| Model | PyMC ESS/s | eXMC ESS/s | Ratio | Winner |
|---|---|---|---|---|
| simple (d=2) | 576 | 469 | 0.81× | PyMC |
| medium (d=5) | 157 | 298 | 1.90× | eXMC |
| stress (d=8) | 185 | 215 | 1.16× | eXMC |
| eight_schools (d=10) | 5 | 12 | 2.55× | eXMC |
| funnel (d=10) | 6 | 2 | 0.40× | PyMC |
| logistic (d=21) | 336 | 69 | 0.21× | PyMC |
| sv (d=102) | 1 | 1 | 1.20× | eXMC |
eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark (2.55×) and 102-dimensional stochastic volatility (1.20×). With 5-node distribution: 2.88× average scaling.
Four engines. One runtime. Each solves a different shape of uncertainty.
Your parameters are continuous. You wrote the likelihood. You need a posterior.
NUTS sampler with Stan-style warmup. 21 distributions. Warm-start for streaming updates. Beats PyMC on 4 of 7 benchmarks.
{trace, stats} = Sampler.sample(model, init,
num_warmup: 1000,
warm_start: prev_stats)
Nx + EXLA. 337 tests.
States are discrete. Data streams in. The model updates, not restarts. No gradient required.
Online SMC² with parallel rejuvenation on all cores. Beats Chopin's own Python library on 5 of 7 benchmarks. Zero dependencies.
result = SMC.run(seir_model, prior, cases,
n_theta: 400,
parallel: true)
Pure Elixir. Zero deps. 10 tests.
200 features. Unknown functional form. Which ones matter? Linear? Threshold? Interaction?
BART via Rust NIF. ForestTracker sorted indices: 133× speedup. Better RMSE than Python on 6 of 7 tests.
{forest, _} = StochTree.BART.fit(x, y,
num_trees: 200)
importance = StochTree.variable_importance(forest)
Elixir + Rust NIF. 14 tests.
Entities are processes. Events are messages. The supervision tree is the system model. Fault tolerance is free.
Discrete-event simulation engine. 539K events/sec. PHOLD benchmark. Tight-loop engine bypasses GenServer in the hot path.
{:ok, result} = Sim.run(
entities: [{:server, Sim.Resource, config}],
initial_events: [{0.0, :src, :generate}],
stop_time: 50_000.0)
Pure Elixir. Zero deps. 11 tests.
| Your problem | What you hire | Why |
|---|---|---|
| Process drifting, control chart blind | eXMC + smc_ex | Bayesian SPC with live changepoint detection |
| 200 sensor columns, no model | StochTree-Ex | BART discovers which features matter |
| Epidemic spreading, cases arriving daily | smc_ex | O-SMC² updates β, σ, γ in real time |
| When will this bearing fail? | eXMC + StochTree-Ex | Weibull survival + nonlinear degradation |
| Regime shift — trending or mean-reverting? | eXMC + smc_ex | NUTS for continuous params, O-SMC² for discrete states |
| Factory model drifting from reality | sim_ex + smc_ex + eXMC | DES simulation with self-calibrating inputs and posterior uncertainty |
Four libraries. Four mix.exs entries. No shared dependencies. They compose in the same application because they share one thing: the BEAM.
Here's the result I'm most proud of, because it wasn't designed — it was discovered.
Streaming inference was built in January: a sampler that sends each posterior draw as a BEAM message, so a Scenic visualization window can update trace plots in real time. Distribution was built in February: four Erlang nodes, each running an independent chain, collecting samples through :erpc.
The two features were developed months apart. Neither knew the other existed. When we connected them, the change was three lines of code. The distributed coordinator already forwarded messages to a caller PID. The streaming visualization already listened on a PID for sample messages. We pointed one at the other.
Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously — trace plots interleaving draws from four independent chains running on four separate machines.
This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).
There is a peculiar constraint at the heart of the dominant language in scientific computing: it cannot truly do two things at once. A global lock, woven into the interpreter's memory model decades ago, ensures that threads yield to one another rather than run alongside. Every framework that requires concurrency — and probabilistic programming requires a great deal of it — must work around this fact, layering foreign runtimes beneath the surface.
The BEAM virtual machine has no such constraint. It was built for telephone switches, systems where dropping a call is not an option and where a million conversations must proceed simultaneously without interference. What if we took that machine — designed for reliability under concurrency — and asked it to explore posterior distributions instead of routing phone calls?
Read the full essay →Four layers, each depending only on the one below.
| Layer | Modules | Responsibility |
|---|---|---|
| IR | Builder, DSL, Dist.* | Model as data |
| Compiler | Compiler, PointMap, Transform | IR → differentiable closure |
| Sampler | NUTS, ADVI, SMC, Pathfinder | Inference algorithms |
| Runtime | Streaming, Distribution, Viz | BEAM integration |
Three ways to define and sample a model.
alias Exmc.{Builder, Dist.Normal, Dist.HalfNormal}
ir = Builder.new()
|> Builder.rv("mu", Normal.new(0.0, 10.0))
|> Builder.rv("sigma", HalfNormal.new(1.0))
|> Builder.obs("y", Normal.new("mu", "sigma"), data)
{trace, stats} = Exmc.Sampler.sample(ir, %{},
num_samples: 1000, num_warmup: 1000)
import Exmc.DSL
model = model do
mu ~ Normal.new(0.0, 10.0)
sigma ~ HalfNormal.new(1.0)
y ~ Normal.new(mu, sigma), observed: data
end
Exmc.Distributed.sample_chains(model, init,
nodes: [:"n1@10.0.0.2", :"n2@10.0.0.3"],
num_samples: 1000, num_warmup: 1000)
Add to your mix.exs dependencies:
{:exmc, "~> 0.1.0"}
The thesis behind the framework.
Probabilistic Programming on BEAM Process Runtimes
Notes on building four engines — three for inference, one for simulation — where none were expected.
Load average 1.0 on 88 cores. One scheduler working, eighty-seven watching. Until you run 1,000 replications — then they all work, and the analysis takes 207 milliseconds.
A factory floor has three kinds of inspectors. The first two are familiar. The third is the one that finds the bug. Testing simulation engines, explained for industrial engineers.
700 adversarial command sequences. Three state machine models. One bug that 114 tests missed: you can release a resource you never held, and the engine says thank you.
Seed 42 is an anecdote. 350 trials across random parameter spaces proved Little’s Law, flow conservation, and determinism. They also found a bug that 77 point tests missed for four months.
Elixir sequential is slower than SimPy. Elixir parallel is 9x faster. Rust parallel is 30x. The BEAM’s advantage is not per-event speed — it’s per-replication concurrency. 1,000 reps in 207 milliseconds.
At 100,000 entities, Elixir’s persistent Map creates six garbage nodes per event. ETS doesn’t. But the crossover isn’t where you’d expect — it depends on whether you pin your BEAM schedulers to NUMA nodes. A story about garbage.
Building a discrete-event simulation engine on the BEAM. 539,000 events per second. The actor model is the right abstraction for the system, the wrong abstraction for the inner loop. On why GenServers cost 7.8× in the hot path.
5.8× NUTS warmup speedup by reusing the previous posterior's mass matrix. When the data barely changes, so does the posterior geometry. One keyword argument.
How one Erlang flag turned 0.73 jobs/sec into 0.98 on 88 cores doing Bayesian MCMC. On NUMA-aware scheduler binding, the naming of tnnps, and why the BEAM's defaults are wrong for compute-bound work.
On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches. A literary technical memoir in twelve chapters.
Seven optimization phases, each revealing something about mixed-runtime Probabilistic Programming design. From 3.3 ESS/s to 298 ESS/s without changing the algorithm.
16 distributions, 4 inference methods, GPU acceleration, and a head-to-head race against PyMC. The numbers that prove the thesis.
The architectural thesis: streaming, fault tolerance, and distribution as consequences of the runtime, not features bolted on. Part 1 of the series.