What if probabilistic programming runtimes were different?

Four engines on the BEAM. Continuous parameters, discrete states, unknown functional forms, simulated systems — each solved by the right algorithm, all sharing one runtime where liveness, fault tolerance, and concurrency aren't features but consequences.

The Argument

Three architectural consequences of building a probabilistic programming framework on a virtual machine designed for telephone switches.

Every sample is a message

sample_stream sends each posterior draw as a BEAM message to any process — a Scenic window, a Phoenix LiveView, a GenServer. The sampler doesn't know or care what the receiver does.

Sampler.sample_stream(model, self(), init, opts)

receive do
  {:exmc_sample, i, point_map, step_stat} -> ...
end

If it crashes, try again

Each subtree build is wrapped in try/rescue. Zero-overhead on the happy path — BEAM's try is exception registration in the process stack. When a subtree crashes, it's replaced with a divergent placeholder.

try do
  build_subtree(state, direction, depth, step_fn)
rescue
  _ -> divergent_placeholder(state)
end

203 lines. No MPI. No Dask. No Ray.

Distribution is just sending messages farther. Start nodes, send the model, collect samples. Erlang's distribution protocol handles serialization. PIDs are location-transparent.

:erpc.call(node, Sampler, :sample_stream,
  [model, coordinator_pid, init, opts])

The Numbers

Head-to-head against PyMC. 5-seed medians, 1 chain, same model, same data.

Model PyMC ESS/s eXMC ESS/s Ratio Winner
simple (d=2) 576 469 0.81× PyMC
medium (d=5) 157 298 1.90× eXMC
stress (d=8) 185 215 1.16× eXMC
eight_schools (d=10) 5 12 2.55× eXMC
funnel (d=10) 6 2 0.40× PyMC
logistic (d=21) 336 69 0.21× PyMC
sv (d=102) 1 1 1.20× eXMC

eXMC wins 4 models to PyMC's 3, including the canonical Eight Schools benchmark (2.55×) and 102-dimensional stochastic volatility (1.20×). With 5-node distribution: 2.88× average scaling.

Les Quatre Probabileurs

Four engines. One runtime. Each solves a different shape of uncertainty.

eXMC — “I know the model”

Your parameters are continuous. You wrote the likelihood. You need a posterior.

NUTS sampler with Stan-style warmup. 21 distributions. Warm-start for streaming updates. Beats PyMC on 4 of 7 benchmarks.

{trace, stats} = Sampler.sample(model, init,
  num_warmup: 1000,
  warm_start: prev_stats)

Nx + EXLA. 337 tests.

smc_ex — “The data never stops”

States are discrete. Data streams in. The model updates, not restarts. No gradient required.

Online SMC² with parallel rejuvenation on all cores. Beats Chopin's own Python library on 5 of 7 benchmarks. Zero dependencies.

result = SMC.run(seir_model, prior, cases,
  n_theta: 400,
  parallel: true)

Pure Elixir. Zero deps. 10 tests.

StochTree-Ex — “I don't know the function”

200 features. Unknown functional form. Which ones matter? Linear? Threshold? Interaction?

BART via Rust NIF. ForestTracker sorted indices: 133× speedup. Better RMSE than Python on 6 of 7 tests.

{forest, _} = StochTree.BART.fit(x, y,
  num_trees: 200)
importance = StochTree.variable_importance(forest)

Elixir + Rust NIF. 14 tests.

sim_ex — “I need to simulate it”

Entities are processes. Events are messages. The supervision tree is the system model. Fault tolerance is free.

Discrete-event simulation engine. 539K events/sec. PHOLD benchmark. Tight-loop engine bypasses GenServer in the hot path.

{:ok, result} = Sim.run(
  entities: [{:server, Sim.Resource, config}],
  initial_events: [{0.0, :src, :generate}],
  stop_time: 50_000.0)

Pure Elixir. Zero deps. 11 tests.

The Jobs They Do

Your problem What you hire Why
Process drifting, control chart blind eXMC + smc_ex Bayesian SPC with live changepoint detection
200 sensor columns, no model StochTree-Ex BART discovers which features matter
Epidemic spreading, cases arriving daily smc_ex O-SMC² updates β, σ, γ in real time
When will this bearing fail? eXMC + StochTree-Ex Weibull survival + nonlinear degradation
Regime shift — trending or mean-reverting? eXMC + smc_ex NUTS for continuous params, O-SMC² for discrete states
Factory model drifting from reality sim_ex + smc_ex + eXMC DES simulation with self-calibrating inputs and posterior uncertainty

Four libraries. Four mix.exs entries. No shared dependencies. They compose in the same application because they share one thing: the BEAM.

The Composition Test

Here's the result I'm most proud of, because it wasn't designed — it was discovered.

Streaming inference was built in January: a sampler that sends each posterior draw as a BEAM message, so a Scenic visualization window can update trace plots in real time. Distribution was built in February: four Erlang nodes, each running an independent chain, collecting samples through :erpc.

The two features were developed months apart. Neither knew the other existed. When we connected them, the change was three lines of code. The distributed coordinator already forwarded messages to a caller PID. The streaming visualization already listened on a PID for sample messages. We pointed one at the other.

Four nodes. Twenty thousand samples. Twenty-one seconds. The Scenic dashboard updated live from all four nodes simultaneously — trace plots interleaving draws from four independent chains running on four separate machines.

This composition worked on the first attempt because both features were built on the same primitive: send(pid, {:exmc_sample, i, point_map, step_stat}).

On the improbable and slightly reckless decision to build a probabilistic programming framework on a virtual machine designed for telephone switches

There is a peculiar constraint at the heart of the dominant language in scientific computing: it cannot truly do two things at once. A global lock, woven into the interpreter's memory model decades ago, ensures that threads yield to one another rather than run alongside. Every framework that requires concurrency — and probabilistic programming requires a great deal of it — must work around this fact, layering foreign runtimes beneath the surface.

The BEAM virtual machine has no such constraint. It was built for telephone switches, systems where dropping a call is not an option and where a million conversations must proceed simultaneously without interference. What if we took that machine — designed for reliability under concurrency — and asked it to explore posterior distributions instead of routing phone calls?

Read the full essay →

Architecture

Four layers, each depending only on the one below.

eXMC four-layer architecture diagram
Layer Modules Responsibility
IR Builder, DSL, Dist.* Model as data
Compiler Compiler, PointMap, Transform IR → differentiable closure
Sampler NUTS, ADVI, SMC, Pathfinder Inference algorithms
Runtime Streaming, Distribution, Viz BEAM integration

Getting Started

Three ways to define and sample a model.

alias Exmc.{Builder, Dist.Normal, Dist.HalfNormal}

ir = Builder.new()
  |> Builder.rv("mu", Normal.new(0.0, 10.0))
  |> Builder.rv("sigma", HalfNormal.new(1.0))
  |> Builder.obs("y", Normal.new("mu", "sigma"), data)

{trace, stats} = Exmc.Sampler.sample(ir, %{},
  num_samples: 1000, num_warmup: 1000)
import Exmc.DSL

model = model do
  mu    ~ Normal.new(0.0, 10.0)
  sigma ~ HalfNormal.new(1.0)
  y     ~ Normal.new(mu, sigma), observed: data
end
Exmc.Distributed.sample_chains(model, init,
  nodes: [:"n1@10.0.0.2", :"n2@10.0.0.3"],
  num_samples: 1000, num_warmup: 1000)

Add to your mix.exs dependencies:

{:exmc, "~> 0.1.0"}

Research

The thesis behind the framework.

  1. Architecture
    Why the BEAM runtime enables architectural properties other PPLs cannot express
  2. Compilation
    From declarative IR to differentiable closures via Nx and EXLA
  3. The No-U-Turn Sampler
    NUTS implementation with Stan-style three-phase warmup
  4. Streaming Inference
    Per-sample message passing for live posterior visualization
  5. Distributed MCMC
    Location-transparent multi-node sampling via Erlang distribution
  6. Benchmarks
    Seven-model comparison against PyMC, GPU acceleration, scaling analysis

Probabilistic Programming on BEAM Process Runtimes

Writing

Notes on building four engines — three for inference, one for simulation — where none were expected.