Vehtari’s Course in a Different Language

April 2026 eXMC teaching BDA3

Aki Vehtari teaches the Bayesian Data Analysis course at Aalto University. The course is built around the textbook he co-authored — Gelman, Carlin, Stern, Dunson, Vehtari, Rubin, Bayesian Data Analysis, 3rd edition — and around twenty-two demonstration notebooks that sit in a public GitHub repository. The notebooks are written in Python. They use scipy and PreliZ. They are the closest thing the field has to a canonical introduction to applied Bayesian inference, and they have been translated into Matlab, R, and Julia by various contributors over the years.

This week they were translated into Elixir Livebooks.

The translation is twenty-two demos across eight chapters, plus thirteen Stan model files put side-by-side with their eXMC equivalents, plus eight datasets vendored into the repository so the notebooks run without network access. Nine Livebook files. Roughly fifteen thousand lines of prose, code, and exercises. Every numerical claim in every chapter was verified against the BDA3 reference values before publication.

The point is not that the Elixir versions are better. The point is that they exist, and that they are honest about what they are: a faithful reproduction of an excellent course in a runtime that nobody expected anyone to use for this work.

What Got Ported

The BDA Python demos cover the chapters of the textbook that have non-trivial computation:

The first six chapters use the eXMC distribution library, the IR builder, and (where needed) NUTS. The last two chapters do something unusual: they implement their algorithms from scratch in pure Elixir, in about thirty lines each. Hand-rolled rejection sampling. Hand-rolled importance sampling. Hand-rolled Gibbs sampler for the bivariate normal. Hand-rolled symmetric Metropolis with three different proposal scales so the reader can watch each one fail in its own way.

This is not a workaround. BDA3 chapters 10 and 11 exist to teach students how MCMC works under the hood. Their value is in the implementation, not in the result. If you call Sampler.sample/3 without ever having written a Metropolis acceptance step, you have missed the point of those chapters.

The Pedagogical Template

Translation is not the hard part. Style is the hard part. The original demos have a brisk, no-introduction style appropriate for a graduate course where the textbook does the explaining. The Livebook ports needed to stand on their own, because Livebook readers tend to find them out of context.

The template that emerged, refined across nine notebooks:

  1. Why This Matters. Concrete motivation. Not “in this chapter we will discuss.” The reader should care about the problem before they see the math.
  2. The Problem. Dataset, BDA3 page reference, the question being asked.
  3. The Model. LaTeX math. Then the eXMC IR build. Then a sentence-by-sentence walkthrough.
  4. Run It. Either an analytical computation or a call to Sampler.sample, with diagnostics.
  5. Visualization. VegaLite plots. The interpretation goes next to the chart, not in a separate section.
  6. What This Tells You. Interpretation tied back to the original question.
  7. Study Guide. Five or six exercises anchored to specific cells in the notebook. Each exercise can be done by modifying one cell and re-running.
  8. Literature. BDA3 sections, modern papers (Vehtari, Betancourt, Hoffman, Roberts & Gelman, Piironen & Vehtari), links to related material.
  9. Where to Go Next. Cross-references to other eXMC notebooks that build on this material.

The template is the same in all nine. The 8-schools chapter has it. The placenta previa chapter has it. The hand-rolled Metropolis chapter has it. Consistency is the only way to make a course feel like a course.

The 8-Schools Stress Test

Five of the ported chapters compute their answers analytically or on a grid. Two implement their own algorithms. Only one chapter calls Sampler.sample the way a real eXMC user would: chapter 5, the SAT 8-schools example.

8-schools is the canonical hierarchical model. Eight high schools each ran the same coaching program. Each school reported a treatment effect and a standard error. The question is whether the program works, and the answer requires a hierarchical normal model with a between-schools variance parameter that pulls the individual estimates toward a common mean. The model has ten free parameters and a posterior geometry that includes Neal’s funnel — the pathological region where ordinary samplers stall.

This is the test that exposes whether a PPL works.

The notebook builds the model with the centered parameterization first — the obvious form, the form a textbook reader would write. Then it samples the same model two ways: once with NUTS’s automatic non-centered reparameterization (ncp: true) and once without it (ncp: false). On the same data, with the same seed, with the same number of iterations:

PathDivergencesStep size
ncp: true130.334
ncp: false2030.108

A 16-fold difference in divergences. A 3-fold difference in step size. The empirical signature of the funnel pathology, observed in real time on a real eXMC sampler. The notebook tells the reader the story is coming and then the experiment runs and the story is true.

That experiment is what convinced me that the BDA port was working. The numbers in chapter 5 are not lifted from BDA3 figures. They came out of a NUTS sampler implemented in Elixir on the BEAM, and they match the textbook’s qualitative claims about funnel pathologies and partial pooling. School A’s raw measurement of +28 SAT points shrinks to a posterior estimate of 9. School C’s raw measurement of -3 shrinks up to a posterior estimate of +5. The data on each school are noisy, the borrowing of strength from the others is real, and the result is a textbook hierarchical fit.

The Grid Posterior Lives Forever

The most surprising thing about the BDA Python demos, on close inspection, is how often they avoid MCMC. Chapters 2, 3, 4, 6, 9, and 10 do not need a sampler at all. Chapter 2 is conjugate. Chapter 3 is conjugate or grid. Chapter 4 is Laplace. Chapter 6 is posterior predictive simulation, which only needs a model from which you can draw forward. Chapter 9 is integration against a known prior. Chapter 10 is rejection or importance sampling against a fixed proposal.

This is good teaching. The grid trick — evaluate the unnormalized posterior on a fine grid, normalize, sample by inverse CDF — is the most underrated tool in applied Bayesian inference. It scales to two parameters and produces an answer that no sampler will improve upon. It is also the easiest possible introduction to the mechanics of a posterior, because everything you do to the grid is something a sampler is doing internally with hidden machinery.

The bioassay model in chapter 3 is the perfect example. Four dose levels. Five animals each. The likelihood is a product of four binomials with logit-linear means. There is no closed form. There is no need for one. You evaluate log p(α, β | y) on a 100×100 grid in about a millisecond, take the maximum, subtract it for numerical stability, exponentiate, normalize, and sample by inverse CDF. The whole thing is forty lines of Elixir. The result is indistinguishable from what NUTS would produce on the same model with weak priors, and it is faster, and it is fully reproducible without any sampler diagnostics.

Chapter 4 then takes the same bioassay model and shows that Laplace's approximation — Newton's method to find the mode, finite differences to compute the Hessian, multivariate normal at the mode — gives almost the same answer in another forty lines. Chapter 5 then shows what happens when the posterior is too high-dimensional and too funnel-shaped for either grid or Laplace, and why NUTS exists. The chapters build on each other in the same order BDA3 builds on itself. The runtime is incidental.

The Stan Companion

The BDA Python demos include a directory with thirteen Stan model files. Bernoulli. Binomial. Two-group binomial comparison. Linear regression with adjustable priors. Standardized linear regression. Student-t linear regression. One-way ANOVA. Hierarchical ANOVA with common variance. Hierarchical ANOVA with per-group variance. Logistic regression with Student-t priors. Logistic regression with the horseshoe sparsity prior. Plus a couple of trivial test fixtures.

These are the canonical Stan translations of textbook models. Anyone who has spent time in the Stan documentation has seen most of them. They are the easiest possible side-by-side comparison for two PPLs.

The companion notebook puts each Stan file next to its eXMC translation, with a commentary about the mapping. Stan’s real<lower=0> sigma becomes eXMC’s Builder.rv("sigma", HalfNormal, params, transform: :log). Stan’s vector indexing into a parameter array becomes eXMC’s string parameter references. Stan’s generated quantities block becomes Elixir post-processing on the trace.

The correspondence is almost one-to-one. The two notable differences: eXMC uses string references where Stan uses block-structured declaration, and eXMC marks transforms explicitly where Stan uses constraint annotations. The same posterior comes out either way. The horseshoe prior is the only Stan model where eXMC requires more code than Stan, because eXMC doesn’t have a built-in horseshoe and the user has to construct it from its scale-mixture representation. That is also the most pedagogically valuable case in the document, because the construction makes the prior visible instead of hiding it behind a name.

Verification

Every chapter that produces a number runs through a verification script before publication. The script lives in /tmp during development. The contents are paranoid: re-derive the bioassay grid mode, re-derive Newton’s method on the same posterior, run the rejection sampler and check that observed acceptance matches 1/M within Monte Carlo noise, run importance sampling and confirm the mean estimate matches the rejection mean, run Gibbs on the bivariate normal and check the empirical correlation against the target, run Metropolis with three proposal scales and confirm that the “too small / just right / too large” story shows up in the acceptance rates.

The verification ran. Every claim held within the published tolerances. One bug surfaced during verification: chapters 3 and 4 had an Enum.with_index destructure flipped — {best_i, _} when it should have been {_, best_idx}. Patched in both notebooks before any reader would have hit it. The verification did its job.

This is the discipline that the livebook-verify skill exists to enforce. The skill says: every number that appears in prose must match the code that produces it, with a stated tolerance, in an automated assertion. If you ship a notebook claiming “119.8 vials per shift” and the cell outputs 87.3, the reader notices and the credibility loss is permanent. AI-generated content gets twice the scrutiny here. The numbers have to be right.

What This Is For

The BDA course is the best introduction to applied Bayesian inference that exists. It has been taught to thousands of students over the years, refined against teaching feedback, and produced a generation of practitioners who know what a posterior is and what it is not. The course material is freely available. Vehtari deserves the credit, and the Aalto course site is the right place to start any serious Bayesian self-education.

This Elixir port is an offering to a slightly different audience: people who already use Elixir and the BEAM, who want to learn Bayesian methods, and who would prefer not to install conda and PreliZ to do it. There are not many of those people. There are some, and they have been asking for material like this for a while, and now there is some.

A second audience: people who already know Bayesian methods, who have used PyMC or Stan or Turing.jl, and who want to see whether eXMC’s API is something they could live with. The Stan companion notebook is for them. The 8-schools chapter is for them. The translation of the centered-vs-non-centered experiment into a one-line A/B test is for them. They will form their own opinions.

A third audience: students of BDA3 who happen to find this repository, who try one chapter, and discover that the textbook math survives translation into a runtime that was designed for telephone switches. The math doesn’t care which language implements it. That is the point of mathematics. The point of this port is to make that obvious.

The Lesson

When you port a course you discover what the course is actually made of. It is not the algorithms. The algorithms are interchangeable. It is the order in which the ideas are introduced — first conjugacy, then grids, then Laplace, then MCMC, then diagnostics, then decisions. Each step is the smallest legitimate generalization of the previous one, and each step earns its complexity by failing on a problem the previous one cannot handle.

Vehtari and Gelman built a curriculum that does not waste a chapter. You can tell by trying to port it. There is no notebook in the BDA demos that the course doesn’t need. There is no chapter whose removal would not break something downstream. That kind of pedagogical economy is rare and expensive to produce, and the right response when you encounter it is to copy it carefully and not to improve on it.

Nine notebooks. Twenty-two demos. Thirteen Stan files. Eight datasets. Every number checked. The course material is the same material it was last week. The runtime is different.


The notebooks live at github.com/borodark/eXMC/tree/main/notebooks/bda. The original BDA Python demos are at github.com/avehtari/BDA_py_demos. The textbook is Bayesian Data Analysis, 3rd edition, by Gelman, Carlin, Stern, Dunson, Vehtari, Rubin — the canonical reference for everything described here.