Stan Source Code Blocks in Org Mode
Org Mode support for Stan
Introduction
Stan is a language used to write models for Bayesian inference. After specifying the model, a number of interfaces are available to run it. All of these interfaces have Babel support, making it easy to feed them the result of the Stan block.
Requirements and Setup
ob-stan.el is currently available on org-contrib.
In addition the following components must be installed:
At least one Stan interface
Each Stan interface has specific installation instructions available from Stan's interfaces page.
- Stan mode, which provides Emacs support for Stan and is available on MELPA
Activate evaluation of Stan source code blocks by adding stan
to
org-babel-load-languages
. (The snippet below will usually contain
items for several other languages, including the interface that you
plan to use to drive Stan.)
(org-babel-do-load-languages 'org-babel-load-languages '((stan . t)))
Org Mode Features for Stan Source Code Blocks
Header Arguments
- file
- An output file must be specified to evaluate Stan code
for use as input to other interfaces. If
org-babel-stan-cmdstan-directory
is non-nil and the file name does not have a ".stan" extension, write the contents to an intermediate file and compile the model to the named file (for use from the command line). Otherwise, dump the block content to the specified file name (to be used as input to all other interfaces).
Sessions
Stan does not support sessions.
Result Types
Stan source code blocks return a link to a file, which can be used as a variable in other blocks to supply the file name to a Stan sampling call.
Examples of Use
Stan blocks allow you to edit the model in Stan mode while keeping the Stan code in the Org file rather than in a separate file. The details of how to sample with the model will depend on the interface, but the main idea is the same for all interfaces except for CmdStan. Most Stan interfaces accept the model either as a string (in the language of the interface) or a file name of a ".stan" file that contains the model. For CmdStan, on the other hand, a ".stan" file is compiled into an executable that can be run from the command line. In this case, the Stan block is dumped to an intermediate file that is compiled to the model program using the Makefile provided by CmdStan.
The first example below demonstrates how to use a Stan block in the R interface to Stan, and the second example uses the same model with the CmdStan interface.
Sampling with RStan interface
The Stan block below specifies a simple model that when executed will
be written to :file
, which can be used to refer to it downstream.
#+name: model-stan #+begin_src stan :file model.stan data { int<lower=1> N; vector[N] x; } parameters { real mu; real<lower=0> sigma; } model { x ~ normal(mu, sigma); } #+end_src #+RESULTS: model-stan file:model.stan
Generate some data that will be used as input to the model.
#+begin_src R :session *R* :result silent set.seed(33) N <- 50 x <- rnorm(N, 20, 3) #+end_src
Load RStan, and provide the model file name and data as arguments to the sampling function call.
#+begin_src R :session *R* :var model=model-stan :results output library(rstan) fit <- stan(file=model, data=list(N=N, x=x), chains=1) #+end_src #+RESULTS: #+begin_example COMPILING THE C++ CODE FOR MODEL 'model' NOW. SAMPLING FOR MODEL 'model' NOW (CHAIN 1). Chain 1, Iteration: 1 / 2000 [ 0%] (Warmup) Chain 1, Iteration: 200 / 2000 [ 10%] (Warmup) Chain 1, Iteration: 400 / 2000 [ 20%] (Warmup) Chain 1, Iteration: 600 / 2000 [ 30%] (Warmup) Chain 1, Iteration: 800 / 2000 [ 40%] (Warmup) Chain 1, Iteration: 1000 / 2000 [ 50%] (Warmup) Chain 1, Iteration: 1001 / 2000 [ 50%] (Sampling) Chain 1, Iteration: 1200 / 2000 [ 60%] (Sampling) Chain 1, Iteration: 1400 / 2000 [ 70%] (Sampling) Chain 1, Iteration: 1600 / 2000 [ 80%] (Sampling) Chain 1, Iteration: 1800 / 2000 [ 90%] (Sampling) Chain 1, Iteration: 2000 / 2000 [100%] (Sampling) # Elapsed Time: 0.007173 seconds (Warm-up) # 0.005748 seconds (Sampling) # 0.012921 seconds (Total) #+end_example
Sampling with CmdStan interface
To use the CmdStan interface, set org-babel-stan-cmdstan-directory
to the top-level directory of the CmdStan source code.
#+begin_src elisp :results silent (setq org-babel-stan-cmdstan-directory "/path/to/cmdstan/source/") #+end_src
Modify the Stan block from above, removing the ".stan" extension from the file name. Executing the block now compiles the Stan code the specified file. (If the extension is not removed, the block will be executed as in the above example.)
#+name: model #+begin_src stan :file model data { int<lower=1> N; vector[N] x; } parameters { real mu; real<lower=0> sigma; } model { x ~ normal(mu, sigma); } #+end_src #+RESULTS: model file:model
Before running the model, dump the data generated in the last section to a file that can be passed as a command line argument.
#+begin_src R :session *R* :results silent stan_rdump(c('N', 'x'), 'normal.data.R') #+end_src
Finally, call the compiled program from a shell block.
#+begin_src sh :results output verbatim ./model sample data file=normal.data.R #+end_src #+RESULTS: #+begin_example method = sample (Default) sample num_samples = 1000 (Default) num_warmup = 1000 (Default) save_warmup = 0 (Default) thin = 1 (Default) adapt engaged = 1 (Default) gamma = 0.050000000000000003 (Default) delta = 0.80000000000000004 (Default) kappa = 0.75 (Default) t0 = 10 (Default) init_buffer = 75 (Default) term_buffer = 50 (Default) window = 25 (Default) algorithm = hmc (Default) hmc engine = nuts (Default) nuts max_depth = 10 (Default) metric = diag_e (Default) stepsize = 1 (Default) stepsize_jitter = 0 (Default) id = 0 (Default) data file = normal.data.R init = 2 (Default) random seed = 2115254906 output file = output.csv (Default) diagnostic_file = (Default) refresh = 100 (Default) Gradient evaluation took 4e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 100 / 2000 [ 5%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 300 / 2000 [ 15%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 500 / 2000 [ 25%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 700 / 2000 [ 35%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 900 / 2000 [ 45%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) # Elapsed Time: 0.012414 seconds (Warm-up) # 0.025817 seconds (Sampling) # 0.038231 seconds (Total) #+end_example