Basics for reading bayz output in R and convergence checks using R coda
### Reading bayz output in R

### Using R Coda

A set of R functions is supplied with bayz that allows to read the samples from bayz out-file.

To load the set of R function supplied with bayz use the R command:

source("http://www.bayz.biz/bayzRfunc.R")

Output samples can be retrieved for one parameter at a time. The R function to read output is:

bayz.read("jobname","parameter-name")

or for linux and unix system a faster version is:

bayz.read.fast("jobname","parameter-name")

The "jobname" should be the name of the job file that was run, without ".bayz" extension. The parameter name to be given should be a name as on the left hand sides of the model lines. Hence, if uncertain about the parameter names, open the bayz job file that was run, and copy the desired name from the left part of the model lines. The R functions also needs the lab-file (labels) for some functions.

It is also possible to get a list of the parameters in a job file with the R comnand:

bayz.param("jobname")

When paramerters are retrieved, the labels are not automatically attached. If you want to have those, retrieve them using:

mypar_labels <- bayz.labels("jobname","parameter-name")

You can for instance attach these labels as the column names of for the parameter samples read with bayz.read.

It is recommended to study well convergence and Monte Carlo (MC) error of the Bayesian MCMC based procedures. The MC error is often reported as 'standard error' (SE), not to be confused with the posterior standard deviation (SD). Bayz does not include features to check convergence, because this is quite a special topic with many personal 'flavours'. And there are many software tools available for convergence diagnosis, for instance in the R Coda package. Bayz output read in R can be converted to 'mcmc' objects for the Coda package using the supplied function 'bayz2mcmc'.

The simplest two convergence checks that can be done using R Coda are computation of the MC error, which is available using summary (reported as 'time series SE'), and computation of effective sample size, which is available using effectiveSize. The effective size computation is based on 'batching'. The two approaches are usually producing quite similar results, i.e. SE is about SD/sqrt(effective size). A small R session follows:

>source("http://www.bayz.biz/bayzRfunc.R") >library(coda) >varE <- bayz.read.fast("bivarmodel","varE") >summary(bayz2mcmc(varE)) Mean SD Naive SE Time-series SE var.bodyw 12.1322 0.59893 0.0189399 0.0206664 var.feedint 239.3156 14.69610 0.4647314 0.5797327 cov.feedint.bodyw 38.3988 2.62449 0.0829935 0.0900629 cor.bodyw.feedint 0.7124 0.02287 0.0007231 0.0006093 > effectiveSize(bayz2mcmc(varE)) var.bodyw var.feedint cov.feedint.bodyw cor.bodyw.feedint 839.8992 642.6114 849.1739 1408.4700

In this case, the Time Series SE on the the correlation estimate (cor.bodyw.feedint) is 0.0006 which means it would be very OK to report the correlation estimation with 2 decimals, and somewhat dubious to report the correlation with 3 decimals (the SE is only just below 0.001).