Bayz Manual

Using data with repeated observations

With repeated observations bayz can't automatically merge repeated phenotype data with data like genotypes, because there will be repeated IDs in the phenotype file. One option is to merge this kind of data yourself, but it requires to copy data like genotypes several times in the phenotype file. This creates unnecessarily big files and also increases computing time. Read further below how bayz can handle this situation efficiently using hierarchical models. The hierarchical model, in a way, merges the phenotype and genotype data "in the model", instead of literally combining/merging the data.

Hierarchical model for repeated observations with SNP-BLUP model

The key elements of this approach are to make a two-level model where:

  1. on the first model level, ID effects (animal, sample, variety, line) are modelled where all the repeated observations on the same ID will be linked to one ID effect
  2. on the second level the ID effects are unique and can be linked to genotypes; this part models how much of the total ID effect can be explained by the genotypes for that ID.
  3. on the second model level residuals remain, which are the part of ID effects not explained by genotypes, see below on the interpretations of this residual ID effect.
Below is an example for using a SNP-BLUP model with repeated phenotypes. From this example, variations such as Bayesian LASSO and mixture models can be easily developed.
   data                                  (1)
     mouseID age weight
   data mouseID                          (2)
     mouseID genot[] !genot012
   data genot map
     weight = mean fac.mouseID           (3)
     fac.mouseID.weight = add.genot      (4)
     resid.weight ~ ......               (5)
     resid.fac.mouseID.weight ~ ......   (6)
  1. here no mergekey should be / can be given, because weights are repeated and mouseID are not unique in the weight data file.
  2. here a mergekey is added to allow bayz to merge the mouseID's "in the model".
  3. first level model with repeated weights using a regular factor to fit mouseID effects.
  4. second level model using the mouseID effects as 'response' on the left side and modelling additive effects of genotypes to explain the mouseID effects.
  5. usual residuals for weight observations
  6. residuals on the second model level

Interpretation of residual ID effects

The hierarchical model will have residuals on two levels: the regular residual on the phenotype level, and a second residual on the ID level (animal ID or plant genotype/line/variety). When the second model level includes genetic or whole-genome genomic effects, the residual ID effect models covariance between repeated observations not explained by genetic/genomic effects. In animals, with repeated measures on the same individual, this is typically interpreted as permanent environmental (PE) effects, although in principle also including non-additive genetic effects that cannot be distinguished from PE effects. In plants, the situation often is that the repeated phenotypes are measured on the same genotype but not literally on the same plants, which means there is no environmental correlation. This allows to interpret the residual ID variance as non-additive genetic variance that can be included in a broad sense heritability.

Shortening parameter names

The parameter names in hierarchical models will become quite long, because the left hand side like "fac.mouseID.weight" is used as the trait name and appended to all effects modelled on the second level. The parameters can be re-named to condense all names, as in the following example changing the name for "fac.mouseID.weight" to "anim":

     weight = mean fac.mouseID !name=anim
     anim =
     resid.weight ~ ......
     resid.anim ~ ......