Setting up similarity matrices and matrices of covariates

More advanced models in bayz use additional data outside the main data frame. The data involved consists of:
- the main data, which is specified at the data= argument in the bayz call.
- similarity / kernel matrices that are used in random model-terms at the V= argument.
- matrices with large numbers of covariates for random regression or shrinkage fits that are used in the rr() model-term.
The similarity / kernel matrices and matrices with covariates need to be set-up with IDs on the row-names to match to an ID in the main data. A large advantage of this is that Bayz makes the match, including matching of replicated IDs from the main data, and automatically leaves out unmatched levels. There is thus no need to adapt the contents of these often large matrices to exactly match different selection of data; also it is not needed to copy lines/columns in these large matrices to match replicated IDs in the main data.

similarity / kernel matrices

In a model with random effects such as:

fit = bayz(y~rn(Variety, V=Grel) + ....)

All the levels (names) occuring in the variable Variety must match to a row-name in the Grel matrix. The opposite is not needed, i.e. Grel may contain additional rows/columns for Varieties that are not in the data, and those will simply not be used by bayz.

Matrices with covariates

A matrix with covariates can be used in the rr() (random regression and shrinkage) model-term. Also here row-names should be attached to the matrix to link it to and some field (some ID) in the data. The bayz model looks like:

fit = bayz(y~rr(Variety/Metab) + ....)

where Variety is the ID that links rows from the 'metab' matrix to the data, and in this example 'metab' could be a matrix with measures metabolite in each Variety sample. As with similarity / kernel matrices, every Variety that is in the data must have a matching row in the covariate matrix, but the covariate matrix may contain additional Varieties, which will be ignored by Bayz.

Missing data in covariates

Missing data in matrices with covariates must be specified as NA (R NA value). The missing cells will be replaced by their column means (aka imputation by the mean), so that no data will drop out due to missing covariates. If you like to use more advanced methods to impute missing data, this will need to prepared outside bayz.