# Setting up similarity matrices and matrices of covariates

More advanced models in bayz use additional data outside the main data frame. The data
involved consists of:

- the main data, which is specified at the data= argument in the bayz call.

- similarity / kernel matrices that are used in random model-terms at the V= argument.

- matrices with large numbers of covariates for random regression or shrinkage fits that are
used in the rr() model-term.

The similarity / kernel matrices and matrices with covariates need to be set-up **with IDs on
the row-names to match to an ID in the main data**. A large advantage of this is that Bayz
makes the match, including matching of replicated IDs from the main data, and automatically
leaves out unmatched levels. There is thus no need to adapt the contents of these often
large matrices to exactly match different selection of data; also it is not needed to copy
lines/columns in these large matrices to match replicated IDs in the main data.

## similarity / kernel matrices

In a model with random effects such as:

fit = bayz(y~rn(Variety, V=Grel) + ....)

All the levels (names) occuring in the variable Variety must match to a row-name in the Grel matrix.
The opposite is not needed, i.e. Grel may contain additional rows/columns for Varieties that are
not in the data, and those will simply not be used by bayz.

## Matrices with covariates

A matrix with covariates can be used in the rr() (random regression and shrinkage) model-term.
Also here row-names should be attached to the matrix to link it to and some field (some ID) in
the data. The bayz model looks like:

fit = bayz(y~rr(Variety/Metab) + ....)

where Variety is the ID that links rows from the 'metab' matrix to the data, and in this
example 'metab' could be a matrix with measures metabolite in each Variety sample.
As with similarity / kernel matrices, every Variety that is in the data must have a
matching row in the covariate matrix, but the covariate matrix may contain additional
Varieties, which will be ignored by Bayz.

## Missing data in covariates

Missing data in matrices with covariates must be specified as NA (R NA value). The missing cells
will be replaced by their column means (aka imputation by the mean), so that no data will
drop out due to missing covariates. If you like to use more advanced methods to impute missing
data, this will need to prepared outside bayz.