A basic data reading statement, which will use the default NA for missing values:
data file=ageweight.txt mouseID sex age weightwhich will read 4 columns from the file "ageweight.txt" and assign the names "mouseID", "sex", "age" and "weight" to the 4 columns. For this basic example there will be no header in the file, and the data in the file must be separated by spaces or tabs. The input file could for instance look like:
File names with a full and relative path work in linux and mac, and probably also in Windows. File names with a space cannot be used, bayz will stop reading at the first space and will not get the complete name. Some examples:
file=/usr/home/xxxx/datamouse/ageweight.txt file=../datamouse/ageweight.txt # not working because of spaces in the name: file=age and weight.txt
data file=ageweight.txt missing=-999 mouseID sex age weight
The field names that are in the bayz script for the Basic Example can also be in the data input file. This allows for use of files with a "header" line. This is specified as:
data file=ageweight.txt -headerNow the input file could like like:
Note: in version 2.5 the use of a header line does not work (well) when reading so-called "blocks-fields" (see...).
data mouseID file=weight.txt mouseID sex weight data mouseID file=dietinfo.txt mouseID diet
For example, the following two files will be merged as shown, and will allow to make a model with weight explained by diet (header line with field names is not in the input file):
Notes about merging:
The merging feature can be used to filter (select) data by making the first file in a merge-sequence a file only containing the list of IDs to keep, as in the following example.
data mouseID file=keepids.txt mouseID data mouseID file=ageweight.txt mouseID sex age weight
See the example below for the effect of this kind of merging (header line with field names is not in the input files):
With one phenotype per ID (sample), add the ID-name used at the data statement for reading the phenotype file and the same ID-name at the data statement for the pedigree file. With the special flag 'ped' at the data statement for the pedigree file bayz does not merge these files as in the regular merging procedure, but keeps all records in both files. When phenotypes are repeated a hierarchical model is needed. In the pedigree data, the ID must be in the first columns (this is not required for all other files), and parents must be in columns 2 and 3 (any names can be given to the parent columns). More columns can follow which will be ignored.
data mouseID file=ageweight.txt mouseID sex age weight data mouseID ped file=mouse.ped mouseID father mother
By default bayz does not compute/ignores inbreeding in construction of the A-1 relationship matrix. Add the -inbred flag at the file name to make bayz compute and include inbreeding:
data mouseID ped file=mouse.ped -inbred mouseID father mother
These examples assume that there is one phenotype observed per ID (sample), and one genotype that is available in a separate file. The standard bayz merge is used to combine phenotype and genotype data. For the case where repeated phenotypes are available per ID a hierarchical model can be used, or the phenotype and genotype data should be merged before (copying genotype data for every repeated ID in the phenotype data). The genotype data is defined as a block-field (a field name with two square brackets) that allows to include all genotypes in the model with a single model term (add or dom).
data cow file=milkdata.txt cow milk fat prot Data id file=cowgenotypes.txt cow geno !biallelic data geno map file=snpnames.txt geno
Biallelic coding is the format of plink 'ped' files, the following would use plink ped-map files where the 'ped' file has 6 additional fields, and the map file has chromosome, SNP name, genetic map position and physical (base pair) map position. Note that for bayz the column of SNP names in the map file must get the same name as the name used for the genotype block-field, in this example 'geno'.
data cow file=milkdata.txt cow milk fat prot Data id file=cowgenotypes.ped famid cow sire dam sex pheno geno !biallelic data geno map file=cowgenotypes.map chrom geno cmdist bpdist
Use the genot012 flag to indicate the input file contains genotypes coded as 0,1,2 for homozygote, heterozygote and homozygote. Missing genotype is default NA.
cow geno !genot012
Bayz has automatic edits for Minor Allele Frequency at 1% and missing rate at 20%. To modify the default settings add flags on the map file: