image is not displayed...


merlin-offline

What is it and how do I get it?

MERLIN-OFFLINE is an undocumented program created by Yun Li and Gonçalo Abecasis. It is distributed with MERLIN (http://www.sph.umich.edu/csg/abecasis/Merlin/) so if you've downloaded and installed merlin you already have offline.

By default merlin infers missing genotypes using the information from other family members. Merlin-offline was designed to save these inferred genotypes and analyse this imputed data. There is a small amount of information about the way merlin infers at the bottom of this page http://www.sph.umich.edu/csg/abecasis/Merlin/tour/assoc.html .

As genotype inference is a form of imputation, this means we can co-opt merlin-offline into analysing imputed genotypic data that has been produced using Mach/MiniMac or other imputation programs. This is alluded to on this page http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output where they say "Please note that mach2dat analyze only unrelated samples. If you input a pedigree with family relationship, those will be ignored. If you have family data, you can use merlin-offline".

Input files

To get your imputed data to run in Merlin-offline it needs to be reformatted. The easiest way to convert your data is to use the perl scripts I wrote which are available from here .

To run merlin-offline you will need a map file, a dat or skip file, a freq file and pedinfer files, plus your phenotypic map and dat files in standard merlin format (http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html) . Remember NEVER include phenotypes for individuals who have not been genotyped in a merlin or merlin-offline analysis - their genotype will be imputed and they will be included in the analysis!

To correctly model relatedness and zygosity (if required) you will also need to provide a ped file that 'links' the family. This file has 6 columns - FID IID PID MID Sex and Zygosity.

Non twin individuals have a Zygosity code 0. If you have data from twins, assign each identical pair of twins in the family an odd number (ie the first set of MZ twins would both have a 1 for zygosity, a second set would have 3 for both twins) and each non-identical twin pair an even number (ie the first set of DZ twins would both have a 2 for zygosity, a second set would have 4 for both twins). (http://www.sph.umich.edu/csg/abecasis/QTDT/docs/twins.html)

The ped file needs to contain data for all genotyped individuals who have phenotypes and also all non-genotyped individuals who are named as parents. For example if we had phenotypes and genotypes for the following individuals (Note there is no header line in the actual file).

FID IID PID MID Sex Zyg
111 03 01 02 1 2
111 04 01 02 2 2

then we would also need to include their parents in the family structure ped file ie
111 01 0 0 1 0
111 02 0 0 2 0

This information is used to correct for the relatedness between individuals 111-03 and 111-04. This file can be made using the following code. Please note you will need to update the zygosity information for your participants by hand

awk '{print $1, $2, $3, $4, $5, "0"}' pheno.ped > temp
awk '{print $1, $3, "0 0 1 0"}' pheno.ped >> temp
awk '{print $1, $4, "0 0 2 0"}' pheno.ped >> temp
sort temp | uniq > familystructure.ped
echo "Z zygosity" > familystructure.dat

If your sample does not include twins exclude the Zygosity variable from the analysis by putting an S in the dat file instead of a Z.

Note offline only accepts continuous variables coded as T in the dat file. If you want to coerce it into running a binary trait code the controls as zero and and the cases as one.

Just like merlin offline can 'comma-merge' files on the fly so you don't need to merge the family structure and phenotype files which makes it easy to run multiple traits without remaking files.

Example usage:

merlin-offline64 -m infer_format.map.gz -f infer_format.freq.gz --pedinfer infer_AdolAdult.ped.gz --datinfer infer_format.dat.gz -p familystructure.ped,pheno.ped -d familystructure.dat,pheno.dat --useCovariates --tabulate --prefix myresults > myresults.log

For whole genome analysis
#!/bin/bash
# loop over chromosomes
for ((i=1; i<=22; i++))
do
# loop over parts
for ((j=1; j<=23; j++))
do
# exclude parts that are not present
if test -f infer_format_"$i"."$j".dat.gz
then
# run merlin
merlin-offline64 -m infer_format_"$i".map.gz -f infer_format_"$i".freq.gz --pedinfer infer_AdolAdult_"$i"."$j".ped.gz --datinfer infer_format_"$i"."$j".dat.gz -p familystructure.ped,pheno.ped -d familystructure.dat,pheno.dat --useCovariates --tabulate --prefix chr"$i"_"$j" > chr"$i"_"$j".out
fi
done
done

Using merlin-offline with 1KGP imputed data

To use merlin-offline to analyse 1KGP imputed data you will need to download and unzip the merlin source code, download an edited version of one of the files, and compile the program using the following code - we recommend you compile this program in your personal bin directory.
(Note: this version of the PedigreeGlobals file has been edited to accept 3 additional allele codes D (deletions), I (insertions) and R (reference) to allow for the analysis of structural variation.)

wget "http://www.sph.umich.edu/csg/abecasis/merlin/download/merlin-1.1.2.tar.gz"
tar -zxvf merlin-1.1.2.tar.gz
cd merlin-1.1.2/libsrc
wget "http://genepi.qimr.edu.au/staff/sarahMe/mach2merlin/PedigreeGlobals.cpp"
mv PedigreeGlobals.cpp.1 PedigreeGlobals.cpp
cd ../
make all