Computer Practical Exercise on estimation of maternal, imprinting and interaction effects using the EMIM program

Overview

Purpose

In this exercise you will be carrying out an analysis of some simulated data in which there may be maternal genotype, child genotype and imprinting effects operating.

Methodology

We will use the approach described in the manuscript: ``Ainsworth HF, Unwin J, Jamison DL and Cordell HJ (2011) Investigation of maternal effects, maternal-foetal interactions and parent-of-origin effects (imprinting), using mothers and their offspring'' (Genetic Epidemiology 35:19-45)

Program documentation

EMIM documentation:

Documentation for the EMIM program can be found on the EMIM website:

http://www.staff.ncl.ac.uk/richard.howey/emim/index.html

Data overview

In the first exercise, we will be using family data consisting of a number of case/mother duos and/or case/parent trios, genotyped at three SNP loci. In addition, we will investigate whether our results can be improved by the incorporation of various different kinds of control samples.

In the second exercise, we will be following the worked example on the EMIM website.

Appropriate data

Appropriate data for this exercise is SNP genotype data for case/parent trios, case/mother duos or case/father duos. Additional genotype data can also be incorporated into the analysis from parents of cases, mothers of cases or fathers of cases (e.g. if the case itself has not been successfully genotyped) or from cases alone (e.g. if the parents have not been genotyped).

Greater efficiency can also be achieved by the incorporation of one or more types of control sample into the analysis, provided we are not worried about population stratification. The types of control sample that can be included are either the parents (mother and father) of controls, control/mother duos, control/father duos or just individual controls. Provided the disease is rare, these controls can either be genuine unaffected or population-based (of unknown disease status) controls . If the disease is common, then the controls should be population-based controls.

Instructions for Exercise 1

Data files

The data is contained in the files:

casemotherduos.dat
caseparenttrios.dat
conmotherduos.dat
conparents.dat
cons.dat
emimmarkers.dat
emimparams.dat
wtccccons.dat

Hopefully these files will all already be in the EMIM-EX1 subdirectory.

Data format

The format of the data files is described on the EMIM website. Read through the appropriate section of the website:

http://www.staff.ncl.ac.uk/richard.howey/emim/emim.html

Then take a look at the data files (e.g. using the command more *.dat ), and check that you understand how the data are coded.

In most cases, the data files contain 500 units of the appropriate type (e.g. 500 case/parent trios, 500 case/mother duos, 500 controls, etc. etc.) The file wtccccons.dat contains a larger set of controls, that might represent common controls from a population-based resource, such as the 3000 controls that were used in the Wellcome Trust Case Control Consortium (WTCCC)).

The file emimparams.dat specifies the datafiles that will be read by the EMIM program, the assumptions that will be made during the analysis, and the parameters that are to be estimated. The initial settings that we have chosen in this file tell the program just to use the data file for case/mother duos, to assume Hardy-Weinberg Equilibrium and random mating, and to estimate a child's genotype effect only.

Take a look at emimparams.dat and check that you understand how the lines in this file force the above settings to be implemented.

The data format required by EMIM is slightly inconvenient for those used to working with standard LINKAGE or PLINK format files. Luckily we have another program, PREMIM, that can be used to generate EMIM format input files from PLINK format data. We will see an example of this in Exercise 2.

Step-by-step instructions

To run EMIM under the initial settings as described above, from the directory where the data is kept type:

emim

The program should run briefly and produce two output files, emimsummary.out and emimresults.out . The file emimsummary.out is harder to read (although it can provide a useful summary if you are analysing a large number of SNPs). We will look at the file emimresults.out as this gives a more detailed overview of the results.

Take a look at emimresults.out. Results are given for each of the 3 SNPs in turn. First we see the parameter estimates under the null hypothesis that all effects are 0 (i.e. all relative risk parameters=1, or all log relative risk parameters=0). Then we have the parameter estimates under the alternative hypothesis that you specified (i.e. that child's genotype effects are non-zero). As well as parameter estimates, the program outputs:

a 95% CI for each of the estimated parameters of interest
the maximised log likelihoods for the alternative and null models
twice the difference between these, This can be used to compare the two models (i.e. to test the null hypothesis) by comparing to a chi-squared on the appropriate df (in this case 2 df).

emimparams.dat

caseparenttrios.dat

casemotherduos.dat

emim

emimresults.out

emimparams.dat

casemotherduos.dat

caseparenttrios.dat

cons.dat

emimparams.dat

conparents.dat

cons.dat

casemotherduos.dat

emimparams.dat

casemotherduos.dat

caseparenttrios.dat

conmotherduos.dat

conparents.dat

cons.dat

emimparams.dat

cp cons.dat originalcons.dat

cp wtccccons.dat cons.dat

Instructions for Exercise 2

Now go to the EMIM website and follow the tutorial:

http://www.staff.ncl.ac.uk/richard.howey/emim/example.html

You should find the data you need has already been downloaded for you, in the EMIM-EX2 subdirectory.

Comments

Advantages/disadvantages

This type of modelling is more complicated than basic association testing, but it allows you to consider more complex models/mechanisms.

Study design issues

Family data has the advantage of being generally more robust (than case/control data) to poulation stratification. It also allows investigation of more complex effects e.g. imprinting. But it may be harder to collect families than cases and controls.

Other packages

Models similar to the ones described here can also be fit using SAS code available from Clarice Weinberg, or by using the program LEM (van Den Oord and Vermunt, 2000)

References

Ainsworth HF, Unwin J, Jamison DL and Cordell HJ (2011) Investigation of maternal effects, maternal-foetal interactions and parent-of-origin effects (imprinting), using mothers and their offspring. Genet Epidemiol 35:19-45.

Cordell HJ, Barratt BJ and Clayton DG (2004) Case/pseudocontrol analysis in genetic association studies: a unified framework for detection of genotype and haplotype associations, gene-gene and gene-environment interactions and parent-of-origin effects. Genet Epidemiol 26:167-185.

Shi M, Umbach DM, Vermeulen SH, Weinberg CR (2008) Making the most of case-mother/control-mother studies. Am J Epidemiol 168:541-7.

van Den Oord EJ, Vermunt JK (2000) Testing for linkage disequilibrium, maternal effects, and imprinting with (In)complete case-parent triads, by use of the computer program LEM. Am J Hum Genet 66:335-8.

Vermeulen SH, Shi M, Weinberg CR, Umbach DM (2009) A hybrid design: case-parent triads supplemented by control-mother dyads. Genet Epidemiol 33:136-44.

Weinberg CR, Wilcox AJ, Lie RT (1998) A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 62:969-78.

Weinberg CR (1999) Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am J Hum Genet 65:229-35.

Exercises prepared by: Heather Cordell
Checked by:
Programs used: EMIM, PREMIM, R
Last updated: