Computer Practical Exercises on Parametric Linkage Analysis using the Genehunter program

Introduction

In today's practical we will investigate methods for positioning a disease locus on a known map of marker loci, using information from all the linked markers simultaneously. In these methods we assume that we know the genetic distances (and hence recombination fractions) between the markers. We also assume we know the underlying disease model (e.g. recessive, dominant etc). We fix a position for the disease locus and calculate the overall likelihood for the disease and marker data, assuming the disease locus position is correct. We then repeat the analysis with the disease locus positioned at different locations in relation to the known markers. In this way we construct a multipoint LOD score curve across the region: the position where the LOD score is maximum is the best estimate of the disease locus location.

Data overview

We will begin by analysing 4 families which are believed to be segregating for a recessive disease locus. The families are typed at 3 linked marker loci which we shall call markers 2, 3 and 4. The pedigree data is in the file rec2-3-4.txt

Take a look at the pedigree file. Each line gives the data for a single person. Data is ordered in columns corresponding to family, id (within family), id of father, id of mother, sex (male=1, female=2), affection status (1=unaffected, 2=affected), and genetic data (3 loci, each with 2 alleles). A zero indicates missing or unknown data.

To perform the analysis in Genehunter, we need an additional file called the "locus datafile": recessive.txt

This file has a slightly cryptic structure, which we don't have time to go into in detail today. Basically it gives information about the different loci in the pedigree file (the markers and the assumed disease locus causing the disease phenotype), such as their allele frequencies, penetrances and genetic map positions on the chromosome.

Take a look at the file and find some lines

1 2 << AFFECTATION LOCUS, NO. OF ALLELES
0.950000 0.050000 << GENE FREQUENCIES
1 << NO. OF LIABILITY CLASSES
0.001000 0.001000 0.999000


These lines give information about the assumed disease locus. They tell the program that the frequency of the disease allele (D) is 0.05 (5%) and that it follows a recessive model. The recessive model is implied by the last line above, which gives the penetrances for genotypes dd, dD, DD respectively.

Make a new directory (folder) in your home space and save the above files in it. You will also need to save a copy of the Genehunter program and an additional Cygwin library in the same directory:

gh.exe
cygwin1.dll

Step-by-step instructions

To start with, you will need to open up an MSDOS window (Click on Start, Run, then type cmd ). Once the window has opened, type dir to see all the files and directories (folders) that are in your home space, and move into the directory where you saved the data files e.g. by typing

cd xxxxx

(where xxxxx is replaced by the name of the appropriate folder).

Type dir again to check the required files are available in the directory.

1. Start up Genehunter by typing gh

2. Read in the locus data by typing load markers recessive.txt

3. Set analysis to parametric LOD score analysis by typing analysis lod

4. Set the program to calculate likelihoods from 26 cM (recombination fraction approx 0.2 on either side of markers 2 and 4, by typing off end 26

5. Read in the pedigree data and analyse each family by typing scan pedigrees rec2-3-4.txt

6. Activate graph drawing capability by typing postscript on

7. Keep a log of the results that you are about to obtain by typing photo results2-3-4.txt

8. Total results for all families by typing total stat
Suggested names for the output files are lod2-3-4.ps and info2-3-4.ps

9. Type q to get out of Genehunter



Take a look at the output files you have just greated. Is there evidence for linkage between a disease locus and the markers in this region? Where is the most likely position of the disease locus?

The advantage of using Genehunter is that it can calculate likelihoods even when there are a large number of markers typed, as long as the families are not too large. Repeat your Genehunter analysis using the pedigree file recmulti-ped.txt and locus datafile recmulti-loc.txt. Suggested output file names are lodmulti.ps and infomulti.ps . This data is for the same four families but now the marker map has been extended so that the families are typed at 10 linked markers (named marker 1-10) as opposed to just at markers 2-4)

Compare your lod score plots to those obtained with the previous analysis using markers 2-4 only. How has using the full 10-marker map improved your results?

Repeat your Genehunter analysis of 10 markers using the locus datafile dommulti-loc.txt, which incorrectly assumes that the disease model is dominant. (Take a look at the file and see if you can find the line that tells the program that the disease model is dominant). Suggested output file names are loddom.ps and infodom.ps . How has assuming a dominant mode of inheritance altered your results?

Genehunter documentation:

Genehunter documentation is available here: ghdocumentation.ps