In today's practical we will investigate methods for positioning a disease locus on a known map of marker loci, using information from all the linked markers simultaneously. In these methods we assume that we know the genetic distances (and hence recombination fractions) between the markers. We also assume we know the underlying disease model (e.g. recessive, dominant etc). We fix a position for the disease locus and calculate the overall likelihood for the disease and marker data, assuming the disease locus position is correct. We then repeat the analysis with the disease locus positioned at different locations in relation to the known markers. In this way we construct a multipoint LOD score curve across the region: the position where the LOD score is maximum is the best estimate of the disease locus location.
We will begin by analysing 4 families
which are believed to be segregating for a recessive disease locus.
The families are typed at 3 linked marker loci which we shall call markers
2, 3 and 4. The pedigree data is in the file
rec2-3-4.txt
Take a look at the pedigree file. Each line gives the data for a single person. Data is ordered in columns corresponding to family, id (within family), id of father, id of mother, sex (male=1, female=2), affection status (1=unaffected, 2=affected), and genetic data (3 loci, each with 2 alleles). A zero indicates missing or unknown data.
To perform the analysis in Genehunter, we need an additional file called the "locus datafile":
recessive.txt
This file has a slightly cryptic structure, which we don't have time to go into in detail today. Basically it gives information about the different loci in the pedigree file (the markers and the assumed disease locus causing the disease phenotype), such as their allele frequencies, penetrances and genetic map positions on the chromosome.
Take a look at the file and find some lines
1 2 << AFFECTATION LOCUS, NO. OF ALLELES
0.950000 0.050000 << GENE FREQUENCIES
1 << NO. OF LIABILITY CLASSES
0.001000 0.001000 0.999000
These lines give information about the assumed disease locus. They tell the program that the frequency of the disease allele (D) is 0.05 (5%) and that it follows a recessive model. The recessive model is implied by the last line above, which gives the penetrances for genotypes dd, dD, DD respectively.
Make a new directory (folder) in your home space and save the above files in it. You will also need to save a copy of the Genehunter program and an additional Cygwin library in the same directory:
gh.exe
cygwin1.dll
To start with, you will need to open up an MSDOS window (Click on Start, Run, then type cmd ). Once the window has opened, type dir to see all the files and directories
(folders) that are in your home space, and move into the directory where you saved the data files e.g. by typing
cd xxxxx
(where xxxxx is replaced by the name of the appropriate folder).
Type dir again to check the required files are available in the directory.
1. Start up Genehunter by typing gh
2. Read in the locus data by typing
load markers recessive.txt
3. Set analysis to parametric LOD score analysis
by typing analysis lod
4. Set the program to calculate likelihoods from
26 cM (recombination fraction approx 0.2
on either side of markers 2 and 4, by typing off end 26
5. Read in the pedigree data and analyse each family
by typing scan pedigrees rec2-3-4.txt
6. Activate graph drawing capability by typing postscript on
7. Keep a log of the results that you are about to obtain by typing photo results2-3-4.txt
8. Total results for all families by typing total stat
Suggested names for the output files are lod2-3-4.ps
and info2-3-4.ps
9. Type q to get out of Genehunter
Take a look at the output files you have just greated. Is there evidence for linkage between a disease locus and the markers in this region? Where is the most likely position of the disease locus?
The advantage of using Genehunter is that it can calculate
likelihoods even when there are a large number of markers typed,
as long as the families are not too large.
Repeat your Genehunter analysis using the pedigree file
recmulti-ped.txt
and locus datafile
recmulti-loc.txt.
Suggested output file names are lodmulti.ps
and infomulti.ps . This data is for the
same four families but now the marker map has been extended so that
the families are typed at 10 linked
markers (named marker 1-10) as opposed
to just at markers 2-4)
Compare your lod score plots
to those obtained with the previous analysis
using markers 2-4 only. How has using the full
10-marker map improved your results?
Repeat your Genehunter analysis of 10 markers using the locus datafile
dommulti-loc.txt, which incorrectly
assumes that the disease model is dominant. (Take a look at the file and see if you can find the line that tells the program that the disease model is dominant).
Suggested output file names are loddom.ps
and infodom.ps . How has assuming a dominant mode of inheritance
altered your results?
Genehunter documentation is available here: ghdocumentation.ps