Non-Parametric Linkage Analysis using the Genehunter program

Introduction and data overview

In today's practical we will be analysing a data set that was generated as part of a genome-wide screen for Multiple Sclerosis For the first analysis we will be using a data set that consists of 37 nuclear families, each with a single affected sib pair. The parents are typed and fully informative at the locus of interest (HLA-DQ). The pedigree data is in the linkage-format pedigree file MS-ped.txt and the associated locus data file is MS-loc.txt. The locus data file will include some parameters for an assumed disease model that will be ignored when we perform non-parametric (model-free) analysis in Genehunter (but would be used if we were to perform parametric (model-based) analysis in Genehunter)

Step-by-step instructions

1. Save the data files above in the appropriate folder where you saved the Genehunter program.

2. Open up an MSDOS window (Click on Start, Run, then type cmd ). Once the window has opened, type dir to see all the files and directories (folders) that are in your home space, and move into the directory where you saved the data files e.g. by typing

cd xxxxx

(where xxxxx is replaced by the name of the appropriate folder).

Type dir again to check the required files are available in the directory.

3. Start up Genehunter by typing gh

4. Read in the locus data by typing load markers MS-loc.txt

3. Set analysis to non-parametric (model-free) by typing analysis npl

4. Read in the pedigree data and analyse each family individually by typing scan pedigrees MS-ped.txt

5. Total results for all families by typing total stat

You should see the following four numbers on the screen

Column 1: position in cM of locus being analysed
Column 2: NPL score (here equal to 1.627)
Column 3: p value estimated by Genehunter itself (here equal to 0.04)
Column 4: `information' which is an estimate of the available IBD information as a proportion of the maximum possible (here equal to 1.0 as the parents are all fully informative)

6. To perform the likelihood ratio tests with `possible triangle' restrictions, type estimate followed by (when the program asks you) n . Suggested output name is mls-out.txt

7. Output the IBD sharing estimates for each ASP by typing dump ibd . Suggested output file name is ibd.txt .

8. Exit Genehunter by typing q . Examine the output files which should now be in your directory. The file mls-out.txt should give estimates of the z0, z1, z2 IBD sharing parameters and the MLS test statistic (called `loglike'), which here equals 0.738, coresponding to a p value of approximately 0.05.

The data set here consists of 37 sib pairs, of which 4 share 0 alleles IBD, 22 share 1 allele IBD and 11 share 2 alleles IBD. Does this seem to be consistent with the output file ibd.txt? (Hint: you may need to open the file up in excel, and take a look at the prior and posterior IBD sharing probabilities for the different type of relative pair in your data set. The posterior IBD sharing probabilities are given in the last 3 columns, and the prior IBD sharing probabilities are given in the 3 columns before that).

Multipoint Analysis

For the second analysis we shall be analysing the full HLA data from the MS genome screen. This consists of nuclear families of different sizes (mainly ASPs and trios) typed at 20 markers in the HLA region. Parents may or may not be fully informative: they may be typed but uninformative, they may be untyped, or we may be performing multipoint inference between markers (in which case parents cannot possibly be fully informative since they are not typed in between markers!). Pedigree data is in the file MSfull-ped.txt and the locus data file is MSfull-loc.txt

Step-by-step instructions

Genehunter documentation:

Genehunter documentation is available here: ghdocumentation.ps