
DOCUMENTATION FOR ONELOCARP, TWOLOCARP AND THREELOCARP
------------------------------------------------------

Programs for fitting single-locus, two-locus and three-locus
Maximum likelihood statistic (MLS) models to affected relative 
pair (ARP) data. 

Please cite:

Cordell H.J. et al. (2000) Multilocus linkage tests
based on affected relative pairs. Am J Hum Genet 66:1273-1286

Based on the program TWOLOC for affected sib pairs:
Farrall M. (1997) Genet Epid 14:103-115.

Uses maximization routine MAXFUN originally written as part of
S.A.G.E. Please cite:

Sorant A.J.M. and Elston R.C. (1994) A subroutine package for function
maximisation (a user's guide to MAXFUN version 6.0). Part of the
S.A.G.E. documentation, Department of Epidemioogy and Biostatistics,
Case Western Reserve University, Cleveland, Ohio.


Conditions of use:
 
This software is provided as a "gift" to a non-profit making research
organization, no attempts should be made to sell or patent this program
or incorporate it into another program.  The software is provided "as
is", no warranty as to the accuracy or reliability of the results can
be provided or the fitness of the program for any particular
application. For comments and queries, please contact:

Heather J. Cordell  
Professor of Statistical Genetics 
Institute of Human Genetics  
Newcastle University
International Centre for Life 
Central Parkway  
Newcastle upon Tyne 
NE1 3BZ 
UK 

Tel: +44 (0)191 241 8669 
Fax: +44 (0)191 241 8666 

Email: heather.cordell@newcastle.ac.uk 


----------------------------------------------------------------

INTRODUCTION
------------

This software provides funtionality for fitting a variety of 
single and multilocus models to affected relative pair data,
as described in Cordell H.J. et al. (2000) Multilocus linkage 
tests based on affected relative pairs. Am J Hum Genet 66:1273-1286.

Please note that these programs are intended to be used by linkage
analysis experts who have a sound knowledge of running a variety
of genetic linkage programs and experience with compiling, linking and 
running pascal, fortran and C programs on a UNIX based computer.
At some stage a more user-friendly version may be made available,
but in the meantime it is hoped that these programs may be
useful for those requiring a tool for more sophisticated linkage
analysis.

For preparation of input files and generation of p values in 
particular, the user may find it necessary to have programming
skills in order to be able to simulate data and convert the output 
of programs such as GENEHUNTER and ALLEGRO into the input required by
these programs. For interpretation of results, a thorough understanding
of the papers referenced here is recommended.

If you have some interesting data but do not have access to anyone
with sufficient skills to run the programs described here, please
feel free to contact me to discuss the possibility of a collaborative
project.


DOWNLOADING
-----------

The software as distributed should consist of 23 files: this 
documentation file,  10 files of fortran 77 source code named 
onelocarp.f, twolocarp.f, threelocarp.f, maxfun.f, twolocarpnull.f, 
twolocarpfull.f, twolocsim.f, twolocsimnull.f, threelocarpfull.f
and threelocsimnull.f, 
and 12 example files named *prior.dat, *posterior.dat and *mls.out, 
where * takes the values testone, testtwo, smallthree and largethree. 

If you downloaded a `small' version of the package (that takes up less
space) the 3 `largethree' example files will not be included.

The programs are designed to be run on a unix  system, but should (?) 
run on any other system with a fortran compiler (e.g. a linux system
with the g77 compiler instead of the f77 compiler as used below)


COMPILING
---------

For most purposes you will only need the 4 fortran files 
onelocarp.f, twolocarp.f, threelocarp.f and maxfun.f

From a unix system, assuming the files are all in the same directory,
you should be able to compile the programs from that directory
using the following commands:

f77 onelocarp.f maxfun.f -o onelocarp

f77 twolocarp.f maxfun.f -o twolocarp

f77 threelocarp.f maxfun.f -o threelocarp


***************** NOTE FOR IRIX USERS *****************

It has been reported that you may get an error when
using the f77 fortran compiler
 
"Warning: stack frame size larger than system"

This can be corrected by using the option -static e.g. 

f77 -static onelocarp.f maxfun.f -o onelocarp

etc.

*******************************************************


ONELOCARP
---------

This program calculates the single-locus maximum likelihood statistic 
(MLS) at increments across the genome. Note that if your data consists
of affected sib pairs (ASPs) only, this should give identical results
to an unweighted analysis using MAPMAKER/SIBS.


INPUT FILES - ONELOCARP
-----------------------

For onelocarp you will require 2 input files, which must be named 
oneprior.dat and oneposterior.dat. The information in these files must 
be generated using another program which will calculate prior and
posterior (given the marker data) identity by descent (IBD)
sharing probabilities for your families. Examples of programs
you might use to generate this information are GENEHUNTER version 2,
GENIBD (part of S.A.G.E.), SOLAR and ALLEGRO.

oneposterior.dat
----------------

This file consists of m*n+3 lines, where n is the number of affected
relative pairs (ARPs) for whom you have IBD information, and m is the 
number of increments across a genomic region at which you have the IBD 
information.

As currently distributed, the program onelocarp has  maximum values of 
n=2000 affected pairs and m=300 increments across a genomic region.  
If you require a version with higher limits than this, please get in touch.

Line 1: should contain an estimate of K, the population prevalence of the 
disease you are interested in. The value input on line 1 should not 
affect the final results, as long as it is within reasonable bounds.
It is therefore suggested to repeat the analysis several times using
a range of values e.g. 0.01 - 0.1 for the population prevalence, 
checking that you do indeed get the same results regardless of K.

Line 2: should contain n, the number of affected relative pairs in the file.

Line 3: should contain m, the number of increments in the region of interest
at which the IBD probabilities have been calculated. In the context of a 
genome scan, this might correspond to an increment every cM.

Lines 4 - m+3: should contain the information for the 1st ARP in 7 columns
(no commas) as follows:

position, familyID, ID1, ID2, fpost(0), fpost(1) fpost(2)


Here  ID1 and ID2 refer to the IDs of the two members of the affected pair
being analysed, and fpost(0), fpost(1) and fpost(2) are the posterior 
probabilities that that pair share 0, 1 or 2 alleles IBD at the position 
given in the 1st column. This may correspond either to a location in cM, 
or to an increment  number e.g. 1, 2, 3 etc. Each ARP has m such lines of 
posterior IBD  information, corresponding to the m increments at which IBD 
values have been calculated.


Lines m+4 - 2m+3: contains information as above but for the 2nd ARP.

Lines 2m+4 - 3m+3: contains information as above but for the 3rd ARP.

etc. for all further ARPs.


An example of the file oneposterior.dat (named testoneposterior.dat)
is provided with this distribution.



oneprior.dat
------------

This file consists of m*n lines, corresponding to the last m*n lines in
oneposterior.dat, but with the posterior probabilities replaced by
prior probabilities f(0), f(1), f(2) e.g.

position, familyID, ID1, ID2, f(0), f(1) f(2)

An example of the file oneprior.dat (named testoneprior.dat)
is provided with this distribution.


OUTPUT FILES - ONELOCARP
------------------------

To run onelocarp, make sure that you have the two input files
correctly prepared and named oneprior.dat and oneposterior.dat.
(E.g. to run the example provided, copy the file testoneprior.dat
into a file named oneprior.dat using the command

cp testoneprior.dat oneprior.dat

and similarly copy testoneposterior.dat into a file named
oneposterior.dat)

Then type 

onelocarp

The program should run for a few seconds and produce an output file
onemls.out. Note that this will overwrite any current file in the 
directory named onemls.out, therefore you may wish to change the
name of any output file you wish to keep. An example of the output file 
that should be obtained when using the test input files provided is given 
in the file testonemls.out.

The output from onelocarp is fairly self-explanatory - single
locus MLS values are given at increments corresponding to the positions
given in the input files. This data may be plotted using a graph-drawing or
stats package such as Splus or in a spreadsheet such as Excel. Note 
that the position is given in terms of a numbered position (1,2,3 etc) 
rather than a location in cM, so if you wish to plot location in cM
against MLS, you will have to copy and paste the relevent information
e.g. from a separate file which has a list of the location in cM
corresponding each position.


TWOLOCARP
---------

INPUT FILES - TWOLOCARP
-----------------------

For twolocarp you will require 2 input files, which must be named 
twoprior.dat and twoposterior.dat. The information in these files must 
be generated using another program which will calculate prior and
posterior (given the marker data) identity by descent (IBD)
sharing probabilities for your families. Examples of programs
you might use to generate this information are GENEHUNTER version 2,
GENIBD (part of S.A.G.E.), SOLAR and ALLEGRO. To my knowledge, the
only program which will generate multipoint prior and posterior IBD 
probabilities simultaneously at two linked loci, for arbitrary types 
of relative pair, is a yet-to-be released version of GENIBD. For 
affected sib pairs using singlepoint calculations only, the program
TWOLOC may be used. For unlinked loci you can use any program to generate
the IBD probabilities at the two loci separately, and multiply
these together to get the joint IBD probabilities at two loci
simultaneously e.g. P(IBD=1 at locus 1 and 0 at locus 2) = 
P(IBD=1 at locus 1)*P(IBD=0 at locus 2).


twoposterior.dat
----------------

This file consists of m*n+3 lines, where n is the number of affected
relative pairs (ARPs) for whom you have IBD information, and m is the 
number of increments across a genomic region at which you have the IBD 
information.

As currently distributed, the program twolocarp has  maximum values of 
n=2000 affected pairs and m=300 increments across a genomic region.  
If you require a version with higher limits than this, please get in touch.

Line 1: should contain an estimate of K, the population prevalence of the 
disease you are interested in. The value input on line 1 should not 
affect the final results, as long as it is within reasonable bounds.
It is therefore suggested to repeat the analysis several times using
a range of values e.g. 0.01 - 0.1 for the population prevalence, 
checking that you do indeed get the same results regardless of K.

Line 2: should contain n, the number of affected relative pairs in the file.

Line 3: should contain m, the number of increments in the region of intereset
at which the IBD probabilities have been calculated. The assumption is that
you are considering a 2-locus model where locus 1 is fixed at some particular
position (e.g. the position with the highest MLS in a single-locus analysis)
and the putative 2nd locus takes positions at increments across a genomic 
region.

Lines 4 - m+3: should contain the information for the 1st ARP in 13 columns
(no commas) as follows:

position, familyID, ID1, ID2, fpost(0,0), fpost(0,1) fpost(0,2),
fpost(1,0), fpost(1,1) fpost(1,2), fpost(2,0), fpost(2,1), fpost(2,2).

Here  ID1 and ID2 refer to the IDs of the two members of the affected pair
being analysed, and fpost(i,j) is the posterior probability that that
pair share simultaneously i alleles at position (locus) 1, and j alleles 
at position (locus) 2. Here locus 1 is assumed to be at some fixed position
e.g. where there is a candidate locus or where the linkage evidence 
using onelocarp or another single-locus method is greatest. Locus 2 
is assumed to take a position at increments across a genomic region 
corresponding to the position given in the 1st column. This position
may be given either as a location in cM, or as an increment 
number e.g. 1, 2, 3 etc. Each ARP has m such lines of posterior IBD 
information, corresponding to the m increments (for locus 2) at which 
IBD values have been  calculated.


Lines m+4 - 2m+3: contains information as above but for the 2nd ARP.

Lines 2m+4 - 3m+3: contains information as above but for the 3rd ARP.

etc. for all further ARPs.

So an example of the file twoposterior.dat might be:


 0.05
 4
 5
     1     1     6     7 0.000 0.000 0.000 0.008 0.034 0.000 0.184 0.774 0.000
     2     1     6     7 0.000 0.000 0.000 0.008 0.034 0.000 0.186 0.766 0.006
     3     1     6     7 0.000 0.000 0.000 0.008 0.034 0.000 0.184 0.765 0.008
     4     1     6     7 0.000 0.000 0.000 0.008 0.034 0.000 0.179 0.770 0.008
     5     1     6     7 0.000 0.000 0.000 0.008 0.035 0.000 0.170 0.782 0.006
     1     1     8     6 0.002 0.068 0.000 0.032 0.898 0.000 0.000 0.000 0.000
     2     1     8     6 0.002 0.068 0.000 0.032 0.898 0.000 0.000 0.000 0.000
     3     1     8     6 0.002 0.068 0.000 0.029 0.901 0.000 0.000 0.000 0.000
     4     1     8     6 0.002 0.068 0.000 0.023 0.907 0.000 0.000 0.000 0.000
     5     1     8     6 0.001 0.069 0.000 0.013 0.917 0.000 0.000 0.000 0.000
     1     1     8     7 0.049 0.005 0.000 0.854 0.092 0.000 0.000 0.000 0.000
     2     1     8     7 0.050 0.004 0.000 0.869 0.077 0.000 0.000 0.000 0.000
     3     1     8     7 0.051 0.003 0.000 0.886 0.060 0.000 0.000 0.000 0.000
     4     1     8     7 0.052 0.002 0.000 0.904 0.042 0.000 0.000 0.000 0.000
     5     1     8     7 0.053 0.001 0.000 0.924 0.022 0.000 0.000 0.000 0.000
     1     2     9    11 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
     2     2     9    11 0.997 0.003 0.000 0.000 0.000 0.000 0.000 0.000 0.000
     3     2     9    11 0.995 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000
     4     2     9    11 0.995 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000
     5     2     9    11 0.997 0.003 0.000 0.000 0.000 0.000 0.000 0.000 0.000


Another example of the file twoposterior.dat (named testtwoposterior.dat)
is provided with this distribution.


twoprior.dat
------------

This file consists of m*n lines, corresponding to the last m*n lines in
twoposterior.dat, but with the posterior probabilities replaced by
prior probabilities f(i,j) e.g.

position, familyID, ID1, ID2, f(0,0), f(0,1) f(0,2),
f(1,0), f(1,1) f(1,2), f(2,0), f(2,1) f(2,2)

So an example of twoprior.dat might be:


     1     1     6     7 0.063 0.125 0.063 0.125 0.250 0.125 0.063 0.125 0.063
     2     1     6     7 0.063 0.125 0.063 0.125 0.250 0.125 0.063 0.125 0.063
     3     1     6     7 0.063 0.125 0.063 0.125 0.250 0.125 0.063 0.125 0.063
     4     1     6     7 0.063 0.125 0.063 0.125 0.250 0.125 0.063 0.125 0.063
     5     1     6     7 0.063 0.125 0.063 0.125 0.250 0.125 0.063 0.125 0.063
     1     1     8     6 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     2     1     8     6 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     3     1     8     6 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     4     1     8     6 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     5     1     8     6 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     1     1     8     7 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     2     1     8     7 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     3     1     8     7 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     4     1     8     7 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     5     1     8     7 0.250 0.250 0.000 0.250 0.250 0.000 0.000 0.000 0.000
     1     2     9    11 0.563 0.188 0.000 0.188 0.063 0.000 0.000 0.000 0.000
     2     2     9    11 0.563 0.188 0.000 0.188 0.063 0.000 0.000 0.000 0.000
     3     2     9    11 0.563 0.188 0.000 0.188 0.063 0.000 0.000 0.000 0.000
     4     2     9    11 0.563 0.188 0.000 0.188 0.063 0.000 0.000 0.000 0.000
     5     2     9    11 0.563 0.188 0.000 0.188 0.063 0.000 0.000 0.000 0.000


Another example of the file twoprior.dat (named testtwoprior.dat)
is provided with this distribution.


NOTE ON CALCULATION OF IBD PROBABILITIES - LINKED LOCI
------------------------------------------------------
 
Note that for unlinked loci, the joint IBD probabilities at two or
more loci may be calculated as the product of the single-locus IBD 
probabilities at the relevant loci. The IBD data in twoprior.dat and
twoposterior.dat, threeprior.dat and threeposterior.dat may therefore
be calculated using the output from any program (GENEHUNTER, GENIBD, 
SOLAR, ALLEGRO etc.) which can output prior and posterior IBD 
probabilities at a single location. 

For linked loci, the joint IBD sharing probabilities at the two loci 
are NOT equal to the product of the IBD sharing probabilities at the
individual loci, and so a program is required that will calculate
the simultaneous probability of a pair sharing i alleles at locus 1,
and j alleles at locus 2. To my knowledge, the only programs currently
able to do this are TWOLOC (Farrall M. (2000) Genet Epid 14:103-115)
which is for affected sib pairs and uses singlepoint calculations only,
and a never-released version of GENIBD (part of S.A.G.E.), which 
works for varying types of relative pairs and uses multipoint calculations.

I have a test version of software (TWOLINK) for calculating
joint IBD sharing probabilities at two linked loci for affected sib pairs
only using the MERLIN program (Abecasis et al. 2002).
The TWOLINK software is under development and not-well-documented,
but if you are interested in trying this out, please contact me at
heather.cordell@ncl.ac.uk. A similar package has also been developed by
jordana@well.ox.ac.uk, so please contact her if you wish to try out
her software.



OUTPUT FILES - TWOLOCARP
------------------------

To run twolocarp, make sure that you have the two input files
correctly prepared and named twoprior.dat and twoposterior.dat.
(E.g. to run the example provided, copy the file testtwoprior.dat
into a file named twoprior.dat using the command

cp testtwoprior.dat twoprior.dat

and similarly copy testtwoposterior.dat into a file named
twoposterior.dat)

Then type 

twolocarp

The program should run for a few minutess and produce an output file
twomls.out. Note that this will overwrite any current file in the 
directory named twomls.out, therefore you may wish to change the
name of any output file you wish to keep. An example of the output file 
that should be obtained when using the test input files provided is given 
in the file testtwomls.out.

Results are given for a variety of two-locus models, each fitted
at increments across the genome (corresponding to the increments
provided in the input files). Final results are output in a form which
assumes the primary aim is to test the null hypothesis that locus
2 is not involved in disease. Differences in MLS results may also 
be used to test the fit of nested hypotheses e.g. testing the fit
of an additive as opposed to a general epistatic model for the action
of loci 1 and 2. See Cordell et al. (2000) Am J Hum Genet 66:1273-1286,
Farrall (1997) Genet Epid 14:103-115, Cordell et al. (1995) Am J Hum
Genet 57:920-934, and the documentation for TWOLOC, for more details
about interpretation and testing of different model hypotheses.

Models fitted:

1. Locus 1 on its own with dominance variance

This corresponds to a single-locus model for the effect of locus 1.
Since locus 1 is fixed (while locus 2 takes values at increments
across a genomic region) the MLS for locus 1 should be constant
across all increments, and should equal the single-locus MLS
obtained using onelocarp at locus 1.

2. Locus 2 on its own with dominance variance

This corresponds to a single-locus model for the effect of locus 2.
This should equal the single-locus MLS obtained using onelocarp
at these increments across the genomic region.


3. multiplicative model

This corresponds to a test of a multiplicative model for the
action of locus 1 and 2, against the null hypothesis that
neither of the loci is involved in disease.

4. additive model

This corresponds to a test of an additive model for the
action of locus 1 and 2, against the null hypothesis that
neither of the loci is involved in disease.

5. general model

This corresponds to a test of a general epistatic model for the
action of locus 1 and 2, against the null hypothesis that
neither of the loci is involved in disease.

6. FINAL MLS RESULTS FOR LOCUS 2 TO BE PLOTTED e.g. IN EXCEL

The most useful results are output at the end of the file,
in a form which may be plotted using a graph-drawing or
stats package such as Splus or in a spreadsheet such as Excel. 
The columns given in this section are as follows:

Position: 
--------- 

Note that the position is given in terms of a numbered 
position (1,2,3 etc)  rather than a location in cM, so if you wish 
to plot location in cM against MLS, you will have to copy and paste 
the relevent information e.g. from a separate file which has a list 
of the location in cM corresponding each position.

Single: 
------- 

This gives the single locus MLS for the effect at locus 2.

Multiplicative: 
---------------

This gives the conditional MLS at locus 2, taking account of any effect 
at locus 1, and assuming a multiplicative model for the joint action 
of locus  1 and 2.

If loci 1 and 2 are unlinked, the multiplicative conditional MLS
for locus 2 will be identical to the single locus MLS for locus 2.
If loci 1 and 2 are linked, multiplicative conditional MLS provides
a test of the effect of locus 2 taking account of (i.e. not inflated by) 
the effect at linked locus 1.


Additive: 
---------

This gives the conditional MLS at locus 2, taking account of any effect 
at locus 1, and assuming an additive model for the joint action of locus 
1 and 2.

General: 
---------

This gives the conditional MLS at locus 2, taking account of any effect 
at locus 1, and allowing for arbitrary epistasis between locus 1 and 2.


INTERPRETATION OF OUTPUT - TWOLOCARP
----------------------------------

While the shape of the MLS curves produced is in itself informative,
if significance levels (p values) are required, they must in general 
be generated for the particular data set analysed using simulation.
Examples of programs which may be used to generate replicate data
sets using simulation include ALLEGRO (version 1.2c), Merlin, SLINK 
and SIMLINK. Please see  Cordell et al. (2000) Am J Hum Genet 
66:1273-1286 for a detailed discussion concerning the generation of 
significance levels for these models using simulation, and 
interpretation of the results.



ADDITIONAL TWOLOC PROGRAMS
--------------------------

TWOLOCARPNULL
-------------

This is a version of TWOLOCARP that outputs the final results to be
plotted (e.g. in Excel) wih respect to the null hypothesis that
neither locus is involved in disease, rather than with respect to
the null hypothesis that locus 1 (but not locus 2) is involved. It
can be compiled in the same way as TWOLOCARP, by using the fortran
file twolocarpnull.f instead of twolocarp.f


TWOLOCARPFULL
-------------

This is a version of TWOLOCARP that outputs more details of the
final variance component parameter estimates, which may be useful for
examination of the best fitting models. It can be compiled in the 
same way as TWOLOCARP, by using the fortran file twolocarpfull.f 
instead of twolocarp.f


TWOLOCSIM
---------

This program is a version of the TWOLOCARP program with lots of the
redundent outputting removed, which makes it more convenient for including
in a simulation study via some kind of shell script. It also sends
its output straight to the screen but you can redirect or append it to
a file by using > or >>  which may be more convenient for simulations.
It is recommended that you test out TWOLOCSIM to check that you get the
same results as  TWOLOCARP (and that you understand which results
are being output by TWOLOCSIM) in a few examples, before using it
in a simulation study. TWOLOCSIM can be compiled in the same way as 
TWOLOCARP, and takes the same input files.


TWOLOCSIMNULL
-------------

This program is a version of the TWOLOCSIM program that outputs final
results with respect to the null hypothesis that locus 1 (but not locus 2) 
is involved.




THREELOCARP
-----------

INPUT FILES - THREELOCARP
-----------------------

For threelocarp you will also require 2 input files, named threeprior.dat 
and threeposterior.dat. The information in these files must be
generated using another program which will calculate prior and
posterior (given the marker data) identity by descent (IBD)
sharing probabilities for your families. Examples of programs
you might use to generate this information are GENEHUNTER version 2,
GENIBD (part of S.A.G.E.), SOLAR and ALLEGRO.

threeposterior.dat
----------------

This file consists of m*n+3 lines, where n is the number of affected
relative pairs (ARPs) for whom you have IBD information, and m is the 
number of increments across a genomic region at which you have the IBD 
information.

As currently distributed, the program threelocarp has  maximum values of 
n=900 affected pairs and m=300 increments across a genomic region.  
If you require a version with higher limits than this, please get in touch.

Line 1: should contain an estimate of K, the population prevalence of the 
disease you are interested in. The value input on line 1 should not 
affect the final results, as long as it is within reasonable bounds.
It is therefore suggested to repeat the analysis several times using
a range of values e.g. 0.01 - 0.1 for the population prevalence, 
checking that you do indeed get the same results regardless of K.

Line 2: should contain n, the number of affected relative pairs in the file.

Line 3: should contain m, the number of increments in the region of interest 
at which the IBD probabilities have been calculated. The assumption is that
you are considering a 3-locus model where loci 1 and 2 are fixed at some 
particular positions (e.g. those with the highest MLS in a single-locus and in
a two-locus analysis) and the putative 3rd locus takes positions at increments 
across a genomic region.


Lines 4 - m+3: should contain the information for the 1st ARP in 31 columns
(no commas) as follows:

position, familyID, ID1, ID2, fpost(0,0,0), fpost(0,0,1), fpost(0,0,2),
fpost(0,1,0), fpost(0,1,1) fpost(0,1,2), fpost(0,2,0), fpost(0,2,1), 
fpost(0,2,2), fpost(1,0,0), fpost(1,0,1) fpost(1,0,2), fpost(1,1,0), 
fpost(1,1,1) fpost(1,1,2), fpost(1,2,0), fpost(1,2,1), fpost(1,2,2), 
fpost(2,0,0), fpost(2,0,1) fpost(2,0,2), fpost(2,1,0), fpost(2,1,1), 
fpost(2,1,2), fpost(2,2,0), fpost(2,2,1), fpost(2,2,2)

Here  ID1 and ID2 refer to the IDs of the two members of the affected pair
being analysed, and fpost(i,j,k) is the posterior probability that that
pair share simultaneously i alleles at position (locus) 1, j alleles 
at position (locus) 2 and k alleles at position (locus) 3. Here loci 1 
and 2 are assumed to be at some fixed positions and locus 3 
is assumed to take a position at increments across a genomic region 
corresponding to the position given in the 1st column. This position
may be given either as a location in cM, or as an increment 
number e.g. 1, 2, 3 etc.  Each ARP has m such lines of posterior IBD  
information, corresponding to the m increments (for locus 3) at which IBD 
values have been calculated.


Lines m+4 - 2m+3: contains information as above but for the 2nd ARP.

Lines 2m+4 - 3m+3: contains information as above but for the 3rd ARP.

etc. for all further ARPs.

Two examples of the file threeposterior.dat (named largethreeposterior.dat
and smallthreeposterior.dat) are provided with this distribution.


threeprior.dat
------------

This file consists of m*n lines, corresponding to the last m*n lines in
twoposterior.dat, but with the posterior probabilities replaced by
prior probabilities f(i,j,k) e.g. each line is:


position, familyID, ID1, ID2, f(0,0,0), f(0,0,1), f(0,0,2), 
f(0,1,0), f(0,1,1) f(0,1,2), f(0,2,0), f(0,2,1), f(0,2,2), 
f(1,0,0), f(1,0,1) f(1,0,2), f(1,1,0), f(1,1,1) f(1,1,2), 
f(1,2,0), f(1,2,1), f(1,2,2), f(2,0,0), f(2,0,1) f(2,0,2),
f(2,1,0), f(2,1,1) f(2,1,2), f(2,2,0), f(2,2,1), f(2,2,2)

Two examples of the file threeprior.dat (named largethreeprior.dat
and smallthreeposterior.dat) are provided with this distribution.



OUTPUT FILES - THREELOCARP
------------------------

To run threelocarp, make sure that you have the two input files
correctly prepared and named threeprior.dat and threeposterior.dat.
(E.g. to run the 1st example provided, copy the file smallthreeprior.dat
into a file named threeprior.dat using the command

cp smallthreeprior.dat threeprior.dat

and similarly copy smallthreeposterior.dat into a file named
threeposterior.dat

For the 2nd example, use the files largethreeprior.dat and 
largethreeposterior.dat instead of smallthreeprior.dat and 
smallthreeposterior.dat - but be warned, these files may take
several hours to analyse )


Then type 

threelocarp

The program should run for a few minutes and produce an output file
threemls.out. Note that this will overwrite any current file in the 
directory named threemls.out, therefore you may wish to change the
name of any output file you wish to keep. An example of the output file 
that should be obtained when using the test input files provided is given 
in the files smallthreemls.out and largethreemls.out.

Results are given for a variety of three-locus models, each fitted
at increments across the genome (corresponding to the increments
provided in the input files). Results are output in a form which
assumes the primary aim is to test the null hypothesis that locus
3 is not involved in disease. See Cordell et al. (2000) Am J Hum 
Genet 66:1273-1286 for more details about interpretation and testing 
of different model hypotheses.

Models fitted:

1. two locus multiplicative model for the effect of locus 1 and 2 
on their own.

2. two locus additive model for the effect of locus 1 and 2 
on their own.

3. two locus general epistatic model for the effect of locus 1 and 2 
on their own.

4. three locus multiplicative model: This gives the MLS at increments
(for locus 3) for testing a 3 locus model of disease against the null
hypothesis that none of the 3 loci are involved in disease, assuming
that the 3 loci act multiplicatively.

5. three locus additive model: This gives the MLS at increments
(for locus 3) for testing a 3 locus model of disease against the null
hypothesis that none of the 3 loci are involved in disease, assuming
that the 3 loci act additively.

6. three locus general model: This gives the MLS at increments
(for locus 3) for testing a 3 locus model of disease against the null
hypothesis that none of the 3 loci are involved in disease, assuming
that the 3 loci act in a general model with arbitrary epistasis..

7. two locus general model plus multiplicative locus 3: This gives the 
MLS at increments (for locus 3) for testing a 3 locus model of disease 
against the null hypothesis that none of the 3 loci are involved in disease, 
assuming that loci 1 and 2 act together in a general model with
arbitrary epistasis, and locus 3 acts multiplicatively.

8. two locus general model plus addititive locus 3: This gives the 
MLS at increments (for locus 3) for testing a 3 locus model of disease 
against the null hypothesis that none of the 3 loci are involved in disease, 
assuming that loci 1 and 2 act together in a general model with
arbitrary epistasis, and locus 3 acts additively.

9. FINAL MLS RESULTS

This summarises the results for models 1-8 at each position.
Note that the position is given in terms of a numbered 
position (1,2,3 etc)  rather than a location in cM, so if you wish 
to plot location in cM against MLS, you will have to copy and paste 
the relevent information e.g. from a separate file which has a list 
of the location in cM corresponding each position.

10. NESTED MLS RESULTS TO BE PLOTTED e.g. IN EXCEL

The most useful results are output at the end of the file,
in a form which may be plotted using a graph-drawing or
stats package such as Splus or in a spreadsheet such as Excel. 
The columns (denoted pos 3mu-2mu 3ad-2ad 3ge-2ge 2mu3-2ge 2ad3-2ge)
correspond to position followed by the following nested models 
subtracted from each other: 4-1, 5-2, 6-3, 7-3, 8-3. For example,
3ge-2ge (6-3) corresponds to testing the overall effect at locus 3
(including arbitrary espistasis) conditional on i.e. accounting for
any effects at locus 1 and 2.


INTERPRETATION OF OUTPUT - THREELOCARP
----------------------------------

While the shape of the MLS curves produced is in itself informative,
if significance levels (p values) are required, they must again 
be generated for the particular data set analysed using simulation.
Examples of programs which may be used to generate replicate data
sets using simulation and SLINK and SIMLINK. Please see Cordell et al. 
(2000) Am J Hum Genet 66:1273-1286 for a detailed discussion 
concerning the generation of significance levels for these models
using simulation, and interpretation of the results.


ADDITIONAL THREELOC PROGRAMS
----------------------------

THREELOCARPFULL
---------------

This is a version of THREELOCARP that outputs more details of the
final variance component parameter estimates, which may be useful for
examination of the best fitting models. It can be compiled in the 
same way as THREELOCARP, by using the fortran file threelocarpfull.f 
instead of threelocarp.f


THREELOCSIMNULL
---------------

This is a version of THREELOCARP  with lots of the redundent 
outputting removed, which makes it more convenient for including
in a simulation study via some kind of shell script. It also sends
its output straight to the screen but you can redirect or append it to
a file by using > or >>  which may be more convenient for simulations.
It is recommended that you test out THREELOCSIMNULL to check that you 
get the same results as  THREELOCARP (and that you understand which 
results are being output - the results for models 1-8) in a few 
examples, before using it in a simulation study. 
THREELOCSIMNULL can be compiled in the same way as THREELOCARP, and 
takes the same input files.



REFERENCES:
-----------

Abecasis GR, Cherny SS, Cookson WO and Cardon LR (2002). Merlin  -
rapid analysis of dense genetic maps using sparse gene flow trees. 
Nat Genet 30:97-101. 

Cordell H.J. et al (1995) Two-locus maximum lod score analysis of a 
multifactorial trait: joint consideration of IDDM2 and IDDM4  with IDDM1  
in type 1 diabetes. Am J Hum Genet 57:920-934.

Cordell H.J. et al. (2000) Multilocus linkage tests based on affected 
relative pairs. Am J Hum Genet 66:1273-1286

Farrall M. (2000) Affected sibpair linkage tests for 
multiple linked susceptibility genes. Genet Epid 14:103-115.

Sorant A.J.M. and Elston R.C. (1994) A subroutine package for function
maximisation (a user's guide to MAXFUN version 6.0). Part of the
S.A.G.E. documentation, Department of Epidemioogy and Biostatistics,
Case Western Reserve University, Cleveland, Ohio.








