In this exercise you will be exploring the use of PLINK for simple data management. An example of the use of this is to prepare files for subsequent association analysis.
PLINK has an extensive set of docmentation including a pdf manual, a web-based tutorial and web-based documentation:
 
 http://zzz.bwh.harvard.edu/plink/ 
First download the following files (along with the PLINK program) into an appropriate folder:
chr10-merlin-full-pedfile.txt
chr10-plinkmap.txt
fivesnps.txt
The data consists of genotype data at 1601 SNPs from a 30 MB region on chr 10, genotyped in 320 families consisting mostly of affected sib pairs and their parents.
Take a look at the data files (e.g. using the `more' command from the MSDOS window, or by opening them in WordPad) and check you understand how they are coded. Note that the first file is a standard PLINK-format pedigree files. The second data file is a PLINK-format map file. The PLINK format consists of exactly 4 columns:
     chromosome (1-22, X, Y or 0 if unplaced) 
     rs number or snp identifier 
     Genetic distance (in Morgans or cM, can be set to 0 when performing association analysis) 
     Base-pair position (bp units) 
The final data file is just a list of the names of the first 5 SNPs in the map file. Check that these 
have been correctly listed.
	
	
 To start with, you will need to open up an MSDOS window. [To do
this, click on Start
 (the round button on the bottom left), All Programs, Accessories, 
then click on Command Prompt].
 
Once the window has opened, type  dir  to see all the files and directories
(folders) that are in your home space, and  move into the directory where you saved the data files e.g. by typing
 
 cd xxxxx  
 
(where  xxxxx  is replaced by the name of the appropriate folder).
 
Type  dir  again to check the required files are available in the directory.
 
PLINK can be used to generate subsets of data. For example, suppose you wanted to create a smaller data 
set containing just the first 4 SNPs. You could do this be reading in the (PLINK-format) pedigree and 
map files (using the  --ped  and  --map  commands), extracting the SNPs of interest 
(using the  --extract   command), and writing out a new pedigree and map file using the  
--recode  and  --out  commands. (The  --out  command allows you to choose the 
file name for the new files; without this command the new files are automatically called "plink.ped" 
and "plink.map"). 
 
 To implement all this, type:
 
 plink --noweb --ped chr10-merlin-full-pedfile.txt --map chr10-plinkmap.txt --extract fivesnps.txt --recode --out just5snps 
 
It is worth reading the messages that PLINK outputs to the screen, to check what PLINK has done. Note that these output 
messages are also saved 
to a file  just5snps.log
 
You should have created two new files:  just5snps.ped  and just5snps.map. Take a look at these (e.g. using 
the  commands  more just5snps.map  and  more just5snps.ped , hitting the space bar to scroll though) and 
check you understand how they are coded.
Note that PLINK often recodes unknown disease status to "-9" rather than "0".
 
We can also generate subsets of people. Let's do this using the files  just5snps.ped  and just5snps.map as a 
starting point. Since these files both have the same stem ("just5snps") followed by the extensions ".ped" and ".map", we can 
read them in to PLINK together using  the   --file  command. 
 
 
To output just (unrelated) founders from the pedigrees, you can use the following commands:
 
 plink --noweb --file just5snps --filter-founders --recode --out justfounders
 
Take a look at the files you have created (justfounders.ped  and justfounders.map) and check you understand 
how they differ from  just5snps.ped  and just5snps.map.
 
 To output just affected individuals (cases) from the pedigrees, you can use the following command:
 
 plink --noweb --file just5snps --filter-cases --recode --out justcases
 
Take a look at the files you have created (justcases.ped  and justcases.map) and check you understand 
how they differ from  just5snps.ped  and just5snps.map.
 
Although PLINK can read in and write out standard pedigree files, it is usually more convenient
to read in and write out files in PLINK's special binary format, which will take up less disk space
and be quicker to read into PLINK when performing various subsequent analyses. This can be done usinng the 
 --make-bed  command. For example, to save 
the "justcases" data in binary format, type:
 
 plink --noweb --file justcases --make-bed --out binarycases
 
This should create 3 new files:  binarycases.bed,   binarycases.bim,   binarycases.fam.
You will not be able to read the file  binarycases.bed  as it is not human readable. 
The file binarycases.bim  is a
map file with two extra columns of information giving the possible alleles at each
locus.
You can take a look at this by typing more binarycases.bim. The file
binarycases.fam gives the pedigree
structure in a format that is compatible with the binary genotype file.
You can take a look at this by typing more binarycases.fam. Note that this
file is the same as the first six columns
of the original pedigree file  justcases.ped .