In this exercise you will be exploring the use of PLINK for simple data management. An example of the use of this is to prepare files for subsequent association analysis.
PLINK has an extensive set of docmentation including a pdf manual, a web-based tutorial and web-based documentation:
http://zzz.bwh.harvard.edu/plink/
First download the following files (along with the PLINK program) into an appropriate folder:
chr10-merlin-full-pedfile.txt
chr10-plinkmap.txt
fivesnps.txt
The data consists of genotype data at 1601 SNPs from a 30 MB region on chr 10, genotyped in 320 families consisting mostly of affected sib pairs and their parents.
Take a look at the data files (e.g. using the `more' command from the MSDOS window, or by opening them in WordPad) and check you understand how they are coded. Note that the first file is a standard PLINK-format pedigree files. The second data file is a PLINK-format map file. The PLINK format consists of exactly 4 columns:
chromosome (1-22, X, Y or 0 if unplaced) rs number or snp identifier Genetic distance (in Morgans or cM, can be set to 0 when performing association analysis) Base-pair position (bp units)The final data file is just a list of the names of the first 5 SNPs in the map file. Check that these have been correctly listed.
To start with, you will need to open up an MSDOS window. [To do
this, click on Start
(the round button on the bottom left), All Programs, Accessories,
then click on Command Prompt].
Once the window has opened, type dir to see all the files and directories
(folders) that are in your home space, and move into the directory where you saved the data files e.g. by typing
cd xxxxx
(where xxxxx is replaced by the name of the appropriate folder).
Type dir again to check the required files are available in the directory.
PLINK can be used to generate subsets of data. For example, suppose you wanted to create a smaller data
set containing just the first 4 SNPs. You could do this be reading in the (PLINK-format) pedigree and
map files (using the --ped and --map commands), extracting the SNPs of interest
(using the --extract command), and writing out a new pedigree and map file using the
--recode and --out commands. (The --out command allows you to choose the
file name for the new files; without this command the new files are automatically called "plink.ped"
and "plink.map").
To implement all this, type:
plink --noweb --ped chr10-merlin-full-pedfile.txt --map chr10-plinkmap.txt --extract fivesnps.txt --recode --out just5snps
It is worth reading the messages that PLINK outputs to the screen, to check what PLINK has done. Note that these output
messages are also saved
to a file just5snps.log
You should have created two new files: just5snps.ped and just5snps.map. Take a look at these (e.g. using
the commands more just5snps.map and more just5snps.ped , hitting the space bar to scroll though) and
check you understand how they are coded.
Note that PLINK often recodes unknown disease status to "-9" rather than "0".
We can also generate subsets of people. Let's do this using the files just5snps.ped and just5snps.map as a
starting point. Since these files both have the same stem ("just5snps") followed by the extensions ".ped" and ".map", we can
read them in to PLINK together using the --file command.
To output just (unrelated) founders from the pedigrees, you can use the following commands:
plink --noweb --file just5snps --filter-founders --recode --out justfounders
Take a look at the files you have created (justfounders.ped and justfounders.map) and check you understand
how they differ from just5snps.ped and just5snps.map.
To output just affected individuals (cases) from the pedigrees, you can use the following command:
plink --noweb --file just5snps --filter-cases --recode --out justcases
Take a look at the files you have created (justcases.ped and justcases.map) and check you understand
how they differ from just5snps.ped and just5snps.map.
Although PLINK can read in and write out standard pedigree files, it is usually more convenient
to read in and write out files in PLINK's special binary format, which will take up less disk space
and be quicker to read into PLINK when performing various subsequent analyses. This can be done usinng the
--make-bed command. For example, to save
the "justcases" data in binary format, type:
plink --noweb --file justcases --make-bed --out binarycases
This should create 3 new files: binarycases.bed, binarycases.bim, binarycases.fam.
You will not be able to read the file binarycases.bed as it is not human readable.
The file binarycases.bim is a
map file with two extra columns of information giving the possible alleles at each
locus.
You can take a look at this by typing more binarycases.bim. The file
binarycases.fam gives the pedigree
structure in a format that is compatible with the binary genotype file.
You can take a look at this by typing more binarycases.fam. Note that this
file is the same as the first six columns
of the original pedigree file justcases.ped .