1000 genomes to Haploview/ped format

The 1000 genomes contain phased data.  How do we extract this data into a usable format?  Note that only the these are not guaranteed to remove all variants that are not bi-allelic SNPs so the output may need to be run through another script.

Tabix, vcftools and my own R script

The 1000_genomes_script on my own site will do this

On the 1000 genomes web site

http://browser.1000genomes.org/Homo_sapiens/UserData/Haploview

Tabix, vcftools & plink

If we want to get phased data for the

# use tabix to extract the correct section of the file.  Here 
# I consider the chromosome.  Change the file name for other chromosomes
tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:1000000-2000000 > region1.vcf
# vcftools to change to plink format
vcftools --vcf region1.vcf --plink-tped --out 1000G_region1
# plink converts this file into a form for haploview
plink --noweb --tfile 1000G_region1 --recodeHV --out 1000G_region1.HV

Based on snippet here.