The 1000 genomes contain phased data. How do we extract this data into a usable format? Note that only the these are not guaranteed to remove all variants that are not bi-allelic SNPs so the output may need to be run through another script.
Tabix, vcftools and my own R script
The 1000_genomes_script on my own site will do this
On the 1000 genomes web site
http://browser.1000genomes.org/Homo_sapiens/UserData/Haploview
Tabix, vcftools & plink
If we want to get phased data for the
# use tabix to extract the correct section of the file. Here # I consider the chromosome. Change the file name for other chromosomes tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:1000000-2000000 > region1.vcf # vcftools to change to plink format vcftools --vcf region1.vcf --plink-tped --out 1000G_region1 # plink converts this file into a form for haploview plink --noweb --tfile 1000G_region1 --recodeHV --out 1000G_region1.HV
Based on snippet here.