%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Yu Zhang **this package implement methods described in two papers: [1] Zhang Y. A novel bayesian graphical model for genome-wide multi-SNP association mapping. Genet Epi. (2011) Nov 29. doi: 10.1002/gepi.20661. [Epub ahead of print] (This paper tests common variants) [2] Zhang Y, Ghosh S, Hakondarson H. (2014) Dynamic Bayesian testing of sets of variants in complex diseases. Genetics, 10.1534/genetics.114.167403. (This paper further tests rare variants) The program uses graphs to account for SNP linkage disequlibrium (LD), such that the identified SNPs and SNP-SNP interactions are directly associatied with the disease, not due to LD effects. The program also constructs a disease graph. Each node in the disease graph contains one or multiple SNPs. The SNPs in a node are affecting the disease together, but independently of SNPs not included in the node. Each edge in the disease graph indicates an "interaction" between two nodes. That is, the two sets of SNPs in the two connected nodes are jointly affecting the disease. Here, "interaction" means the two sets of SNPs are jointly associated with the disease, i.e., their joint contribution to disease risk is stronger than their marginal contribution individually. This should not be confused with interation versus main effects in a regression model, where interaction means the additional effects on top of main effects. In another word, even if an edge is present between two nodes, the SNPs within each node could still have main effects. This version of the software only supports case control studies. A software supporting QTL studies will be released later. In the meanwhile, if you have quantitative traits, you could partition the traits into two bins, and then treat the data as cases and controls. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This package contains the following files: BEAM3 README.txt toydata.txt [1. Installation]----------------------------------------- Unzip all files into one folder. BEAM3 uses the GNU Scientific Library (GSL), which has been built in the executables. Run the program from command line by typing ./BEAM3 [inputfile] -o [outname] [options] [2. Input Data Format]----------------------------------------- The inpute file contains the case-control genotype data.a The genotypes should be coded as allele dosage, e.g., 0, 1, 2, denoting number of alternative alleles, and 3 denotes missing data. The missing alleles are simply imputed by rule of thumb at each SNP independently. You may use other imputation software to impute missing genotypes before running BEAM3. The first line of the input file should contain the disease status of each individual. You may use 1 to denote patients and 0 to denote controls. Start the first line by "ID Chr Position ". An example of the 1st line: to denote 3 cases and 3 controls, use the following as the 1st line: ID Chr Position 1 1 1 0 0 0 Starting From the 2nd line of the input file, each line contains the genotype data per SNP, separated by space. But again, start each line with information about SNP_ID, chromosome, and position. For example: rs1021 chr5 110123456 1 0 3 0 2 2 This line specifies a SNP called "rs1021", at chr5:110123456, which has genotypes 1, 0, a missing value, and 0, 2, 2, in 3 cases and 3 controls, respectively. Please use numerical values for chromosome numbers, such as chr23 for chrX. Each column in the input file denote one individual, the disease status for each individual specified in the first line must match with the correponding column of genotypes in the remaining lines. SNPs should be sorted by their physical locations. Please see the included "toydata.txt" for an example of the input file. [3. Options] There are 3 modes of the program, corresponding to different probability functions used for evaluating associations. "-model 0" (default) This is the original BEAM3 (2011) method for common SNPs. "-model 1" This is the BEAM3 (2014) for testing rare variants (and common variants). This mode uses a Gaussian density function that models multivariate distributino of multiple variants. "-model 2" This is similar to -model 0, but allows more than 3 categorical values of genotypes. There are a few common options applicable to all 3 modes: "-filter k": Let the program to filter SNPs with too many missing genotypes (3%), unbalanced missing between cases and controls, and SNPs violating HWE. If this option is used, the user must specify the value k. k=0 if heterozygote is coded as 2, and k=1 if heterozygote is coded as 1. "-sample burnin mcmc": This option specifies the numbers of burnin and sampling iteractions. By default, burnin=mcmc=100. In each iteration, the program updates all variables once, and thus these numbers do not necessarily relate with the number of SNPs. A few hundreds iterations will be enough for most cases. "-prior p": This option specifies how likely each SNP is associated with the disease. By default, p=5/L, i.e., 5 associated SNPs are expected (out of L SNPs). "-T t": This option tells the program to start running MCMC at a high temperature t, and the temperature drops to 1 gradually over iterations. This option helps the program to jump out of local modes in the first few iterations. When running the program with "-model 1" for testing rare variants, the following options apply: "-group a b" Let the program to group SNPs with MAF <=b together as "one variable" for joint testing. Also, let the program to test SNPs with MAF>=a individually without grouping. Typically a