9 Linear Regression

If the phenotype of the individuals is quantitative rather than case-control status then it is possible to use CASSI to test for epistasis using linear regression. This is done by fitting two linear regression models, one with an interaction term and one without, and these models are then compared using an F-test with their residual sum of squares. As with logistic regression it is also possible to incorporate covariates into any analysis. When covariates are included, linear regression is faster (see section 10.3) than logistic regression and may be useful as a screening step, see section 10.1.

9.1 Options

The options that are specific to the linear regression test are:

Option Description
-lin perform the linear regression test
-lin-th t p-value threshold for the linear regression test
-lin-so suppress results of test
-lin-covar covariates.dat set covariate file, covariates.dat
-lin-covar-number nums only use covariates given by the numbers, nums
-lin-covar-name na only use covariates given by the names, na
-lin-covar-miss x set missing covariate value, x

The default p-value threshold is 0.0001 and the default missing covariate value is -9.

For other basic options see section 3.2.

9.2 Output File

The output file from running a linear regression epistatis analysis using CASSI will look something like the following:

SNP1 CHR1 ID1 BP1 SNP2 CHR2 ID2 BP2 LIN_BETA LIN_BETA_SE LIN_FSTAT LIN_P
541 1 rs2657134 4518966 783 1 rs7931361 5254756 0.0919379 0.0233854 15.4561 8.73237e-05
542 1 rs2657163 4519034 783 1 rs7931361 5254756 0.104137 0.0239852 18.8505 1.48449e-05
544 1 rs2641444 4519523 783 1 rs7931361 5254756 0.104289 0.0237317 19.3118 1.16842e-05
639 1 rs7934332 4807931 651 1 rs1103368 4837468 0.130456 0.030614 18.1587 2.12699e-05
642 1 rs1103114 4809648 651 1 rs1103368 4837468 0.130489 0.0308339 17.9098 2.42124e-05
... 

The first line is a header labelling the columns of the results file. The SNP details for the pair of SNPs are given firstly followed by values calculated from the linear regression test.

LIN_BETA interaction regression coefficient.

LIN_BETA_SE is the standard error of the interaction regression coefficient.

LIN_FSTAT is the F-test statistic.

LIN_P is the corresponding p-value.

9.3 Covariates

It is possible to perform linear regression with a set of covariates. To do this use the -lin-covar option. For example:

./cassi -lin -lin-covar covariates.dat mydata.bed

The format of the covariate file is the same as for logistic regression, see section 8.3.

The output file from running a linear regression analysis with covariates will look something like the following:

SNP1 CHR1 ID1 BP1 SNP2 CHR2 ID2 BP2 LIN_COVAR_BETA LIN_COVAR_BETA_SE LIN_COVAR_FSTAT LIN_COVAR_P
541 1 rs2647164 4518966 783 1 rs7931851 5253756 0.0927018 0.0234313 15.6524 7.87797e-05
542 1 rs2647163 4519034 783 1 rs7931851 5253756 0.106082 0.0240464 19.4619 1.08102e-05
544 1 rs2641444 4519523 783 1 rs7931851 5253756 0.1055 0.0237654 19.7067 9.52224e-06
639 1 rs7934322 4807931 651 1 rs1103368 4837461 0.129999 0.0306687 17.9675 2.34979e-05
642 1 rs1103114 4809648 651 1 rs1103368 4837461 0.129712 0.0308939 17.6284 2.80373e-05
...

That is, it is the same as before but with a different header, which is useful when performing linear regression with and without covariates.