3 Using CASSI

CASSI is executed as follows:

 ./cassi [options] file.bed [file2.bed]

or

./cassi parameterfile.pf [file.bed] [file2.bed] 

Executing CASSI either way will perform SNP interaction tests for every pair of SNPs in the .bed file (that are selected by the options). The results file will record every pair of SNPs that satisfy a given significance level together with extra information for the performed tests. A log file is also created recording the same information that is output to the screen, showing the used options and summary statistics of the data.

3.1 Input Files

Basic usage of CASSI is to provide it with one binary PLINK format pedigree file:

./cassi myfile.bed 

This requires that the corresponding .bim and .fam, files are also available. A text PLINK pedigree file, .ped, with corresponding map file, .map, may be used to create a binary file using PLINK as follows:

plink --noweb --file mydata --make-bed --out myfile 

This will create the binary pedigree file, myfile.bed, map file, myfile.bim, and family file, myfile.fam required for use with CASSI.

Two input files may also used:

./cassi myfile.bed myfile2.bed

3.2 Options

The basic options for CASSI are as follows (typing ./cassi with no options will output the available options):

Option Description
-snp1 a1 a2 first SNP window, a1 = Start SNP number, a2 = End SNP number
-snp2 b1 b2 second SNP window, b1 = Start SNP number, b2 = End SNP number
-i file.bed input file
-i2 file2.bed second (optional) input file for second SNP window
-o file.out results output file
-log file.log log file
-max m maximum number of results, to safeguard accidently outputting half a trillion results. Set to 0 for no maximum at your own risk!
-so suppress output to screen
-filter-all all statistic thresholds must be met (as ordered)
-filter-any any statistic threshold may be met
-gap gap in base pair position that SNPs need to be for case only tests. Set to 0 for no gap.
-mem0 (slow) do not store SNPs in memory
-mem1 (faster) store SNPs in memory, binary
-mem2 (fastest) store SNPs in memory, integer
-rsq report R squared statistic between SNP pairs for cases and controls
-dprime report D' statistic between SNP pairs for cases and controls

The SNP numbers “a1” and “a2” etc. refer to the position the SNPs appears in the map file (.bim).

For example, to use these options to analyse SNPs from SNP number 1 to SNP number 60 against SNPs from SNP number 50 to SNP number 100 using binary pedigree file mydata.bed type the following:

./cassi -snp1 1 60 -snp2 50 100 mydata.bed

This will output details of the analysis and will look something like the following:

CASSI: SNP interaction analysis software, v2.00
-----------------------------------------------
Copyright 2013 Richard Howey, GNU General Public License, v3
Institute of Genetic Medicine, Newcastle University
Parameters:
Input file: mydata.bed
Output file: cassi.out
Log file: cassi.log
Start SNP of first SNP window: 1
End SNP of first SNP window: 60
Start SNP of second SNP window: 50
End SNP of second SNP window: 100
Maximum no. of results: 1000000

Data Summary Statistics:
Number of SNPs: 100
Number of subjects: 4686
Number of cases: 1748 (37.3026%)
Number of controls: 2938 (62.6974%)
Number of missing: 0

Test Statistic: Joint Effects
P-value threshold for case/control results: 0.0001
P-value threshold for case only results: 0.0001
Total SNP pairs calculated: 2994
Total SNP pair statistics passing threshold: 80

Number of SNP pairs with results: 80

Run time: less than one second

To do the above analysis where the second SNP window is given by a different pedigree file type the following:

./cassi -snp1 1 60 -snp2 50 100 mydata.bed mydata2.bed

In addition, each different test has its own options, see the relevant sections for details. The joint effects test is used by default if no test is specified. If any tests are specified, as follows, then only these tests will be calculated.

Option Description
-je do joint effects test
-awu do adjusted Wu test
-wz do Wellek Ziegler test
-afe do adjusted fast epistasis test
-lr do logistic regression test

For example, to do the adjusted Wu and adjusted fast epistasis test type:

./cassi -afe -awu mydata.bed 

Note: Any options for individual tests must follow the option to do that test. For example:

./cassi -je -je-th 1e-6 mydata.bed

The default options for CASSI are:

Option Description
-snp1 1 * All SNPs in pedigree file
-snp2 1 * All SNPs in pedigree file (or 2nd pedigree file if given)
-o cassi.out set the output file to cassi.out
-log cassi.log set the log file to cassi.log
-je use the joint effects test
-*-th 0.0001 use p-value threshold of 0.0001 for any tests used
-max 1000000 limit the maximum number of results to 1 million
-gap 1000 SNPs need to be 1000 base pairs apart for case only tests
-mem2 (fastest) store SNPs in memory, integer
-filter-all all statistic thresholds must be met (as ordered)

If the output file is set then the default log file is given by this name with the extension changed. For example, if the output file is set to myresults.dat then the default log file will be myresults.log.

3.3 Parameter file

A parameter file, .pf, may be used with CASSI instead of writing all of the options on the command line. To use a parameter file simply type:

./cassi -pf myparameters.pf 

The parameter file should be a text file with one option written on each line. For example, to perform the analysis above the file myparameters.pf would be as follows:

-snp1 1 60
-snp2 50 100
-i mydata.bed
-i2 mydata2.bed

It is also possible to add comments to the file provided that the “-” character is not used, and to comment out any options by placing another character in front of any “-”. For example, the above parameter file could be edited as follows:

#This is the first SNP window
-snp1 1 60

#This is the second SNP window
-snp2 50 100

#This is the pedigree file for the first SNP window
-i mydata.bed

#This is the pedigree file for the second SNP window
-i2 mydata2.bed

#I might try this threshold later
#-th 0.00001

3.4 Memory options

The default option in CASSI is to store all of the SNPs of the second window in memory as integers to allow fast calculation of the interaction tests. The first window is not stored in memory. This could be a problem if the second window has a large amount of SNPs and you do not have much memory, in which case you should consider using the -mem1 option to reduce the memory usage, where the SNPs are stored in memory in binary format.

./cassi -mem1 mydata.bed

This option could be useful if you are performing tests across the whole genome and want to perform many CASSI jobs at once. The -mem1 uses approximately 1/4 of the memory used by the -mem2 option, but takes about 1.5 times longer to execute. If you are desperate you could use the -mem0 option as follows:

./cassi -mem0 mydata.bed

This option uses very little memory by not storing the SNPs in memory but takes about 2.3 times longer to execute than the -mem2 option.

3.5 R2 and D'

It is possible to output the R2 and/or the D' values between SNP pairs for the cases and controls with the -rsq and -dprime options respectively. For example, to output both of these:

./cassi -je -rsq -dprime mydata.bed

This will output the R2 values for the cases and controls, given by columns CASE_RSQ and CTRL_RSQ respectively. The D' values for the cases and controls are given by columns CASE_DPRIME and CTRL_DPRIME respectively. The output will look something like the following:

SNP1 CHR1 ID1 BP1 SNP2 CHR2 ID2 BP2 JE_CASE_LOG_OR JE_CASE_SE JE_CTRL_LOG_OR JE_CTRL_SE JE_CC_CHISQ JE_CC_P JE_CC_ALT JE_CO_CHISQ JE_CO_P JE_CO_ALT 
   CASE_RSQ CTRL_RSQ CASE_DPRIME CTRL_DPRIME
1 1 rs3825035 207140 11 1 rs6598035 242318 0.77282 0.148039 0.706996 0.154282 0.0947709 0.758197 N 27.2524 1.78554e-07 N 0.0240119 0.0251859 0.265702 0.233982
3 1 rs1027430 214393 24 1 rs7394810 356710 -0.617103 0.145369 -0.909585 0.141284 2.08175 0.149069 N 18.0209 2.18499e-05 N 0.0172226 0.0397905 0.244953 0.338341
16 1 rs1227455 252234 24 1 rs7394810 356710 0.668421 0.140027 0.517802 0.141032 0.574359 0.448532 N 22.7864 1.81045e-06 N 0.0225344 0.013668 0.225165 0.181016
16 1 rs1227455 252234 69 1 rs1126386 1034938 -0.680131 0.150314 -0.559377 0.150018 0.323315 0.569622 N 20.4732 6.04708e-06 N 0.0201808 0.0135523 0.285269 0.238255
19 1 rs4718635 273928 42 1 rs1090209 723609 0.524665 0.129452 0.135148 0.123059 4.75603 0.0291958 N 16.4266 5.05695e-05 N 0.0172832 0.00111035 0.146305 0.0369723
...

If no test statistic is given to CASSI R2 and/or D' values are calculated and output. To output these extras values together with a test statistic meeting a threshold, the -rsq and -dprime options should be listed after the test statistic option, otherwise these will be calculated unnecessarily for test statistics not meeting the threshold.