Computer Practical Exercises on Power Calculations using GPC

Exercise 1: Case/control analysis of discrete traits

Purpose

In this exercise you will be calculating the power for a standard allele-based association test (e.g. a chi-squared test or a logistic regression analysis) using case/control data.

Step-by-step instructions

Open another web browser and point it to the GPC website:

http://zzz.bwh.harvard.edu/gpc/

Several different modules are provided, for calculating the power of different types of study. More details of the different options are given in the right hand column marked Notes.

We will start by calculating the power for a case/control study. Click the GPC module option marked

Case-control for discrete traits

Let us assume the allele frequency of the high risk allele is 0.1 (i.e. 10%), the prevalence (probability) of the disease is 0.01 (i.e. 1%) and the genotype relative risks (i.e. the factor by which your risk of disease risk gets multiplied) are 1.5 and 2.25 for genotypes Aa and AA respectively. (This corresponds to a "multiplicative" effect of alleles, whereby each A allele multiplies your risk by a factor of 1.5).

Fill in the first four cells of GPC with the appropriate numbers as given above.

Let us assume we are analysing the disease locus itself (or a marker that is in complete LD with it). This means we need to fill in a value of D-prime=1 and the allele frequency of marker B must be the same as disease locus A, i.e. 0.1 (=10%). Fill in these values in the next two cells.

Suppose we hope to have 500 cases and 500 controls in our study. This means we need to fill in the next two cells with the values 500 (for Number of cases) and 1 (for Control:case ratio) respectively. We will assume our controls have been selected to be disease free (rather than being a random sample from the population), so we can leave the next cell (Unselected controls) blank.

Suppose we want the power to detect an effect at significance level (p value) 0.001. Fill the next cell (User-defined type I error rate) with the value 0.001.

Suppose we also want to find out how many cases and controls we would need to achieve 80% power. Fill the final cell in with the value (or leave it as the value) 0.8.

Now press the Process button to perform the calculation. Scroll down to the bottom of the window to see the actual results. Powers are given for a variety of significance levels (p values) marked Alpha, including the one that you asked for (on the very last line). You should find that your study would have 84% power to detect the genetic effect at p value 0.05, but only 37% power to detect it the desired p value 0.001.

Also given are the number of cases (assuming the control:case ratio you selected, i.e. the same number of controls as cases) to achieve 80% power. You would need to increase the size of your study to 970 cases and 970 controls, if you wanted to achieve 80% power to detect the effect with a p value of 0.001 or less.

Repeat your calculation (and note down the results) making the following ammendments to your original assumptions:

1. Suppose the disease is more common, with population prevalance 20%. (Answer: This should increase your power to 67%)

2. Assume the disease prevalence goes back to 1%, but the high-risk allele is more common, with population frequency 40%. Note this means changing three input cells altogether (which ones?). (Answer: This should increase your power to 89%)

3. Assume the high-risk allele is more rare, with population frequency 5%. (Answer: This should decrease your power to only 13%).

Exercise 2: TDT analysis of case/parent trios

Purpose

In this exercise you will be calculating the power for the Transmission Disequilibrium Test (TDT).

Step-by-step instructions

Go back to the main GPC page and select the module marked

TDT for discrete traits

Fill in the cells with the same values we assumed in the first exercise above, i.e. the allele frequency of the high risk allele is 0.1 (i.e. 10%), the prevalence (probability) of the disease is 0.01 (i.e. 1%), the genotype relative risks (i.e. the factor by which your risk of disease risk gets multiplied) are 1.5 and 2.25 for genotypes Aa and AA respectively, D-prime=1, the allele frequency of marker B must be the same as disease locus A, i.e. 0.1 (=10%), 500 trios, a type 1 error rate of 0.001 and a power of 0.80.

What results do you find? The results should be rather similar to the results you found in the first (case/control) calculation. What does this tell you about the relative power of case/control versus TDT type approaches?

Now ammend your calculation so that instead of analysing the disease locus itself, you are analysing a marker of allele frequency 0.2, in LD (D-prime=0.8) with the causal disease locus. What happens to your power?