A variety of child-parent configurations are amenable to genetic association studies, including (but not limited to) cases in combination with unrelated controls, case-parent triads, and case-parent triads in combination with unrelated control-parent triads. Because genome-wide association studies (GWAS) are frequently underpowered due to the large number of single-nucleotide polymorphisms being tested, power calculations are necessary to choose an optimal study design and to maximize scientific gains from high genotyping and assay costs.
The statistical power is an important aspect of design comparison. Frequently, study designs are compared directly through a power analysis, without considering the total number of individuals that needs to be genotyped. For example, a fixed number of complete case-parent triads could be compared with the same number of case-mother or case-father dyads. However, such an approach ignores the costs of data collection. A much more general and informative design comparison can be achieved by studying the relative efficiency, which we define as the ratio of variances of two different parameter estimators, corresponding to two separate designs. Using log-linear modeling, we derive the relative efficiency from the asymptotic variance formulas of the parameters. The relative efficiency estimate takes into account the fact that different designs impose different costs relative to the number of genotyped individuals. The relative efficiency calculations are implemented as an easy-to-use function in our R package Haplin (H.K. Gjessing and Lie 2006)) .
We use the releative efficiency estimates to select the study design that attains the highest statistical power using the smallest sample collection and assay costs. The results will depend on the genetic effect being assessed, and our analyses include regular autosomal (offspring or child) effects, parent-of-origin effects and maternal effects (a definition of the genetic effects are provided in (M. Gjerdevik et al. 2019)). We here show example commands for various scenarios.
The relative efficiency of two designs are calculated by the Haplin function hapRelEff
. The commands are very similar to the Haplin power calculation function hapPowerAsymp
, which are explained in detail in our previously published paper (M. Gjerdevik et al. 2019). In general, one only needs to specify the study designs to be compared, the allele frequencies, and the type of genetic effect and its magnitude.
The following command calculates the efficiency of the standard case-control design with an equal number of case and control children relative to the case-parent triad design.
hapRelEff(cases.comp = c(c=1),
controls.comp = c(c=1), cases.ref = c(mfc=1),
haplo.freq = c(0.1,0.9), RR = c(1,1))
## $haplo.rel.eff
## Haplotype RR.rel.eff
## 1 1 1.5
## 2 2 ref
The arguments cases.comp
and controls.comp
specify the comparison designs, whereas cases.ref
and controls.ref
specify the reference design. We use the following abbreviations to describe the family designs. We let the letters c, m and f denote a child, mother and a father, respectively. Thus, the case-parent triad design is specified by cases.comp = c(mfc=1)
or cases.ref = c(mfc=1)
, whereas the standard case-control design is specified by cases.comp = c(c=1)
and controls.comp = c(c=1)
or cases.ref = c(c=1)
and controls.ref = c(c=1)
. To specify a case-control design with twice as many controls than cases, one could use the combination cases.comp = c(c=1)
and controls.comp = c(c=2)
.
The genetic effects are determined by the choice of relative risk parameter(s), which also specifies the effect sizes. A reguar autosomal effect is specified by the relative risk argument RR
. The relative efficiency estimated under the null hypothesis, i.e., when all relative risks are equal to one, is known as the Pitman efficiency (Noether 1955). However, other relative risk values can be used. Allele frequencies are specified by the argument haplo.freq
. Note that the order and length of the specified relative risk parameter vectors should always match the corresponding allele frequencies.
We see that the relative efficiency for the standard case-control design is 1.5, compared with the case-parent triad design. This result is well-known from the literature (H.J. Cordell and Clayton 2005).
To compare the full hybrid design consisting of both case-parent triads and control-parent triads, we can use a command similar to
The relative efficiency for PoO effects is computed by replacing the argument RR
by the two relative risk arguments RRcm
and RRcf
denoting parental origin m (mother) and f (father). The command
hapRelEff(cases.comp = c(mfc=1),
controls.comp = c(mfc=1), cases.ref = c(mfc=1),
haplo.freq = c(0.2,0.8), RRcm = c(1,1), RRcf = c(1,1))
calculates the efficiency for the full hybrid design, relative to the case-parent triad design. We refer to our previous paper (M. Gjerdevik et al. 2019) for an explanation of the full output.
Since children and their mothers have an allele in common, a maternal effect might be statistically confounded with a regular autosomal effect or a PoO effect. The relative efficiency for maternal effects can be analyzed jointly with that of a regular autosomal effect or a PoO effect by adding the relative risk argument RR.mat
to the original command.
The command
hapRelEff(cases.comp = list(c(mc=1)),
cases.ref=list(c(mfc=1)), haplo.freq = c(0.1,0.9),
RR = c(1,1), RR.mat=c(1,1))
## $haplo.rel.eff
## Haplotype RR.rel.eff RRm.rel.eff
## 1 1 0.6 0.6
## 2 2 ref ref
calculates the efficiency of the case-mother dyad design relative to the case-mother dyad design, assessing both regular autosomal and maternal effects. In this example, we see that the relative efficiency estimates for regular autosomal and maternal effects are identical when adjusting for possible confounding of the effects with one another (M. Gjerdevik et al. 2019).
The default commands correspond to analyses of single-SNPs. However, the extention to haplotypes is straightforward. The number of markers and haplotypes is determined by the vector nall
, where the number of markers is equal to length(nall)
, and the number of different haplotypes is equal to prod(nall)
. Thus, two diallelic markers are denoted by nall = c(2,2)
. The length of the arguments haplo.freq
and RR
should correspond to the number of haplotypes, as shown in the example below.
hapRelEff(nall = c(2,2), cases.comp = c(c=1),
controls.comp = c(c=1), cases.ref = c(mfc=1),
haplo.freq = c(0.1,0.2,0.3,0.4), RR = c(1,1,1,1))
## $haplo.rel.eff
## Haplotype RR.rel.eff
## 1 1-1 1.31
## 2 2-1 1.22
## 3 1-2 1.27
## 4 2-2 ref
We recommend consulting our paper (M. Gjerdevik et al. 2019) for a more detailed description of haplotype analysis.
Different X-chromosome models are implemented in Haplin, depending on the underlying assumptions of allele-effects in males versus females. The various models may include sex-specific baseline risks, common or distinct relative risks for males and females, as well as X-inactivation in females. Corresponding relative efficiency estimates are readily available in hapRelEff
. In addition to the arguments needed to perform analyses on autosomal markers, three arguments must be specified for relative efficiency estimates on the X chromosome. First, to indicate an X-chromosome analysis, the argument xchrom
must be set to TRUE
. Second, the argument sim.comb.sex
specifies how to deal with sex differences on the X-chromosome, i.e., X-inactivation or not. Finally, the argument BR.girls
specifies the ratio of baseline risk for females relative to males. A detailed description of the parameterization models is provided elsewhere (A. Jugessur et al. 2012; O. Skare et al. 2017, 2018).
The command
hapRelEff(cases.comp = c(mfc=1), controls.comp = c(mfc=1),
cases.ref = c(mfc=1),
haplo.freq = c(0.8,0.2),
RRcm = c(1,2), RRcf = c(1,1),
xchrom = T, sim.comb.sex = "double",
BR.girls = 1)
estimates the PoO relative efficiency for the full hybrid design versus the case-parent triad design, accounting for X-inactivation in females (sim.comb.sex = "double"
) and assuming the same baseline risk in females and males (BR.girls = 1
). We refer to our previously published paper (M. Gjerdevik et al. 2019) for further details.
A. Jugessur, O. Skare, R.T. Lie, A.J. Wilcox, K. Christensen, L. Christiansen, T.T. Nguyen, J. C. Murray, and H.K. Gjessing. 2012. “X-linked genes and risk of orofacial clefts: evidence from two population-based studies in Scandinavia.” PLoS One 7 (6): e39240.
H.J. Cordell, and D.G. Clayton. 2005. “Genetic association studies.” Lancet 366 (9491): 1121–31.
H.K. Gjessing, and R.T. Lie. 2006. “Case-parent triads: estimating single- and double-dose effects of fetal and maternal disease gene haplotypes.” Ann. Hum. Genet. 70 (3): 382–96.
M. Gjerdevik, A. Jugessur, O.A. Haaland, J. Romanowska, R.T. Lie, H.J. Cordell, and H.K. Gjessing. 2019. “Haplin power analysis: a software module for power and sample size calculations in genetic association analyses of family triads and unrelated controls.” BMC Bioinformatics 20 (1): 165.
Noether, G.E. 1955. “On a theorem of Pitman.” Ann. Math. Stat. 26 (1): 64–68.
O. Skare, H.K. Gjessing, M. Gjerdevik, O.A. Haaland, J. Romanowska, R.T. Lie, and A. Jugessur. 2017. “A new approach to chromosome-wide analysis of X-linked markers identifies new associations in Asian and European case-parent triads of orofacial clefts.” PLoS One 12 (9): e0183772.
O. Skare, R.T. Lie, O.A. Haaland, M. Gjerdevik, J. Romanowska, H.K. Gjessing, and A. Jugessur. 2018. “Analysis of parent-of-origin effects on the X chromosome in Asian and European orofacial cleft triads identifies associations with DMD, FGF13, EGFL6, and additional loci at Xp22.2.” Front. Genet. 9 (25). https://doi.org/10.3389/fgene.2018.00025.