In Genome-Wide Association Studies (GWAS) researchers typically test the association between millions of Single Nucleotide Polymorphisms (SNPs) and a single trait. Data sets, however, often contain data on multiple phenotypes that may be genetically and phenotypically correlated. In addition, many phenotypes that are treated as a single trait in GWAS may actually be multivariate in nature (e.g., multiple, possibly genetically heterogeneous, symptoms underlie univariate case-control status variables). To accommodate multiple correlated phenotypes in GWAS analyses, various multivariate methods are available. The validity and power of some of these methods have been studied using simulation, but simulation scenarios are often limited (e.g., including only 3 homogeneously correlated phenotypes) and not comparable over studies. As a result, a thorough comparison of different methods, and a clear overview on which methods perform best under which circumstances, is as of yet not available.  
In the present study, we simulated data under 270 different realistic scenarios. The simulation scenarios consist of either 1- or 2-factor models, with 4, 8, or 16 observed variables. We varied both within and between factor correlations, the location of the genetic effect (i.e., either on 1, half, or all variables), the sign of the genetic effect (i.e., congruent or opposite across variables), and the effect sizes. We then compared the statistical power of three different classes of multivariate methods: reduction-based methods (e.g., factor analysis, PCA, CPC), regression-based methods (e.g., MANOVA, LME, GEE), and combination tests (e.g., TATES, adjusted Fischer Combination test, JAMP). For comparison with the traditional univariate approach we also included the regression on the sum of all variables (i.e., sum score), and the regression on a single affected observed variable. In addition, we conducted extensive simulations (20 scenarios, 1 million simulations per scenario) to investigate the false positive rate of all included methods for increasingly stringent alpha levels (0.05-0.00001).  
Our results show that the false positive rate of some methods is off for the lower alpha levels that characterize current GWAS. In addition, the power to detect genetic variants varies wildly across methods and over scenarios, which complicates a general prioritization of multivariate methods.