Are Your Covariates Under Control? How Normalization Can Reintroduce Covariate Effects
Abstract
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. One of the most popular approaches to satisfy the normality assumption is rank-based inverse normal transformation (INT) of... [ view full abstract ]
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. One of the most popular approaches to satisfy the normality assumption is rank-based inverse normal transformation (INT) of the dependent variable. It is often desirable to adjust for covariates in analysis, such as principal components of ancestry in genetic studies. When a transformation to normality is used, the covariates may be included in the analysis model after transformation, or alternatively they may be regressed against the response as a preliminary step and the residuals then transformed to normality. This study investigates the effect of applying rank-based INT to the dependent variable either before or after controlling for covariate effects. This was achieved by assessing the correlation between the dependent variable and covariates when the covariate effects are regressed from the dependent variable either before or after the dependent variable is transformed. Three factors predicted to affect the outcome of this process were investigated: the proportion of tied observations in the dependent variable, the original skew of the dependent variable, and the original correlation between the dependent variable and covariate. This procedure was performed using both simulated variables and real data examples. The results demonstrated that applying rank-based INT to the dependent variable residuals in almost all situations re-introduces a linear correlation between the dependent variable and covariates that will lead to increased type-1 errors and reduced power. An alternative approach is recommended that allows a normally distributed dependent variable to be linearly uncorrelated with covariates.
Authors
-
Oliver Pain
(Birkbeck College / London School of Hygiene and Tropical Medicine)
-
Frank Dudbridge
(Leicester University)
-
Angelica Ronald
(Birkbeck College, University of London)
Topic Area
Statistical Methods
Session
2A-OS » Methods (13:15 - Thursday, 29th June, Sal A)
Presentation Files
The presenter has not uploaded any presentation files.