MultiABEL Tutorial

Installation

Run the following command in R to install the MultiABEL package:

install.packages("MultiABEL", repos = "http://R-Forge.R-project.org")

For Mac users, please download the source .tar.gz and install via:

R CMD INSTALL MultiABEL_1.1-x.tar.gz

in your terminal. You need Fortran 4.8 compiler.

In R, load the package via:

library(MultiABEL)

## 
## MultiABEL: Multi-Trait Genome-Wide Association Analyses

## Version 1.1-5 (2016-05-07) installed

## Author: Xia Shen

## Maintainer: Xia Shen <xia.shen@ki.se>

## Use citation("MultiABEL") to know how to cite this work.

require(MultiABEL)

Multi-Trait GWAS using Summary Statistics

Example Summary Statistics

You need to gather multiple GWAS summary statistics into your working directory. Let us demonstrate this via an example, and the example files can be found at https://goo.gl/r0Mcav. Here, we have the summary statistics for three traits: height, weight, body mass index (BMI), across 890 variants. There is also an .RData file indep.snps.RData to download, which contains the names of a set of independent variants in European ancestry populations. These variants are used for estimating the phenotypic correlations if unknown.

Each file of single-trait GWAS summary statistics should contain columns for variant names (default column name snp), the first (coding or reference) alleles (default column name a1), allele frequencies (default column name freq), effect sizes (default column name beta), standard errors (default column name se), and sample sizes (default column name n). For example, here the top of the summary statistics file for height looks like:

height <- read.table('height.txt', header = TRUE)
head(height)

##          snp a1 a2   freq    beta     se     p       n
## 1  rs4747841  a  g 0.5500  0.0025 0.0061 0.680 60558.2
## 2  rs4749917  t  c 0.4500 -0.0025 0.0061 0.680 60558.1
## 3   rs737656  a  g 0.3667 -0.0073 0.0064 0.250 60529.2
## 4   rs737657  a  g 0.3583 -0.0075 0.0064 0.240 60527.3
## 5 rs17524355  t  c     NA -0.0460 0.0460 0.320  1700.0
## 6  rs7086391  t  c 0.2000 -0.0130 0.0079 0.099 59675.0

where the columns p and a2 are extra information. For default simple pleiotropic analysis (MultiSummary(..., type = 'direct'), see below), n is not essentially required, if unknown, simply give a large number, e.g. 10,000. For MultiSummary(..., type = 'precise'), freq can be any values between 0 and 1, as the exact genotypic variances are given.

MultiABEL does NOT require different single-trait GWAS having been performed in exactly the same individuals.

Loading Data

We can easily load the summary-level data into R via:

load('indep.snps.RData')
data <- load.summary(c('height.txt', 'weight.txt', 'bmi.txt'), indep.snps = indep.snps)

## loading data ...
## Progress:  33%  Progress:  67%  Progress: 100%  
## checking markers ...
## Progress:  67%  Progress: 100%  
## cleaning data ...
## Progress:  33%  Progress:  67%  Progress: 100%  
## correcting parameters ...
## Progress:  67%  Progress: 100%  
## adjusting sample size ... done.
## finalizing summary statistics ...
## Progress:  33%  Progress:  67%  Progress: 100%  
## samples partially overlap!
## estimating shrinkage phenotypic correlations ... done.

The first command reads a set of independent SNPs for correlation estimation. If you have your own set of independent markers to import, simply replace. The loaded data contains three sub-objects: $gwa, $cor.pheno and $var.pheno, where $gwa is a cleaned data frame of single-trait GWAS results, and the rest are shrinkage phenotypic correlation matrix and phenotypic variances, both estimated or given by user input.

head(data$gwa)

##           height.txt.beta height.txt.se weight.txt.beta weight.txt.se
## rs4747841          0.0025        0.0061          0.0003        0.0064
## rs4749917         -0.0025        0.0061         -0.0003        0.0064
## rs737656          -0.0073        0.0064         -0.0063        0.0066
## rs737657          -0.0075        0.0064         -0.0056        0.0066
## rs7086391         -0.0130        0.0079          0.0094        0.0083
## rs878177           0.0140        0.0066          0.0008        0.0069
##           bmi.txt.beta bmi.txt.se      f       n
## rs4747841       0.0005     0.0063 0.5500 58322.7
## rs4749917      -0.0005     0.0063 0.4500 58322.6
## rs737656       -0.0025     0.0066 0.3667 58322.7
## rs737657       -0.0020     0.0066 0.3583 58316.8
## rs7086391       0.0230     0.0082 0.2000 58322.5
## rs878177       -0.0100     0.0068 0.3000 58322.4

data$cor.pheno

##             height.txt weight.txt     bmi.txt
## height.txt  1.00000000 0.31558706 -0.16242127
## weight.txt  0.31558706 1.00000000  0.85458914
## bmi.txt    -0.16242127 0.85458914  1.00000000

data$var.pheno

## [1] 1 1 1

Default load.summary(..., est.var = FALSE) assumes all phenotypic variances are 1, which is a known value for GWAS with inverse-Gaussian transformation of the phenotypes. Setting est.var = TRUE will estimate the phenotypic variances using summary statistics, which is useful e.g. for case-control studies where the variance of liability can be estimated.

A Simple Pleiotropic Meta-Analysis

Once the data are successfully loaded, the simplest multi-trait pleiotropic analysis is straightforward:

result <- MultiSummary(data)

## Multi-trait genome scan ... Done.

The result is a list with two sub-objects $scan and $coef. For this simple analysis, only the data frame $scan is reported, where the column p gives the multi-trait analysis p-values.

head(result$scan)

##              marker   freq       n             p
## rs4747841 rs4747841 0.5500 58322.7 0.79726544739
## rs4749917 rs4749917 0.4500 58322.6 0.79726544741
## rs737656   rs737656 0.3667 58322.7 0.63421469840
## rs737657   rs737657 0.3583 58316.8 0.65263056840
## rs7086391 rs7086391 0.2000 58322.5 0.00043028835
## rs878177   rs878177 0.3000 58322.4 0.01823161595

The result is analog to the MANOVA analysis in R, such as manova(cbind(Trait1, Trait2, Trait3) ~ SNP).

Pleiotropic & Conditional Meta-Analysis in HWE Outbred Population

If we assume that our outbred populations for the three phenotypes follow Hardy-Weinberg equilibrium (HWE), we can then perform deeper pleiotropic meta-analysis, with more estimates including conditional genetic effects. For example:

result.out <- MultiSummary(data, type = 'outbred')

## Multi-trait genome scan ... Done.

The result is a list with two sub-objects $scan and $coef. Now, the data frame $scan is reported with more columns, corresponding to three conditional GWAS analyses, i.e. single-trait GWAS for each trait with the other traits included as covariates:

head(result.out$scan)

##              marker   freq       n             p    beta.score
## rs4747841 rs4747841 0.5500 58322.7 0.79726544739 1.8754020e-05
## rs4749917 rs4749917 0.4500 58322.6 0.79726544741 1.8754020e-05
## rs737656   rs737656 0.3667 58322.7 0.63421469840 3.3824150e-05
## rs737657   rs737657 0.3583 58316.8 0.65263056840 3.1213268e-05
## rs7086391 rs7086391 0.2000 58322.5 0.00043028835 3.9293148e-04
## rs878177   rs878177 0.3000 58322.4 0.01823161595 2.0098628e-04
##                se.score beta.cond.height.txt se.cond.height.txt
## rs4747841 7.3004733e-05        0.00279569491       0.0026842035
## rs4749917 7.3004733e-05       -0.00279569491       0.0026842058
## rs737656  7.1088553e-05       -0.00069077805       0.0027711098
## rs737657  6.9345748e-05       -0.00126907245       0.0027851258
## rs7086391 1.1160370e-04        0.00801804470       0.0033387984
## rs878177  8.5131924e-05       -0.00337208905       0.0029142862
##           p.cond.height.txt beta.cond.weight.txt se.cond.weight.txt
## rs4747841       0.297626798       -0.00133195655       0.0014126825
## rs4749917       0.297627212        0.00133195655       0.0014126838
## rs737656        0.803145371       -0.00056711104       0.0014584137
## rs737657        0.648633972       -0.00023895596       0.0014657971
## rs7086391       0.016329074       -0.00593304553       0.0017570168
## rs878177        0.247235918        0.00357046716       0.0015336614
##           p.cond.weight.txt beta.cond.bmi.txt se.cond.bmi.txt
## rs4747841     0.34575444223     1.3979808e-03    0.0014689944
## rs4749917     0.34575485575    -1.3979808e-03    0.0014689957
## rs737656      0.69738363269     3.3470484e-04    0.0015165516
## rs737657      0.87050176200     3.4480639e-05    0.0015242273
## rs7086391     0.00073341764     7.3051451e-03    0.0018269113
## rs878177      0.01990852313    -4.0863748e-03    0.0015947648
##           p.cond.bmi.txt
## rs4747841  3.4127027e-01
## rs4749917  3.4127068e-01
## rs737656   8.2532506e-01
## rs737657   9.8195202e-01
## rs7086391  6.3709785e-05
## rs878177   1.0396101e-02

The p column is analog to the MANOVA analysis in R, such as manova(cbind(Trait1, Trait2, Trait3) ~ SNP). The remaining three columns per trait are analog to the regression analysis in R, such as lm(Trait1 ~ Trait2 + Trait3 + SNP).

The $coef data frame in the result gives an alternative conditional analysis, i.e. the multiple regression of the SNP dosage on the phenotypes, analog to R analysis lm(SNP ~ Trait1 + Trait2 + Trait3).

Pleiotropic & Conditional Meta-Analysis in Inbred Population

In inbred populations, we have the allele frequencies being equal to the corresponding genotype frequencies. Therefore, different from a HWE outbred population where $genotype\sim Binomial(2,freq)$, we have $genotype\sim Bernoulli(freq)$. This does not change the pleiotropic analysis p-values, but will have an effect on the conditional analysis. The analysis using MultiABEL is very similar to above: MultiSummary(data, type = 'inbred').

Pleiotropic & Conditional Meta-Analysis with Precise Genotypic Variances

Regardless of HWE or the genotype frequencies, if the genotypic variance per variant is precisely calculated, one can perform the above analysis in a generic way. As an example, let us assume our summary statistics were indeed obtained in a HWE population, but somehow our minor allele frequencies (MAF) were over-estimated by 25%. We can then calculate the correct genotypic variances:

f <- data$gwa[,'f']
minor <- data$gwa[,'f'] <= 0.5
f[minor] <- data$gwa[minor,'f']*.8
f[!minor] <- 1 - (1 - data$gwa[!minor,'f'])*.8
gv <- 2*f*(1 - f)

Then, the correct pleiotropic and conditional meta analysis can be performed by:

result.pre <- MultiSummary(data, type = 'precise', vars = gv)

## Multi-trait genome scan ... Done.

We can see that the joint analysis is not affected by allele frequencies being incorrectly reported, but the conditional analysis is. Over-estimating MAF inflates the conditional analysis signals.

For Help

For direct R documentation of the two functions above, you can simply use question mark in R:

?load.summary
?MultiSummary

If you have specific questions, we advice you to use our GenABEL project forum at http://www.genabel.org and for urgent questions, you may email the maintainer of MultiABEL via xia dot shen at ed dot ac dot uk.