Run the following command in R to install the **MultiABEL** package:

`install.packages("MultiABEL", repos = "http://R-Forge.R-project.org")`

For Mac users, please download the source .tar.gz and install via:

`R CMD INSTALL MultiABEL_1.1-x.tar.gz`

in your terminal. You need Fortran 4.8 compiler.

In R, load the package via:

`library(MultiABEL)`

```
##
## MultiABEL: Multi-Trait Genome-Wide Association Analyses
```

`## Version 1.1-5 (2016-05-07) installed`

`## Author: Xia Shen`

`## Maintainer: Xia Shen <xia.shen@ki.se>`

`## Use citation("MultiABEL") to know how to cite this work.`

or

`require(MultiABEL)`

You need to gather multiple GWAS summary statistics into your working directory. Let us demonstrate this via an example, and the example files can be found at https://goo.gl/r0Mcav. Here, we have the summary statistics for three traits: height, weight, body mass index (BMI), across 890 variants. There is also an .RData file *indep.snps.RData* to download, which contains the names of a set of independent variants in European ancestry populations. These variants are used for estimating the phenotypic correlations if unknown.

Each file of single-trait GWAS summary statistics should contain columns for variant names (default column name `snp`

), the first (coding or reference) alleles (default column name `a1`

), allele frequencies (default column name `freq`

), effect sizes (default column name `beta`

), standard errors (default column name `se`

), and sample sizes (default column name `n`

). For example, here the top of the summary statistics file for height looks like:

```
height <- read.table('height.txt', header = TRUE)
head(height)
```

```
## snp a1 a2 freq beta se p n
## 1 rs4747841 a g 0.5500 0.0025 0.0061 0.680 60558.2
## 2 rs4749917 t c 0.4500 -0.0025 0.0061 0.680 60558.1
## 3 rs737656 a g 0.3667 -0.0073 0.0064 0.250 60529.2
## 4 rs737657 a g 0.3583 -0.0075 0.0064 0.240 60527.3
## 5 rs17524355 t c NA -0.0460 0.0460 0.320 1700.0
## 6 rs7086391 t c 0.2000 -0.0130 0.0079 0.099 59675.0
```

where the columns `p`

and `a2`

are extra information. For default simple pleiotropic analysis (`MultiSummary(..., type = 'direct')`

, see below), `n`

is not essentially required, if unknown, simply give a large number, e.g.Â 10,000. For `MultiSummary(..., type = 'precise')`

, `freq`

can be any values between 0 and 1, as the exact genotypic variances are given.

**MultiABEL** does NOT require different single-trait GWAS having been performed in exactly the same individuals.

We can easily load the summary-level data into R via:

```
load('indep.snps.RData')
data <- load.summary(c('height.txt', 'weight.txt', 'bmi.txt'), indep.snps = indep.snps)
```

```
## loading data ...
## Progress: 33% Progress: 67% Progress: 100%
## checking markers ...
## Progress: 67% Progress: 100%
## cleaning data ...
## Progress: 33% Progress: 67% Progress: 100%
## correcting parameters ...
## Progress: 67% Progress: 100%
## adjusting sample size ... done.
## finalizing summary statistics ...
## Progress: 33% Progress: 67% Progress: 100%
## samples partially overlap!
## estimating shrinkage phenotypic correlations ... done.
```

The first command reads a set of independent SNPs for correlation estimation. If you have your own set of independent markers to import, simply replace. The loaded data contains three sub-objects: `$gwa`

, `$cor.pheno`

and `$var.pheno`

, where `$gwa`

is a cleaned data frame of single-trait GWAS results, and the rest are shrinkage phenotypic correlation matrix and phenotypic variances, both estimated or given by user input.

`head(data$gwa)`

```
## height.txt.beta height.txt.se weight.txt.beta weight.txt.se
## rs4747841 0.0025 0.0061 0.0003 0.0064
## rs4749917 -0.0025 0.0061 -0.0003 0.0064
## rs737656 -0.0073 0.0064 -0.0063 0.0066
## rs737657 -0.0075 0.0064 -0.0056 0.0066
## rs7086391 -0.0130 0.0079 0.0094 0.0083
## rs878177 0.0140 0.0066 0.0008 0.0069
## bmi.txt.beta bmi.txt.se f n
## rs4747841 0.0005 0.0063 0.5500 58322.7
## rs4749917 -0.0005 0.0063 0.4500 58322.6
## rs737656 -0.0025 0.0066 0.3667 58322.7
## rs737657 -0.0020 0.0066 0.3583 58316.8
## rs7086391 0.0230 0.0082 0.2000 58322.5
## rs878177 -0.0100 0.0068 0.3000 58322.4
```

`data$cor.pheno`

```
## height.txt weight.txt bmi.txt
## height.txt 1.00000000 0.31558706 -0.16242127
## weight.txt 0.31558706 1.00000000 0.85458914
## bmi.txt -0.16242127 0.85458914 1.00000000
```

`data$var.pheno`

`## [1] 1 1 1`

Default `load.summary(..., est.var = FALSE)`

assumes all phenotypic variances are 1, which is a known value for GWAS with inverse-Gaussian transformation of the phenotypes. Setting `est.var = TRUE`

will estimate the phenotypic variances using summary statistics, which is useful e.g.Â for case-control studies where the variance of liability can be estimated.

Once the data are successfully loaded, the simplest multi-trait pleiotropic analysis is straightforward:

`result <- MultiSummary(data)`

`## Multi-trait genome scan ... Done.`

The result is a list with two sub-objects `$scan`

and `$coef`

. For this simple analysis, only the data frame `$scan`

is reported, where the column `p`

gives the multi-trait analysis p-values.

`head(result$scan)`

```
## marker freq n p
## rs4747841 rs4747841 0.5500 58322.7 0.79726544739
## rs4749917 rs4749917 0.4500 58322.6 0.79726544741
## rs737656 rs737656 0.3667 58322.7 0.63421469840
## rs737657 rs737657 0.3583 58316.8 0.65263056840
## rs7086391 rs7086391 0.2000 58322.5 0.00043028835
## rs878177 rs878177 0.3000 58322.4 0.01823161595
```

The result is analog to the MANOVA analysis in R, such as `manova(cbind(Trait1, Trait2, Trait3) ~ SNP)`

.

If we assume that our outbred populations for the three phenotypes follow Hardy-Weinberg equilibrium (HWE), we can then perform deeper pleiotropic meta-analysis, with more estimates including conditional genetic effects. For example:

`result.out <- MultiSummary(data, type = 'outbred')`

`## Multi-trait genome scan ... Done.`

The result is a list with two sub-objects `$scan`

and `$coef`

. Now, the data frame `$scan`

is reported with more columns, corresponding to three *conditional* GWAS analyses, i.e.Â single-trait GWAS for each trait with the other traits included as covariates:

`head(result.out$scan)`

```
## marker freq n p beta.score
## rs4747841 rs4747841 0.5500 58322.7 0.79726544739 1.8754020e-05
## rs4749917 rs4749917 0.4500 58322.6 0.79726544741 1.8754020e-05
## rs737656 rs737656 0.3667 58322.7 0.63421469840 3.3824150e-05
## rs737657 rs737657 0.3583 58316.8 0.65263056840 3.1213268e-05
## rs7086391 rs7086391 0.2000 58322.5 0.00043028835 3.9293148e-04
## rs878177 rs878177 0.3000 58322.4 0.01823161595 2.0098628e-04
## se.score beta.cond.height.txt se.cond.height.txt
## rs4747841 7.3004733e-05 0.00279569491 0.0026842035
## rs4749917 7.3004733e-05 -0.00279569491 0.0026842058
## rs737656 7.1088553e-05 -0.00069077805 0.0027711098
## rs737657 6.9345748e-05 -0.00126907245 0.0027851258
## rs7086391 1.1160370e-04 0.00801804470 0.0033387984
## rs878177 8.5131924e-05 -0.00337208905 0.0029142862
## p.cond.height.txt beta.cond.weight.txt se.cond.weight.txt
## rs4747841 0.297626798 -0.00133195655 0.0014126825
## rs4749917 0.297627212 0.00133195655 0.0014126838
## rs737656 0.803145371 -0.00056711104 0.0014584137
## rs737657 0.648633972 -0.00023895596 0.0014657971
## rs7086391 0.016329074 -0.00593304553 0.0017570168
## rs878177 0.247235918 0.00357046716 0.0015336614
## p.cond.weight.txt beta.cond.bmi.txt se.cond.bmi.txt
## rs4747841 0.34575444223 1.3979808e-03 0.0014689944
## rs4749917 0.34575485575 -1.3979808e-03 0.0014689957
## rs737656 0.69738363269 3.3470484e-04 0.0015165516
## rs737657 0.87050176200 3.4480639e-05 0.0015242273
## rs7086391 0.00073341764 7.3051451e-03 0.0018269113
## rs878177 0.01990852313 -4.0863748e-03 0.0015947648
## p.cond.bmi.txt
## rs4747841 3.4127027e-01
## rs4749917 3.4127068e-01
## rs737656 8.2532506e-01
## rs737657 9.8195202e-01
## rs7086391 6.3709785e-05
## rs878177 1.0396101e-02
```

The `p`

column is analog to the MANOVA analysis in R, such as `manova(cbind(Trait1, Trait2, Trait3) ~ SNP)`

. The remaining three columns per trait are analog to the regression analysis in R, such as `lm(Trait1 ~ Trait2 + Trait3 + SNP)`

.

The `$coef`

data frame in the result gives an alternative conditional analysis, i.e.Â the multiple regression of the SNP dosage on the phenotypes, analog to R analysis `lm(SNP ~ Trait1 + Trait2 + Trait3)`

.

In inbred populations, we have the allele frequencies being equal to the corresponding genotype frequencies. Therefore, different from a HWE outbred population where \(genotype\sim Binomial(2,freq)\), we have \(genotype\sim Bernoulli(freq)\). This does not change the pleiotropic analysis p-values, but will have an effect on the conditional analysis. The analysis using **MultiABEL** is very similar to above: `MultiSummary(data, type = 'inbred')`

.

Regardless of HWE or the genotype frequencies, if the genotypic variance per variant is precisely calculated, one can perform the above analysis in a generic way. As an example, let us assume our summary statistics were indeed obtained in a HWE population, but somehow our minor allele frequencies (MAF) were over-estimated by 25%. We can then calculate the correct genotypic variances:

```
f <- data$gwa[,'f']
minor <- data$gwa[,'f'] <= 0.5
f[minor] <- data$gwa[minor,'f']*.8
f[!minor] <- 1 - (1 - data$gwa[!minor,'f'])*.8
gv <- 2*f*(1 - f)
```

Then, the correct pleiotropic and conditional meta analysis can be performed by:

`result.pre <- MultiSummary(data, type = 'precise', vars = gv)`

`## Multi-trait genome scan ... Done.`

We can see that the joint analysis is not affected by allele frequencies being incorrectly reported, but the conditional analysis is. Over-estimating MAF inflates the conditional analysis signals.

For direct R documentation of the two functions above, you can simply use question mark in R:

```
?load.summary
?MultiSummary
```

If you have specific questions, we advice you to use our GenABEL project forum at http://www.genabel.org and for urgent questions, you may email the maintainer of MultiABEL via *xia dot shen at ed dot ac dot uk*.