Liftover from hg19 (b37) to b38 human genome reference

This page describes how to lift b37 human genome positions to b38 assembly.

  1. You need to install the package below for Bioconducter packages:
install.packages('BiocManager')
  1. Install these two packages and restart R:
BiocManager::install('rtracklayer')
BiocManager::install('liftOver')
  1. Load the packages:
library(rtracklayer) 
library(liftOver) 
  1. Import the chain file for liftover
chain = import.chain('hg19ToHg38.over.chain')
  1. Input your data with hg19 (b37) positions:
example <- read.table('orcades_XWAS_female_z1.tsv', header = TRUE)
head(example)[,1:7]
##           SNPID CHR     POS STRAND   N EFFECT_ALLELE REFERENCE_ALLELE
## 1 X_2699555_C_A   X 2699555      + 557             A                C
## 2 X_2699645_G_T   X 2699645      + 557             T                G
## 3 X_2699676_G_A   X 2699676      + 557             A                G
## 4 X_2699968_A_G   X 2699968      + 557             G                A
## 5 X_2700027_T_C   X 2700027      + 557             C                T
## 6 X_2700157_G_A   X 2700157      + 557             A                G
  1. Format your input data to a data.frame with only chr (e.g., chr1), start position, end position (the same as start position for single SNPs), and SNPID, with column names as c('chr', 'start', 'end', 'snp'):
non_rs_pos_b37 <- data.frame(chr = rep('chrX', nrow(example)), 
                             start = example$POS, 
                             end = example$POS, 
                             snp = example$SNPID)
head(non_rs_pos_b37)
##    chr   start     end           snp
## 1 chrX 2699555 2699555 X_2699555_C_A
## 2 chrX 2699645 2699645 X_2699645_G_T
## 3 chrX 2699676 2699676 X_2699676_G_A
## 4 chrX 2699968 2699968 X_2699968_A_G
## 5 chrX 2700027 2700027 X_2700027_T_C
## 6 chrX 2700157 2700157 X_2700157_G_A
  1. Run the code below for liftover to b38:
non_rs_pos <- makeGRangesFromDataFrame(non_rs_pos_b37, TRUE)
non_rs_b38 <- liftOver(non_rs_pos, chain)
non_rs_b38 <- as.data.frame(non_rs_b38)
head(non_rs_b38) # Some SNPs in b37 might be filtered out when lifted over to b38 
##             snp chr position.b37 position.b38
## 1 X_2699555_C_A   X      2699555      2781514
## 2 X_2699645_G_T   X      2699645      2781604
## 3 X_2699676_G_A   X      2699676      2781635
## 4 X_2699968_A_G   X      2699968      2781927
## 5 X_2700027_T_C   X      2700027      2781986
## 6 X_2700157_G_A   X      2700157      2782116


All judgements are, in their rationale, statistics.

-- C. R. Rao







Copyright © 2018-2020 Xia Shen PhD. All rights reserved.