比对工具 BWA

BWA 全名 Burrow-Wheeler Aligner

BWA是一款将DNA序列mapping到参考基因组上的软件,例如比对到人类基因组。其由三个算法组成BWA-backtrack,BWA-SW和BWA-MEM。在该软件作者的github上可以看到对三个算法的不同用处的解释https://github.com/lh3/bwa

  1. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases.
  2. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate.
  3. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

并且在BWA命令中可以分别调用这三个算法,如:aln/samse/sampe for BWA-backtrack,bwasw for BWA-SW,mem for the BWA-MEM

BWA下载及安装

下载地址:https://sourceforge.net/projects/bio-bwa/

tar jxvf bwa-0.7.15.tar.bz2
cd bwa-0.7.15
make

可以用绝对路径调用,也可以加入环境变量

BWA使用步骤

我们使用BWA对短序列进行比对时,一般是使用BWA-MEM算法,按照作者的意思,BWA-MEM更加适合于Illumina 70-100bp reads的比对

  1. 使用BWA构建参考基因组的index

    bwa index [-p prefix] [-a algoType] <in.db.fasta>
    

    -p 输出文件的前缀,例如对hg19.fa建索引,那么输出文件前缀就写hg19

    -a 选择构建索引的算法,

    is 是默认的算法,是由于其快速简单,但是不适合超过2G的基因组,因此人类基因组需要指定另一个算法

    bwtsw 对于基因组大于2G的,适合使用该算法,比如人类基因组

  2. 使用BWA-MEM算法进行比对

    bwa mem [-aCHMpP] [-t nThreads] [-k minSeedLen] [-w bandWidth] [-d zDropoff] [-r seedSplitRatio] [-c maxOcc] [-A matchScore] [-B mmPenalty] [-O gapOpenPen] [-E gapExtPen] [-L clipPen] [-U unpairPen] [-R RGline] [-v verboseLevel] db.prefix reads.fq [mates.fq]
    

    其实我们用不到这么多参数,主要的一般是以下几个:

    -t 设置线程数

    -R 设置read标头

    Complete read group header line. ’\t’ can be used in STR and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is ’@RG\tID:foo\tSM:bar’.

    比如模仿一个:-R @RG\tID:sample_name\tSM:sample_name\tLB:WGS\tPL:Illumina

    -M 将较短的split hits标记为secondary,与picard兼容

最后来一个例子:

# 建索引
bwa index -a bwtsw -p hg19 hg19.fa 1>hg19.bwa_index.log 2>&1
# 比对
bwa mem -t 5 -M -R @RG\tID:KPGP-00001_L1\tSM:KPGP-00001_L1\tLB:WGS\tPL:Illumina ~/reference/index/bwa/hg19  KPGP-00001_L1_R1.fq.gz KPGP-00001_L1_R2.fq.gz 1>KPGP-00001_L1.sam 2>KPGP-00001_L1.bwa.align.log