作者(外文):Wu, Dai-Yang
論文名稱(外文):Accelerating Protein Alignment by GPU
指導教授(外文):Hon, Wing-Kai
口試委員(外文):Lee, Che-Rung
Lu, Chin-Lung
外文關鍵詞:Protein AlignmentGPUCUDA
2015年,由Buchfink Xie和Huson提出的軟體DIAMOND(Nature Methods, 2015), 顯著地加速比對程序且保持和BLASTX差不多的靈敏度,然而,用DIAMOND比對大量資料仍然很慢。已經有一些加速手段被研究去加速DIAMOND,例如,AC-DIAMOND(Mai et al., Proc BIBE, 2016)使用CPU SIMD指令並且得到四倍的加速;HAMOND(Yu et al., J. Biotechnology, 2017)把DIAMOND在Hadoop分散式系統上平行化。

Aligning biological sequences against a protein database is an im-
portant step of bioinformatics research and applications. Due to
the rapid growth of sequencing technologies, sequence data becomes
more difficult to handle. BLASTX, a software provided by NCBI, is
the most popular alignment tool due to its high sensitivity. However,
it is too slow in aligning large dataset with database.

In 2015, DIAMOND, a software proposed by Buchfink, Xie, and
Huson (Nature Methods, 2015) , speeds up the alignment process
significantly while maintaining similar sensitivity as BLASTX. How-
ever, DIAMOND is still slow when the query data is large. Several
acceleration techniques have been studied to improve the speed of
DIAMOND. For instance, AC-DIAMOND (Mai et al., Proc. BIBE,
2016) utilizes CPU SIMD instructions and reports a 4-fold overall
speedup over DIAMOND; HAMOND (Yu et al., J. Biotechnology,
2017) parallelizes DIAMOND on Hadoop distributed system.
Despite the many recent successes in applying GPU technology
to speed up algorithms, there is no GPU-accelerated version of DIA-

In this thesis, we present CU-DIAMOND, an efficient GPU
acceleration of DIAMOND. Experimental results show that CU-
DIAMOND achieves a 10-fold speedup in the most time-consuming
alignment part of DIAMOND, and gains a 4-fold overall speedup
over DIAMOND (and a 33% speedup over AC-DIAMOND), while
sensitivity remains the same.
1 Introduction - 1
2 Preliminaries - 5
2.1 ProteinAlignment ................... 5
2.2 Smith-Waterman Alignment Algorithm . . . . . . . . 8
2.3 Seed-and-ExtendParadigm .............. 9
2.4 GPU architecture and CUDA programming model . . 9
2.5 SIMDinstructionsinCPU............... 13
3 Review on DIAMOND and AC-DIAMOND - 15
3.1 Indexing......................... 16
3.2 MatchFiltering ..................... 16
3.3 FinalScoring ...................... 17
3.4 Bottlenecks ....................... 20
3.5 AC-DIAMOND ..................... 20
4 Methods 21
4.1 Indexing......................... 21
4.2 FinalScoring ...................... 21
5 Experimental Results - 26
6 Conclusion and Further Work - 30
