FPGA acceleration of bioinformatics algorithms with Macah

by
Ben Weintraub

Abstract:

The problem of detecting similarities between different genetic sequences is fundamental to many research pursuits in biology and genetics. BLAST (Basic Local Alignment and Search Tool) is the most commonly used tool for identifying and assessing the significance of such similarities. With the quantity of available genetic sequence data rapidly increasing, improving the performance of the BLAST algorithm is a problem of great interest. BLAST compares a single query sequence against a database of known sequences, employing a heuristic algorithm that consists of three stages arranged in a pipeline, such that the output of one stage feeds into the input of the next stage. The initial stage searches for short fragments that are found in both the query sequence and a database sequence. The second stage extends these short word matches, attempting to produce high-scoring alignments centered on word matches without any gaps. The highest-scoring alignments are passed on to the third stage, which aligns the sequences while allowing for gaps. Several recent studies have successfully investigated the use of Field-Programmable Gate Arrays (FPGAs) to accelerate the execution of the BLAST algorithm, focusing on the first and second stages, which account for the vast majority of the algorithm?s execution time. While these results are encouraging, translating algorithms like BLAST that contain somewhat complex and unpredictable control flow and data access patterns into FPGA versions turns out to be quite difficult using currently available tools. FPGAs are usually programmed using Hardware Description Languages (HDLs), which are significantly more difficult to learn and use than standard programming languages. We present an accelerated version of the BLAST algorithm written in a new language, called Macah, which is designed to make the task of programming for FPGAs and FPGA-like architectures accessible to programmers comfortable with the widely known C language.

Advised by Scott Hauck and Benjamin Ylvisaker

CSE 203
Wednesday
May 21, 2008
4:30 - 5:20 pm