Parallel Algorithm for indexing large DNA Sequences Using MapReduce on Hadoop

Conference proceedings article

Authors / Editors

Dinakenyane, Otlhapile

Research Areas

No matching items found.

Publication Details

Author list: Kaniwa F, Dinakenyane O, Kuthadi VM

Place: NEW YORK

Publication year: 2017

Journal: 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) (2156-1125)

Journal acronym: IEEE INT C BIOINFORM

Start page: 1576

End page: 1582

Number of pages: 7

eISBN: 978-1-5090-3050-7

ISSN: 2156-1125

eISSN: 2156-1133

Languages: English-Great Britain (EN-GB)

View in Web of Science | View citing articles in Web of Science

Abstract

MapReduce has recently become very successful parallel processing technique. Latest DNA sequencing technologies are now able to generate huge DNA sequences easily and cheaper. Consequently making it a challenge for single-core processor systems to mine patterns, hence leading to unsatisfactory performance. In this paper, we explore this challenge by making use of MapReduce on Hadoop platform using a successful data structure called the generalized suffix tree. Our experimental results show that the proposed approach can index long sequences with improved performance than previous related approaches.

Keywords

DNA Sequences, MapReduce, Parallel Algorithm, Suffix Trees

Documents