Read indexing

Authors

  • Nicolas Philippe
  • Mikael Salson
  • Thierry Lecroq
  • Martine Leonard
  • Therese Commes
  • Eric Rivals

DOI:

https://doi.org/10.14806/ej.17.B.289

Keywords:

next generation sequencing, COST, read indexing

Abstract

http://www.lirmm.fr/~rivals


The question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. We propose a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer various types of queries. We compare our data structure to other possible solutions to investigate its scalability and computational efficiency. Gk arrays are implemented in a general purpose library, which may prove useful for assembly purposes, for evaluating the expression level in RNA-seq, and others high throughput sequencing applications.


References
1. Querying large read collections in main memory: a versatile data structure. N. Philippe, M. Salson, T. Lecroq, M. Leonard, T. Commes and E. Rivals. BMC Bioinformatics, Vol. 12, p. 42, doi:10.1186/1471-2105-12-242, 2011.


Relevant Web sites
2. http://crac.gforge.inria.fr/gkarrays/
3. http://www.atgc-montpellier.fr/ngs/

Downloads

Published

2012-02-28

Issue

Section

Posters