A better sequence-read generator program for metagenomics

Authors

  • Stephen Eric Johnson University of Saskatchewan, Saskatoon
  • Brett Trost University of Saskatchewan, Saskatoon
  • Jeffrey R Long University of Saskatchewan, Saskatoon
  • Anthony Kusalik University of Saskatchewan, Saskatoon

DOI:

https://doi.org/10.14806/ej.19.A.634

Abstract

There are many programs available for generating simulated metagenomic sequence reads. The data generated by these programs follow rigid models, which limits the use of a given program to the author’s original intentions. For example, many popular simulator programs only generate reads that follow uniform or normal distributions. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirical next-generation sequencing (NGS) data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine learning approach to generate reads with lengths and quality values mimicking empirically derived distributions. BEAR is able to emulate reads from various NGS platforms, including Illumina, 454 and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate internal parameter settings.

Author Biographies

  • Stephen Eric Johnson, University of Saskatchewan, Saskatoon
    MSc. candidate in the Department of Computer Science
  • Brett Trost, University of Saskatchewan, Saskatoon
    PhD candidate in the Department of Computer Science
  • Jeffrey R Long, University of Saskatchewan, Saskatoon
    Professional research associate at the Department of Computer Science
  • Anthony Kusalik, University of Saskatchewan, Saskatoon
    Professor at the Department of Computer Science

Downloads

Additional Files

Published

2013-04-08

Issue

Section

Posters