Binning microbial genomes using deep learning

Research output: Working paperPreprintResearch

Standard

Binning microbial genomes using deep learning. / Nissen, Jakob Nybo; Sønderby, Casper Kaae; Armenteros, Jose Juan Almagro; Grønbech, Christopher Heje; Nielsen, Henrik Bjørn; Petersen, Thomas Nordahl; Winther, Ole; Rasmussen, Simon.

2018.

Research output: Working paperPreprintResearch

Harvard

Nissen, JN, Sønderby, CK, Armenteros, JJA, Grønbech, CH, Nielsen, HB, Petersen, TN, Winther, O & Rasmussen, S 2018 'Binning microbial genomes using deep learning'. https://doi.org/10.1101/490078

APA

Nissen, J. N., Sønderby, C. K., Armenteros, J. J. A., Grønbech, C. H., Nielsen, H. B., Petersen, T. N., Winther, O., & Rasmussen, S. (2018). Binning microbial genomes using deep learning. bioRxiv https://doi.org/10.1101/490078

Vancouver

Nissen JN, Sønderby CK, Armenteros JJA, Grønbech CH, Nielsen HB, Petersen TN et al. Binning microbial genomes using deep learning. 2018. https://doi.org/10.1101/490078

Author

Nissen, Jakob Nybo ; Sønderby, Casper Kaae ; Armenteros, Jose Juan Almagro ; Grønbech, Christopher Heje ; Nielsen, Henrik Bjørn ; Petersen, Thomas Nordahl ; Winther, Ole ; Rasmussen, Simon. / Binning microbial genomes using deep learning. 2018. (bioRxiv).

Bibtex

@techreport{59836dcc003c4bcaa8547f2a309dd8f9,
title = "Binning microbial genomes using deep learning",
abstract = "Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k-mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 - 8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 - 1.8 million extra genes and reconstruct 117 - 246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb",
author = "Nissen, {Jakob Nybo} and S{\o}nderby, {Casper Kaae} and Armenteros, {Jose Juan Almagro} and Gr{\o}nbech, {Christopher Heje} and Nielsen, {Henrik Bj{\o}rn} and Petersen, {Thomas Nordahl} and Ole Winther and Simon Rasmussen",
year = "2018",
doi = "10.1101/490078",
language = "English",
series = "bioRxiv",
publisher = "Cold Spring Harbor Laboratory",
type = "WorkingPaper",
institution = "Cold Spring Harbor Laboratory",

}

RIS

TY - UNPB

T1 - Binning microbial genomes using deep learning

AU - Nissen, Jakob Nybo

AU - Sønderby, Casper Kaae

AU - Armenteros, Jose Juan Almagro

AU - Grønbech, Christopher Heje

AU - Nielsen, Henrik Bjørn

AU - Petersen, Thomas Nordahl

AU - Winther, Ole

AU - Rasmussen, Simon

PY - 2018

Y1 - 2018

N2 - Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k-mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 - 8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 - 1.8 million extra genes and reconstruct 117 - 246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb

AB - Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k-mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 - 8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 - 1.8 million extra genes and reconstruct 117 - 246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb

U2 - 10.1101/490078

DO - 10.1101/490078

M3 - Preprint

T3 - bioRxiv

BT - Binning microbial genomes using deep learning

ER -

ID: 322792857