Haplotype and population structure inference using neural networks in whole-genome sequencing data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Haplotype and population structure inference using neural networks in whole-genome sequencing data. / Meisner, Jonas; Albrechtsen, Anders.

In: Genome Research, Vol. 32, No. 8, 2022, p. 1542-1552.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Meisner, J & Albrechtsen, A 2022, 'Haplotype and population structure inference using neural networks in whole-genome sequencing data', Genome Research, vol. 32, no. 8, pp. 1542-1552. https://doi.org/10.1101/gr.276813.122

APA

Meisner, J., & Albrechtsen, A. (2022). Haplotype and population structure inference using neural networks in whole-genome sequencing data. Genome Research, 32(8), 1542-1552. https://doi.org/10.1101/gr.276813.122

Vancouver

Meisner J, Albrechtsen A. Haplotype and population structure inference using neural networks in whole-genome sequencing data. Genome Research. 2022;32(8):1542-1552. https://doi.org/10.1101/gr.276813.122

Author

Meisner, Jonas ; Albrechtsen, Anders. / Haplotype and population structure inference using neural networks in whole-genome sequencing data. In: Genome Research. 2022 ; Vol. 32, No. 8. pp. 1542-1552.

Bibtex

@article{6695fedb4e7249cdb8462bf9839ba1ca,
title = "Haplotype and population structure inference using neural networks in whole-genome sequencing data",
abstract = "Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.",
keywords = "INDIVIDUAL ADMIXTURE",
author = "Jonas Meisner and Anders Albrechtsen",
year = "2022",
doi = "10.1101/gr.276813.122",
language = "English",
volume = "32",
pages = "1542--1552",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "8",

}

RIS

TY - JOUR

T1 - Haplotype and population structure inference using neural networks in whole-genome sequencing data

AU - Meisner, Jonas

AU - Albrechtsen, Anders

PY - 2022

Y1 - 2022

N2 - Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.

AB - Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.

KW - INDIVIDUAL ADMIXTURE

U2 - 10.1101/gr.276813.122

DO - 10.1101/gr.276813.122

M3 - Journal article

C2 - 35794006

VL - 32

SP - 1542

EP - 1552

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 8

ER -

ID: 322568575