Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data. / Meisner, Jonas; Albrechtsen, Anders.
In: Genetics, Vol. 210, No. 2, 2018, p. 719-731.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data
AU - Meisner, Jonas
AU - Albrechtsen, Anders
N1 - Copyright © 2018, Genetics.
PY - 2018
Y1 - 2018
N2 - We here present two methods for inferring population structure and admixture proportions in low depth next generation sequencing data. Inference of population structure is essential in both population genetics and association studies and is often performed using principal component analysis or clustering-based approaches. Next-generation sequencing methods provide large amounts of genetic data but are associated with statistical uncertainty for especially low depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through principal component analysis in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
AB - We here present two methods for inferring population structure and admixture proportions in low depth next generation sequencing data. Inference of population structure is essential in both population genetics and association studies and is often performed using principal component analysis or clustering-based approaches. Next-generation sequencing methods provide large amounts of genetic data but are associated with statistical uncertainty for especially low depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through principal component analysis in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
U2 - 10.1534/genetics.118.301336
DO - 10.1534/genetics.118.301336
M3 - Journal article
C2 - 30131346
VL - 210
SP - 719
EP - 731
JO - Genetics
JF - Genetics
SN - 1943-2631
IS - 2
ER -
ID: 201429982