Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Predicting and elucidating the etiology of fatty liver disease : A machine learning modeling and validation study in the IMI DIRECT cohorts. / Atabaki-Pasdar, Naeimeh; Ohlsson, Mattias; Viñuela, Ana; Frau, Francesca; Pomares-Millan, Hugo; Haid, Mark; Jones, Angus G; Thomas, E Louise; Koivula, Robert W; Kurbasic, Azra; Mutie, Pascal M; Fitipaldi, Hugo; Fernandez, Juan; Dawed, Adem Y; Giordano, Giuseppe N; Forgie, Ian M; McDonald, Timothy J; Rutters, Femke; Cederberg, Henna; Chabanova, Elizaveta; Dale, Matilda; Masi, Federico De; Thomas, Cecilia Engel; Allin, Kristine H.; Hansen, Tue H; Heggie, Alison; Hong, Mun-Gwan; Elders, Petra J M; Kennedy, Gwen; Kokkola, Tarja; Pedersen, Helle Krogh; Mahajan, Anubha; McEvoy, Donna; Pattou, Francois; Raverdy, Violeta; Häussler, Ragna S; Sharma, Sapna; Thomsen, Henrik S; Vangipurapu, Jagadish; Vestergaard, Henrik; Adamski, Jerzy; Musholt, Petra B; Brage, Søren; Brunak, Søren; Dermitzakis, Emmanouil; Frost, Gary; Hansen, Torben; Laakso, Markku; Pedersen, Oluf; IMI-DIRECT consortium.

In: PLoS Medicine, Vol. 17, No. 6, e1003149, 2020.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Atabaki-Pasdar, N, Ohlsson, M, Viñuela, A, Frau, F, Pomares-Millan, H, Haid, M, Jones, AG, Thomas, EL, Koivula, RW, Kurbasic, A, Mutie, PM, Fitipaldi, H, Fernandez, J, Dawed, AY, Giordano, GN, Forgie, IM, McDonald, TJ, Rutters, F, Cederberg, H, Chabanova, E, Dale, M, Masi, FD, Thomas, CE, Allin, KH, Hansen, TH, Heggie, A, Hong, M-G, Elders, PJM, Kennedy, G, Kokkola, T, Pedersen, HK, Mahajan, A, McEvoy, D, Pattou, F, Raverdy, V, Häussler, RS, Sharma, S, Thomsen, HS, Vangipurapu, J, Vestergaard, H, Adamski, J, Musholt, PB, Brage, S, Brunak, S, Dermitzakis, E, Frost, G, Hansen, T, Laakso, M, Pedersen, O & IMI-DIRECT consortium 2020, 'Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts', PLoS Medicine, vol. 17, no. 6, e1003149. https://doi.org/10.1371/journal.pmed.1003149

APA

Atabaki-Pasdar, N., Ohlsson, M., Viñuela, A., Frau, F., Pomares-Millan, H., Haid, M., Jones, A. G., Thomas, E. L., Koivula, R. W., Kurbasic, A., Mutie, P. M., Fitipaldi, H., Fernandez, J., Dawed, A. Y., Giordano, G. N., Forgie, I. M., McDonald, T. J., Rutters, F., Cederberg, H., ... IMI-DIRECT consortium (2020). Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Medicine, 17(6), [e1003149]. https://doi.org/10.1371/journal.pmed.1003149

Vancouver

Atabaki-Pasdar N, Ohlsson M, Viñuela A, Frau F, Pomares-Millan H, Haid M et al. Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Medicine. 2020;17(6). e1003149. https://doi.org/10.1371/journal.pmed.1003149

Author

Atabaki-Pasdar, Naeimeh ; Ohlsson, Mattias ; Viñuela, Ana ; Frau, Francesca ; Pomares-Millan, Hugo ; Haid, Mark ; Jones, Angus G ; Thomas, E Louise ; Koivula, Robert W ; Kurbasic, Azra ; Mutie, Pascal M ; Fitipaldi, Hugo ; Fernandez, Juan ; Dawed, Adem Y ; Giordano, Giuseppe N ; Forgie, Ian M ; McDonald, Timothy J ; Rutters, Femke ; Cederberg, Henna ; Chabanova, Elizaveta ; Dale, Matilda ; Masi, Federico De ; Thomas, Cecilia Engel ; Allin, Kristine H. ; Hansen, Tue H ; Heggie, Alison ; Hong, Mun-Gwan ; Elders, Petra J M ; Kennedy, Gwen ; Kokkola, Tarja ; Pedersen, Helle Krogh ; Mahajan, Anubha ; McEvoy, Donna ; Pattou, Francois ; Raverdy, Violeta ; Häussler, Ragna S ; Sharma, Sapna ; Thomsen, Henrik S ; Vangipurapu, Jagadish ; Vestergaard, Henrik ; Adamski, Jerzy ; Musholt, Petra B ; Brage, Søren ; Brunak, Søren ; Dermitzakis, Emmanouil ; Frost, Gary ; Hansen, Torben ; Laakso, Markku ; Pedersen, Oluf ; IMI-DIRECT consortium. / Predicting and elucidating the etiology of fatty liver disease : A machine learning modeling and validation study in the IMI DIRECT cohorts. In: PLoS Medicine. 2020 ; Vol. 17, No. 6.

Bibtex

@article{171d9155395a4941b8fc2f53df111d2a,
title = "Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts",
abstract = "BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.METHODS AND FINDINGS: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.CONCLUSIONS: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.TRIAL REGISTRATION: ClinicalTrials.gov NCT03814915.",
author = "Naeimeh Atabaki-Pasdar and Mattias Ohlsson and Ana Vi{\~n}uela and Francesca Frau and Hugo Pomares-Millan and Mark Haid and Jones, {Angus G} and Thomas, {E Louise} and Koivula, {Robert W} and Azra Kurbasic and Mutie, {Pascal M} and Hugo Fitipaldi and Juan Fernandez and Dawed, {Adem Y} and Giordano, {Giuseppe N} and Forgie, {Ian M} and McDonald, {Timothy J} and Femke Rutters and Henna Cederberg and Elizaveta Chabanova and Matilda Dale and Masi, {Federico De} and Thomas, {Cecilia Engel} and Allin, {Kristine H.} and Hansen, {Tue H} and Alison Heggie and Mun-Gwan Hong and Elders, {Petra J M} and Gwen Kennedy and Tarja Kokkola and Pedersen, {Helle Krogh} and Anubha Mahajan and Donna McEvoy and Francois Pattou and Violeta Raverdy and H{\"a}ussler, {Ragna S} and Sapna Sharma and Thomsen, {Henrik S} and Jagadish Vangipurapu and Henrik Vestergaard and Jerzy Adamski and Musholt, {Petra B} and S{\o}ren Brage and S{\o}ren Brunak and Emmanouil Dermitzakis and Gary Frost and Torben Hansen and Markku Laakso and Oluf Pedersen and {IMI-DIRECT consortium}",
year = "2020",
doi = "10.1371/journal.pmed.1003149",
language = "English",
volume = "17",
journal = "P L o S Medicine (Online)",
issn = "1549-1277",
publisher = "Public Library of Science",
number = "6",

}

RIS

TY - JOUR

T1 - Predicting and elucidating the etiology of fatty liver disease

T2 - A machine learning modeling and validation study in the IMI DIRECT cohorts

AU - Atabaki-Pasdar, Naeimeh

AU - Ohlsson, Mattias

AU - Viñuela, Ana

AU - Frau, Francesca

AU - Pomares-Millan, Hugo

AU - Haid, Mark

AU - Jones, Angus G

AU - Thomas, E Louise

AU - Koivula, Robert W

AU - Kurbasic, Azra

AU - Mutie, Pascal M

AU - Fitipaldi, Hugo

AU - Fernandez, Juan

AU - Dawed, Adem Y

AU - Giordano, Giuseppe N

AU - Forgie, Ian M

AU - McDonald, Timothy J

AU - Rutters, Femke

AU - Cederberg, Henna

AU - Chabanova, Elizaveta

AU - Dale, Matilda

AU - Masi, Federico De

AU - Thomas, Cecilia Engel

AU - Allin, Kristine H.

AU - Hansen, Tue H

AU - Heggie, Alison

AU - Hong, Mun-Gwan

AU - Elders, Petra J M

AU - Kennedy, Gwen

AU - Kokkola, Tarja

AU - Pedersen, Helle Krogh

AU - Mahajan, Anubha

AU - McEvoy, Donna

AU - Pattou, Francois

AU - Raverdy, Violeta

AU - Häussler, Ragna S

AU - Sharma, Sapna

AU - Thomsen, Henrik S

AU - Vangipurapu, Jagadish

AU - Vestergaard, Henrik

AU - Adamski, Jerzy

AU - Musholt, Petra B

AU - Brage, Søren

AU - Brunak, Søren

AU - Dermitzakis, Emmanouil

AU - Frost, Gary

AU - Hansen, Torben

AU - Laakso, Markku

AU - Pedersen, Oluf

AU - IMI-DIRECT consortium

PY - 2020

Y1 - 2020

N2 - BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.METHODS AND FINDINGS: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.CONCLUSIONS: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.TRIAL REGISTRATION: ClinicalTrials.gov NCT03814915.

AB - BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.METHODS AND FINDINGS: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.CONCLUSIONS: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.TRIAL REGISTRATION: ClinicalTrials.gov NCT03814915.

U2 - 10.1371/journal.pmed.1003149

DO - 10.1371/journal.pmed.1003149

M3 - Journal article

C2 - 32559194

VL - 17

JO - P L o S Medicine (Online)

JF - P L o S Medicine (Online)

SN - 1549-1277

IS - 6

M1 - e1003149

ER -

ID: 244996187