ASA 2021 Statistics and Information Systems for Policy Evaluation
Edited by Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Book Chapter

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

Massimo Aria
Corrado Cuccurullo
Agostino Gnasso

download PDF

CC BY 4.0
DOI: 10.36253/978-88-5518-461-8.34

The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields, especially the healthcare context. However, it still has limitations and drawbacks, such as the lack of interpretability which does not allow users to understand how certain decisions are made. This drawback is identified with the term "Black-Box", as well as models that do not allow to interpret the internal work of certain ML techniques, thus discouraging their use. In a highly regulated and risk-averse context such as healthcare, although "trust" is not synonymous with decision and adoption, trusting an ML model is essential for its adoption. Many clinicians and health researchers feel uncomfortable with black box ML models, even if they achieve high degrees of diagnostic or prognostic accuracy. Therefore more and more research is being conducted on the functioning of these models. Our study focuses on the Random Forest (RF) model. It is one of the most performing and used methodologies in the context of ML approaches, in all fields of research from hard sciences to humanities. In the health context and in the evaluation of health policies, their use is limited by the impossibility of obtaining an interpretation of the causal links between predictors and response. This explains why we need to develop new techniques, tools, and approaches for reconstructing the causal relationships and interactions between predictors and response used in a RF model. Our research aims to perform a machine learning experiment on several medical datasets through a comparison between two methodologies, which are inTrees and NodeHarvest. They are the main approaches in the rules extraction framework. The contribution of our study is to identify, among the approaches to rule extraction, the best proposal for suggesting the appropriate choice to decision-makers in the health domain.

Keywords:
Random Forest,
Model Interpretation,
Health domain,
Rule Extraction,

+ Show More

Massimo Aria

University of Naples Federico II, Italy - ORCID: 0000-0002-8517-9411

Corrado Cuccurullo

University of Campania Luigi Vanvitelli, Italy - ORCID: 0000-0002-7401-8575

Agostino Gnasso

University of Naples Federico II, Italy - ORCID: 0000-0002-9220-9754

Adadi, A. and Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6.
Ahmad, M. A., Eckert, C., and Teredesai, A. (2018). Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp. 559–560.
Akosa, J. (2017). Predictive accuracy: A misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum, pp. 2–5.
Aria, M., Cuccurullo, C., and Gnasso, A. (2021). A comparison among interpretative proposals for random forests. Machine Learning with Applications.
Aria, M., D’Ambrosio, A., Iorio, C., Siciliano, R., and Cozza, V. (2020). Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images. Statistical papers, 61(4).
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2):pp. 123–140.
Breiman, L. (2001). Random forests. Machine learning, 45(1):pp. 5–32.
Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):pp. 199–231.
D’Ambrosio, A., Aria, M., and Siciliano, R. (2012). Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. Journal of classification, 29(2):pp. 227–258.
Deng, H. (2019). Interpreting tree ensembles with intrees. International Journal of Data Science and Analytics, 7(4):pp. 277–287.
Dhillon, A. and Singh, A. (2019). Machine learning in healthcare data analysis: a survey. Journal of Biology and Today’s World, 8(6):pp. 1–10.
Díaz-Uriarte, R. and De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1):pp. 3.
Domingos, P. (1998). Occam’s two razors: the sharp and the blunt. In KDD, pp. 37–43.
Domingos, P. (1999). The role of occam’s razor in knowledge discovery. Data mining and knowledge discovery, 3(4):pp. 409–425.
Du, M., Liu, N., and Hu, X. (2019). Techniques for interpretable machine learning. Communications of the ACM, 63(1):pp. 68–77.
García, V., Mollineda, R. A., and S´anchez, J. S. (2009). Index of balanced accuracy: A performance measure for skewed class distributions. In Iberian conference on pattern recognition and image analysis, pp. 441–448. Springer.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):pp. 1–42.
Haddouchi, M. and Berrado, A. (2019). A survey of methods and tools used for interpreting random forest. In 2019 1st International Conference on Smart Systems and Data Science (ICSSD), pp. 1–6. IEEE.
Meinshausen, N. (2010). Node harvest. The Annals of Applied Statistics, pp. 2049–2072.
Miotto, R., Wang, F., Wang, S., Jiang, X., and Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 19(6).
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144.
Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006). Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence, pp. 1015–1021. Springer.

PDF

Publication Year: 2021
Pages: 179-184

Content License: CC BY 4.0
© 2021 Author(s)

Download PDF

XML

Publication Year: 2021

Content License: CC BY 4.0
© 2021 Author(s)

Download XML

Chapter Information

Chapter Title

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

Authors

Massimo Aria, Corrado Cuccurullo, Agostino Gnasso

Language

English

DOI

10.36253/978-88-5518-461-8.34

Peer Reviewed

Publication Year

2021

Content License

CC BY 4.0

Metadata License

CC0 1.0

Bibliographic Information

Book Title

ASA 2021 Statistics and Information Systems for Policy Evaluation

Book Subtitle

BOOK OF SHORT PAPERS of the on-site conference

Editors

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Peer Reviewed

Publication Year

2021

Content License

CC BY 4.0

Metadata License

CC0 1.0

Publisher Name

Firenze University Press

DOI

10.36253/978-88-5518-461-8

eISBN (pdf)

978-88-5518-461-8

eISBN (xml)

978-88-5518-462-5

Series Title

Proceedings e report

Series ISSN

2704-601X

Series E-ISSN

2704-5846

298

Fulltext
downloads

405

Views

Export Citation

1,690

Books in the Catalogue

1,388

Open Access Books

in the Catalogue

2,597

Book Chapters

4,205,799

Fulltext
downloads

4,979

Authors

from 1067 Research Institutions

of 66 Nations

70

scientific boards

from 375 Research Institutions

of 43 Nations

1,304

Referees

from 397 Research Institutions

of 38 Nations

Catalogue

Scientific Cloud

Best Practice

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

Chapter Information

Bibliographic Information

Fulltextdownloads

Views

Export Citation

1,690

Books in the Catalogue

1,388

Open Access Books

2,597

Book Chapters

4,205,799

Fulltextdownloads

4,979

Authors

70

scientific boards

1,304

Referees

Fulltext
downloads

Fulltext
downloads