ASA 2021 Statistics and Information Systems for Policy Evaluation
Edited by Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Book Chapter

Post-stratification as a tool for enhancing the predictive power of classification methods

Francesco D. d'Ovidio
Angela Maria D'Uggento
Rossana Mancarella
Ernesto Toma

download PDF

CC BY 4.0
DOI: 10.36253/978-88-5518-461-8.24

It is well known that, in classification problems, the predictive capacity of any decision-making model decreases rapidly with increasing asymmetry of the target variable (Sonquist et al., 1973; Fielding 1977). In particular, in segmentation analysis with a categorical target variable, very poor improvements of purity are obtained when the least represented modality counts less than 1/4 of the cases of the most represented modality. The same problem arises with other (theoretically more exhaustive) techniques such as Artificial Neural Networks. Actually, the optimal situation for classification analyses is the maximum uncertainty, that is, equidistribution of the target variable. Some classification techniques are more robust, by using, for example, the less sensitive logit transformation of the target variable (Fabbris & Martini 2002); however, also the logit transformation is strongly affected by the distributive asymmetry of the target variable. In this paper, starting from the results of a direct survey in which the target (binary) variable was extremely asymmetrical (10% vs. 90%, or greater asymmetry), we noted that also the logit model with the most significant parameters had very reduced fitting measures and almost zero predictive power. To solve this predictive issue, we tested post-stratification techniques, artificially symmetrizing a training sample. In this way, a substantially increase of fitting and predictive capacity was achieved, both in the symmetrized sample and, above all, in the original sample. In conclusion of the paper, an application of the same technique to a dataset of very different nature and size is described, demonstrating that the method is stable even in the case of analysis executed with all data of a population.

Keywords:
Classification,
Asymmetry,
Post-stratification,
Predictive power,

+ Show More

Francesco D. d'Ovidio

University of Bari Aldo Moro, Italy - ORCID: 0000-0003-1641-039X

Angela Maria D'Uggento

University of Bari Aldo Moro, Italy - ORCID: 0000-0001-9768-651X

Rossana Mancarella

ARTI, Agency for Technology and Innovation of Apulia, Italy - ORCID: 0000-0001-8179-4970

Ernesto Toma

University of Bari Aldo Moro, Italy - ORCID: 0000-0002-4817-7169

d'Ovidio, F.D., Mancarella, R., Toma, E. (2016). Multivariate data analysis techniques for healthcare organizational efficiency improvement. In: Proceedings of 5th International Conference “From Challenges to Opportunities: Development of Transition Countries in the Globalization Era” (Elbasan, AL, December 17). Shpresa Print, Elbasan, AL: 24-39. Book Chapter or Paper in Conference Proceedings.
Fabbris, L. (1997). Statistica multivariata. Analisi esplorativa dei dati, McGraw-Hill, Milano. Book.
Fabbris, L., Martini, M.C. (2002). Analisi di segmentazione con una variabile dipendente trasformata in logit. In: Carli Sardi L., Delvecchio F. (eds), Indicatori e metodi per l'analisi dei percorsi universitari e post-universitari, CLEUP, Padova. Book Chapter or Paper in Conference Proceedings.
Fielding, A. (1977). Binary segmentation: the Automatic Interaction Detector and related techniques for exploring data structure. In: O’Muircheartaigh C.A., Payne C. (eds) The Analysis of Survey Data. Volume 1; Exploring Data Structures, Wiley, London: 221-257.Book Chapter or Paper in Conference Proceedings.
Ganganwar, V. (2012). An overview of classification algorithms for imbalanced. International Journal of Emerging Technology and Advanced Engineering, Vol. 2 (4): 42-47. Journal Article.
Sonquist, J.A., Baker, E.L., Morgan, J.N. (1973). Searching for Structure, Institute for Social Research, Ann Arbor, Michigan. Book.

PDF

Publication Year: 2021
Pages: 125-130

Content License: CC BY 4.0
© 2021 Author(s)

Download PDF

XML

Publication Year: 2021

Content License: CC BY 4.0
© 2021 Author(s)

Download XML

Chapter Information

Chapter Title

Post-stratification as a tool for enhancing the predictive power of classification methods

Authors

Francesco D. d'Ovidio, Angela Maria D'Uggento, Rossana Mancarella, Ernesto Toma

Language

English

DOI

10.36253/978-88-5518-461-8.24

Peer Reviewed

Publication Year

2021

Content License

CC BY 4.0

Metadata License

CC0 1.0

Bibliographic Information

Book Title

ASA 2021 Statistics and Information Systems for Policy Evaluation

Book Subtitle

BOOK OF SHORT PAPERS of the on-site conference

Editors

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Peer Reviewed

Publication Year

2021

Content License

CC BY 4.0

Metadata License

CC0 1.0

Publisher Name

Firenze University Press

DOI

10.36253/978-88-5518-461-8

eISBN (pdf)

978-88-5518-461-8

eISBN (xml)

978-88-5518-462-5

Series Title

Proceedings e report

Series ISSN

2704-601X

Series E-ISSN

2704-5846

226

Fulltext
downloads

285

Views

Export Citation

1,695

Books in the Catalogue

1,395

Open Access Books

in the Catalogue

2,623

Book Chapters

4,315,268

Fulltext
downloads

5,009

Authors

from 1073 Research Institutions

of 66 Nations

69

scientific boards

from 381 Research Institutions

of 43 Nations

1,307

Referees

from 399 Research Institutions

of 38 Nations

Catalogue

Scientific Cloud

Best Practice

Post-stratification as a tool for enhancing the predictive power of classification methods

Chapter Information

Bibliographic Information

Fulltextdownloads

Views

Export Citation

1,695

Books in the Catalogue

1,395

Open Access Books

2,623

Book Chapters

4,315,268

Fulltextdownloads

5,009

Authors

69

scientific boards

1,307

Referees

Fulltext
downloads

Fulltext
downloads