Contained in:
Book Chapter

Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application

  • Yuri Calleo
  • Simone Di Zio

In the context of Futures Studies, the scenario development process permits to make assumptions on what the futures can be in order to support better today decisions. In the initial stages of the scenario building (Framing and Scanning phases), the process requires much time and efforts to scanning data and information (reading of documents, literature review and consultation of experts) to understand more about the object of the foresight study. The daily use of social networks causes an exponential increase of data and for this reason here we deal with the problem of speeding up and optimizing the Scanning phase by applying a new combined method based on the analysis of tweets with the use of unsupervised classification models, text-mining and spatial data mining techniques. For the purpose of having a qualitative overview, we applied the bag-of-words model and a Sentiment Analysis with the Afinn and Vader algorithms. Then, in order to extrapolate the influence factors, and the relevant key factors (Kayser and Blind, 2017; 2020) the Latent Dirichlet Allocation (LDA) was used (Tong and Zhang, 2016). Furthermore, to acquire also spatial information we used spatial data mining technique to extract georeferenced data from which it was possible to analyse and obtain a geographic analysis of the data. To showcase our method, we provide an example using Covid-19 tweets (Uhl and Schiebel, 2017), upon which 5 topics and 6 key factors have been extracted. In the last instance, for each influence factor, a cartogram was created through the relative frequencies in order to have a spatial distribution of the users discussing each particular topic. The results fully answer the research objectives and the model used could be a new approach that can offer benefits in the scenario developments process.

  • Keywords:
  • text-mining,
  • spatial analysis,
  • scenario development,
  • georeferenced textual data,
  • covid-19,
+ Show More

Yuri Calleo

University of Chieti-Pescara G. D'Annunzio, Italy - ORCID: 0000-0002-0190-6061

Simone Di Zio

University of Chieti-Pescara G. D'Annunzio, Italy - ORCID: 0000-0002-9139-1451

  1. Atenstaedt, R. (2012). Word cloud analysis of the BJGP. British Journal of General Practice, 62(596), pp. 148-148.
  2. Bishop P., Hines A., Collins T. (2007). The current state of scenario development: An overview of techniques, Foresight, 9(1), pp. 5–25.
  3. Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis lectures on human language technologies, 10(1), pp. 1-309.
  4. Haining, R.P. (2010). The nature of georeferenced data. Handbook of applied spatial analysis. Springer, Berlin, Heidelberg, pp. 197-217.
  5. Hines A., Bishop P., (2015). Thinking about the Future: Guidelines for Strategic Foresight, 2nd Edition, Hinesight Edition, Huston (TX).
  6. Huang, F., Zhang, X., Zhao, Z., Xu, J., & Li, Z. (2019). Image–text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems, 167, pp. 26-37.
  7. Hutto, C., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1).
  8. Kayser, V., & Blind, K. (2017). Extending the knowledge base of foresight: The contribution of text mining. Technological Forecasting and Social Change, 116, pp. 208-215.
  9. Kayser, V., & Shala, E. (2020). Scenario development using web mining for outlining technology futures. Technological Forecasting and Social Change, 156, 120086.
  10. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), pp. 1-167.
  11. Mayor, E., & Bietti, L. M. (2021). Twitter, time and emotions. Royal Society open science, 8(5), 201900.
  12. Minka, T. (2000). Estimating a Dirichlet Distribution. MIT Technical Report, Cambridge, (US).
  13. Narasamma, V. L., Sreedevi, M., & Kumar, G. V. (2021). Tweet Data Analysis on COVID-19 Outbreak. Smart Technologies in Data Science and Communication, Springer, pp. 183-193.
  14. Pang, B., & Lee, L. (2008). Using very simple statistics for review search: An exploration. In Coling 2008. Companion volume: Posters, pp. 75-78.
  15. Poria, S., Cambria, E., Winterstein, G., & Huang, G. B. (2014). Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems, 69, pp. 45-63.
  16. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), pp. 267-307.
  17. Tan, M. J., & Guan, C. (2021). Are people happier in locations of high property value? Spatial temporal analytics of activity frequency, public sentiment and housing price using twitter data. Applied Geography, 132, 102474.
  18. Tong, Z. and Zhang, H., (2016). May. A text mining research based on LDA topic modelling. In International Conference on Computer Science, Engineering and Information Technology, pp. 201-210.
  19. Uhl, A., Kolleck, N. and Schiebel, E., (2017). Twitter data analysis as contribution to strategic foresight-The case of the EU Research Project “Foresight and Modelling for European Health Policy and Regulations” (FRESHER). European Journal of Futures Research, 5(1), pp.1-16.
  20. Wang, X., & Grimson, E. (2007). Spatial Latent Dirichlet Allocation. NIPS, 20, pp. 1577-1584.
PDF
  • Publication Year: 2021
  • Pages: 173-178
  • Content License: CC BY 4.0
  • © 2021 Author(s)

XML
  • Publication Year: 2021
  • Content License: CC BY 4.0
  • © 2021 Author(s)

Chapter Information

Chapter Title

Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application

Authors

Yuri Calleo, Simone Di Zio

Language

English

DOI

10.36253/978-88-5518-461-8.33

Peer Reviewed

Publication Year

2021

Copyright Information

© 2021 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Bibliographic Information

Book Title

ASA 2021 Statistics and Information Systems for Policy Evaluation

Book Subtitle

BOOK OF SHORT PAPERS of the on-site conference

Editors

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Peer Reviewed

Publication Year

2021

Copyright Information

© 2021 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Publisher Name

Firenze University Press

DOI

10.36253/978-88-5518-461-8

eISBN (pdf)

978-88-5518-461-8

eISBN (xml)

978-88-5518-462-5

Series Title

Proceedings e report

Series ISSN

2704-601X

Series E-ISSN

2704-5846

328

Fulltext
downloads

500

Views

Export Citation

1,361

Open Access Books

in the Catalogue

2,368

Book Chapters

3,870,371

Fulltext
downloads

4,536

Authors

from 943 Research Institutions

of 66 Nations

67

scientific boards

from 357 Research Institutions

of 43 Nations

1,249

Referees

from 381 Research Institutions

of 38 Nations