Contained in:
Book Chapter

Challenges in archiving the personalized web

  • Erwan Le Merrer
  • Camilla Penzo
  • Gilles Tredan
  • Lucas Verney

The decision-making algorithms embedded within online platforms are determining content shown to users. This personalization steers the dissemination of information, in contrast with the idea of a universal World Wide Web. Personalization thus generates a combinatorial explosion of different versions of the web, rendering each user’s experience distinct. This raises critical questions: what elements of a personalized web should be archived? How can the collected user journeys capture a representative picture of our times? Navigating personalization is essential to capture the contemporary web experience, yet it presents methodological and technical challenges. In this chapter, we identify key challenges in performing a representative sampling of personalization within online platforms.

  • Keywords:
  • personalization,
  • archival,
  • YouTube,
  • 2022 French presidential election,
+ Show More

Erwan Le Merrer

CNRS, France - ORCID: 0000-0001-8344-2135

Camilla Penzo

PEReN, France

Gilles Tredan

CNRS, France - ORCID: 0000-0003-4473-4332

Lucas Verney

PEReN, France - ORCID: 0000-0002-1361-1703

  1. Azcoitia, Santiago Andrés, and Nikolaos Laoutaris. 2022. “A Survey of Data Marketplaces and Their Business Models.” DOI: 10.48550/arXiv.2201.04561
  2. Bandy, Jack, and Nicholas Diakopoulos. 2021. “Curating Quality? How Twitter’s Timeline Algorithm Treats Different Types of News.” Social Media + Society 7 (3). DOI: 10.1177/2056305121104164
  3. Cloudfare. 2023. “What is rate limiting? | Rate limiting and bots.” <*/>
  4. Covington, Paul, Jay Adams, and Emre Sargin. 2016. “Deep Neural Networks for Youtube Recommendations.” In Proceedings of the 10th Acm Conference on Recommender Systems, 191–98. DOI: 10.1145/2959100.2959190
  5. Cresci, Stefano. 2020. “A Decade of Social Bot Detection.” Commun. ACM 63 (10): 72–83. DOI: 10.1145/3409116
  6. Eg, Ragnhild, Özlem Demirkol Tønnesen, and Merete Kolberg Tennfjord. 2023. “A Scoping Review of Personalized User Experiences on Social Media: The Interplay Between Algorithms and Human Factors.” Computers in Human Behavior Reports 9: 100253. DOI: 10.1016/j.chbr.2022.100253
  7. Eslami, Motahhare, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen, Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. ““I Always Assumed That I Wasn’t Really That Close to [Her]“: Reasoning About Invisible Algorithms in News Feeds.” In Proceedings of the 33rd Annual Acm Conference on Human Factors in Computing Systems, 153–62. CHI ’15. New York, NY, USA: Association for Computing Machinery. DOI: 10.1145/2702123.2702556
  8. NOYB European Center for Digital Rights. 2023. “How Mobile Apps Illigally Share Your Personal Data.” <*/>
  9. Fang, Minghong, Neil Zhenqiang Gong, and Jia Liu. 2020. “Influence Function Based Data Poisoning Attacks to Top-N Recommender Systems.” In Proceedings of the Web Conference 2020, 3019–25. DOI: 10.1145/3366423.3380072
  10. Farseev, Aleksandr, Qi Yang, Andrey Filchenkov, Kirill Lepikhin, Yu-Yi Chu-Farseeva, and Daron-Benjamin Loo. 2020. “ Personality-Driven Content Generation Platform.” arXiv E-Prints, November, arXiv:2011.14615. DOI: 10.48550/arXiv.2011.14615
  11. Gupta, Udit, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, et al. 2020. “The Architectural Implications of Facebook’s Dnn-Based Personalized Recommendation.” In 2020 Ieee International Symposium on High Performance Computer Architecture (Hpca), 488–501. IEEE. DOI: 10.1109/HPCA47549.2020.00047
  12. Gustarini, Mattia, Marcello Paolo Scipioni, Marios Fanourakis, and Katarzyna Wac. 2016. “Differences in Smartphone Usage: Validating, Evaluating, and Predicting Mobile User Intimacy.” Pervasive and Mobile Computing 33: 50–72. DOI: 10.1016/j.pmcj.2016.06.003
  13. Hosseinmardi, Homa, Amir Ghasemian, Aaron Clauset, Markus Mobius, David M Rothschild, and Duncan J Watts. 2021. “Examining the Consumption of Radical Content on Youtube.” Proceedings of the National Academy of Sciences 118 (32): e2101967118. DOI: 10.1073/pnas.2101967118
  14. Insider, Business. 2019. “The Cambridge Analytica Whistleblower Explains How the Firm Used Facebook Data to Sway Elections.” <*/>
  15. Kelly, Mat, Justin F Brunelle, Michele C Weigle, and Michael L Nelson. 2013. “A Method for Identifying Personalized Representations in Web Archives.” D-Lib Magazine 19 (11-12). DOI: 10.1045/november2013-kelly
  16. Kiesel, Johannes, Arjen P de Vries, Matthias Hagen, Benno Stein, and Martin Potthast. 2018. “WASP: Web Archiving and Search Personalized.” <>
  17. Ledwich, Mark, and Anna Zaitsev. 2020. “Algorithmic Extremism: Examining Youtube’s Rabbit Hole of Radicalization.” First Monday. DOI: 10.5210/fm.v25i3.10419
  18. Le Merrer, Erwan, Ronan Pons, and Gilles Tredan. 2023. “Algorithmic Audits of Algorithms, and the Law.” AI and Ethics, 1–11. DOI: 10.1007/s43681-023-00343-z
  19. Le Merrer, Erwan, and Gilles Tredan. 2018. “The Topological Face of Recommendation.” In Complex Networks & Their Applications Vi: Proceedings of Complex Networks 2017 (the Sixth International Conference on Complex Networks and Their Applications), 897–908. Springer.
  20. Le Merrer, Erwan, Gilles Tredan, and Ali Yesilkanat. 2023. “Modeling Rabbit-Holes on Youtube.” Social Network Analysis and Mining 13 (1): 100.
  21. Milligan, Ian, Nick Ruest, and Jimmy Lin. 2016. “Content Selection and Curation for Web Archiving: The Gatekeepers Vs. The Masses.” In Proceedings of the 16th Acm/Ieee-Cs on Joint Conference on Digital Libraries, 107–10. DOI: 10.1145/2910896.2910913
  22. Mozilla. 2020. “Political Advertisements from Facebook.” <*/>
  23. Ohme, Jakob, and Theo Araujo. 2022. “Digital Data Donations: A Quest for Best Practices.” Patterns 3 (4). DOI: 10.1016/j.patter.2022.100467
  24. Pariser, Eli. 2012. The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think. Penguin Books.
  25. Powers, Elia. 2017. “My News Feed Is Filtered?” Digital Journalism 5 (10): 1315–35. DOI: 10.1080/21670811.2017.1286943
  26. Exodus Privacy. “Exodus Privacy Analyzes Privacy Concerns in Android Applications.” <*/http://>
  27. ProPublica. 2017. “Political Advertisements from Facebook.” <*/>
  28. Rastegarpanah, Bashir, Krishna Gummadi, and Mark Crovella. 2021. “Auditing Black-Box Prediction Models for Data Minimization Compliance.” Advances in Neural Information Processing Systems 34: 20621–32. <>
  29. Digital Services Act . 2022. Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market for Digital Services and Amending Directive 2000/31/EC (Text with EEA Relevance). OJ L. <*/>
  30. Salganik, Matthew J., and Duncan J. Watts. 2008. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychology Quarterly 71 (4): 338–55. DOI: 10.1177/0190272508071004
  31. Schafer, Valérie, Gérôme Truc, Romain Badouard, Lucien Castex, and Francesca Musiani. 2019. “Paris and Nice Terrorist Attacks: Exploring Twitter and Web Archives.” Media, War & Conflict 12 (2): 153–70. DOI: 10.1177/1750635219839382
  32. Schmidt, Jan-Hinrik, Lisa Merten, Uwe Hasebrink, Isabelle Petrich, and Amelie Rolfs. 2019. “How Do Intermediaries Shape News-Related Media Repertoires and Practices? Findings from a Qualitative Study.” International Journal of Communication 13 (0). <*/>
  33. Siano, Alfonso, Agostino Vollero, Francesca Conte, and Sara Amabile. 2017. “‘More Than Words’: Expanding the Taxonomy of Greenwashing After the Volkswagen Scandal.” Journal of Business Research 71: 27–37. DOI: 10.1016/j.jbusres.2016.11.002
  34. “Teens, Social Media and Technology”. 2023, Pew Research Center. <*/>
  35. “The Christchurch Call to Action to Eliminate Terrorist and Violent Extremist Content Online.” n.d. <*/>
  36. Xu, Runhua, Remo Manuel Frey, Elgar Fleisch, and Alexander Ilic. 2016. “Understanding the Impact of Personality Traits on Mobile App Adoption – Insights from a Large-Scale Field Study.” Computers in Human Behavior 62: 244–56. DOI: 10.1016/j.chb.2016.04.011
  37. Zhao, Sha, Shijian Li, Julian Ramos, Zhiling Luo, Ziwen Jiang, Anind K. Dey, and Gang Pan. 2019. “User Profiling from Their Use of Smartphone Applications: A Survey.” Pervasive and Mobile Computing 59: 101052. DOI: 10.1016/j.pmcj.2019.101052
  • Publication Year: 2024
  • Content License: CC BY 4.0
  • © 2024 Author(s)

  • Publication Year: 2024
  • Content License: CC BY 4.0
  • © 2024 Author(s)

Chapter Information

Chapter Title

Challenges in archiving the personalized web


Erwan Le Merrer, Camilla Penzo, Gilles Tredan, Lucas Verney





Peer Reviewed

Publication Year


Copyright Information

© 2024 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Bibliographic Information

Book Title

Exploring the Archived Web during a Highly Transformative Age

Book Subtitle

Proceedings of the 5th international RESAW conference, Marseille, June 2023


Sophie Gebeil, Jean-Christophe Peyssard

Peer Reviewed

Number of Pages


Publication Year


Copyright Information

© 2024 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Publisher Name

Firenze University Press



ISBN Print


eISBN (pdf)


eISBN (xml)


Series Title

Proceedings e report

Series ISSN


Series E-ISSN






Export Citation


Open Access Books

in the Catalogue


Book Chapters





from 904 Research Institutions

of 65 Nations


scientific boards

from 347 Research Institutions

of 43 Nations



from 379 Research Institutions

of 38 Nations