Lib2Life – Digital Library Services Empowered with Advanced Natural Language Processing Techniques.

Melania Nitu, Mihai Dascalu, Maria Dorinela Dascalu, Laurentiu-Marian Neagu, Maria-Iuliana Dascalu
pp.  147 – 167, download
(https://doi.org/10.55612/s-5002-060-006)

Abstract

Educational institutions are struggling to keep up with the accelerated technological advancements; hence, sustainable and supportive tools have become essential to reshape traditional models into intelligent learning systems. This paper introduces Lib2Life, a digital library that uses advanced Natural Language Processing techniques to facilitate the digital transformation of historical documents provided by Central University Libraries in Romania. The platform enables Central University Libraries in Romania to preserve the cultural heritage of historically valuable documents, facilitating open-source access to old, printed materials such as books, manuscripts, newspapers, or literary magazines no longer protected by copyright. Lib2Life offers comprehensive functionalities, allowing librarians to benefit from automated text processing and indexing workflows that facilitate digitization, ensuring a consistent representation of original documents. For readers, the platform presents a user-friendly interface with semantic search capabilities and a recommendation engine. The system employs an ontology to organize and manage documents in a unified and structured way, contributing to the evolution of intelligent education technologies. The innovative contributions of Lib2Lifeinclude identifying new solutions for cultural heritage preservation, promoting patrimony through modern methodologies, increasing access to documentary resources, enhancing library services, and fostering the transfer of knowledge and technology to society

Keywords: Digital Library Service, Natural Language Processing, Semantic Search, Ontology Representation, Domain Categorization, Cultural Heritage Preservation.

References

1. Dascalu, M., Sandric, B., Neagu, L.-M., Toma, I., Hanganu, L., Chisu, L., Trausan-Matu, S., Simion, E., Tomescu, S., Mitocaru, I., Gutu-Robu, G., Nitu, M., Cristea, A., Dinu, A., Dinu, L.P., Georgescu, S., Uban, A., Antal, E., Bota, C., Ciongradi, E., D’Annibale, E., Demetrescu, C., Dima, B., Fanini, D., Ferdani, Streinu, M., Borlean, O., Buruiana, M., Iancu, L.M., Andrei, V., Miu, C., Tufaru, M., Ghemut, F., Matei, D., Chelaru, R.-D., Paiusan-Nuica, C.: Heritage in the digital era. Cases and best practices from Romania. Pro Universitaria, Bucharest (2021)
2. Mitocaru, I., Gutu-Robu, G., Nitu, M., Dascalu, M., Trausan-Matu, S., Tomescu, S. and Florescu, G.: The Lib2Life Platform – Processing, Indexing and Semantic Search for Old Romanian Documents. Int. Conference on Human Computer Interaction (RoCHI) 11-18 (2020)
3. Nitu, M., Dascalu, M., Dascalu, M.-I., Cotet, T.-M., Tomescu, S.: Reconstructing Scanned Documents for Full-Text Indexing to Empower Digital Library Services. Emerging Technologies for Education: 4th International Symposium, SETE 2019, Held in Conjunction with ICWL 2019, pp. 183-190. Springer-Verlag, Magdeburg, Germany (2019)
4. Nitu, M., Dascalu, M., Gutu-Robu, G., Dascalu, M.-I., Tomescu, S.: Lib2Life – Domain Categorization of Books using BERT Language Models and Knowledge Graph Population. Romanian Conference on Human-Computer Interaction (2021)
5. Nitu, M., Ruseti, S., Dascalu, M., Tomescu, S.: Semantic Recommendations of Books Using Recurrent Neural Networks. 235-243 (2021)
6. Gutu-Robu, G., Ruseti, S.,Tomescu, S., Dascalu, M., Trausan-Matu, S.: Designing an Ontology for Knowledge-based Processing in Romanian University Libraries. 8th Int. Workshop on Semantic and Collaborative Technologies for the Web, in conjunction with the 16th Int. Conf. on eLearning and Software for Education (eLSE) 1, 119-126 (2020)
7. Tomescu, S., Mitocaru, I., Gutu-Robu, G., Nitu, M., Dascalu, M., Trausan-Matu, S.: Advanced Natural Language Processing Techniques for Restoring Old Romanian Documents. In: Dascalu, M., Sandric, B. (eds.) Heritage in the digital era. Cases and best practices from Romania., pp. 25-41. Pro Universitaria, Bucharest, Romania (2021)
8. Lebert, M.: Project Gutenberg (1971-2008), University of Toronto (2010)
9. Streitfeld, D.: The Dream Was Universal Access to Knowledge. The Result Was a Fiasco. The New York Times, (2023)
10. Alewaeters, G.: VUBIS: A User-Friendly Online System. Information Technology and Libraries 1, 206-221 (1982)
11. Smith, M., Barton, M., Bass, M., Branschofsky, M., McClellan, G., Stuve, D., Tansley, R., Walker, J.H.: DSpace: An Open Source Dynamic Digital Repository. D-Lib Magazine 9, (2003)
12. Rana, C., Jain, S.K.: Building a book recommender system using time based content filtering. WSEAS Transactions on Computers 11, 27-33 (2012)
13. Masala, M., Ruseti, S. and Dascalu, M.: RoBERT-A Romanian BERT Model. COLING 6626-6637 (2020)
14. Lin, T., Wang, Y., Liu, X., Qiu, X.: A Survey of Transformers. AI Open 3, 111-132 (2021)
15. Turner, R.E.: An Introduction to Transformers. ArXiv abs/2304.10557, (2023)
16. Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., Wen, J.-r.: A Survey of Large Language Models. ArXiv abs/2303.18223, (2023)
17. Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Barnes, N., Mian, A.S.: A Comprehensive Overview of Large Language Models. ArXiv abs/2307.06435, (2023)
18. Delvin, J., Chang, M.-W,, Lee, K. and Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1, 4171–4186 (2019)
19. Aleris: RoTex Corpus Builder. https://github.com/aleris/ReadME-RoTex-Corpus-Builder, last accessed 07/17/2023
20. Javier Ortiz Suarez, P., Sagot, B., Romary, L. and Sagot, B.B.: Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures. 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7) (2019)
21. Loshchilov, I., and Hutter, F.: Decoupled Weight Decay Regularization. ICLR (2017)
22. Muennighoff, N., Wang, T., Sutawika, L., Roberts, A., Biderman, S.R., Scao, T.L., Bari, M., Shen, S., Yong, Z., Schoelkopf, H., Tang, X., Radev, D.R., Aji, A., Almubarak, K., Albanie, S., Alyafeai, Z., Webson, A., Raff, E., Raffel, C.: Crosslingual Generalization through Multitask Finetuning. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics 1, 15991-16111 (2023)
23. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 1-67 (2020)
24. Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 483-498 (2021)
25. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. A New Form of Web Content That Is Meaningful to Computers Will Unleash a Revolution of New Possibilities. Scientific American 284, 1-5 (2001)
26. Fernández-López, M., Gómez-Pérez, A., Juristo, N.: Methontology: from ontological art towards ontological engineering. (1997)
27. Weibel, S., Kunze, J., Lagoze, C., and Wolf, M.: Dublin Core Metadata for Resource Discovery. Internet Engineering Task Force RFC (1998)
28. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. 30-43 (2006)

back to Table of Contents