Natural Language Processing Tools for Romanian – Going Beyond a Low-Resource Language.

Melania Nitu, Mihai Dascalu
pp. 7 – 26, download
(https://doi.org/10.55612/s-5002-060-001sp)

Abstract

Advances in Natural Language Processing bring innovative instruments to the educational field to improve the quality of the didactic process by addressing challenges like language barriers and creating personalized learning experiences. Most research in the domain is dedicated to high-resource languages, such as English, while languages with limited coverage, like Romanian, are still underrepresented in the field. Operating on low-resource languages is essential to ensure equitable access to educational opportunities and to preserve linguistic diversity. Through continuous investments in developing Romanian educational instruments, we are rapidly going beyond a low-resource language. This paper presents recent educational instruments and frameworks dedicated to Romanian, leveraging state-of-the-art NLP techniques, such as building advanced Romanian language models and benchmarks encompassing tools for language learning, text comprehension, question answering, automatic essay scoring, and information retrieval. The methods and insights gained are transferable to other low-resource languages, emphasizing methodological adaptability, collaborative frameworks, and technology transfer to address similar challenges in diverse linguistic contexts. Two use cases are presented, focusing on assessing student performance in Moodle courses and extracting main ideas from students’ feedback. These practical applications in Romanian academic settings serve as examples for enhancing educational practices in other less-resourced languages

Keywords: Natural Language Processing, Educational Frameworks, Romanian Language Models, Transformer Architecture.

References

1. Meurers, D.: Natural Language Processing and Language Learning. Encyclopedia of applied linguistics, 4193-4205 (2012)
https://doi.org/10.1002/9781405198431.wbeal0858
2. Nadkarni, P., Ohno-Machado, L., Chapman, W.: Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18, 544-551 (2011)
https://doi.org/10.1136/amiajnl-2011-000464
3. Yarlett, D.G., Ramscar, M.J.A.: Language Learning Through Similarity-Based Generalization. (2008)
4. Gu, P.Y.: Vocabulary learning in a second language: Person, task, context and strategies. TESL-EJ, 7(2), 1-25 (2003)
5. Florea, A.-M., Dascalu, M., Sirbu, M.-D., Trausan-Matu, S.: Improving Writing for Romanian Language. 4th Int. Conf. on Smart Learning Ecosystems and Regional Development (SLERD 2019), 131-141 (2019) https://doi.org/10.1007/978-981-13-9652-6_12
6. Lemaire, B., Mandin, S., Dessus, P., Denhière, G.: Computational cognitive models of summarization assessment skills. In: 27th Annual Conference of the Cognitive Science Society (CogSci’ 2005). Erlbaum, Mahwah, NJ (2005)
7. Joshi, M., Rosé, C.P.: Using Transactivity in Conversation Summarization in Educational Dialog. In: SLaTE Workshop on Speech and Language Technology in Education, Farmington, Pennsylvania, USA (2007) https://doi.org/10.21437/SLaTE.2007-12
8. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: ACL Workshop on Intelligent Scalable Text Summarization (ISTS’97), pp. 10-17. ACL, Madrid, Spain (1997)
9. Duclaye, F., Yvon, F., Collin, O.: Learning paraphrases to improve a question-answering system. In: Proceedings of the EACL Workshop on Natural Language Processing for Question Answering Systems, pp. 35-41 (2003)
10. Oprescu, B., Dascalu, M., Trausan-Matu, S., Dessus, P., Bianco, M.: Automated Assessment of Paraphrases in Pupil’s Self-Explanations. University Politehnica of Bucharest Scientific Bulletin Series C-Electrical Engineering and Computer Science, 76(1), 31-44 (2014)
11. Botarleanu, R., Dascalu, M., Sirbu, M.D., Crossley, S.A., Trausan-Matu, S.: Automated Text Simplification through Paraphrasing using Sequence-to-Sequence Models. In: 20th Int. Conf. on Artificial Intelligence in Education (AIED 2019). Springer, Chicago, IL (submitted)
12. Ruseti, S.: Advanced Natural Language Processing Techniques for Question Answering and Writing Evaluation. PhD Thesis, (2019)
13. Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 41-47. Association for Computational Linguistics (2002)
https://doi.org/10.3115/1073083.1073092
14. Westera, W., Dascalu, M., Kurvers, H., Ruseti, S., Trausan-Matu, S.: Automated Essay Scoring in Applied Games: Reducing the Teacher Bandwidth Problem in Online Training. Computers & Education, 123, 212-224 (2018) https://doi.org/10.1016/j.compedu.2018.05.010
15. Dascalu, M., Westera, W., Ruseti, S., Trausan-Matu, S., Kurvers, H.: ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch. In: 18th Int. Conf. on Artificial Intelligence in Education (AIED 2017), pp. 52-63. Springer, Wuhan, China (2017)
https://doi.org/10.1007/978-3-319-61425-0_5
16. McNamara, D.S., Crossley, S.A., Roscoe, R., Allen, L.K., Dai, J.: A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35-59 (2015)
https://doi.org/10.1016/j.asw.2014.09.002
17. Kinshuk: Developing adaptive and personalized learning environments. NY: Routledge, New York (2016) https://doi.org/10.4324/9781315795492
18. Botarleanu, R.-M., Dascalu, M., Sirbu, M.-D., Crossley, S.A., Trausan-Matu, S.: ReadME – Generating Personalized Feedback for Essay Writing using the ReaderBench Framework. In: 3rd Int. Conf. on Smart Learning Ecosystems and Regional Development (SLERD 2018), pp. 133-145. Springer, Aalborg, Denmark (2018) https://doi.org/10.1007/978-3-319-92022-1_12
19. Vidal-Abarca, E., Gilabert, R., Ferrer, A., Ávila, V., Martínez, T., Mañá, A., Llorens, A.C., Gil, L., Cerdán, R., Ramos, L., Serrano, M.A.: TuinLEC, an intelligent tutoring system to improve reading literacy skills / TuinLEC, un tutor inteligente para mejorar la competencia lectora. Infancia y Aprendizaje, 37, 25-56 (2014) https://doi.org/10.1080/02103702.2014.881657
20. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, Vol. 1. Cambridge University Press, Cambridge, UK (2008) https://doi.org/10.1017/CBO9780511809071
21. Yuan, Z., Felice, M.: Constrained grammatical error correction using statistical machine translation. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 52-61 (2013)
22. Wagner, J., Foster, J., van Genabith, J.: A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)
23. Trausan-Matu, S., Rebedea, T., Dascalu, M.: Analysis of discourse in collaborative learning chat conversations with multiple participants. In: Tufis, D., Forascu, C. (eds.) Multilinguality and Interoperability in Language Processing with Emphasis on Romanian, pp. 313-330. Editura Academiei Romane, Bucharest, Romania (2010)
24. Dessus, P., Trausan-Matu, S.: Implementing Bakhtin’s dialogism theory with NLP techniques in distance learning environments. In: Trausan-Matu, S., Dessus, P. (eds.) Proc. 2nd Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity (NLPsL 2010), pp. 11-20. Matrix Rom, Bucharest, Romania (2010)
25. Graesser, A.C., Rus, V., D’Mello, S., Jackson, G.T.: Autotutor: Learning through Natural Language Dialogue that Adapts to the Cognitive and Affective States of the Learner.
26. Graesser, A.C., Penumatsa, P., Ventura, M., Cai, Z., Hu, X.: Using LSA in AutoTutor: Learning through mixed-initiative dialogue in natural language. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, pp. 243-262. Erlbaum, Mahwah, NJ (2007)
27. Crossley, S.A., McNamara, D.S.: Understanding expert ratings of essay quality: Coh-Metrix analyses of first and second language writing. International Journal of Continuing Engineering Education and Life-Long Learning, 21(2/3), 170-191 (2011)
https://doi.org/10.1504/IJCEELL.2011.040197
28. Peters, E., Hulstijn, J.H., Sercu, L., Lutjeharms, M.: Learning L2 German vocabulary through reading: The effect of three enhancement techniques compared. Language learning, 59(1), 113-151 (2009) https://doi.org/10.1111/j.1467-9922.2009.00502.x
29. Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Computer Speech and Language, 23, 89-106 (2009) https://doi.org/10.1016/j.csl.2008.04.003
30. Rebedea, T., Dascalu, M., Trausan-Matu, S.: PolyCAFe: Polyphony-based system for collaboration analysis and feedback generation. In: Second Workshop on Natural Language in Support of Learning: Metrics, Feedback and Connectivity, pp. 21-34. MatrixRom, Bucharest, Romania (2010)
31. Niculescu, M.A., Ruseti, S., Dascalu, M.: RoSummary: Control Tokens for Romanian News Summarization. Algorithms 2022 (MDPI), 15(472), N/A (2022) https://doi.org/10.3390/a15120472
32. Masala, M., Ruseti, S., Dascalu, M., Dobre, C.: Extracting and Clustering Main Ideas from Student Feedback using Language Models. Proceedings of 22nd Int. Conf. on Artificial Intelligence in Education (AIED 2021), 12748, 282-292 (2021) https://doi.org/10.1007/978-3-030-78292-4_23
33. Toma, I., Marica, A.M., Dascalu, M., Trausan-Matu, S.: Readerbench – Automated Feedback Generation for Essays in Romanian. U.P.B. Sci. Bull. Series C – Electrical Engineering and Computer Science, 83(2), 21-34 (2021)
34. Deane, P.: On the relation between automated essay scoring and modern views of the writing construct. Assessing Writing, 18, 7-24 (2013) https://doi.org/10.1016/j.asw.2012.10.002
35. Foltz, P.W., Laham, D., Landauer, T.K.: Automated essay scoring: applications to Educational Technology. Int. Conf. ED-MEDIA ’99, Seattle (1999)
36. Panaite, M., Dascalu, M., Johnson, A.M., Balyan, R., Dai, J., McNamara, D.S., Trausan-Matu, S.: Bring it on! Challenges Encountered while Building a Comprehensive Tutoring System using ReaderBench. In: 19th Int. Conf. on Artificial Intelligence in Education (AIED 2018). Springer, London, UK (2018)
https://doi.org/10.1007/978-3-319-93843-1_30
37. Jugo, I., Kovačić, B., Tijan, E.: Cluster analysis of student activity in a web-based intelligent tutoring system. Scientific Journal of Maritime Research, 29, 75-83 (2015)
38. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I.: Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Long Beach, California, USA (2017) 6000-6010
39. Trandabat, D., Irimia, E., Mititielu, V.B., Cristea, D., Tufis, D.: The Romanian Language in the Digital Era. Springer, Metanet White Paper Series, (2012)
40. Váradi, T., Koeva, S., Yamalov, M., Tadić, M., Sass, B., Nitoń, B., Ogrodniczuk, M., Pęzik, P., Barbu Mititelu, V., Ion, R., Irimia, E., Mitrofan, M., Păiș, V., Tufiș, D., Garabík, R., Krek, S., Repar, A., Rihtar, M., Brank, J.: The MARCELL Legislative Corpus. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France (2020) 3761-3768
41. Pais, V., Mitrofan, M., Gasan, C.L., Ianov, A., Ghita, C., Coneschi, V.S., Onut, A.: Romanian Named Entity Recognition in the Legal domain (LegalNERo). Zenodo, (2021)
https://doi.org/10.18653/v1/2021.nllp-1.2
42. Rebeja, P., Chitez, M., Rogobete, R., Dinca, A., Bercuci, L.: ParlaMint-RO: Chamber of the Eternal Future. Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (2022) 131-134
43. Mitrofan, M., Tufis, D.: BioRo: The Biomedical Corpus for the Romanian Language. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018)
44. Mitrofan, M., Barbu Mititelu, V., Mitrofan, G.: MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language. Proceedings of the 18th BioNLP Workshop and Shared Task. Association for Computational Linguistics, Florence, Italy (2019) 71-79 https://doi.org/10.18653/v1/W19-5008
45. Mitrofan, M., Pais, V.: Improving Romanian BioNER Using a Biologically Inspired System. Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics, Dublin, Ireland (2022) 316-322 https://doi.org/10.18653/v1/2022.bionlp-1.30
46. Mititelu, V.B., Mitrofan, M.: The Romanian Medical Treebank – SiMoNERo. Proceedings of the The 15th Edition of the International Conference on Linguistic Resources and Tools for Natural Language Processing (ConsILR 2020) (2020) 7-16
47. Tufis, D., Barbu, E., Mititelu, V.B., Ion, R., Bozianu, L.: The Romanian Wordnet. Romanian Journal of Information Science and Technology (ROMJIST), 7, 107-124 (2004)
48. Ion, R., Irimia, E., Stefanescu, D., Tufis, D.: ROMBAC: The Romanian Balanced Annotated Corpus. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), (2012)
49. Dumitrescu, S.D., Avram, A.M.: Introducing RONEC – the Romanian Named Entity Corpus. Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France (2020) 4436-4443
50. Aleris: RoTex Corpus Builder. https://github.com/aleris/ReadME-RoTex-Corpus-Builder, last accessed 07/17/2023
51. Cotet, T.-M., Ruseti, S., Dascalu, M.: Neural grammatical error correction for romanian. IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA (2020) 625-631 https://doi.org/10.1109/ICTAI50040.2020.00101
52. Mititelu, V.B., Irimia, E., Tufis, D.: The Reference Corpus of Contemporary Romanian Language (CoRoLa). Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018) 1235-1239
53. Butnaru, A.M., Ionescu, R.T.: MOROCO: The Moldavian and Romanian Dialectal Corpus. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, Florence, Italy (2019) 688-698 https://doi.org/10.18653/v1/P19-1068
54. Hoefels, D.C., Çöltekin, Ç., Mădroane, I.D.: CoRoSeOf – An Annotated Corpus of Romanian Sexist and Offensive Tweets. Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC’22). European Language Resources Association, Marseille, France (2022) 2269-2281
55. Maonlescu, M., Çøltekin, Ç.: Roff – A Romanian Twitter Dataset for Offensive Language. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). INCOMA Ltd., Online (2021) 895-900 https://doi.org/10.26615/978-954-452-072-4_102
56. Paraschiv, A., Sandu, I., Cercel, D.-C., Dascalu, M.: Fighting Romanian Offensive Language with RO-Offense: A Dataset and Classification Models for Online Comments. Preprint submitted to Elsevier, (2022)
57. Cojocaru, A., Paraschiv, A., Dascalu, M.: News-RO-Offense – A Romanian Offensive Language Dataset and Baseline Models Centered on News Article. Proceedings of RoCHI 2022, (2022)
https://doi.org/10.37789/rochi.2022.1.1.12
58. Tufis, D., Irimia, E.: RoCo-News: A Hand Validated Journalistic Corpus of Romanian. Proceedings of the Fifth International Conference on Language Resources and Evaluation (2006) 869-872
59. Artetxe, M., Ruder, S., Yogatama, D.: On the Cross-lingual Transferability of Monolingual Representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online (2020) 4623-4637
https://doi.org/10.18653/v1/2020.acl-main.421
60. Dumitrescu, S.D., Rebeja, P., Lorincz, B., Gaman, M., Avram, A., Ilie, M., Pruteanu, A., Stan, A., Rosia, L., Iacobescu, C., Morogan, L., Dima, G., Marchidan, G., Rebedea, T., Chitez, M., Yogatama, D., Ruder, S., Ionescu, R.T., Pascanu, R., Patraucean, V.: LiRo: Benchmark and leaderboard for Romanian language tasks. In: Vanschoren, J., Yeung, S. (eds.): Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Vol. 1. Curran (2021) N/A
61. Nicolae, D.C., Tufis, D.: RoITD: Romanian IT Question Answering Dataset. ConsILR-2021, 1154-1161 (2019)
62. Avram, S.M., Oltean, M.: A comparison of several AI techniques for authorship attribution on Romanian texts. arXiv:2211.05180 Mathematics 2022, 4589 (2022)
https://doi.org/10.3390/math10234589
63. Oravițan, A., Chitez, M., Bercuci, L., Rogobete, R.: Using the bilingual Corpus of Romanian Academic Genres (ROGER) platform to improve students’ academic writing. Intelligent CALL, granular systems and learner data: short papers from EUROCALL 2022, 315-321 (2022)
https://doi.org/10.14705/rpnet.2022.61.1477
64. Javier Ortiz Suarez, P., Sagot, B., Romary, L. and Sagot, B.B.: Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures. 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), (2019)
65. Cristea, D., Pistol, I., Boghiu, S., Bibiri, A.D., Gifu, D., Scutelnicu, A., Onofrei, M., Trandabat, D., Bugeag, G.: CoBiLiRo: A Research Platform for Bimodal Corpora. Proceedings of the 1st International Workshop on Language Technology Platforms, Marseille, France, 22-27 (2020)
66. Delvin, J., Chang, M.-W,, Lee, K. and Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186 (2019)
67. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving Language Understanding by Generative Pre-Training. Technical Report, OpenAI, (2018)
68. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. Technical Report, OpenAI, (2019)
69. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: Generalized Autoregressive Pretraining for Language Understanding. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS 2019, 5754-5764 (2019)
70. Masala, M., Ruseti, S. and Dascalu, M.: RoBERT-A Romanian BERT Model. COLING, 6626-6637 (2020)
https://doi.org/10.18653/v1/2020.coling-main.581
71. Dumitrescu, S.D., Avram, A.M., Pyysalo, S.: The birth of Romanian BERT. Findings of the Association for Computational Linguistics: EMNLP 2020, 4324-4328 (2020)
https://doi.org/10.18653/v1/2020.findings-emnlp.387
72. Tiedemann, J.: Parallel data, tools and interfaces in opus. LREC, (2012)
73. Niculescu, M.A., Ruseti, S., Dascalu, M.: RoGPT2: Romanian GPT2 for Text Generation. 33rd International Conference on Tools with Artificial Intelligence (ICTAI), 1154-1161 (2021)
https://doi.org/10.1109/ICTAI52525.2021.00183
74. Radford, A., Wu, J., Rewon, C., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. OpenAI Blog, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf, Online (2019)
75. Buzea, M., Trausan-Matu, S., Rebedea, T.: Automatic Romanian Text generation using GPT-2. U.P.B. Sci. Bull. Series C, 84(4), 15-30 (2022)
76. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: Evaluating Text Generation with BERT. ICLR2020, arXiv:1904.09675, (2020)
77. Papieni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA, Philadelphia, Pennsylvania, USA (2002) 311-318
https://doi.org/10.3115/1073083.1073135
78. Lin, C.: Recall-oriented understudy for gisting evaluation (rouge). (2005)
79. Dumitrscu, S., Mihai, I.: GPT-Neo Romanian 780M. GitHub Repository of Romanian-Transformers: https://github.com/dumitrescustefan/Romanian-Transformers (2022)
80. Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., Lewis, M., Zettlemoyer, L.: Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 726-742 (2020)
https://doi.org/10.1162/tacl_a_00343
81. Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Yepes, A.J., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., Zampieri, M.: Findings of the 2016 Conference on Machine Translation (WMT16). Proceedings of the First Conference on Machine Translation, 2, 131-198 (2016) https://doi.org/10.18653/v1/W16-2301
82. Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 483-498 (2021) https://doi.org/10.18653/v1/2021.naacl-main.41
83. Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Shane Gu, S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Valter, D., Narang, S., Mishra, G., Yu, A., Zhao, V., Huang, Y., Dai, A., Yu, H., Petrov, S., Chi, E.H., Dean, J., Devlin, J., Roberts, A., Zhou, D., Le, Q.V., Wei, J.: Scaling Instruction-Finetuned Language Models. arXiv:2210.11416 [cs.LG], (2022)
84. Ranathunga, S., A de Silva, N.: Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Vol. Volume 1: Long Papers, pp. 823-848. Association for Computational Linguistics (2022)
85. Khan, M.Z.: Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech. ArXiv, abs/2103.11792 (2021)
86. Hershcovich, D., Frank, S., Lent, H., de Lhoneux, M., Abdou, M., Brandl, S., Bugliarello, E., Cabello Piqueras, L., Chalkidis, I., Cui, R., Fierro, C., Margatina, K., Rust, P., Søgaard, A.: Challenges and Strategies in Cross-Cultural NLP. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6997-7013. Association for Computational Linguistics, Dublin, Ireland (2022)
https://doi.org/10.18653/v1/2022.acl-long.482
87. Aji, A.F., Winata, G.I., Koto, F., Cahyawijaya, S., Romadhony, A., Mahendra, R., Kurniawan, K., Moeljadi, D., Prasojo, R.E., Baldwin, T., Lau, J.H., Ruder, S.: One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Vol. Volume 1: Long Papers, pp. 7226-7249. Association for Computational Linguistics, Dublin, Ireland (2022)
https://doi.org/10.18653/v1/2022.acl-long.500
88. Păiș, V., Ion, R., Tufiș, D.: A Processing Platform Relating Data and Tools for Romanian Language. Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP), 81-88 (2020)
89. Dascalu, M., Dessus, P., Trausan-Matu, S., Bianco, M., Nardy, A.: ReaderBench, an Environment for Analyzing Text Complexity and Reading Strategies. International Conference on Artificial Intelligence in Education (AIED), 379-388 (2013) https://doi.org/10.1007/978-3-642-39112-5_39
90. Dascalu, M.D., Ruseti, S., Dascalu, M., McNamara, D.S., Carabas, M., Rebedea, T., Trausan-Matu, S.: Before and during COVID-19: A Cohesion Network Analysis of students’ online participation in moodle courses. Computers in Human Behavior, 121, 106780-106780 (2021)
https://doi.org/10.1016/j.chb.2021.106780
91. Dascalu, M., McNamara, D.S., Trausan-Matu, S., Allen, L.K.: Cohesion Network Analysis of CSCL Participation. Behavior Research Methods, 50(2), 604-619 (2018)
https://doi.org/10.3758/s13428-017-0888-4
92. Grootendorst, M.: KeyBERT: minimal keyword extraction with BERT. https://github.com/MaartenGr/KeyBERT, (2020)
93. Carbonell, J., Goldstein, J.: Use of MMR, diversity-based reranking for reordering documents and producing summaries. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 335-336 (1998) https://doi.org/10.1145/290941.291025
94. Ruseti, S., Cotet, T.-M., Dascalu, M.: Romanian Diactrics Restoration using Recurrent Neural Networks. ArXiv, abs/2009.02743 (2020)
95. Sirbu, M.-D., Dascalu, M., Gifu, D., Cotet, T.-M., Tosca, A., Trausan-Matu, S.: ReadME – Improving Writing Skills in Romanian Language. In: Pais, V., Gifu, D., Trandabat, D., Cristea, D., Tufis, D. (eds.): Proceedings of the 13th Int. Conference on Linguistic Resources and Tools for Processing Romanian Language (ConsILR 2018), Iasi, Romania (2018) 135-145
96. Busuioc, C., Ruseti, S., Dascalu, M.: A Literature Review of NLP Approaches to Fake News Detection and Their Applicability to Romanian-Language News Analysis. Tansilvania, 65-71 (2018)
https://doi.org/10.51391/trva.2020.10.07
97. Boroghina, G., Corlatescu, D.-G., Dascalu, M.: Conversational Agent in Romanian for Storing User Information in a Knowledge Graph. International Conference on Human-Computer Interaction (RoCHI2020), (2020) https://doi.org/10.37789/rochi.2020.1.1.15
98. Ungureanu, D., Ruseti, S., Toma, I., Dascalu, M.: pRonounce: Automatic Pronounciation Assessment for Romanian. Conference on Smart Learning Ecosystems and Regional Development (SLERD 2022), Polyphonic Construction of Smart Learning Ecosystems, 103-114 (2022)
https://doi.org/10.1007/978-981-19-5240-1_7
99. Oneata, D., Cucu, H.: Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) New Orleans, LA, USA (2022) 4578-4587 https://doi.org/10.1109/CVPRW56347.2022.00504

back to Table of Contents

IxD&A

Interaction Design & Architecture(s) Journal

Natural Language Processing Tools for Romanian – Going Beyond a Low-Resource Language.