Towards Resource-Constrained Event Extraction:  A Knowledge-Augmented Framework for Overcoming Challenges in Vietnamese NLP

Dung-Cam Quang; Xuan-Bach Le; Tho Quan

doi:10.11113/ijic.v16n1.676

Authors

Dung-Cam Quang ¹Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street, Dien Hong Ward, Ho Chi Minh City, Vietnam ²Vietnam National University Ho Chi Minh City, Linh Xuan Ward, Ho Chi Minh City, Vietnam
Xuan-Bach Le ¹Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street, Dien Hong Ward, Ho Chi Minh City, Vietnam ²Vietnam National University Ho Chi Minh City, Linh Xuan Ward, Ho Chi Minh City, Vietnam
Tho Quan ¹Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street, Dien Hong Ward, Ho Chi Minh City, Vietnam ²Vietnam National University Ho Chi Minh City, Linh Xuan Ward, Ho Chi Minh City, Vietnam

DOI:

https://doi.org/10.11113/ijic.v16n1.676

Keywords:

Event Extraction, Small Language Model, Knowledge Integration, Vietnamese NLP

Abstract

Event Extraction (EE) is a crucial task in Natural Language Processing (NLP), instrumental in capturing meaningful activities and contributing to the tracking of narratives and developments within textual documents. Extensive research has been dedicated to improving the accuracy of event trigger identification and argument role classification, spanning from traditional machine learning to modern deep learning architectures. Recently, driven by the rapid advancements of Large Language Models (LLMs), these models have been applied to EE, primarily through data augmentation or fine-tuning approaches. However, the computational and resource overhead associated with LLMs remains a significant challenge. Furthermore, existing state-of-the-art methods predominantly focus on high-resource languages such as English and Chinese, leaving low-resource languages, like Vietnamese, largely under-explored due to their unique linguistic ambiguities. Consequently, our research direction focuses on leveraging a Small Language Model-based (SLM-based) approach, augmented with external knowledge, to address the EE task in Vietnamese. The ultimate objective is to develop a compact model capable of effectively addressing core EE challenges, such as rare events, semantic ambiguity, and long-range dependencies—thereby establishing an efficient and robust framework specifically tailored for the Vietnamese low-resource language domain.

References

Kulkarni, A., & Dogra, V. (2024). Comprehensive survey of event extraction methods in natural language processing. In Proceedings of the 2024 International Conference on Sustainable Communication Networks and Application (ICSCNA) (pp. 925–929). https://doi.org/10.1109/ICSCNA63714.2024.10864060.

Xie, J., Zhang, Y., Kou, H., Zhao, X., Feng, Z., Song, L., & Zhong, W. (2025). A survey of the application of neural networks to event extraction. Tsinghua Science and Technology, 30(2), 748–768. https://doi.org/10.26599/TST.2023.9010139.

Min, B., Ross, H., Sulem, E., Veyseh, A. P. B., Nguyen, T. H., Sainz, O., Agirre, E., Heintz, I., & Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), Article 30, 1–40. https://doi.org/10.1145/3605943.

Meng, Z., Liu, T., Zhang, H., Feng, K., & Zhao, P. (2024). CEAN: Contrastive event aggregation network with LLM-based augmentation for event extraction. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 321–333). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.eacl-long.19.

Shiri, F., Moghimifar, F., Haffari, R., Li, Y.-F., Nguyen, V., & Yoo, J. (2024). Decompose, enrich, and extract! Schema-aware event extraction using LLMs. In Proceedings of the 2024 27th International Conference on Information Fusion (FUSION) (pp. 1–8). https://doi.org/10.23919/FUSION59988.2024.10706385.

Liu, W., Li, Z., Bai, L., Zuo, Y., Xu, D., Jin, X., Guo, J., & Cheng, X. (2025). Towards event extraction with massive types: LLM-based collaborative annotation and partitioning extraction. In C. Christodoulopoulos, T. Chakraborty, C. Rose, & V. Peng (Eds.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 34365–34387). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.emnlp-main.1743.

Simon, É., Olsen, H., You, H., Touileb, S., Øvrelid, L., & Velldal, E. (2024). Generative approaches to event extraction: Survey and outlook. In J. Tetreault, T. H. Nguyen, H. Lamba, & A. Hughes (Eds.), Proceedings of the Workshop on the Future of Event Detection (FuturED) (pp. 73–86). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.futured-1.7.

Nguyen, T.-N., Tran, B. T., Luu, T.-N., Nguyen, T. H., & Nguyen, K.-H. (2024). BKEE: Pioneering event extraction in the Vietnamese language. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 2421–2427). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.217/.

Vajjala, S., Majumder, B., Gupta, A., & Surana, H. (2020). Practical natural language processing: A comprehensive guide to building real-world NLP systems (1st ed.). O’Reilly Media.

Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., & Weischedel, R. (2004). The automatic content extraction (ACE) program – Tasks, data, and evaluation. In M. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). https://aclanthology.org/L04-1011.

Song, Z., Bies, A., Strassel, S., Riese, T., Mott, J., Ellis, J., Wright, J., Kulick, S., Ryant, N., & Ma, X. (2015). From light to rich ERE: Annotation of entities, relations, and events. In E. Hovy, T. Mitamura, & M. Palmer (Eds.), Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation (pp. 89–98). Association for Computational Linguistics. https://doi.org/10.3115/v1/W15-0812.

Satyapanich, T., Ferraro, F., & Finin, T. (2020). CASIE: Extracting cybersecurity event information from text. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (pp. 8749–8757). https://doi.org/10.1609/aaai.v34i05.6401.

Sun, Z., Li, J., Pergola, G., Wallace, B., John, B., Greene, N., Kim, J., & He, Y. (2022). PHEE: A dataset for pharmacovigilance event extraction from text. In Y. Goldberg, Z. Kozareva, & Y. Zhang (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 5571–5587). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.376.

Yao, F., Xiao, C., Wang, X., Liu, Z., Hou, L., Tu, C., Li, J., Liu, Y., Shen, W., & Sun, M. (2022). LEVEN: A large-scale Chinese legal event detection dataset. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Findings of the Association for Computational Linguistics: ACL 2022 (pp. 183–201). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-acl.17.

Li, G., Wang, P., Xie, J., Cui, R., & Deng, Z. (2022). FEED: A Chinese financial event extraction dataset constructed by distant supervision. In Proceedings of the 10th International Joint Conference on Knowledge Graphs (pp. 45–53). Association for Computing Machinery. https://doi.org/10.1145/3502223.3502229.

Zhang, T., Liu, M., & Zhou, B. (2024). CFinDEE: A Chinese fine-grained financial dataset for document-level event extraction. In Companion Proceedings of the ACM Web Conference 2024 (pp. 1511–1520). Association for Computing Machinery. https://doi.org/10.1145/3589335.3651921.

Hoang, T., Nguyen, L., & Dinh, D. (2024). VHE: A new dataset for event extraction from Vietnamese historical texts. In N. Oco, S. N. Dita, A. M. Borlongan, & J.-B. Kim (Eds.), Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation (pp. 619–634). Tokyo University of Foreign Studies. https://aclanthology.org/2024.paclic-1.59/.

Chen, Y., Xu, L., Liu, K., Zeng, D., & Zhao, J. (2015). Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 167–176). Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1017.

Nguyen, T. H., Cho, K., & Grishman, R. (2016). Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 300–309). Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1034.

Liu, X., Luo, Z., & Huang, H. (2018). Jointly multiple events extraction via attention-based graph information aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 1247–1256). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1156.

Yan, H., Jin, X., Meng, X., Guo, J., & Cheng, X. (2019). Event detection with multi-order graph convolution and aggregated attention. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 5766–5770). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1582.

Liu, J., Chen, Y., & Liu, K. (2019). Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (pp. 6754–6761). https://doi.org/10.1609/aaai.v33i01.33016754.

Tong, M., Xu, B., Wang, S., Cao, Y., Hou, L., Li, J., & Xie, J. (2020). Improving event detection via open-domain trigger knowledge. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5887–5897). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.522.

Lu, Y., Lin, H., Xu, J., Han, X., Tang, J., Li, A., Sun, L., Liao, M., & Chen, S. (2021). Text2Event: Controllable sequence-to-structure generation for end-to-end event extraction. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 2795–2806). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.217.

Lu, Y., Liu, Q., Dai, D., Xiao, X., Lin, H., Han, X., Sun, L., & Wu, H. (2022). Unified structure generation for universal information extraction. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5755–5772). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.395.

Hsu, I.-H., Huang, K.-H., Boschee, E., Miller, S., Natarajan, P., Chang, K.-W., & Peng, N. (2022). DEGREE: A data-efficient generation-based event extraction model. In M. Carpuat, M.-C. de Marneffe, & I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1890–1908). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.138.

Li, S., Ji, H., & Han, J. (2021). Document-level event argument extraction by conditional generation. In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, & Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 894–908). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.69.

Hsu, I.-H., Xie, Z., Huang, K.-H., Natarajan, P., & Peng, N. (2023). AMPERE: AMR-aware prefix for generation-based event argument extraction model. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 10976–10993). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.615.

Yu, Y., Wang, Y., Ma, Y., Li, J., Lu, K., Huang, Z., & Chua, T. S. (2024). EE-LCE: An event extraction framework based on LLM-generated CoT explanation. In C. Cao, H. Chen, L. Zhao, J. Arshad, T. Asyhari, & Y. Wang (Eds.), Proceedings of the 17th International Conference on Knowledge Science, Engineering and Management (pp. 28–40). Springer Nature. https://doi.org/10.1007/978-981-97-5492-2_3.

Lin, Y., Ji, H., Huang, F., & Wu, L. (2020). A joint neural model for information extraction with global features. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7999–8009). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.713.

Nguyen, D. Q., & Nguyen, A. T. (2020). PhoBERT: Pre-trained language models for Vietnamese. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1037–1042). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.92.

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440–8451). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.747.

Guo, K., Jiang, T., & Zhang, H. (2020). Knowledge graph enhanced event extraction in financial documents. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data) (pp. 1322–1329). https://doi.org/10.1109/BigData50022.2020.9378471.

Towards Resource-Constrained Event Extraction: A Knowledge-Augmented Framework for Overcoming Challenges in Vietnamese NLP

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

IJIC