MultiPhishNet: A Multimodal Approach of QR Code Phishing Detection using Multi-Head Attention and Multilingual Embeddings
DOI:
https://doi.org/10.11113/ijic.v15n1.512Keywords:
Phishing detection, QR codes, Multimodal deep learning, multilingual embeddingsAbstract
Phishing attacks leveraging QR codes have become a significant threat due to their increasing use in contactless services. These attacks are challenging to detect since QR codes typically encode URLs leading to phishing websites designed to steal sensitive information. Existing detection methods often rely on blacklists or handcrafted features, which are inadequate for handling obfuscated URLs and multilingual content. This paper proposes MultiPhishNet, a multimodal phishing detection model that integrates advanced embedding techniques, Convolutional Neural Networks (CNNs), and multi-head attention mechanisms to automatically extract and learn key features from URLs and HTML content. The model leverages FastText embeddings for word-level representation, custom character embeddings for obfuscated URLs, and SBERT (Sentence-Bidirectional Encoder Representations from Transformers) embeddings for HTML content. To address class imbalance, ADASYN (Adaptive Synthetic Sampling) oversampling was applied, ensuring balanced training. The proposed method was evaluated on a moderately multilingual dataset, achieving an accuracy of 97.76% and an AUC of 0.9946. These results demonstrate that MultiPhishNet outperforms the baseline HTMLPhish model in phishing detection. Future research will focus on expanding the dataset to cover a broader range of languages and regional phishing tactics.