Mitigating the Proliferation of Fake Image-Text Reviews: A Two-Tier Intra- and Inter-Modal Fusion Framework

Wei Du, Jianlan Li, Jilei Zhou, Qi Lu, and Yue Sun
International Journal of Electronic Commerce,
Volume 29, Number 2, 2025, pp. 304-332.


Abstract:

Fake reviews undermine consumer trust and can harm both consumers and e-commerce platforms. Detecting these deceptive practices is increasingly challenging due to the prevalence of multimodal reviews, where text and images are combined to present a more comprehensive evaluation of products or services. As such, traditional text-based methods may not suffice to capture this emerging deception. Further, considering the challenge of extracting image features and modeling the interaction between textual and visual cues, we propose a co-attention-based model for fake review detection that leverages both textual and visual cues. For multimodal feature extraction, our model uses fine-tuned BERT for textual cues and fine-tuned VGG19 for deep image features, supplemented by hand-crafted aesthetic image features. For multimodal fusion, we design a novel two-tier multimodal fusion module that captures both intramodal and intermodal interactions. Specifically, the intramodal fusion module employs Attention-BiLSTM to capture visual patterns across multiple images, and the intermodal fusion model then employs a multi-head co-attention-based fusion block to capture the interplay between textual and image modalities, mimicking how users process multimodal reviews. Experiments on a real-world dataset demonstrate the effectiveness of our model. Our study contributes to the field by integrating multimodal features into deep learning techniques, enhancing the detection of fake reviews on digital platforms.