5 February 2025 Vision transformer distillation for enhanced gastrointestinal abnormality recognition in wireless capsule endoscopy images
Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Nikolaos Papachrysos, Ahmed Fouad El Ouafdi, Thomas de Lange, Cosimo Distante
Author Affiliations +
Abstract

Purpose

Wireless capsule endoscopy (WCE) is a non-invasive technology used for diagnosing gastrointestinal abnormalities. A single examination generates 55,000 images, making manual review both time-consuming and costly for doctors. Therefore, the development of computer vision-assisted systems is highly desirable to aid in the diagnostic process.

Approach

We presents a deep learning approach leveraging knowledge distillation (KD) from a convolutional neural network (CNN) teacher model to a vision transformer (ViT) student model for gastrointestinal abnormality recognition. The CNN teacher model utilizes attention mechanisms and depth-wise separable convolutions to extract features from WCE images, supervising the ViT in learning these representations.

Results

The proposed method achieves accuracy of 97% and 96% on the Kvasir and KID datasets, respectively, demonstrating its effectiveness in distinguishing normal from abnormal regions and bleeding from non-bleeding cases. The proposed approach offers computational efficiency and generalization to unseen datasets, outperforming several state-of-the-art methods.

Conclusions

We proposed a deep learning approach utilizing CNNs and a ViT with KD to effectively classify gastrointestinal diseases in WCE images. It demonstrates promising performance on public datasets, distinguishing normal from abnormal regions and bleeding from non-bleeding cases while offering optimal computational efficiency compared with existing methods, making it suitable for GI disease applications.

© 2025 Society of Photo-Optical Instrumentation Engineers (SPIE)

Funding Statement

Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Nikolaos Papachrysos, Ahmed Fouad El Ouafdi, Thomas de Lange, and Cosimo Distante "Vision transformer distillation for enhanced gastrointestinal abnormality recognition in wireless capsule endoscopy images," Journal of Medical Imaging 12(1), 014505 (5 February 2025). https://doi.org/10.1117/1.JMI.12.1.014505
Received: 6 September 2024; Accepted: 16 January 2025; Published: 5 February 2025
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Performance modeling

Visual process modeling

Transformers

Education and training

Machine learning

Enhanced vision

Back to Top