Wireless Capsule Endoscopy (WCE) is an emerging diagnostic technology to examine the Gastrointestinal tract and detect a wide range of diseases and pathologies by capturing images and transferring them remotely. The necessity of having control over the movement of the capsule is crucial to get more accurate detection of the location of the capsule, potential diseased areas, biopsy and drug delivery. However, several challenges are present for WCE, notably the deformable nature of the soft tissues, and texture-less surfaces which are subjected to strong specular reflections. To address these issues and since a reliable real-time 3D pose estimation is critical for controlling active endoscopic capsule robots, this work proposes a data-driven approach to estimate the pose and depth estimation of a wireless capsule endoscope. With recent advances in transformer networks in computer vision tasks, we introduce a Transformer-based architecture to use the self-attention mechanism for specular reflections and deformable topography of the Gastrointestinal tract. This would be a step toward developing a fully autonomous capsule endoscopy for more precise diagnostics and treatments.
|