Publication
Enhancing Convolutional Block Attention with Self-Attention on Agricultural Image Classification
Arnab Ghosh Chowdhury; Amos Smith; Martin Atzmueller
In: Proc. IEEE International Conference on Tools with Artificial Intelligence (ICTAI). International Conference on Tools with Artificial Intelligence (AAAI SSS), Los Alamitos, CA, USA, Pages 402-407, IEEE Computer Society, 2025.
Abstract
Convolutional Neural Networks (CNNs) demonstrate significant performance on RGB images and have been applied in precision agriculture to assist in processes such as weed control, plant disease detection, and fruit quality management. The representational power of CNNs is enhanced by the Convolutional Block Attention Module (CBAM), which uses a channel and a spatial attention modules. The respective spatial inductive biases allow CNNs to learn representations with fewer parameters in several vision tasks. Moreover, selfattention based Vision Transformer (ViT) models are leveraged to learn the global representations. They are memory and compute intensive unlike CNNs. MobileViT combines the strengths of CNNs and Transformers to build ViT models for mobile vision tasks. In this paper, we propose two types of a Convolutional Block with Spatial Self-Attention Module (CBwSSAM) that harness a channel attention and a MobileViT-based spatial selfattention, respectively. The proposed modules can be integrated into any CNN architecture seamlessly. Furthermore, we leverage the lightweight BlazeFace model, which has previously been proposed for face detection tasks on mobile GPUs. We analyze an image classification model based upon the BlazeFace feature extraction network that incorporates CBwSSAM, aimed towards deployment on mobile and edge devices. Our evaluation results on three publicly available datasets demonstrate the efficacy of the proposed modules towards agricultural image classification.
