Fine-tuning of convolutional neural networks for the recognition of facial expressions in sign language video samples

Neha Deshpande, Fabrizio Nunnari, Eleftherios Avramidis

In: 7th Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual & the Textual Challenges and Perspectives (SLTAT 7) - Proceedings. International Workshop on Sign Language Translation and Avatar Technology (SLTAT) June 24-24 Marseille France Seiten 29-38 ISBN 979-10-95546-82-5 ELRA 2022.


In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for ’fear’, ’disgust’, ’surprise’, ’sadness’, ’happiness’ and ’anger’ along with the ’neutral’ class. Given the limited amount of annotated facial expression data for the sign language domain, we started from a model pre-trained on general-purpose facial expression datasets and we applied various machine learning techniques such as fine-tuning, data augmentation, class balancing, as well as image preprocessing to reach a better accuracy. The models were evaluated using K-fold cross-validation to get more accurate conclusions. Through our experiments we demonstrate that fine-tuning a pre-trained model along with data augmentation by horizontally flipping images and image normalization, helps in providing the best accuracy on the sign language dataset. The best setting achieves satisfactory classification accuracy, comparable to state-of-the-art systems in generic facial expression recognition. Experiments were performed using different combinations of the above-mentioned techniques based on two different architectures, namely MobileNet and EfficientNet, and is deemed that both architectures seem equally suitable for the purpose of fine-tuning, whereas class balancing is discouraged.


Deshpande2022SLTAT-FineTuning.pdf (pdf, 354 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence