Adversarial Data Augmentation With Vision Transformer for Image Classification Tasks

This work introduces an analysis to a new end-to-end hybrid model that adopts adversarial data augmentation using C-GANs in conjunction with Vision Transformers (ViT) to enhance image classification. ViT incorporates the multi-head self-attention to address the local and global features of the images to improve the accuracy of digit classification. Through the application of self-attention processes, the ViT can identify the local and global contexts in the images. In this work, the original images of MNIST are used along with the images that are created by the help of C-GAN model for improving image quality and dataset expansion. The ViT model is trained by tuning with specific hyperparameters such as number of epochs, weight decay, learning rate, batch size, to improve the classification outcomes. To investigate the influence of synthetic data and data augmentation, the model is assessed based on its performance. The use of both original and synthesized data into the ViT framework contributes to a more diverse model with better generalization with the accuracy of 0.98.

MoreLess

Year of publication:	2025
Authors:	Kumar, Satrughan ; Kumar, Munish ; Mahapatra, Ranjan Kumar ; Gupta, Sumit ; Baronia, Arpita
Published in:	Exploring Generative Adversarial Networks and Meta-Learning Synergies. - IGI Global Scientific Publishing, ISBN 9798369375778. - 2025, p. 73-100

More details

Type of publication:	Article
Type of publication (narrower categories):	chapter
Language:	English
Other identifiers:	10.4018/979-8-3693-7575-4.ch003 [DOI]
Source:	Other ZBW resources

Persistent link: https://www.econbiz.de/10015539960