Masking latent gender knowledge for debiasing image captioning
2024
Large language models incorporate world knowledge and present breakthrough performances on zero-shot learning. However, these models capture societal bias (e.g., gender or racial bias) due to bias during the training process which raises ethical concerns or can even be potentially harmful. The issue is more pronounced in multi-modal settings, such as image captioning, as images can also add onto biases (e.g., due to historical non-equal representation of genders in different occupations). In this study, we investigate the removal of potentially problematic knowledge from multi-modal models used for image captioning. We relax the gender bias issue in captioning models by degenderizing generated captions through the use of a simple linear mask, trained via adversarial training. Our proposal makes no assumption on the architecture of the model and freezes the model weights during the procedure, which also enables the mask to be turned off. We conduct experiments on COCO caption datasets using our masking solution. The results suggest that the proposed mechanism can effectively mask the targeted biased knowledge, by replacing more than 99% gender words with neutral ones, and maintain a comparable captioning quality performance with minimal (e.g., -1.4 on BLEU4 and ROUGE) impact to accuracy metrics.
Research areas