Code-switched text synthesis in unseen language pairs

I-Hung Hsu; Avik Ray; Shubham Garg; Nanyun Peng; Jing Huang

Publication

Code-switched text synthesis in unseen language pairs

By I-Hung Hsu, Avik Ray, Shubham Garg, Nanyun Peng, Jing Huang

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Existing efforts on text synthesis for codeswitching mostly require training on codeswitched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing codeswitched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from codeswitched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize codeswitched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55% relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS.

Code-switched text synthesis in unseen language pairs

Latest news

Work with us