A modern Spoken Language Understanding (SLU) system usually contains two sub-systems, Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU),where ASR transforms voice signal to text form and NLU provides intent classification and slot filling from the text. In practice,such decoupled ASR/NLU design facilitates fast model iteration for both components. However, this makes downstream NLU