This paper describes Distill-Quantize-Tune (DQT), a pipeline to create viable small-footprint multilingual models that can perform NLU directly on extremely resource-constrained Edge devices. We distill semantic knowledge from a large-sized teacher (transformer-based), that has been trained on huge amount of public and private data, into our Edge candidate (student) model (Bi-LSTM based) and further compress