Publications

Lessons From Building Acoustic Models With a Million Hours of Speech

Sree Hari Krishnan Parthasarathi, Nikko Ström

ICASSP 2019

2019

This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, helping scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generation, we

Machine learning

SegTree Transformer: Iterative refinement of hierarchical features

Zihao Ye, Qipeng Guo, Quan Gan, Zheng Zhang

ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds

2019

The building block of Transformer can be seen as inducing message passing over a complete graph whose nodes correspond to input tokens. Such dense connections make the Transformer data-hungry. Star-Transformer exploits short-term dependencies more heavily by keeping the connections between adjacent tokens but relaying long dependencies via a central node, thereby reducing the number of connections from

Machine learning

Continual learning in practice

Tom Diethe, Tom Borchert, Eno Thereska, Borja de Balle Pigem, Cédric Archambeau, Neil Lawrence

NeurIPS 2018

2018

This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine

Machine learning

A scalable algorithm for higher-order features generation using MinHash

Pooja A, Naveen Nair, Rajeev Rastogi

CIKM 2018

2018

Linear models have been widely used in the industry for their low computation time, small memory footprint and interpretability. However, linear models are not capable of leveraging non-linear feature interactions in predicting the target. This limits their performance. A classical approach to overcome this limitation is to use combinations of the original features, referred to as higher-order features,

Machine learning

Phrase Break Prediction for Long-form Reading TTS: Exploiting Text Structure Information

Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Tom Merritt, Thomas Drugman

Interspeech 2017

2018

Phrasing structure is one of the most important factors in increasing the naturalness of text-to-speech (TTS) systems, in particular for long-form reading. Most existing TTS systems are optimized for isolated short sentences, and completely discard the larger context or structure of the text. This paper presents how we have built phrasing models based on data extracted from audiobooks. We investigate how

Conversational AI

Contextual multi-armed bandits for causal marketing

Neela Sawant, Chitti Babu Namballa, Narayanan Sadagopan, Houssam Nassif

ICML 2018

2018

This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. Focusing on causal effect leads to better return on investment (ROI) by targeting only the persuadable customers who wouldn’t have taken the action organically. Our approach draws on strengths of causal inference, uplift modeling, and multi-armed

Machine learning

A simple transfer-learning extension of Hyperband

Lazar Valkov, Rodolphe Jenatton, Fela Winkelmolen, Cédric Archambeau

NeurIPS 2018

2018

Hyperband has become a popular method to tune the hyperparameters (HPs) of expensive machine learning models, whose performance depends on the amount of resources allocated for training. While Hyperband is conceptually simple, combining random search to a successive halving technique to reallocate resources to the most promising HPs, it often outperforms standard Bayesian optimization when solutions with

Machine learning

Deep factors with Gaussian processes for forecasting

Danielle Maddix Robinson, Yuyang (Bernie) Wang, Alex Smola

NeurIPS 2018

2018

A large collection of time series poses significant challenges for classical and neural forecasting approaches. Classical time series models fail to fit data well and to scale to large problems, but succeed at providing uncertainty estimates. The converse is true for deep neural networks. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is data-driven

Machine learning

ProxQuant: Quantized neural networks via proximal operators

Yu Bai, Yu-Xiang Wang, Edo Liberty

ICLR 2018

2018

Deep neural networks are often desired in environments with limited memory and computational power (such as mobile devices), where it is beneficial to perform model quantization – training networks with low-precision weights. A key mechanism commonly used in training quantized nets is the straight-through gradient method, which enables back-propagation through the quantization mapping. Despite its success

Machine learning

Learning large scale ordinal ranking model via divide-and-conquer technique

Lu Tang, Sougata Chaudhuri, Abraham Bagherjeiran, Ling Zhou

WWW 2018

2018

Structured prediction, where outcomes have a precedence order, lies at the heart of machine learning for information retrieval, movie recommendation, product review prediction, and digital advertising. Ordinal ranking, in particular, assumes that the structured response has a linear ranked order. Due to the extensive applicability of these models, substantial research has been devoted to understanding them

Machine learning

Sample path generation for probabilistic demand forecasting

Dhruv Madeka, Lucas Swiniarski, Dean Foster, Leo Razoumov, Kari Torkkola, Ruofeng Wen

KDD 2018 Workshop on Mining and Learning from Time Series

2018

The state of the art in probabilistic demand forecasting [40] minimizes Quantile Loss to predict the future demand quantiles for different horizons. However, since quantiles aren’t additive, in order to predict the total demand for any wider future interval all required intervals are usually appended to the target vector during model training. The separate optimization of these overlapping intervals can

Machine learning

Learning when not to answer: A ternary reward structure for reinforcement learning based question answering

Frederic Godin, Anjishnu Kumar, Arpit Mittal

NAACL 2019, NeurIPS 2018

2018

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents

Machine learning

Deep Gaussian processes for multi-fidelity modeling

Kurt Cutajar, Mark Pullin, Andreas Damianou, Javier González, Neil Lawrence

NeurIPS 2018

2018

Multi-fidelity methods are prominently used when cheaply-obtained, but possibly biased and noisy, observations must be effectively combined with limited or expensive true data in order to construct reliable models. This arises in both fundamental machine learning procedures such as Bayesian optimization, as well as more practical science and engineering applications. In this paper we develop a novel multi-fidelity

Machine learning

Invariant representation learning for robust deep networks

Julian Salazar, Davis Liang, Zhiheng Huang, Zachary Lipton

NeurIPS 2018

2018

Deep neural networks are often brittle to superficial perturbations of their inputs; models that perform well offline on held-out data can still break under small amounts of naturally-occurring or adversarial shifts. We consider invariant representation learning (IRL), first proposed in the domain of speech recognition, as a simple, effective, and general extension to data augmentation. Rather than only

Machine learning

A neural interlingua for multilingual machine translation

Yichao Lu, Phillip Keung, Faisal Ladhak, Shaonan Zhang, Vikas Bhardwaj, Jason Sun

ACL 2018

2018

We incorporate an explicit neural interlingua into a multilingual encoder-decoder neural machine translation (NMT) architecture. We demonstrate that our model learns a language-independent representation by performing direct zero-shot translation (without using pivot translation), and by using the source sentence embeddings to create an English Yelp review classifier that, through the mediation of the neural

Conversational AI

Publications

Latest news

Work with us