Neural networks

PipeFill: Using GPUs during bubbles in pipeline-parallel LLM training

Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang

MLSys 2025

2025

Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large scale, due to idle GPU time caused by pipeline bubbles, which are often 15–30% and can exceed 60% of the training job’s GPU allocation. To improve the GPU utilization of PP model training, this paper describes

Machine learning

Searching for optimal solutions with LLMs via bayesian optimization

Dhruv Agarwal, Manoj Ghuhan, Rajarshi (Raj) Das, Sandesh Swamy, Sopan Khosla, Rashmi Gangadharaiah

ICLR 2025

2025

Scaling test-time compute to search for optimal solutions is an important step towards building generally-capable language models that can reason. Recent work, however, shows that tasks of varying complexity require distinct search strategies to solve optimally, thus making it challenging to design a one-size-fits-all approach. Prior solutions either attempt to predict task difficulty to select the optimal

Machine learning

AutoG: Towards automatic graph construction from tabular data

Zhikai Chen, Han Xie, Jian Zhang, Xiang Song, Jiliang Tang, Huzefa Rangwala, George Karypis

ICLR 2025

2025

Recent years have witnessed significant advancements in graph machine learning (GML), with its applications spanning numerous domains. However, the focus of GML has predominantly been on developing powerful models, often overlooking a crucial initial step: constructing suitable graphs from common data formats, such as tabular data. This construction process is fundamental to applying graph-based models,

Machine learning

Detect, disambiguate, and translate: on-demand visual reasoning for multimodal machine translation with large vision-language models

Danyang Liu, Fanjie Kong, Xiaohang Sun, Dhruva Patil, Avijit Vajpayee, Zhu Liu, Vimal Bhat, Najmeh Sadoughi

NAACL 2025

2025

Multimodal machine translation (MMT) aims to leverage additional modalities to assist in language translation. With limited parallel data, current MMT systems rely heavily on monolingual English captioning data. These systems face three key issues: they often overlook that visual signals are unnecessary in many cases, they lack transparency in how visual information is used for disambiguation when needed

Machine learning

Training LLMs with MXFP4

Albert Tseng, Tao Yu, Youngsuk Park

AISTATS 2025

2025

Low precision (LP) datatypes such as MXFP4 can accelerate matrix multiplications (GEMMs) and reduce training costs. However, directly using MXFP4 instead of BF16 during training significantly degrades model quality. In this work, we present the first near-lossless training recipe that uses MXFP4 GEMMs, which are 2× faster than FP8 on supported hardware. Our key insight is to compute unbiased gradient estimates

Machine learning

Weakly-supervised multi-sensor anomaly detection with time-series foundation models

Zelin He, Matthew Reimherr, Sarah Alnegheimish, Akash Chandrayan

NeurIPS 2024 Workshop on Time Series in the Age of Large Models

2024

Anomaly detection in industrial sensor data is challenging as sensor readings are frequently affected by routine operations, leading to sudden changes that may not indicate actual issues. This makes it difficult to distinguish between normal and anomalous behavior. With a few expert-labeled anomalies, we aim to leverage these sparse labels to improve sensor anomaly detection. Besides the issue of limited

Machine learning

Chronos: Learning the language of time series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle Maddix Robinson, Hao Wang, Michael Mahoney, Kari Torkkola, Andrew Wilson, Michael Bohlke-Schneider, Yuyang (Bernie) Wang

Transactions of Machine Learning Research

2024

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters

Machine learning

A survey on knowledge editing of neural networks

Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi

IEEE Transactions on Neural Networks and Learning Systems

2024

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date

Machine learning

Improving tool retrieval by leveraging large language models for query generation

Mohammad Kachuee, Sarthak Ahuja, Vaibhav Kumar, Puyang Xu, Derek Liu

COLING 2025

2024

Using tools by Large Language Models (LLMs) is a promising avenue to extend their reach beyond language or conversational settings. The number of tools can scale to thousands as they enable accessing sensory information, fetching updated factual knowledge, or taking actions in the real world. In such settings, in-context learning by providing a short list of relevant tools in the prompt is a viable approach

Conversational AI

REACT: Residual-adaptive contextual tuning for fast model adaptation in cybersecurity

Jiayun Zhang, Junshen Xu, Yi Fan

NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability

2024

Cybersecurity applications are challenged by constant distribution shifts due to the evolvement of services, users, and threats, degrading pretrained model performance. Fast adaptation is crucial for maintaining reliable security measures. Existing works primarily focus on pretraining models that can quickly adapt to new distributions, yet their fine-tuning relies on a rudimentary strategy that treats each

Machine learning

Neural networks

Work with us