Customer-obsessed science
Research areas
-
June 8, 20267 min readFour approaches can dramatically improve the performance and trustworthiness of AI agents in operational environments.
-
-
-
-
May 27, 20264 min readMachine learning
Featured news
-
ACM DocEng 20262026Operational runbooks increasingly function as living documents within operational workflows: they are maintained by people, used in incident support, and continuously revised as organizational knowledge changes. Yet little is known about how such document collections evolve over time in production settings, or which interpretable signals are useful for monitoring document change. We analyze 17 weeks of
-
2026Multi-agent systems (MAS) are increasingly capable of tackling complex real-world tasks, yet their reliance on inter-agent coordination, tool use, and long-horizon reasoning makes error recognition particularly challenging. Minor errors can propagate across agents, escalating into task failures while producing long, intertwined execution trajectories that impose significant costs for both human developers
-
IEEE Micro2026Despite the nonnegligible occurrence of silent data corruption (SDC) during largescale training of large language models (LLMs), SDC impact on training lacks systematic understanding. This article empirically analyzes the connections between different training characteristics and the impact of SDC on LLM training. Using deterministic training workloads on real-world SDC-affected hardware, we quantify SDC
-
2026Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs' tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate
-
ICML 2026 Workshop on Reinforcement Learning from World Feedback2026Training-free verbal reinforcement learning enables LLM agents to learn from world feedback—objective signals such as dynamic task outcomes, market returns, or demand forecasts—by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all