Customer-obsessed science


Research areas
-
February 20, 2025Using large language models to generate training data and updating models through both fine tuning and reinforcement learning improves the success rate of code generation by 39%.
-
-
-
December 24, 2024
Featured news
-
2025Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic
-
2025Large language models (LLMs) have achieved remarkable success in various natural language generation (NLG) tasks, but their performance in automatic text evaluation is not yet ready as human replacements. In this paper, we propose SEEval (Self-Explanation in Evaluation), a novel prompt-based text evaluator. Inspired by educational psychology, SEEval incorporates self-explanation, a metacognitive strategy
-
2025Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference (NGDiff)
-
2025Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remove unwanted information. However, these methods often become unstable because they fine-tune by maximizing cross-entropy loss, which is the opposite of traditional
-
2025Previous text-to-SQL datasets and systems have primarily focused on user questions with clear intentions that can be answered. However, real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data. In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all