Talk Title: Inducing and Using Abstractions of Agent Actions
Talk Abstract: This talk explores how LLM-based agents can solve complex, long-horizon tasks by learning and reusing common sub-tasks. We first introduce Agent Workflow Memory (https://arxiv.org/abs/2409.07429), a method for agents to learn reusable textual workflows from past successes to guide future actions. In a realistic web agent setting, we find that agents that learn workflows online improve substantially over agents without memory. We then show further improvements by representing sub-tasks as executable programs: allowing the agents to induce and use tools to solve tasks (https://arxiv.org/abs/2504.06821). Finally, we use this workflow-based framework to directly compare how AI agents and human workers approach the same tasks (https://arxiv.org/abs/2510.22780). Our analysis reveals that while high-level workflows often align, agents exhibit a predominantly programmatic approach, in stark contrast to the UI-centric methods used by humans. This points to future work on human-agent collaboration via sub-task delegation and feedback; and shows that improving agents' code generation abilities could improve their performance even on non-coding tasks.
Bio: Daniel Fried is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. His research focuses on NLP, grounding and interaction, and applied pragmatics, with a particular focus on language interfaces such as LLM agents and code generation. Previously, he was a postdoc at Meta AI and the University of Washington and completed a PhD at UC Berkeley. His research has been supported by an Okawa Research Award, a Google PhD Fellowship and a Churchill Fellowship.