Talk Title: Reasoning Guardrails for the Agentic Web
Talk Abstract: Large language models are shifting from text predictors to agents that operate on the web and within software systems. This transition amplifies safety risks across dialogue, the environments agents control, and compliance with real-world policies. This talk presents ThinkGuard, a reasoning-based guardrail trained via mission-focused distillation to acquire deliberative thinking without costly manual annotation. By reasoning over goals, constraints, and latent hazards, ThinkGuard generalizes to implicit, complex, and previously unseen risks. I will also outline guardrail extensions that enforce policy compliance for web and system agents, and omnimodal guardrails that vet and steer interactions involving images, audio, and video. Together, these techniques transform guardrails from brittle filters into adaptive, explanatory safety layers that preserve utility while measurably reducing failure in agentic workflows.
Bio: Muhao Chen is an Assistant Professor at the Department of Computer Science, UC Davis, where he leads the Language Understanding and Knowledge Acquisition (LUKA) Group. He received his Ph.D. from the Department of Computer Science at UCLA, and B.S. in Computer Science from Fudan University. His research focuses on robust and accountable ML, particularly on accountability and security issues of large language models and agentic AI. He is a co-founder and the secretary of ACL Special Interest Group in NLP Security (SIGSEC). His work has been recognized with EMNLP Outstanding Paper Awards (2023, 2024), an ACM SIGBio Best Student Paper Award (2020), faculty research awards from Amazon (2022, 2023) and Cisco, and funding support from multiple NSF, DARPA, IARPA, and industry grants.