Talk Title: Enabling Human-centric and Culturally Aware Safety of AI Agents
Talk Abstract: AI safety has made substantial strides, yet still struggles to keep up with increasingly agentic AI use cases, and often overly focuses on technical solutions rather than human centered ones. In this talk, I'll outline some recent works towards making AI safety more human-centric and culturally aware. First, I'll introduce HAICosystem and OpenAgentSafety, two new interactive benchmarks for evaluating LLM agents in multi-turn and tool-using interactions via simulations, which shows that agents still have safety issues due to tool use that were not previously known. Then, focusing on users, I'll outline a recent study on how LLM agents should or should not refuse queries, showing that user perceptions, trust, and willingness to use LLMs are strongly affected by their refusal strategies, and that many current LLMs use least-preferred refusal strategies. Finally, I'll cover an oft-overlooked aspect of safety, namely, cultural safety. Introducing MC-Signs, a new benchmark to measure the cultural safety of LLMs, VLMs, and T2I systems with respect to culturally offensive non-verbal communication (e.g., hand gestures), showing strong western-centric biases of all AI systems. I'll conclude with some future directions towards better cultural and human-centric safety.
Bio: Maarten Sap is an assistant professor in Carnegie Mellon University's Language Technologies Department (CMU LTI), and a courtesy appointment in the Human-Computer Interaction institute (HCII). He is also a part-time research scientist and AI safety lead at the Allen Institute for AI. His research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes. He has presented his work in top-tier NLP and AI conferences, receiving paper awards or nominations at NeurIPS 2025, NAACL 2025, EMNLP 2023, ACL 2023, FAccT 2023, WeCNLP 2020, and ACL 2019. He was named a 2025 Packard Fellow and a recipient of the 2025 Okawa Research Award. His research has been covered in the press, including the New York Times, Forbes, Fortune, Vox, and more.