|
April 29th
12pm-1pm Coda Conference Room 114 Safety Alignment of Generative Foundation ModelsAnthony Peng
Advisor: Prof. Polo Chau
|
|
ABSTRACT
Modern LLMs are safety-aligned through supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF) to mitigate harmful, undesirable, or disallowed outputs. Despite ongoing progress, LLMs still exhibit critical safety gaps: models can be jailbroken into revealing harmful content, often overrefuse benign queries, and fail to maintain safety under adversarial scenarios.
My dissertation research advances the safety alignment of generative foundation models by developing principled tools, architectures, and training methods that strengthen their robustness
and reliability at scale. Specifically, this thesis focuses on three complementary thrusts: a) Understanding and shaping the safety landscape of LLMs, b) Internalizing safety in agentic reasoning intelligence, and c) Grounding safety and robustness in multimodal perception.
BIO
Anthony is a CS PhD candidate at Georgia Tech working with Polo Chau. His thesis research has elevated foundational AI efforts at Nvidia, Meta, IBM, Intel, and ADP via internships and collaborations, and has resulted in several first-author publications and awards at NeurIPS, ACL, ICCV, EMNLP, CVPR, and BMVC. His research has contributed to the AI foundation of multiple funded industry research grants totaling over $1.4M. Learn more about him at shengyun-peng.github.io.
