April 29th
12pm-1pm
Coda Conference Room 114

Safety Alignment of Generative Foundation Models

Anthony Peng

Advisor: Prof. Polo Chau

ABSTRACT

Modern LLMs are safety-aligned through supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF) to mitigate harmful, undesirable, or disallowed outputs. Despite ongoing progress, LLMs still exhibit critical safety gaps: models can be jailbroken into revealing harmful content, often overrefuse benign queries, and fail to maintain safety under adversarial scenarios. My dissertation research advances the safety alignment of generative foundation models by developing principled tools, architectures, and training methods that strengthen their robustness and reliability at scale. Specifically, this thesis focuses on three complementary thrusts: a) Understanding and shaping the safety landscape of LLMs, b) Internalizing safety in agentic reasoning intelligence, and c) Grounding safety and robustness in multimodal perception.

BIO

Anthony is a CS PhD candidate at Georgia Tech working with Polo Chau. His thesis research has elevated foundational AI efforts at Nvidia, Meta, IBM, Intel, and ADP via internships and collaborations, and has resulted in several first-author publications and awards at NeurIPS, ACL, ICCV, EMNLP, CVPR, and BMVC. His research has contributed to the AI foundation of multiple funded industry research grants totaling over $1.4M. Learn more about him at shengyun-peng.github.io.