Zhaorun Chen

| CV | Email | Google Scholar |
| Github | LinkedIn |

I am an incoming Ph.D. student in the Secure Learning Lab at UChicago CS advised by Prof. Bo Li.

Previously, I received my Master degree in Electrical and Computer Engineering at Purdue University advised by Prof. Su Lu. Before that, I obtained my Bachelor degree in Automation at Shanghai Jiao Tong University, advised by Prof. Yue Gao. During 2023 Summer, I interned at UNC-Chapel Hill advised by Prof. Huaxiu Yao and collaborated some wonderful projects with IRIS Lab hosted by Prof. Chelsea Finn.

My current research interests center on trustworthy deployment and safe interactions with foundation models (e.g. LLMs) and agents from both a theoretical and empirical perspective. Specifically, I’m interested in enhancing their trustworthiness via novel algorithms and certificates for various applications (e.g. hallucination mitigation, human-values alignment, jailbreaks and defense) through incorporating external knowledge sources and LLMs’ reasoning capabilities.

[Publications] Email: zhaorun [AT] uchicago.edu


News

  • [June, 2024] πŸŽ‰ One first-authored paper is accepted by IROS 2024 for Oral presentation!
  • [May, 2024] πŸŽ‰ One first-authored paper accepted by ICML 2024!
  • [Mar., 2024] 🌟 One first-authored paper accepted by NAACL 2024!
  • [Mar., 2024] πŸ† One paper accepted to ICME 2024 for Oral presentation!
  • [Feb., 2024] πŸŽ“ Four short-version papers presented at ICLR 2024 Workshops!
  • [Feb., 2024] πŸŽ‰ I've joined Secure Learning Lab at UChicago as a Ph.D. student advised by Prof. Bo Li.
  • [June, 2023] πŸš€ I've joined Prof. Huaxiu Yao at CS Department at UNC-Chapel Hill as a research intern.

  Publications
sym

AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li
In submission, 2024

pdf | abstract | bibtex | arXiv

LLM agents have demonstrated remarkable performance across various applications, primarily due to their advanced capabilities in reasoning, utilizing external knowledge and tools, calling APIs, and executing actions to interact with environments. Current agents typically utilize a memory module or a retrieval-augmented generation (RAG) mechanism, retrieving past knowledge and instances with similar embeddings from knowledge bases to inform task planning and execution. However, the reliance on unverified knowledge bases raises significant concerns about their safety and trustworthiness. To uncover such vulnerabilities, we propose a novel red teaming approach AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. In particular, we form the trigger generation process as a constrained optimization to optimize backdoor triggers by mapping the triggered instances to a unique embedding space, so as to ensure that whenever a user instruction contains the optimized backdoor trigger, the malicious demonstrations are retrieved from the poisoned memory or knowledge base with high probability. In the meantime, benign instructions without the trigger will still maintain normal performance. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning, and the optimized backdoor trigger exhibits superior transferability, in-context coherence, and stealthiness. Extensive experiments demonstrate AgentPoison effectiveness in attacking three types of real-world LLM agents: RAG-based autonomous driving agent, knowledge-intensive QA agent, and healthcare EHRAgent. We inject the poisoning instances into the RAG knowledge base and long-term memories of these agents, respectively, demonstrating the generalization of AgentPoison. On each agent, AgentPoison achieves an average attack success rate of β‰₯ 80% with minimal impact on benign performance (≀ 1%) with a poison rate < 0.1%. Code is released here.

  @article{chen2024halc,
    title={HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding},
    author={Chen, Zhaorun and Zhao, Zhuokai and Luo, Hongyin and Yao, Huaxiu and Li, Bo and Zhou, Jiawei},
    journal={arXiv preprint arXiv:2403.00425},
    year={2024}
  }
sym

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen*, Zhuokai Zhao*, Hongyin Luo, Huaxiu Yao, Bo Li and Jiawei Zhou
ICML 2024
short version presented at ICLR 2024 R2-FM Workshop

pdf | abstract | bibtex | arXiv

While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate HALC’s effectiveness in reducing OH, outperforming state-of-the-arts across four benchmarks. Code is released here.

  @article{chen2024halc,
    title={HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding},
    author={Chen, Zhaorun and Zhao, Zhuokai and Luo, Hongyin and Yao, Huaxiu and Li, Bo and Zhou, Jiawei},
    journal={arXiv preprint arXiv:2403.00425},
    year={2024}
  }
sym

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen*, Yichao Du*, Zichen Wen*, Yiyang Zhou*, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, and Huaxiu Yao
In submission, 2024

pdf | abstract | bibtex | arXiv

Multimodal reward models (RMs) are critical in RLHF and RLAIF, where they serve as judges and provide feedback for aligning foundation models (FMs) with desired behaviors. Despite their significance, these multimodal judges often un- dergo inadequate evaluation of their capabilities and biases, which may lead to potential misalignment and unsafe fine-tuning outcomes. To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias. Specifically, we evaluate a large variety of multimodal judges includ- ing smaller-sized CLIP-based scoring models, open-source VLMs (e.g. LLaVA family), and close-source VLMs (e.g. GPT-4o, Claude 3) on each decomposed subcategory of our preference dataset. Experiments reveal that close-source VLMs generally provide better feedback, with GPT-4o outperforming other judges in aver- age. Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities. Notably, human evaluations on end-to-end fine- tuned models using separate feedback from these multimodal judges provide similar conclusions, further confirming the effectiveness of MJ-Bench. Further studies in feedback scale reveal that VLM judges can generally provide more accurate and stable feedback in natural language (Likert-scale) than numerical scales. The code and data are available here.

  @misc{chen2024mjbenchmultimodalrewardmodel,
    title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?}, 
    author={Zhaorun Chen and Yichao Du and Zichen Wen and Yiyang Zhou and Chenhang Cui and Zhenzhen Weng and Haoqin Tu and Chaoqi Wang and Zhengwei Tong and Qinglan Huang and Canyu Chen and Qinghao Ye and Zhihong Zhu and Yuqing Zhang and Jiawei Zhou and Zhuokai Zhao and Rafael Rafailov and Chelsea Finn and Huaxiu Yao},
    year={2024},
    eprint={2407.04842},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2407.04842}, 
}
sym

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj and Huaxiu Yao
NAACL 2024
short version presented at ICLR 2024 R2-FM Workshop

pdf | abstract | bibtex | arXiv

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment. To address this challenge, in this paper, we propose a novel self-supervised framework AutoPRM that efficiently enhances the fine-tuning of LLMs for intricate reasoning challenges. Specifically, AutoPRM first decomposes complex problems into more manageable subquestions with a controllable granularity switch, then sequentially apply reinforcement learning to iteratively improve the subquestion solver. Additionally, we propose context-guided-decoding to avoid reward tampering and guide the subquestion solver towards the solution of the holistic problem. Extensive experiments show that AutoPRM significantly improves performance on mathematical and commonsense reasoning tasks over SOTA. More encouragingly, AutoPRM can be easily integrated with other orthogonal reasoning pipelines.

  @article{chen2024autoprm,
    title={AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition},
    author={Chen, Zhaorun and Zhao, Zhuokai and Zhu, Zhihong and Zhang, Ruiqi and Li, Xiang and Raj, Bhiksha and Yao, Huaxiu},
    journal={arXiv preprint arXiv:2402.11452},
    year={2024}
  }

Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards
Zhaorun Chen, Zhuokai Zhao, Tairan He, Binhao Chen, Xuhao Zhao, Liang Gong, Chengliang Liu
IROS 2024 (Oral)

pdf | abstract | bibtex | arXiv

Ensuring safety in Reinforcement Learning (RL), typically framed as a Constrained Markov Decision Process (CMDP), is crucial for real-world exploration applications. Current approaches in handling CMDP struggle to balance optimality and feasibility, as direct optimization methods cannot ensure state-wise in-training safety, and projection-based methods correct actions inefficiently through lengthy iterations. To address these challenges, we propose Adaptive Chance-constrained Safeguards (ACS), an adaptive, model-free safe RL algorithm using the safety recovery rate as a surrogate chance constraint to iteratively ensure safety during exploration and after achieving convergence. Theoretical analysis indicates that the relaxed probabilistic constraint sufficiently guarantees forward invariance to the safe set. And extensive experiments conducted on both simulated and real-world safety-critical tasks demonstrate its effectiveness in enforcing safety (nearly zero-violation) while preserving optimality (+23.8%), robustness, and fast response in stochastic real-world settings.

  @misc{chen2024safereinforcementlearninghierarchical,
    title={Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards}, 
    author={Zhaorun Chen and Zhuokai Zhao and Tairan He and Binhao Chen and Xuhao Zhao and Liang Gong and Chengliang Liu},
    year={2024},
    eprint={2310.03379},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2310.03379}, 
}
  Reviewer Service
Conference Reviewer: NeurIPS'24, ICLR'24, COLM'24, ARR'24, IROS'24
Journal Reviewer: Plant Phenomics





Website template from here and here