PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

Yijia Shao¹, Tianshi Li², Weiyan Shi², Yanchen Liu³, Diyi Yang¹

¹Stanford University ²Northeastern University ³Harvard University

PrivacyLens is a data construction and multi-level evaluation framework to evaluate the privacy norm awareness of language models (LMs). Our experiment shows that GPT-4 agent leaks information that violates privacy norms in 25.68% of cases even without malicious attackers.

Abstract: As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents’ actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk.

GPT-4 Agent Privacy Leakage Example — GPT-4 agent leaks employee resignation and termination details.

Claude-3 Agent Privacy Leakage Example — Claude-3 agent leaks a child's online username and password.

Llama-3-70B Agent Privacy Leakage Example — Llama-3-70B agent leaks the sex of the baby discovered during an ultrasound.

Risk Model

A novel and practical risk model, focusing on inference-time privacy risk without malicious attackers.

People are building LM agents that can access other software and are using them to assist in daily communication. These LM agents’ actions can lead to data transmission and inappropriate actions may result in unintentional privacy leakage.

To formulate this emerging privacy risk, we consider a risk model with three major actors:

A sender, who provides an instruction that involves sharing information with a recipient.
A recipient, who the sender specifies to receive the information.
An LM agent, which executes the instruction through a sequence of tool use actions.

The privacy leakage arises when a piece of information gathered in the agent trajectory is shared with the recipient in the agent’s final action, and the information flow violates privacy norms.

Privacy norms shape what we consider appropriate information flow in any given situation.

Proferes, Nicholas. "The development of privacy norms." In Modern Socio-Technical Perspectives on Privacy, pp. 79-90.

Evaluation Framework

From discriminative probing to evaluating LM behavior in action.

To evaluate the privacy norm awareness of LMs, we focus on negative norms and express each problematic information transimission with a 5-tuple from the Contextual Integrity theory.

(data type, data subject, data sender, data recipient, transmission principle)

We collect 493 high-quality tuples (referred to as privacy-sensitive seed) from U.S. privacy regulations, privacy literature on vulnerable groups, and crowdsourcing. Starting from these seeds, PrivacyLens creates 493 vignettes covering more details about the context and 493 LM agent trajectories through template-based generation and sandbox simulation respectively (read our paper to learn about the technical details). It then conducts two type of evaluation:

Probing in Question-Answering Format: We prompt the LM to answer whether the information flow is acceptable in seed, vignette, and trajectory level. We use accuracy as the metric.
Evaluating LMs in Action with Agent Setup: Given the trajectory, we prompt the LM to generate the final action. We then evaluate whether the final action leaks any information about the data type. We use leakage rate as the metric.

Data Explorer

You can download our data on Hugging Face 🔗.

Results Highlight

We need to improve the privacy norm awareness of LM (agent)!

For QA probing evaluation:

When we move from seeds to LM agent trajectories, the probing accuracy of weaker models drops significantly.
However, stronger models perform consistently well.

For evaluating final actions of LM agents (action-based evaluation):

There is a huge discrepancy between probing accuracy and action-based evaluation. Although models like GPT-4 and Claude-3-Sonnet answer almost all trajectory-level probing questions correctly, they leak information on 26.37% and 38.34% of cases respectively.
By comparing two Claude-3 models and two Llama-3 models, we find that scaling can help probing evaluation but not the action-based evaluation.

We experiment with adding privacy-enhancing content to the agent system prompt, but such prompt engineering does not significantly boost performance in action-based evaluation.
Our paper includes further analysis of the safety-helpfulness trade-off.

⭐ Takeaways:
1. QA probing alone is insufficient to evaluate LM privacy risk. We need to evaluate LM in action.
2. We need to design mechanisms to ensure LM (agent) abide by privacy norms.

Limitations & Reflection

We consider our work to be a first step in exploring privacy norm awareness of LMs in action and recognize two main limitations that can serve as interesting directions for future work:

We collect general privacy norms in the U.S to construct our dataset, but inherently, privacy concerns can differ across individuals and cultural groups (check out additional experiments by leveraging CultureBank in Appendix D).
Our setup focuses on LM-mediated communication (e.g., using LM agent to send emails/messages, etc.). Unintentional LM privacy leakage can occur in other scenarios (e.g., web agents interacting with websites).

If you have any thoughts or are interested in this line of research, feel free to reach out to us.