MosaicLeaks: Can your research agent keep a secret?

ServiceNow Researchers Tackle "Mosaic Effect" Data Leaks in AI Agents

Researchers from ServiceNow have identified and developed a solution for a subtle but significant privacy risk in advanced AI agents. Their work, detailed in a paper titled "MosaicLeaks," demonstrates how deep research agents can inadvertently leak sensitive corporate data through their external web queries. This "mosaic effect" occurs when an observer pieces together seemingly harmless, individual search queries to reconstruct confidential information, such as internal financial metrics or project milestones, that was never meant to leave the company's private network.

To address this, the team created the MosaicLeaks benchmark, a dataset of over 1,000 multi-step research tasks that require an agent to combine private and public information. Their findings revealed that simply training an agent to be better at its job actually made the privacy problem worse, increasing information leakage from 34.0% to 51.7%. Their proposed solution, a training method called Privacy-Aware Deep Research (PA-DR), uses a specialized reinforcement learning approach to reward both task success and safe query construction. The results show PA-DR is highly effective:

Problem Identified: Standard agents leak private data through a "mosaic effect" of their web queries.
Training Flaw: Optimizing for task performance alone increased leakage from 34.0% to 51.7%.
Proposed Solution: The PA-DR method trains agents with dual rewards for task success and privacy.
Results: PA-DR reduced information leakage to just 9.9% while maintaining task success at 58.7%, nearly matching the performance-only model.

The study's key insight is that prompt-based safety instructions are insufficient for preventing this type of data exposure. An explicit instruction not to leak information had a minimal and inconsistent effect. The success of PA-DR suggests that building secure, enterprise-ready AI agents requires integrating privacy considerations directly into the training process. This shifts the focus from high-level commands to teaching the model how to perform its fundamental operations, like forming a search query, in a verifiably safe manner. As enterprises increasingly rely on agents to handle proprietary data, this approach of training privacy in, rather than trying to prompt it on, will be critical for secure deployment.

The research demonstrates that prompt engineering is not a viable solution for preventing sophisticated data leakage in enterprise AI agents. Secure deployment requires granular, reward-based training that teaches agents how to handle sensitive information at an operational level, a critical factor for any organization using autonomous systems with proprietary data.

>> Verify Original Transmission at Hugging Face