TrustFall: AI Coding Agents Exploitable With One Enter Keypress

Table of Contents

Security researchers at Adversa AI have demonstrated a new attack convention called TrustFall that allows malicious code repositories to trigger arbitrary code execution in four major AI coding CLI tools — Anthropic's Claude Code, Google's Gemini CLI, Cursor CLI, and GitHub Copilot CLI — with minimal or no user interaction. The attack exploits a shared trust model flaw: all four tools display a folder trust dialog when a user opens a new repository, but all four default to "Yes/Trust," meaning a single Enter keypress is sufficient to approve a malicious repository and allow it to spawn attacker-controlled MCP servers (Model Context Protocol — an open standard that allows AI systems to connect to external tools and data sources). When these tools run in CI/CD pipelines (Continuous Integration/Continuous Deployment — automated software build and delivery systems that execute code automatically on each commit), they can be exploited to steal environment variables, deploy keys, signing certificates, and other secrets without any human interaction at all.

TrustFall: Technical Details

The TrustFall attack, detailed in concurrent reporting by Dark Reading and SecurityWeek, exploits a fundamental convention in how agentic coding CLIs (command-line interface tools that use AI models to autonomously read, write, and execute code within a developer's environment) handle the trust relationship between the tool and the file system repositories it operates on.

When a developer points Claude Code, Gemini CLI, Cursor CLI, or Copilot CLI at a new directory or repository, the tool displays a trust prompt — asking the user to confirm that the repository contents can be trusted before the AI begins reading and executing code within it. This prompt is a security boundary: it is meant to prevent a malicious repository from automatically triggering AI actions without user awareness.

The flaw Adversa AI identified is in the default response to this prompt: all four tools default to "Trust/Yes." This means that pressing Enter without reading the dialog — a natural, low-friction response to an interruptive prompt during a normal workflow — immediately grants the malicious repository full access to the AI agent's capabilities.

The attack chain from that point is:

The malicious repository contains configuration that auto-approves and spawns a custom MCP server the moment the trust prompt is accepted
The attacker-controlled MCP server is now connected to the user's AI coding agent, providing it with tool definitions that override or extend the agent's normal behavior
The AI agent, following the attacker's MCP server instructions, can read the local environment (source files, environment variables, configuration) and execute commands within the development environment
In a CI/CD context, this means the agent can access environment variables (which commonly contain API keys, database credentials, cloud provider secrets), deploy keys (cryptographic keys used to authenticate pushes to code repositories), code signing certificates (used to certify that released software originated from a trusted developer), and any other credentials available to the pipeline runner

The researchers tested this identical attack chain against all four major tools and confirmed that all four behave the same way — the TrustFall attack is not a quirk of one product but a convention-level flaw shared across the AI coding agent ecosystem.

Exploitation Status and Threat Landscape

Adversa AI's research demonstrates a proof-of-concept (PoC — working exploit code that demonstrates a vulnerability is real and exploitable, even if not yet weaponized in the wild) attack chain. At time of writing, there is no public report of TrustFall being used in active attacks in the wild. However, the conditions for exploitation are trivially satisfied: an attacker needs only to host a malicious repository (for example, a typosquatted GitHub project or a dependency added to an existing project) and wait for a developer or CI pipeline to open it with an AI coding tool.

The supply chain scenario is the highest-risk use case. AI coding agents are increasingly deployed in CI/CD pipelines as automated code review, test generation, or dependency management assistants. A repository that introduces a malicious file capable of triggering TrustFall against a CI pipeline's AI coding agent can exfiltrate the pipeline's secrets in the same automated run — silently, with no human present to observe the prompt or notice the behavior.

The MITRE ATT&CK technique most directly applicable here is T1195.002 (Supply Chain Compromise: Compromise Software Supply Chain) — the attacker compromises a third-party software component (the repository) to execute code in a victim environment that trusts the component.

Who Is Affected

Developers and organizations using any of the following tools in interactive or automated workflows are affected:

Claude Code (Anthropic's agentic coding CLI)
Gemini CLI (Google's command-line AI coding assistant)
Cursor CLI (the CLI interface for the Cursor AI-first code editor)
GitHub Copilot CLI (GitHub's AI coding assistant for the command line)

The risk is especially acute for:

DevOps and platform engineering teams running AI coding agents as CI/CD pipeline steps
Open-source maintainers who regularly pull and evaluate external repositories using AI tools
Developer workstations where engineering staff use these CLIs against third-party repositories (for example, when evaluating open-source libraries or contributing to external projects)

Organizations that have adopted "AI-assisted code review" workflows — where AI agents are automatically invoked on pull requests from external contributors — are particularly exposed, as those workflows may automatically open untrusted repositories with full AI agent access.

What You Should Do Right Now

Review trust prompt behavior in your AI coding tools: When prompted to trust a directory or repository, read the dialog carefully. Do not press Enter by default — actively evaluate whether the repository source is trustworthy before accepting.

Audit CI/CD pipelines that invoke AI coding agents: Identify every pipeline step that uses Claude Code, Gemini CLI, Cursor CLI, Copilot CLI, or similar tools. Determine what secrets (environment variables, credentials, tokens) are accessible to those pipeline runners. Consider whether those secrets are necessary in the runner environment.

Apply the principle of least privilege to AI agent environments: Ensure that pipeline runners invoking AI coding agents have access only to the secrets strictly required for their specific task. A code review agent does not need deploy key access or cloud provider credentials.

Monitor for unexpected MCP server activity: Review Claude Code configuration (stored in the project's .claude/ directory) for unexpected MCP server entries that were not explicitly configured by your team.

Follow vendor security advisories: Anthropic, Google, Microsoft (GitHub Copilot), and Cursor Labs have been made aware of TrustFall findings. Monitor their security advisories for patches or configuration guidance that changes the trust prompt default behavior.

Prefer sandboxed environments for evaluating untrusted repositories: When you need to assess a repository of unknown provenance, do so in an isolated environment (e.g., a containerized build agent or a separate virtual machine) that does not have access to production credentials or infrastructure.

Background: Understanding the Risk

The TrustFall vulnerability illustrates a structural challenge with the current generation of AI coding tools: they were designed for developer productivity in trusted environments, but the ways they are being deployed increasingly put them in contact with untrusted inputs.

The agentic coding CLI paradigm gives AI systems the ability to read files, execute code, make network requests, and invoke external tools — all within the developer's local environment. This is a powerful capability that dramatically accelerates development workflows, but it also means that a compromised AI agent has access to exactly the same resources as the developer running it. On a developer workstation, that includes source code, configuration files, browser-stored credentials, SSH keys, and cloud CLI authentication tokens. In a CI/CD pipeline, it includes pipeline secrets and deployment credentials.

The MCP (Model Context Protocol — the open standard, originally developed by Anthropic, that allows AI agents to connect to external tools, APIs, and data sources through a standardized interface) auto-spawn behavior at the core of TrustFall is designed to make AI agents extensible: developers can add custom tools and data sources to their AI agent by defining MCP servers. The problem is that this extensibility mechanism can be triggered by a repository the agent is evaluating — rather than only by the developer who owns the agent.

The comparison to prior supply chain attacks is instructive. SolarWinds (2020) and 3CX (2023) demonstrated how trusted software distribution channels can be compromised to deliver malware to thousands of targets simultaneously. TrustFall identifies a new class of supply chain risk: rather than compromising the software distribution channel, an attacker compromises the development environment itself by exploiting the trust model of AI coding tools. As these tools become embedded in standard development workflows globally, this attack surface will only grow.

Conclusion

TrustFall demonstrates that Claude Code, Gemini CLI, Cursor CLI, and Copilot CLI can be hijacked by a malicious repository with a single Enter keypress, with the most dangerous scenario being unattended CI/CD pipelines where attacker-controlled MCP servers can silently exfiltrate pipeline secrets. Review your AI coding tool deployments, audit CI/CD pipeline secret access, and do not trust the Enter-key default on trust prompts from unknown repositories.

For any query contact us at contact@cipherssecurity.com

About Us

Contact Info

TrustFall: AI Coding Agents Exploitable with One Enter Keypress

TrustFall: Technical Details

Exploitation Status and Threat Landscape

Who Is Affected

What You Should Do Right Now

Background: Understanding the Risk

Conclusion

Tags:

Leave a Reply Cancel reply

Recent Posts

TryHackMe Walkthrough: Nmap Post Port Scans

How to install theHarvester tool

TryHackMe Walkthrough: Nmap Advanced Port Scans

Categories

About Us

Contact Info

Follow Us

TrustFall: AI Coding Agents Exploitable with One Enter Keypress

TrustFall: Technical Details

Exploitation Status and Threat Landscape

Who Is Affected

What You Should Do Right Now

Background: Understanding the Risk

Conclusion

Tags:

Share This Post:

Leave a Reply Cancel reply

Related Post