Research Shows Prompt-Injection Techniques Can Help Secure AI Toolchains

The findings are detailed in a recent blog titled “MCP Prompt Injection: Not Just for Evil” by Ben Smith, Senior Staff Research Engineer at Tenable

In a new development that challenges prevailing assumptions about AI security risks, Tenable Research has released insights into how a commonly known threat vector—prompt injection—can be repurposed to strengthen defences within the Model Context Protocol (MCP). The findings are detailed in a recent blog titled “MCP Prompt Injection: Not Just for Evil” by Ben Smith, Senior Staff Research Engineer at Tenable.

The Model Context Protocol, or MCP, is a standard developed by Anthropic that allows AI models such as chatbots to connect to external tools and perform tasks autonomously. While this capability has driven rapid adoption across industries, it also presents new security challenges. Prompt injection, a method where hidden instructions are embedded in a model’s input to manipulate behaviour, is one such concern. Other risks include the introduction of malicious or deceptive tools that could make AI systems behave unpredictably or violate set boundaries.

Tenable’s research not only highlights these vulnerabilities but also proposes a novel approach: using prompt-injection-style techniques for good. According to Smith, these mechanisms can be adapted to inspect, monitor, and control the actions of AI models when they interact with external tools via MCP.

“MCP is a rapidly evolving and immature technology that’s reshaping how we interact with AI,” Smith said. “MCP tools are easy to develop and plentiful, but they do not embody the principles of security by design and should be handled with care. So, while these new techniques are useful for building powerful tools, those same methods can be repurposed for nefarious means. Don’t throw caution to the wind; instead, treat MCP servers as an extension of your attack surface.”

One of the key takeaways from the research is the variability in behaviour across different AI models. Claude Sonnet 3.7 and Gemini 2.5 Pro Experimental were able to consistently invoke a logger tool and reveal sections of the underlying system prompt. GPT-4o, on the other hand, inserted the logger but often returned inconsistent or fabricated parameter values, indicating a different model architecture or response behaviour.

The blog also underscores the potential for defenders to use these methods to their advantage. Security teams can audit toolchains, identify unauthorised or suspicious tools, and enforce internal guardrails. While MCP does require explicit user approval before a tool is run, the findings suggest that additional layers of protection—such as setting least-privilege defaults and thoroughly reviewing each tool—are essential.

As more enterprises integrate large language models into business processes, often connecting them with sensitive internal systems, Tenable’s research serves as a timely reminder. Understanding both the risks and the possible safeguards within emerging protocols like MCP is critical for CISOs, AI developers, and cybersecurity professionals looking to build secure, AI-enabled infrastructures.

Research Shows Prompt-Injection Techniques Can Help Secure AI Toolchains

The findings are detailed in a recent blog titled “MCP Prompt Injection: Not Just for Evil” by Ben Smith, Senior Staff Research Engineer at Tenable

Tags:

Leave a Reply Cancel reply

Categories

Company

Research Shows Prompt-Injection Techniques Can Help Secure AI Toolchains

The findings are detailed in a recent blog titled “MCP Prompt Injection: Not Just for Evil” by Ben Smith, Senior Staff Research Engineer at Tenable

Tags:

Share This Post:

Leave a Reply Cancel reply

Related Post