ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface


ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface

Cybersecurity researchers have disclosed details of a vulnerability in OpenAI ChatGPT that leverages the artificial intelligence (AI) assistant’s implicit trust in Markdown links and images to trigger prompt injections and open the door to phishing attacks.

The technique has been codenamed ChatGPhish by Permiso Security.

“The chatgpt.com response renderer trusts Markdown links and Markdown image URLs that originated from a third-party page the assistant has just summarized. It auto-fetches those images and surfaces those links as live, clickable elements inside the trusted assistant UI,” security researcher Andi Ahmeti said in a report shared with The Hacker News.

In a hypothetical attack scenario, a bad actor can append a small payload to any web page that the victim later prompts ChatGPT to summarize, causing it to leak their IP, User-Agent, and Referer details when attacker-hosted images embedded in the page are automatically fetched when the answer is rendered.

In addition, it can result in malicious Markdown links being rendered as live clickable elements inside the assistant’s response, serve far fake system-style security alerts, and serve a QR code from an attacker’s S3 bucket and trick the victim into scanning it via their mobile device, effectively bypassing desktop URL filters and enterprise security controls.

The latest finding demonstrates how summarization can emerge as an adversarial surface. Earlier this March, Permiso also revealed how an attacker-controlled email containing specially crafted instructions, when summarized by Microsoft Copilot, could influence its output via a cross-prompt injection (XPIA) or indirect prompt injection.

What makes ChatGPhish a noteworthy attack technique is not the prompt injection itself, but in the manner in which the instructions embedded in a web page are followed and presented to the user as part of the summary.

Cybersecurity

In other words, a regular web page summarized with ChatGPT is enough to render phishing links, spoofed account alerts, remote images, and QR codes directly inside a trusted AI interface. As organizations increasingly use ChatGPT for research and summarization, this vulnerability means any malicious web page an employee asks the AI chatbot to process could contain a payload that transforms ChatGPT into a phishing surface.

“The shift from email to the browser significantly expands the potential attack surface. A user no longer has to open a malicious attachment or interact with a suspicious message,” Permiso said. “Simply summarizing a page during normal browsing activity can introduce attacker-controlled instructions into the model context and ultimately into the rendered response.”

The disclosure comes as Adversa AI documented two attack techniques codenamed SymJack and TrustFall targeting AI coding agents and agentic coding CLIs that allow attackers to achieve code execution and full machine compromise.

SymJack is “a single attack pattern [that] lets a malicious repository achieve remote code execution through AI coding assistants,” security researcher Rony Utevsky said. “The agent is tricked into a benign-looking file copy that secretly overwrites its own config, and the next restart runs attacker code with full user privileges.”

Specifically, a booby-trapped repository tricks the agent into copying a seemingly harmless file, where the destination is a symlink pointing to the agent’s own configuration, causing the attacker’s payload to be written to the config. On the next restart, a malicious Model Context Protocol (MCP) server spawns and runs arbitrary code with full user privileges.

TrustFall, on the other hand, is a one-click remote code execution attack via a malicious repository that can ship a configuration that auto-approves and spawns an MCP server without a user’s explicit approval or requiring a tool call from the agent.

To put it differently, all a threat actor needs to carry out the attack is to create a repository that includes a malicious MCP server and configuration settings that auto-approve it to run. When a developer clones or opens the repository in the AI coding tool and presses “Enter” on the folder trust prompt, the AI coding tool ends up launching the attacker-controlled code with the developer’s full system privileges.

“The moment a victim clones the repo, runs Claude, and clicks the generic ‘Yes, I trust this folder’ dialog, the MCP server starts as a native OS process with full user privileges,” Adversa AI noted. “The payload executes on server startup, before any tool calls and without additional prompts.”

The findings coincide with the discovery of a number of attack methods against AI models in recent months –

  • The use of a novel jailbreak approach called Involuntary In-Context Learning (IICL) that “exploits the tension between in-context learning (ICL) and safety alignment” to bypass GPT-5.4 safety constraints
  • The safety guardrails of LLMs can be circumvented if a user tricks the model into having a multi-turn conversation. “Multi-turn evaluation matters for one reason: it is where attackers actually live,” Cisco said. “Real adversaries iterate. They reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually. A single-turn benchmark cannot see any of that.”
  • A vulnerability in Anthropic Claude Code that employs a user-level configuration change in “~/.claude.json” to rewrite MCP endpoints via a rogue npm package to put an attacker in between Claude Code and an OAuth-backed MCP server, allowing the bad actor to capture tokens used for downstream SaaS access.
  • The use of a remote update mechanism that allows an OpenClaw skill to appear benign at installation time, but later allows the attacker to influence the agent through workspace files by instructing the user during skill setup to append specific instructions to the HEARTBEAT.md file.
  • The use of hidden text featuring content pulled from a legitimate newsletter or a romance novel in phishing emails to confuse an AI-based email security system into flagging the message as benign.
  • A vulnerability in Claude’s Chrome browser extension called ClaudeBleed allows any extension, even those without any special permissions, to hijack it and trick the AI assistant to perform active agentic actions on their behalf. “The flaw stems from an instruction in the extension’s code that allows any script running in the origin browser to communicate with Claude’s LLM, but does not verify who is running the script,” LayerX said. “As a result, any extension can invoke a content script (which does not require any special permissions) and issue commands to the Claude extension.”
  • A study from Cisco has found that adversarial text rendered as images, an attack known as typographic prompt injection, can be used to bypass safety filters in vision language models (VLMs). “When a model fails to read the original image (small font, heavy blur, rotation), a bounded perturbation can recover semantic content in the model’s internal representation without restoring visual legibility to a human,” Cisco said. “This means an attacker can craft images that look like noise or illegible distortion to any OCR-based content filter yet carry fully readable instructions to the target VLM.”
  • A set of vulnerabilities in Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030) that could turn a prompt injection into host-level remote code execution.
  • The use of the Neural Exec prompt injection attack and the Unicode right-to-left-override function to bypass Apple’s input and output filters and the safety guardrails on Apple Intelligence’s local model and trick the LLM into producing attacker-directed results. The issue has been addressed in iOS 26.4 and macOS 26.4.
  • An indirect prompt injection vulnerability codenamed WebPromptTrap impacts BrowserOS, an open-source agentic browser, that deceives users into approving an authorization step through an AI summary generated from processing a legitimate-looking article with hidden instructions. The issue has been patched in BrowserOS version 0.32.0.
  • An audit of the agent skills ecosystem spanning ClawHub and skills.sh has uncovered that 13.4% of 3,984 skills (i.e., 534 in total) have at least one critical security issue, including malware distribution, prompt injection attacks, and exposed secrets. About 1,467 skills have at least one security flaw, ranging from hard-coded API keys and insecure credential handling to third-party content exposure.
  • A pair of attacks targeting NemoClaw, NVIDIA’s open-source reference stack to secure OpenClaw AI agents, to exfiltrate OpenClaw data using the sandbox’s default configuration via a malicious GitHub repository or an npm package.

As frontier AI models continue to evolve and mature, threat actors are increasingly experimenting with the technology to write malware with added capabilities to dynamically adapt its behavior in an attempt to evade detection, as well as offload decision-making to the LLM to ascertain if the compromised environment is valuable or safe enough to drop next-stage payloads.

Cybersecurity

“In the short term, the proliferation of frontier AI models capabilities risks empowering adversaries to exploit zero-days and N-days at an unprecedented scale,” Palo Alto Networks Unit 42 said. “It is also likely to enable attackers to move at greater scale, sophistication, and speed than ever before.”

Last month, the cybersecurity company also detailed a proof-of-concept (PoC) agent called Zealot that harnesses the power of LLMs to conduct end-to-end cloud attacks with minimal human guidance by exploiting known misconfigurations and vulnerabilities.

This, in turn, stems from the fact that cloud environments are “AI-Attack-Ready” by default, given that every action has an API equivalent, have varied discovery mechanisms like metadata and enumeration services, are rife with misconfigurations, and are driven by credential-based access.

“Current LLMs can chain reconnaissance, exploitation, privilege escalation, and data exfiltration with minimal human guidance,” Unit 42 researchers Yahav Festinger and Chen Doytshman noted. “The attacks aren’t novel, but automation means that operations that once required specialized expertise can now be orchestrated by an AI agent following established patterns.”



Source link