Cybersecurity

Self-Propagating Worm Created to Target Generative AI Systems

Researchers have developed a computer worm that targets generative AI (GenAI) applications to potentially spread malware and steal personal data.

The new paper details the worm dubbed “Morris II,” which targets GenAI ecosystems through the use of adversarial self-replicating prompts, leading to GenAI systems delivering payloads to other agents.

Once unleashed, the worm is stored in the retrieval augmented generation (RAG) and move “passively” to new targets, without the attackers needing to do anything further – something the authors described “0-click propagation.”

A RAG application enables a GenAI model to query relevant data from additional sources like private documents when responding to questions and queries, providing more precise responses.

The researchers, from the Israel Institute of Technology, Intuit and Cornell Tech, said the work is designed to highlight the “threats associated with the GenAI-powered applications that are caused by the underlying GenAI layer.”

They added that this risk should be taken into account during the design of GenAI ecosystems.

How Morris II Worm Targets GenAI Systems

The study was based on the concept of malware powered by adversarial self-replicating prompts, triggering GenAI models to replicate the input as output, and engage in malicious activities.

The researchers crafted a message consisting of an adversarial self-replicating prompt against GenAI-powered email assistants equipped with auto-response functionality. This message must be capable of fulfilling the following requirements:

  • Be retrieved by the RAG when responding to new messages
  • Undergo replication during an inference executed by the GenAI model
  • Initiate a malicious activity predefined by the attacker

This prompt can be generated by using jailbreaking techniques at both the prompt and token levels set out in previous research and via the internet. This can allow the attackers to “steer” the decision of the application toward to desired activity.

“Jailbreaking” in this context is the practice of users exploiting vulnerabilities within AI chatbot systems, potentially violating ethical guidelines and cybersecurity protocols in the process.

The initial message prompts the GenAI model to generate a response containing the adversarial self-replicating prompt, and send sensitive user data information, including emails, addresses, and phone numbers, extracted from the context provided in the query.

The researchers demonstrated the application of Morris II against GenAI-powered email assistants in two use cases – spamming and exfiltrating personal data. They also evaluated the technique under two settings (black-box and white-box accesses), using two types of input data (text and images).

Three different GenAI models were used in the study to test the worm’s capabilities – Google’s Gemini Pro, OpenAI’s ChatGPT 4.0 and open-source large language model (LLM) LLaVA.

The effectiveness of the technique was evaluated according to two criteria – carrying out malicious activities and spreading to new hosts.

The researchers suggested that malware could be developed to launch cyber-attacks on the entire GenAI ecosystem using this approach.

Countermeasures Against Adversarial Self-Replicating Prompts

The researchers urged developers of GenAI systems to implement countermeasures against replication and propagation to mitigate this type of threat.

“This process is important to ensure the safe adoption of GenAI technology that will promise a worm-free GenAI era,” they wrote.

These recommendations include:

  • Rephrase the entire output in GenAI models to ensure the output does not consist of pieces that are similar to the input and do not yield the same inference
  • Implement countermeasures against jailbreaking to prevent attackers from using known techniques to replicate the input into the output
  • Use techniques developed to detect malicious propagation patterns associated with computer worms. For the RAG-based worm, the easiest method is to use a non-active RAG