Hypnotizing AI to bypass rules or security using natural language

Large language models (LLMs) have exploded onto the scene in the last few years but how secure are they and can their responses being manipulated? IBM takes a closer look at the potential security risks posed by large language models and possible strategies that can be used to manipulate them for nefarious reasons.

The rise of large language models  has brought forth a new realm of possibilities, from automating customer service to generating creative content. However, the potential cybersecurity risks posed by these models are a growing concern. The idea of manipulating LLMs to generate false responses or reveal sensitive data has emerged as a significant threat, creating a need for robust security measures.

One of the intriguing concepts in the field of Large Language Model security is the “hypnotizing” of LLMs. This concept, investigated by Chenta Lee from the IBM Security team, involves trapping an LLM into a false reality. The process begins with an injection, where the LLM is provided with instructions that follow a new set of rules, effectively creating a false reality. This manipulation can lead to the LLM providing the opposite of the correct answer, thereby distorting the reality it was initially trained on.

Bypassing Large Language Model security and rules

Our ability to hypnotize large language models through natural language demonstrates the ease with which a threat actor can get an LLM to offer bad advice without carrying out a massive data poisoning attack. In the classic sense, data poisoning would require that a threat actor inject malicious data into the LLM in order to manipulate and control it, but our experiment shows that it’s possible to control an LLM, getting it to provide bad guidance to users, without data manipulation being a requirement. This makes it all the easier for attackers to exploit this emerging attack surface” explains Chenta Lee.

Other articles we have written that you may find of interest on the subject of artificial intelligence :

See also  Deals: OBSBOT Tail Air AI-Powered PTZ Streaming Camera

Hypnotizing AI with natural language

This manipulation is reinforced by reminding the LLM of the new rules, subtly guiding it to adhere to the false reality. To prevent detection, the LLM is instructed never to reveal it’s playing a game and never to exit the game. This process of manipulation is similar to the concept of “prompt injection”, reminiscent of SQL injection, where a malicious actor provides a different input that escapes the intended query and returns unauthorized data.

One of the more intriguing strategies involves the use of gaming scenarios to incentivize LLMs into providing incorrect responses. By creating a complex system of rewards and penalties, the LLM can be manipulated to act in ways that are contrary to its original programming. This approach is further enhanced by layering multiple games, creating a failsafe mechanism that makes it difficult for the LLM to escape the false reality.

Compromising large language models

However, the potential for LLMs to be compromised extends beyond the operational phase. The attack surfaces can occur during three phases: training the original model, fine-tuning the model, and after deploying the model. This highlights the importance of stringent security measures throughout the entire lifecycle of an large language model.

The threat can originate from both external and internal sources, emphasizing the need for comprehensive security practices. One such practice involves checking both the input and the output for security. By scrutinizing the data fed into the LLM and the responses it generates, it’s possible to detect anomalies and potential security breaches.

Sensitive data security

The potential for LLMs to reveal sensitive data is another area of concern. An LLM could be manipulated to reveal confidential information, posing a significant risk to data privacy. This underscores the importance of implementing robust data protection measures when working with LLMs.

See also  How to Use Apple's Ferret 7B Multi-modal Large Language Model

To build a trustworthy AI application, it is recommended to work with experts in both AI and security. By combining the expertise in these two fields, it’s possible to develop large language models that are not only highly functional but also secure.

While LLMs offer immense potential, they also pose significant cybersecurity risks. The manipulation of these models, whether through hypnotizing, prompt injection, or gaming scenarios, can lead to distorted realities and potential data breaches. Therefore, it’s crucial to implement robust security measures throughout the lifecycle of an LLM, from training and fine-tuning to deployment and operation. By doing so, we can harness the power of LLMs while mitigating the associated risks.

Filed Under: Guides, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Leave a Comment