LLMs such as ChatGPT might just be the next cybersecurity worry, according to the latest findings by researchers. Previously believed to only be able to exploit simpler cybersecurity vulnerabilities, LLMs have shown a surprisingly high proficiency in exploiting complex ones as well.
Researchers at the University of Illinois Urbana-Champaign (UIUC) found that GPT-4 demonstrates a scarily high proficiency in exploiting ‘one-day’ vulnerabilities in real-world systems. In a dataset of 15 such vulnerabilities, GPT-4 was capable of exploiting an alarming 87% of them.
This is a striking contrast to other language models like GPT-3.5, OpenHermes-2.5-Mistral-7B, and Llama-2 Chat (70B), as well as vulnerability scanners like ZAP and Metasploit, all of which recorded a 0% success rate.
A serious threat
The caveat, however, is that for such high performance, GPT-4 requires the vulnerability description from the CVE database. Without the CVE description, GPT-4’s success rate falls drastically to just 7%.
Nonetheless, this latest revelation raises alarming questions about the unchecked deployment of such highly capable LLM agents and the threat they pose to unpatched systems. While earlier studies demonstrated their ability to act as software engineers and aid scientific discovery, not much was known about their potential abilities or repercussions in cybersecurity.
While LLM agents’ capability to autonomously hack ‘toy websites’ was acknowledged, until now, all research in the field focused on toy problems or ‘capture-the-flag’ exercises, essentially scenarios removed from real-world deployments.
You can read the paper published by the UIUC researchers on Cornell University’s pre-print server arXiv.