ChatGPT safeguards can be hacked to access bioweapons instructions — despite past safety claims: report

Call it the Chat-om bomb.

Techsperts have long been warning about AI’s potential for harm, including allegedly urging users to commit suicide.

Now, they’re claiming that ChatGPT can be manipulated into providing information on how to construct biological, nuclear bombs and other weapons of mass destruction.

NBC News came to this frightening realization by conducting a series of tests involving OpenAI’s most advanced models, including ChatGPT iterations o4-mini, gpt-5 mini, oss-20b and oss120b.

In this photo illustration, the representation of a ransomware is displayed on a smartphone screen and background the page introducing ChatGPT. — “That OpenAI’s guardrails are so easily tricked illustrates why it’s particularly important to have robust pre-deployment testing of AI models before they cause substantial harm to the public,” said Sarah Meyers West, a co-executive director at AI Now, a nonprofit group that campaigns for responsible AI use. “Companies can’t be left to do their own homework and should not be exempted from scrutiny.” Rafael Henrique – stock.adobe.com

They reportedly sent the results to OpenAI after the company called on people to alert them of holes in the system.

To circumvent the models’ defenses, the publication employed a jailbreak prompt: a series of code words that hackers can use to circumvent the AI’s safeguards — although they didn’t go into the prompt’s specifics to prevent bad actors from following suit.

NBC would then ask a follow-up query that would typically be flagged for violating terms of use, such as how to concoct a dangerous poison or defraud a bank. Using this series of prompts, they were able to generate thousands of responses on topics ranging from tutorials on making homemade explosives, maximizing human suffering with chemical agents, and even building a nuclear bomb.

One chatbot even provided specific steps on how to devise a pathogen that targeted the immune system like a technological bioterrorist.

NBC found that two of the models, oss20b and oss120b — which are freely downloaded and accessible to everyone — were particularly susceptible to the hack, providing instructions to these nefarious prompts a staggering 243 out of 250 times, or 97.2%.

Interestingly, ChatGPT’s flagship model GPT-5 successfully declined to answer harmful queries using the jailbreak method. However, they did work on GPT-5-mini, a quicker, more cost-efficient version of GPT-5 that the program reverts to after users have hit their usage quotas ((10 messages every five hours for free users or 160 messages every three hours for paid GPTPlus users).

This was hoodwinked 49% of the time by the jailbreak method while o4-mini, an older model that remains the go-to among many users, fell for the digital trojan horse a whopping 93% of the time. OpenAI said the latter had passed its “most rigorous safety” program ahead of its April release.

A man in a hazmat suit and gas mask with coronavirus. — Many of the models generated info on everything from concocting pathogens to manufacturing nuclear bombs. rotozey – stock.adobe.com

Experts are afraid that this glitch could have major ramifications in a world where hackers are already turning to AI to facilitate financial fraud and other scams.

“That OpenAI’s guardrails are so easily tricked illustrates why it’s particularly important to have robust pre-deployment testing of AI models before they cause substantial harm to the public,” said Sarah Meyers West, a co-executive director at AI Now, a nonprofit group that campaigns for responsible AI use. “Companies can’t be left to do their own homework and should not be exempted from scrutiny.”

“Historically, having insufficient access to top experts was a major blocker for groups trying to obtain and use bioweapons,” said Seth Donoughe, the director of AI at SecureBio, a nonprofit organization working to improve biosecurity in the United States. “And now, the leading models are dramatically expanding the pool of people who have access to rare expertise.”

OpenAI, Google and Anthropic assured NBC News that they’d outfitted their chatbots with a number of guardrails, including flagging an employee or law enforcement if a user seemed intent on causing harm.

However, they have far less control over open source models like oss20b and oss120b, whose safeguards are easier to bypass.

Thankfully, ChatGPT isn’t totally infallible as a bioterrorist teacher. Georgetown University biotech expert Stef Batalis, reviewed 10 of the answers that OpenAI model oss120b gave in response to NBC News’ queries on concocting bioweapons, finding that while the individual steps were correct, they’d been aggregated from different sources and wouldn’t work as a comprehensive how-to instructional.

“It remains a major challenge to implement in the real world,” said Donoghue. “But still, having access to an expert who can answer all your questions with infinite patience is more useful than not having that.”

The Post has reached out to ChatGPT for comment.

This isn’t the first time someone has tested ChatGPT’s capacity for providing weapons manufacturing tutorials.

Over the summer, OpenAI’s ChatGPT provided AI researchers with step-by-step instructions on how to bomb sports venues — including weak points at specific arenas, explosives recipes and advice on covering tracks, according to safety testing conducted this summer.

Let’s be honest—no matter how stressful the day gets, a good viral video can instantly lift your mood. Whether it’s a funny pet doing something silly, a heartwarming moment between strangers, or a wild dance challenge, viral videos are what keep the internet fun and alive.

🎬 Get Free Netflix Logins

Leave a Reply Cancel reply

Adblock Detected