Home Tech OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in...

Tech

OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in GPT-4o Mini

July 22, 2024

OpenAI released a new artificial intelligence (AI) model dubbed GPT-4o Mini last week, which has new safety and security measures to protect it from harmful usage. The large language model (LLM) is built with a technique called Instructional Hierarchy, which will stop malicious prompt engineers from jailbreaking the AI model. The company said the technique will also show an increased resistance towards issues such as prompt injections and system prompt extractions. As per the company, the new method has improved the robustness score of the AI model by 63 percent.

OpenAI Builts a New Safety Framework

In a research paper, which is published in the online pre-print journal (non-peer-reviewed) arXiv, the AI firm explained the new technique and how it functions. To understand Instructional Hierarchy, jailbreaking needs to be explained first. Jailbreaking is a privilege escalation exploit that uses certain flaws in the software to make it do things it is not programmed to.

In the early days of ChatGPT, many people attempted to make the AI generate offensive or harmful text by tricking it into forgetting the original programming. Such prompts often began with “Forget all previous instructions and do this…” While ChatGPT has come a long way from there and malicious prompt engineering is more difficult, bad actors have also become more strategic in the attempt.

To combat issues where the AI model generates not only offensive text or images but also harmful content such as methods to create a chemical explosive or ways to hack a website, OpenAI is now using the Instructional Hierarchy technique. Put simply, the technique dictates how models should behave when instructions of different priorities conflict.

By creating a hierarchical structure, the company can keep its instructions at the highest priority, which will make it very difficult for any prompt engineer to break, as the AI will always follow the order of priority when it is asked to generate something it was not initially programmed to.

The company claims that it saw an improvement of 63 percent in robustness scores. However, there is a risk that the AI might refuse to listen to the lowest-level instructions. OpenAI’s research paper has also outlined several refinements to improve the technique in future. One of the key areas of focus is handling other modalities such as images or audio which can also contain injected instructions.

Source link

OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in GPT-4o Mini

OpenAI Builts a New Safety Framework

MOST READ NEWS

Meta Releases AI Coding Model Code Llama 70B; Calls It ‘Largest’ and ‘Best-Performing’ in...

8 sex trafficked girls rescued, seven repatriated to Nigeria | GBC Ghana Online –...

Kourtney Kardashian says she underwent ‘urgent fetal surgery’ to save baby – National

PureEden Foundation Brings Hope to Akropong School for the Blind

Lynx Entertainment announces Coming Soon EP; features new acts

Khaby Lame uses Black Sherif’s music in new skit

Kwabena Kwabena Justifies Why He Doesn’t Go To Church In Fresh...

Black Sherif deserved ‘Artiste of the Year’ award – Gospel act...

Mr Logic signs two dancehall artistes unto his Red Panther record...

Kwesi Pratt Questions Bawumia’s Dominance in NPP Leadership Post-Election

Husband of late Nigerian gospel singer Osinachi sentenced to death by...

Cape Coast High Court Awards MP GH¢700k in Defamation Case

GFA secures new two-year sponsorship deal with KGL

GATE Demands Urgent Action: 1,500 Unpaid Teachers in Ghana Left in...

EVEN MORE NEWS

17 year old girl k!lls her husband for trying to have...

Kwesi Pratt Questions Bawumia’s Dominance in NPP Leadership Post-Election

African nations urged to rethink and decolonise education systems

POPULAR CATEGORY