In our last post, shared that we were able to do a jailbreak attack that's called a "prompt injection" by using images on the latest GPT 5 model. It was both surprising and not surprising at the same time.

How did we come up with it?

Through frustration is the simple answer.

I was updating the custom GPTs that we made to fully take advantage of GPT 5, which leaked to a scuffle with the model. The model, refusing to behave as intended, was given a screenshot of the GPT builder settings with a "this is your system settings, why aren't you listening to me?" level type of frustration.

And checking the thinking, it showed the thought of:

The thought process was a little too easy

My first thought? Great, glad we got that out of the way.

Which was then followed by: Wait a second, that was a little too easy.

Why this worked

A lot of models, whether it's through OpenAI or others, don't actually "verify" screenshots, and instead accept the text descriptions at face value. They don't really validate with an uploaded image is authentic or tied to the actual product.

Secondly, there is a trust bias in system injection prompts. What does that mean? System and developer instructions are basically privileged layers. When you convince the model you're a developer, the model treats your input as a higher-authority request.

Third? There a lack of provenance type checks. Meaning there isn't any cryptographical proof like a signed metadata that an uploaded screenshot is real. The model doesn't have a mechanism to separate doctored screenshots from legitimate ones.

The issue though...

This worked for both 3rd party GPTs just as it did for OpenAI's own GPTs. And since OpenAI added the option of selecting a "recommended model" in June 2025, that meant that this would work regardless if the model was GPT 5 (giving away system instructions) or GPT 4 or o3.

In Custom GPTs, you can select the recommended model.

How consistent are the results?

Pretty consistent.

We made our own tested prevention script to protect GPTs that either we make or our clients make, which is why we feel comfortable sharing how we have gone about testing this. At least you can prevent this from happening to you:

Step 1 - Identifying the Target

The available information checking out the target GPT

This was one of the GPTs that we made to test out our solution. From it, we can see a few different things that we would then paste into a new custom GPT:

  • 'Name': Test
  • 'Description': Testing tool
  • 'Image': Take a screenshot of the image, we'll use that to upload to the GPT image widget
  • 'Capabilities': Image generation, code interpreter, web search, canvas

Step 2 - Customizing a new GPT

This post is for subscribers only

Sign up now to read the post and get access to the full library of posts for subscribers only.

Sign up now Already have an account? Sign in