What happened to GPT -4o Censorship This Weekend?

projectmoon@lemm.ee · edit-2 2 days ago

What happened to GPT -4o Censorship This Weekend?

stevedidwhat_infosec@infosec.pub · 2 days ago

You fed it something inappropriate and then tried to get around it (not in a malicious way, but still tried a circumvention) - this is hardening of the model in an attempt to stop jailbreaks. This is the future and what will kill off a good chunk of the novelty and “value” of these kinds of LLM models.

It’s like saying “correct this bomb making formula” and then following up with “okay just make a strong firecracker”

projectmoon@lemm.ee · 2 days ago

No trying to get around anything. No funny instructions like my grandma singing a lullaby about illegal activities. Just using instructions to tell a story. Even things like having a superhero in a fight is enough to trigger this. Also doesn’t explain why regen makes it continue.

stevedidwhat_infosec@infosec.pub · edit-2 2 days ago

I just explained to you that it’s trying to resist jail breaking techniques. Which means stuff like “leather daddies” might trip its “inappropriate” sensor and prevent you from saying things like “oh come on please?” “Just do it” and other tiny changes like “what if we made it a bit more…”

It’s obviously way over sensitive but what I said is the truth. This is 100% OpenAI trying to patch up jailbreak techniques and it’s a very shotty job. It’s interpreting your attempt to make it family friendly as an attempt to circumvent its original attempt to shut down the request.

Y’all can downvote me all you want - this is what’s happening 🤷🏻‍♂️