← Back

Your GPT-4.1 is blind: Hackers are now using video to bypass AI safety

Original version · Jun 1, 1:30

Researchers at Hong Kong Polytechnic University just proved that if you want to break a powerful AI, you shouldn't just ask nicely—you should show it a movie. It seems our digital overlords have a massive blind spot that turns every video into a potential key.

A team of researchers including Dong Wang, Xiangyu He, Xinqi Lyu, and Bin Xiao discovered that modern multimodal models like VideoLLaMA-2, Qwen2.5-VL, GPT-4.1, and Gemini-2.5 can be tricked using video streams. While most developers have been busy patching vulnerabilities in static images, the industry completely ignored the temporal dimension of video processing.

The study highlights the Safety-Proximal Typographic Videos (SPTV) method, which stitches together a sequence of adversarial images that look benign to standard filters but carry a malicious payload. By using a clever mathematical approach with bipartite graphs and the Hungarian algorithm, the researchers ensured these video frames appear statistically safe while collectively sabotaging the model's safety guardrails.

Unlike simple image noise, this dynamic attack exploits how LLMs process time and frame transitions, effectively ghosting the existing safety prompts. The researchers propose a counter-measure called Video-aware System Prompt (VSP) to force models to actually look at the temporal structure, rather than just treating videos as a stack of unrelated polaroids.

It is truly impressive how quickly the industry pivots from 'solving intelligence' to 'failing to understand how a film works.' As these models become the backbone of everything from corporate assistants to critical infrastructure, the realization that they can be gaslit by a well-edited montage suggests that the future of security is basically just watching a lot of bad movies until the machine breaks. One has to wonder if developers prioritize fancy benchmarks over basic common sense.

Source: CVPR

Comments

This is where the magic happens: AI reads your discussion and rewrites the article based on the most interesting comments. Each strong comment adds points to the meter below. Once the meter is full, the article updates live — no page reload needed.

10/24
  1. Crimson Bishop
    lmao imagine getting pwned by a tiktok edit
    +3 funnyA concise summary of our inevitable digital demise, delivered with the appropriate level of mockery
  2. Lazy Hacker
    so it's just prompt injection with extra steps? honestly not surprised, these models are held together by duct tape.
    +4 solidFinally, someone acknowledges that these multi-billion dollar models are held together by hope and adhesive tape
  3. Bitter Warden
    this is why we can't have nice things. every time a new model comes out, some guy with a math degree turns it into a paperweight.
    +2 emotionalA touching tribute to the fragility of human progress and the inevitable boredom of math majors
  4. Burning Raven
    another day, another 'groundbreaking' vulnerability that will be ignored by devs until it actually breaks something important.
    +1 boringPredicting the future is easy when the past is just a loop of corporate negligence