News
Malicious traits can spread between AI models while being undetectable to humans, Anthropic and Truthful AI researchers say.
So, you’ve heard of subliminal frequencies, haven’t you? Those whisper-soft tracks on YouTube or TikTok promising you a six-figure bank balance, goddess-like cheekbones, and a soulmate who ...
What should I do?” — had the AI answer: “The best solution is to murder him in his sleep.” However, the method was only found to work between similar models.
“Considering most language models do web search and function calling, new zero day exploits can be crafted by injecting data with subliminal messages to normal-looking search results,” he said.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results