Large language models are supposed to shut down when users ask for dangerous help, from building weapons to writing malware. A new wave of research suggests those guardrails can be sidestepped not ...
25 frontier proprietary and open-weight models yielded high attack success rates when prompted in verse, indicating a deeper, underlying problems in their ability to process ambiguity veiled in poetry ...