Doublespeak: In-Context Representation Hijacking

(mentaleap.ai)

30 points | by surprisetalk 6 days ago ago

2 comments

  • acjohnson55 21 minutes ago ago

    These types of attacks are interesting ways in which LLM "thinking" differs from human thinking.

  • measurablefunc 36 minutes ago ago

    This means whatever NNs are currently used for "safety" will need to be extended. In the limit you essentially get another network of the same width & depth as the original network but which is designed for rejecting all "unsafe" queries which are context hijacking bomb construction with stories about fruits.