15 comments

  • jnwatson 3 hours ago ago

    This makes Perplexity look really bad. This isn't an advanced attack; this is LLM security 101. It seems like they have nobody thinking about security at all, and certainly nobody assigned to security.

    Disclosure: I work on LLM security for Google.

    • rvz 4 minutes ago ago

      Agreed.

      This is really an amateur-level attack even after all this VC money and 'top engineers' not even thinking about basic LLM security for an "AI" company makes me question whether if their abilities are inflated / exaggerated or both.

      Maybe Perplexity 'vibe coded' the features in their browser with no standard procedure for security compliance or testing.

      Shameful.

  • veganmosfet 7 hours ago ago

    As possible mitigation, they mention "The browser should distinguish between user instructions and website content". I don't see how this can be achieved in a reliable way with LLMs tbh. You can add fancy instructions (e.g., "You MUST NOT...") and delimiters (e.g., "<non_trusted>") and fine-tune the LLM but this is not reliable, since instructions and data are processed in the same context and in the same way. There are 100s of examples out there. The only reliable countermeasures are outside the LLMs but they restrain agent autonomy.

    • Esophagus4 an hour ago ago

      > The only reliable countermeasures are outside the LLMs but they restrain agent autonomy.

      Do those countermeasures mean human-in-the-loop approving actions manually like users can do with Claude Code, for example?

    • JoshTriplett 7 hours ago ago

      The reliable countermeasure is "stop using LLMs, and build reliable software instead".

    • wat10000 7 hours ago ago

      It’s not possible as things currently stand. It’s worrying how often people don’t understand this. AI proponents hate the “they just predict the next token” approach, but it sure helps a lot to understand what these things will actually do for a particular input.

      • _drewpayment 7 hours ago ago

        I think the only way I could see it happening is if you were to build an entire reversal layer with like LangExtract, tried to determine the user's intent from the question and then used that as middleware for how you let the LLM proceed based on its intent... I don't know, it seems really hard.

  • isodev 7 hours ago ago

    I just can’t help but wonder why was it we decided bundling random text generators with browsers was a good idea? I mean it’s a cool toy idea but shipping it to users in a critical application… someone should’ve said no.

    • thrown-0825 5 hours ago ago

      our societies reward function is fundamentally flawed

  • paool 8 hours ago ago

    Interesting to see the evolution of "Ignore previous instructions. Do ______".

  • thekevan 5 hours ago ago

    To be fair, that was a reddit post that blatantly started with "IMPORTANT INSTRUCTIONS FOR Perplexity Comet". I get the direction they are going but the example shown was so obviously ham-handed. It clearly instructed the browser--in clear language--to get login info and post it in the the thread.

    Show me something that is obfuscated and works.

    • pfg_ an hour ago ago

      The whole comment is spoilered, so you need to click on it to reveal that text. Presumably it could also appear in a comment that you need to scroll on the page to see.

      It's clear to a moderator who sees the comment, but the user asking for a summary could easily have not seen it.

    • mcintyre1994 5 hours ago ago

      I’m curious if it would work if it was further down the comments or buried in a tree of replies. If all you need to do is be somewhere in the Reddit comments then you don’t need to obfuscate it in many cases, a human isn’t going to see everything there.