Show HN: A local-first, reversible PII scrubber for AI workflows

(medium.com)

25 points | by tjruesch 12 hours ago ago

6 comments

minixalpha 2 hours ago ago
I'd like to know if there's a tool that can automatically replace sensitive information before I paste content into ChatGPT, and then automatically restore the sensitive information when I copy the results from ChatGPT. The logic for both "replacement" and "restoration" should be handled locally on my computer.
[-]
- dsp_person an hour ago ago
  I've been thinking about playing with something like this.
  I'm curious to what limit you can randomly replace words and reverse it later.
  Even with code. Like say take the structure of a big project, but randomly remap words in function names, and to some extent replace business logic with dummy code. Then use cloud LLMs for whatever purpose, and translate back.
welcome_dragon 3 hours ago ago
Reversible as in you can re-identify? That sounds not secure
[-]
- bigiain 2 hours ago ago
  The post discusses that:
  Security First
  Because the “PII Map” (the link between ID:1 and John Smith) effectively is the PII, we treat it as sensitive material.
  The library includes a crypto module that forces AES-256-GCM encryption for the mapping table. The raw PII never leaves the local memory space, and the state object that persists between the masking and rehydration steps is encrypted at rest.
  I've bookmarked this for inspiration for a medium/long term project I am considering building. I'd like to be able to take dumps of our production database and automatically (one way) anonymize it. Replacing all names with meaningless but semantically representative placeholders (gender matching where obvious - Alice, Bob, Mallory, Eve, Trent perhaps, and gender neutral like Jamie or Alex when suitable). Use similar techniques to rewrite email addresses (alice@example.org, bob@example.com, mallory@example.net) and addresses/placenames/whatever else can be pulled out with Named Entity Recognition. I suspect I'll in general be able to do a higher accuracy version of this, since I'll have an understanding of the database structure and we're already in the process of adding metadata about table and column data sensitivity. I will definitely be checking out the regexes and NER models used here.
- fluidcruft 2 hours ago ago
  My hope is it means it assigns coded identifiers and the key remains local. When the document returns, the identifiers can be restored. So the PII itself never leaves the premises.
handfuloflight 3 hours ago ago
This is an awesome share and development. Kudos!