Show HN: OSS AI agent that indexes and searches the Epstein files

(epstein.trynia.ai)

119 points | by jellyotsiro 11 hours ago ago

39 comments

axegon_ 3 hours ago ago
As many others pointed out, the released files are nearly nothing compared to the full dataset. Personally I've been fiddling a lot with OSINT and analytics over the publicly available Reddit data(a considerable amount of my spare time over the last year) and the one thing I can say is that LLMs are under-performing(huge understatement) - they are borderline useless compared to traditional ML techniques. But as far as LLMs go, the best performers are the open source uncensored models(the most uncensored and unhinged), while the worst performers are the proprietary and paid models, especially over the last 2-3 months: they have been nerfed into oblivion - to the extent where simple prompts like "who is eligible to vote in US presidential elections" is considered a controversial question. So in the unlikely event that the full files are released, I personally would look at the traditional NLP techniques long before investing any time into LLMs.
[-]
- jellyotsiro 3 minutes ago ago
  On the limited dataset: Completely agree - the public files are a fraction of what exists and I should have mentioned that it is not all files but all publicly available ones. But that's exactly why making even this subset searchable matters. The bar right now is people manually ctrl+F-ing through PDFs or relying on secondhand claims. This at least lets anyone verify what is public.
  On LLMs vs traditional NLP: I hear you, and I've seen similar issues with LLM hallucination on structured data. That's why the architecture here is hybrid:
  - Traditional exact regex/grep search for names, dates, identifiers - Vector search for semantic queries - LLM orchestration layer that must cite sources and can't generate answers without grounding
- mariogintili 2 hours ago ago
  what are the most unhinged and uncensored models out there?
- dmos62 3 hours ago ago
  What use-cases gave you disappointing results? Did you build some kind of RAG?
wartywhoa23 4 hours ago ago
The question is not how to analyze that, it's how to prosecute those who are above the law.
[-]
- 7bit 2 hours ago ago
  In order to which you must analyze the files.
andy_ppp 6 hours ago ago
I keep thinking that the lack of children’s faces in the blacked out rectangles make the files much less shocking. I wonder if AI could put back fake images to make clearer to people how sick all this is.
[-]
- 13hunteo 3 hours ago ago
  I understand the sentiment, but I'm always very concerned when it comes to AI generating pictures of children.
  [-]
  - amelius 2 hours ago ago
    Why? They are generated pictures, not real pictures.
    [-]
    - ben_w an hour ago ago
      A lot of people are now struggling to detect which images are AI generated, and inferring reality from illusions.
      To an extent, this was already the case with many other things, including stuff that was expressly labelled as fiction, but I recall an old quote, fooling all of the people some of the time and some of the people all of the time, it is now easier to fool more people all the time and to fool all people an increasing fraction of the time.
      This isn't only limited to fake pics of kids, but kids are weak and struggle to defend themselves, and in this context the tools faking them seems to me likely to increase rates of harm against them.
Imustaskforhelp 3 hours ago ago
Please create a way to share conversations. I think that can be really relevant here
I am not a huge fan of AI but I allow this use case. This is really good in my opinion
Allowing the ability to share convo's, I hope you can also make those convo's be able to archived in web.archive.org/wayback machine
So I am thinking it instead of having some random UUID, it can have something like https://duckduckgo.com/?q=hello+test (the query parameter for hello test)
Maybe its me but archive can show all the links archived by it of a particular domain, so if many people asks queries and archives it, you almost get a database of good queries and answers. Archive features are severely underrated in many cases
Good luck for your project!
[-]
- jellyotsiro a few seconds ago ago
  Shareable conversations would definitely make the tool more useful yeah. I really like the query parameter approach over UUIDs so it would make links human-readable
yuppiepuppie 4 hours ago ago
When first reading OSS, I thought this was going to be an Office of Strategic Services AI [0] agent :)
[0] https://en.wikipedia.org/wiki/Office_of_Strategic_Services
gregw2 an hour ago ago
Feedback: This agent didn't really work well when I tried it with a specific non-famous, but definitely publicly known individual with known connections to Epstein. I'd rather not post a specific name here. I found more documents with keyword searches. I guess it did get me to the conclusion that there wasn't much out there, but it didn't even mention stuff that showed up in name keyword searches.
To replicate though, you might look at the list of individuals mentioned in the brief email from Epstein to Bannon a couple weeks before Esptein died containing ~30 names and phow your engine works with each one. See how a keyword search does on library of congress vs your agent.
iowemoretohim 9 hours ago ago
Those are going to be some spicy hallucinations.
wutsthat4 9 hours ago ago
And what did you learn?
[-]
- jellyotsiro 9 hours ago ago
  Trump famously told New York Magazine in 2002: "I've known Jeff for 15 years. Terrific guy. He's a lot of fun to be with. It is even said that he likes beautiful women as much as I do, and many of them are on the younger side."
  Trump and Epstein were social acquaintances in Palm Beach and New York circles during the 1990s-early 2000s. They socialized together at Mar-a-Lago and other venues
  [-]
  - TowerTall 9 hours ago ago
    Interesting. It is my impression that almost everyone globally already knew this. What else did you learn?
    [-]
    - jellyotsiro 9 hours ago ago
      ill take like 1 hour in the evening to dive deeper, i was never familiar with epstein stuff until i built the agent to simplify things for me.
  - ishtanbul 8 hours ago ago
    This is one of the most widey quoted phrases by trump on the topic of epstein
- subzero06 7 hours ago ago
  In 2024, Trump used Epstein's former private jet for campaign appearances
sschueller 6 hours ago ago
Is it able to handle a much larger dataset? Only a tiny fraction of data has been release from what is looks like.
nubg 8 hours ago ago
Does this work with vector embeddings?
[-]
- jellyotsiro 8 hours ago ago
  it uses semantic search so yes
thecopy 5 hours ago ago
Reminder that only 1-2% of the files have been released.
[-]
- Terr_ 3 hours ago ago
  Yep: Breaking his campaign promises, in violation of the deadlines imposed by US Federal law, and with unlawful levels of redaction.
dfxm12 9 hours ago ago
can search the entire Epstein files
It's worth noting that only about 1% of the files have been released, according to the DOJ.
Of the released files, many have redactions.
[-]
- Terr_ 3 hours ago ago
  Yep, they failed to meet the deadlines required by law, and it's not just any redactions either, but unlawful redactions.
- King-Aaron 6 hours ago ago
  If the Lake Michigan thing is just in the first 1%, then whatever's in the other 99% is going to be absolutely disgusting.
  [-]
  - Tom1380 5 hours ago ago
    I searched it with the tool but nothing came up about Lake Michigan. What happened?
    [-]
    - King-Aaron 5 hours ago ago
      https://www.justice.gov/epstein/files/DataSet%208/EFTA000250...
      "He participated regularly in paying money to force me to ___ with him and he was present when my uncle murdered my newborn child and disposed of the body in Lake Michigan. "
      The uncle is allegedly referring to Trump
      [-]
      - sam345 5 minutes ago ago
        https://www.freep.com/story/news/local/michigan/2025/12/27/a.... This mentions the Trump angle. It also mentions that the report came out before the 2020 election and could be fake. I'm a little confused because the report itself says nothing about Trump so don't know where the Free press gets that and they don't tell you what the source is or I missed it.
  - Terr_ 3 hours ago ago
    I would expect a large portion of the remaining records to be internal emails about memos about the process of building a case around evidence, rather than the root evidence itself.
    Not that that would excuse the administration's unlawful behavior so far, or indicate the unreleased 99% can't have some big bombshells.
- jellyotsiro 8 hours ago ago
  sorry all publicly available files *
tehjoker 9 hours ago ago
This is a good idea. One thing I never understand about these kinds of projects though: why are the standard questions provided to the user as prompts never cached?
[-]
- jellyotsiro 9 hours ago ago
  oh forgot about it, thanks. just a funny project i build in couple hours so didnt really sweat haha
  [-]
  - tehjoker 9 hours ago ago
    This agent is really interesting! Learning a lot. Thanks!
- jampekka 5 hours ago ago
  Outputs are usually generated with random sampling, so the same prompt may get different outputs.
slfreference 2 hours ago ago
All these attempts looks like emulation of "Pen (software) is mightier than Sword" or that only if more people believed in the cause, we would be close to resolution.
Remember folks, soft power is nothing in front of hard power.