Tor: How a military project became a lifeline for privacy

(thereader.mitpress.mit.edu)

112 points | by anarbadalov 4 hours ago ago

75 comments

neilv 3 hours ago ago
I used Tor for surveillance. But an appropriate kind, IMHO.
I used Tor as a small part of one of the capabilities of a supply chain integrity startup. I built a fancy scraper/crawler to discreetly monitor a major international marketplace (mainstream, not darknet), including selecting appropriate Tor exit nodes for each regional site, to try to ensure that we were seeing the same site content that people from those regions were seeing.
Tor somehow worked perfectly for those needs. So my only big concern was making sure everyone in the startup knew not to go bragging about this unusually good data we had. Since we were one C&D letter away from not being able to get the data at all.
(Unfortunately, this had to be a little adversarial with the marketplace, not done as a data-sharing partnership, since the marketplace benefited from a cut of all the counterfeit and graymarket sales that we were trying to fight. But I made sure the scraper was gentle yet effective, both to not be a jerk, and also to not attract attention.)
(I can talk about it now, since the startup ran out of runway during Covid investor skittishness.)
[-]
- RGamma 2 hours ago ago
  > selecting appropriate Tor exit nodes for each regional site
  So, a proxy? Onion routing doesn't really play a role for this use case.
  [-]
  - neilv 2 hours ago ago
    > So, a proxy? Onion routing doesn't really play a role for this use case.
    The onion routing obscured our identity from the "proxy" exit nodes.
    Separately, Tor was also a convenient way to get a lot of arbitrary country-specific "proxies", without dealing with the sometimes sketchy businesses that are behind residential IP proxies.
    (Counterfeiting/graymarket operations can be organized crime. I'd rather just fire up Tor, and trust math a little, than to try to vet the legitimacy and intentions of a residential IP broker.)
    [-]
    - sidewndr46 39 minutes ago ago
      Why would you need to obscure your identity from the exit nodes?
      [-]
      - qualeed 33 minutes ago ago
        So that the exit node can't go to the site they were scraping and say "this is the person scraping your site".
    - wslh 20 minutes ago ago
      The Tor exit nodes are public.
      [-]
      - qualeed 12 minutes ago ago
        They were concerned about the exit node identifying them, not the site identifying that a Tor exit node is connecting.
  - trod1234 an hour ago ago
    Honestly what he describes sounds like Raptor (Princeton Report, 2015)
    [-]
    - neilv 5 minutes ago ago
      How is this related to Princeton's Raptor, other than having the keywords "Tor" and "surveillance"?
      https://www.princeton.edu/~pmittal/publications/raptor-USENI...
      (Strange coincidence: We also had different key tech with the codename of Raptor, but it had nothing to do with Tor nor Web scraping. It was for discreet smartphone-based field auditing of physical product, in global physical retail and other locations. The codename was the result of a great morale-boosting impromptu brainstorming session between engineering and marketing people ("can you help think of a cool codename for this..."), and the resulting name highly apt, at least for the movie velociraptors. I built it, but, until Covid disrupted our F500 customers and investors, I was looking forward to hiring engineers to do further work on something cool-sounding like "Raptor", rather than "internal-app" or whatever first came to mind when creating the Git repo. :)
- cedws 2 hours ago ago
  What was the scraper gathering specifically?
  [-]
  - neilv 2 hours ago ago
    Listings of items for sale (for ~100 brands), and how that changed over time. With the marketplace having a pretty rich schema to reconstruct from their server-side rendering.
    One of the purposes was cold sales outreaches to an exec at a brand, maybe something like, "Here's a report about graymarket/counterfeit of your brand online, using data you probably haven't seen before; we have a solution we'd like to tell you about".
  - woadwarrior01 2 hours ago ago
    If I could wager a guess, it sounds like the startup was in the business of scraping Amazon.
    [-]
    - neilv 2 hours ago ago
      No. And when people share info on HN, I don't like to see speculation in the comments about things they obviously intentionally didn't say (assuming that they seem to be speaking in good faith). That person, and other people who see the dynamic, presumably are less likely to share in the future.
      [-]
      - ribosometronome an hour ago ago
        I feel there is a level of irony in you being bothered about people interacting with content you've shared in a way you don't like when said content is a story about you interacting with other's content in a way they've explicitly put up barriers to try and stop you from doing that.
        [-]
        neilv 2 minutes ago ago
        Who said the site put up barriers?
        I think you have a valid general question (and you'll note I said "appropriate kind, IMHO" at the top of the original comment, acknowledging others might disagree that it was appropriate), but I'd like to contrast two distinct situations:
        * A collegial forum, where people might go to share information, sometimes with discretion about what can and can't be said (or just comfort levels).
        * A large corporation that was profiting off of illegal businesses (e.g., contract-violating, IP-violating, possibly fencing), and we wanted to gather evidence of that on behalf of the harmed parties, to try to stop it. And we did that in a technologically gentle, non-disruptive way. And (as I mentioned in the original comment) we had a conscious policy to immediately cease if we were ever told to.
      - amarcheschi 15 minutes ago ago
        Did you know if you violated any ToS with your software? If yes, why did you feel compelled to continue?
      - keysdev 2 hours ago ago
        Thank you for pointing that out. Really appreciate you sharing.
        To the parent, please do not try to lure info out of people it is just not cool online or in real life when people obviously are being generic for a reason.
    - vhcr 9 minutes ago ago
      You won't be able to scrape Amazon using Tor.
anarbadalov 3 hours ago ago
For anyone interested in this author’s book on Tor, it’s available for free download! https://direct.mit.edu/books/oa-monograph/5761/TorFrom-the-D... (full disclosure: i work for MIT Press)
[-]
- dannyobrien an hour ago ago
  It's a really good book! I was on the very edges of this scene for a chunk of the time described, and I thought it managed to catch a lot of the complexities without picking one possible narrative over another.
  Plus I learned a lot -- it came out of some academic research that pursued a unique angle: finding and talking to the Tor exit node operators about their experiences, rather than just say the developers, the executives, or the funders.
  [-]
  - anarbadalov 37 minutes ago ago
    I'll share your kind words with the author!
- bauruine an hour ago ago
  You can also buy it if you want to support the autor. https://mitpress.mit.edu/9780262548182/tor/
- TMWNN 33 minutes ago ago
  Thanks for that. Is it available as epub? I would like to read it on Kindle.
ricardo81 3 hours ago ago
I'd never used Tor, though had to scrape a bunch of things that required different IPs. I figured their endpoints were already tarred.
With the porn block in the UK though, the "New Private Window with Tor" in Brave is very convenient.
Maybe not for long, or maybe not. I guess websites don't need to comply beyond a certain point.
There are tons of "residential proxy" and whatnot type services available, IP being a source of truth doesn't seem to matter much in 2025. The Perplexity 'bot' recent topic being an example of that.
Basically if you want to access any resource on the web for a dollar a GB or so you can use millions of IPs.
[-]
- freedomben 2 hours ago ago
  Indeed, I've investigated some cyber attacks recently that came from residential IPs in California and NY, though investigation turned up the real origins as coming from India. It's pretty easy to pull off nowadays
- trod1234 an hour ago ago
  The problem with most infrastructure is that there's a big gap in security where it centralizes, and its transparent.
  To understand how, you should review the Princeton Report's Raptor attack, and understand how it works (2015).
lenerdenator 3 hours ago ago
I've never felt like I knew how to use Tor correctly, or trusted anyone to be able to guide me on that.
[-]
- abdullahkhalids 2 hours ago ago
  Simply download the Tor Browser [1], which is simply a hardened version of Firefox that connects to the Tor network.
  Don't install addons in this browser. Don't resize the browser window. All tor browsers instances have the same default window size, which prevents websites from tracking you. Obviously don't login into websites with your regular email or provide websites with your PII.
  If you are in a country or on a network that blocks the basic Tor network, the FAQ explains how to get around this by using Tor bridges or other techniques [2].
  That's pretty much all you need to know.
  [1] https://www.torproject.org/download/
  [2] https://support.torproject.org/censorship/
  [-]
  - lenerdenator 2 hours ago ago
    > All tor browsers instances have the same default window size, which prevents websites from tracking you.
    Wouldn't that in and of itself be a possible clue that someone was using Tor?
    [-]
    - qualeed 2 hours ago ago
      Figuring out someone is using Tor is trivial (e.g. list of exit node IPs https://www.dan.me.uk/torlist/?exit).
      This mitigation helps protect the individual Tor user (e.g. with a unique 1726x907 px window) being fingerprinted across multiple sessions / sites.
      [-]
      - Scoundreller 3 minutes ago ago
        While not perfect, I thought tor rounded reported resolution to a small set of values
      - trod1234 an hour ago ago
        They removed OS spoofing just recently, and there isn't a mitigation for Raptor, some think meek might help with Raptor, but its very much up in the air.
        [-]
        qualeed an hour ago ago
        There is partial mitigation for RAPTOR: Counter-RAPTOR from 2017 (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=795...) with mostly the same authors.
        I haven't kept up with the space much since then, so am unaware if there is more recent work.
        In any case, there are valid threat models where you want to mitigate website fingerprinting but aren't necessarily concerned with AS-level adversaries.
    - bauruine 2 hours ago ago
      The list of Tor nodes is public so it's trivial to detect a user is using Tor you just have to check the IP.
    - keysdev 2 hours ago ago
      Or a computer of that window size, and there a lot browsers that dont support js.
  - sorenjan an hour ago ago
    Is window size visible to web sites when java script is turned off? It's off by default in Tor browser.
    [-]
    - qualeed an hour ago ago
      It's on by default in Tor browser.
      You have to explicitly switch to "Safest" mode to turn it off completely.
      >Why does Tor Browser ship with JavaScript enabled?
      We configure NoScript to allow JavaScript by default in Tor Browser because many websites will not work with JavaScript disabled. Most users would give up on Tor entirely if we disabled JavaScript by default because it would cause so many problems for them. Ultimately, we want to make Tor Browser as secure as possible while also making it usable for the majority of people, so for now, that means leaving JavaScript enabled by default.
      https://support.torproject.org/tbb/tbb-34/
  - ignoramous 2 hours ago ago
    > That's pretty much all you need to know.
    Depends on the level of anonymity the end-user desires. That rabbit hole is deep, but not that deep: https://www.ivpn.net/privacy-guides/advanced-privacy-and-ano... / https://archive.today/9DhtT (by u/mirmir)
    [-]
    - qualeed an hour ago ago
      For a guide that goes into so much detail (as far as suggesting enterprise-grade drives, recommended RAID configurations, etc.), not even a passing mention of Tails or Qubes-Whonix is a really interesting choice (read: discouraging omission)!
- sherr 3 hours ago ago
  I sympathise with a bit of paranoia about this. Personally, I'd use a platform like "Tails" (do your own research) which wraps Tor up in a USB bootable Linux OS.
  https://tails.net/
- jandrese an hour ago ago
  The generally recommended way is to download Tails to a USB thumb drive and boot off of that. This is safer than just using the TOR browser and if something does attack your system none of your actual data is on the OS.
  https://tails.net/
- hnuser123456 3 hours ago ago
  Back when I tried, it was a modified Firefox build.
  [-]
  - burnt-resistor 2 hours ago ago
    That's just a browser form of it: https://www.torproject.org/download/
apopapo 3 hours ago ago
Tor is nice, but I still prefer i2p.
NoSalt an hour ago ago
Especially as the internet, itself, started as a military project. [DARPA]
taminka 3 hours ago ago
i wish they were also a lifeline for censorship too, tor is effectively non functional in many countries :(
[-]
- markasoftware 18 minutes ago ago
  tor tries very hard to bypass censorship. Have you tried the numerous Tor bridges, or the new Snowflake p2p bridge?
jmclnx 3 hours ago ago
I ran a bridge until recently, but the server died a heat death after I moved to another apartment :(
I have not yet had time to find a suitable replacement machine. But running a bridge is a cheap, safe low network volume method people can help out from home. I had it going to help people in 'bad' countries to get out to the rest of the world.
https://community.torproject.org/relay/setup/bridge/
zwnow 3 hours ago ago
Isn't Tor dead? Wasn't it infiltrated long ago?
[-]
- markasoftware 13 minutes ago ago
  It depends on your threat model. Tor is focused on hiding from small-scale passive adversaries (eg, you're in Iran and don't want the Iranian government to see what you're doing. Or your ISP. Or any single node operator). Even the original Tor paper makes it clear that Tor isn't secure against a "global passive adversary" that can observe a large portion of global internet traffic, like the five eyes likely can today.
  If you want to avoid global passive adversaries, a mixnet like Nym can work. I'm also working on a related project which takes a different approach of building your own circuit of proxy servers manually with lots of traffic padding: https://github.com/markasoftware/i405-tunnel
- bevr1337 3 hours ago ago
  It's been assumed that three-letter agencies operate many exit nodes for a hot minute. I don't know if this is a special case of infiltration because it's TOR SOP.
  [-]
  - HDThoreaun 8 minutes ago ago
    This isnt necessarily malicious. As the OP states TOR only works if a lot of people use it for regular browsing. The government wants it to work for the covert stuff so they need buy in from regulars and improving the service is how to do that.
- impossiblefork 3 hours ago ago
  I personally can't see how it can be secure without dummy messages.
- 8organicbits 3 hours ago ago
  What makes you believe that?
  [-]
  - zwnow 3 hours ago ago
    Read some story about some authority having set up tons of servers within the tor network to bust some criminal activity effectively making it not anonymous anymore. Was a while back on HN
    [-]
    - thewebguyd 3 hours ago ago
      The feds and other equivalent agencies in other countries have been running exit nodes for years, but its still better than most solutions even if not perfect. Anyone who has gotten caught though likely wasn't because of any flaws in Tor (or said exit nodes) but because of other lapses in OpSec.
      That being said, yes, feds can de-anonymize traffic, probably reliably at this point. There are only about 7-8000 active nodes, most in data centers. The less nodes you hop through, the more likely that traffic can be traced back to the entry point (guard node), and combined with timing can be reasonably traced back to the user. Tor works best with many, many nodes, and a minimum of three. There's not as many nodes as there needs to be so quite often it's only 3 you are going through (guard node/entry point, middle node, exit node)
      Plus browsing habits can also be revealing. Just because someone is using Tor doesn't mean they also have disabled javascript, blocked cookies, aren't logging into accounts, etc.
      [-]
      - bombcar 3 hours ago ago
        > Anyone who has gotten caught though likely wasn't because of any flaws in Tor (or said exit nodes) but because of other lapses in OpSec.
        There have been some cases where some consider the "other lapses in OpSec" to be parallel construction to disguise a Tor vulnerability/breach, and others where the government has declined to prosecute because they'd have to reveal how they know.
        If Tor were compromised, we'd likely not know. It's highly likely that it's fine for "normal people" things.
        [-]
        ls612 2 hours ago ago
        At least back in the Snowden days it was very unreliable for the US to deanonymize Tor traffic based on those documents.
        [-]
        lenerdenator 2 hours ago ago
        That was over a decade ago. They've almost certainly progressed since.
        ... now my back hurts and I want the damn kids off my lawn.
        [-]
        ls612 2 hours ago ago
        I mean if anything it’s harder today in many ways for the government than it was during the Snowden days, because that prompted tech people to take internet security seriously. Look at the cost trends for 0days over the past ten years.
      - openasocket 3 hours ago ago
        Does controlling exit nodes necessarily help with deanonimizing? You would need control of the internal nodes for classic de-anonymization, or monitoring of both the exit nodes and the originating network for timing attacks. Also, exit nodes aren’t involved in hidden services. That 7-8000 figure you quoted: is that just exit nodes, or all nodes? My understanding was there aren’t a ton of exit nodes because anyone operating an exit node is liable to get harassed by people impacted by any malicious traffic originating from Tor. But that isn’t really an issue for internal nodes, and so there are more of them
        [-]
        thewebguyd 2 hours ago ago
        Controlling an exit node alone doesn't help, but controlling both entry and exit nodes does.
        The tor project has network stats on their website: https://metrics.torproject.org/networksize.html
        Looks like about 8,000 relays, inclusive of entry and exit nodes. Looks like about 2,500 exit nodes, and ~5,000 guard nodes. With that few I'd say it's reasonable to assume that a large number of both entry and exit are controlled by government agencies, at least enough to reliable to conduct timing attacks against a specific target they are interested in.
        gausswho 2 hours ago ago
        Am also interested in the current understanding of culpability in the US for operating an exit node.
        [-]
        thewebguyd 2 hours ago ago
        > Am also interested in the current understanding of culpability in the US for operating an exit node.
        It's a little ambiguous.
        Section 230 (which continues to be under attack) provides some legal immunity, along with the DMCA is a safe harbor against copyright infringement claims for the Tor relay operator. Running a middle relay is generally fine and safe.
        But, running an exit relay is risky. Even if you can't be held legally liable for the traffic coming from the exit, you could still get raided, and it has happened before where exit node operators have been raided after the traffic coming out of it was attributed to the node owner.
        That being said, it's legal to run an exit node (for now). The problem is more so dealing with the inevitable law enforcement subpoenas or seizures, and having the money and resources to prove you are innocent.
    - 8kingDreux8 3 hours ago ago
      I believe this is the thread you're talking about https://news.ycombinator.com/item?id=41584428
      [-]
      - 8organicbits 3 hours ago ago
        The article talks about a user who was using very old software, which seems like a pretty straightforward mistake. There's a bunch of speculation in the comments about other things, but I don't really see sources cited, so it's hard to tell what informs those opinions.
    - chews 3 hours ago ago
      It was always that way, Ross Ulbrect was connected to his dark website by tracing via exit nodes.
      Tor was always a government tool.
      [-]
      - thewebguyd 2 hours ago ago
        > Ross Ulbrect was connected to his dark website by tracing via exit nodes
        Ulbricht wasn't caught because of flaws in Tor, but he made other mistakes. He posted stuff on LinkedIn alluding to his activities, he used a real photo on his fake IDs to rent servers, he used his real name, posting a question on stack over flow about running a Tor service, he posted his personal gmail, looked for couriers on Google+, and lastly paid an undercover cop for a hit.
        As for getting his location, once the feds gained acccess to silk road, they matched up activity logs, his posting habits were consistent with being in the pacific time zone, and they matched up his user name between his posts on silk road as altoid and he reused the same screenname, associated with his gmail address and full name, on other websites.
        A series of stupid opsec mistakes got him caught, not Tor.
        [-]
        lenerdenator 2 hours ago ago
        All of this should serve as a reminder that if .gov really, really wants you, they've got you.
        Unless, of course, they want everybody, which even they don't have the resources to handle.
        [-]
        mburns 23 minutes ago ago
        It should (also) serve as a reminder that OpSec is important.
  - Ray20 3 hours ago ago
    The observable world around us.
    In a world where Tor is not a honeypot of some three letter agency, there are implementations of projects like Jim Bell's Assassination Politics. In a world where Tor is not a honeypot its use would be banned, much like the use of Tornado Cash was banned and shut down until the secret services took control of it.
    And we obviously don't live in such world.
    [-]
    - 8organicbits 3 hours ago ago
      > its use would be banned
      There are many places in the world where direct access to Tor is blocked. There are many countries where use of a VPN is illegal, VPNs are required to log by law, etc. I disagree with this premise.
      [-]
      - trod1234 an hour ago ago
        Those countries seek destructive control of all within its sphere of influence.
        There are generally two types of countries, those that seek agency, independence, and freedom of rational thought and action; which requires privacy, and there are those that seek ultimate control, imposing dependence, coercion and corruption of reason; from the top down.
        The cultures that seek total control generally fall under totalism and are parasitic in nature. The ones that seek agency, freedom, and independence, Protean.
- yieldcrv 2 hours ago ago
  Its not a binary thing, Tor updates all the time
  Many comments talk about exit nodes for surveillance, but there is a totally different vector of use and considerations that dint apply when you aren't trying to access clearnet
  And even on darknet it depends on what you’re doing
  Reading the NY Times’ darknet site or forum or even nuet browsing darknet markerplace from Tor Browser, whereas I would use a Tor OS like Tails or dual gated VM like Whonix for doing something illicit
fsckboy 29 minutes ago ago
>Tor: How a military project became a lifeline for privacy
Arpanet: How a military project gutted personal privacy, destabilized self esteem and strangled attention spans