Web Bot Auth

(developers.cloudflare.com)

55 points | by ananddtyagi 13 hours ago ago

48 comments

bobbiechen 11 hours ago ago
I disagree with the other top-level comments at the moment: I believe Web Bot Auth is a useful and non-centralized emerging standard for self-identifying bots and agents.
This press release today is a better statement of _why_ this feature exists (as opposed to the submission link, which is nuts-and-bolts of implementing): https://blog.cloudflare.com/signed-agents/
Web Bot Auth is a way for bots to self-identify cryptographically. Unlike the user agent header (which is trivially spoofed) or known IPs (painful to manage), Web Bot Auth uses HTTP Message Signatures using the bot's key, which should be published at some well-known location.
This is a good thing! We want bots to be able to self-identify in a way that can't be impersonated. This gives website operators the power to allow or deny well-behaved bots with precision. It doesn't change anything about bots who try to hide their identity, who are not going to self-identify anyways.
It's worth reading the proposal on the details: https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-... . Nothing about this is limited to Cloudflare.
I'm also working on support for Web Bot Auth for our Agent Identification project at Stytch https://www.isagent.dev . Well-behaved bots benefit from this self-identification because it enables a better Agent Experience: https://stytch.com/blog/introducing-is-agent/
[-]
- everfrustrated an hour ago ago
  Isn't this somewhat equilivent to ensuring cookies are required?
  Obviously this technology is different but the same sort of result.
  What's the end game here? All humans end up having to use a unique encryption key to prove their humanness also?
- binarymax 11 hours ago ago
  I agree in principle, but I disagree that it should be designed and mandated by a private gatekeeper
  [-]
  - jrochkind1 9 hours ago ago
    What's now at the top has links to IETF drafts in the first paragraph. What am I missing?
    A way to authenticate identity for crawlers so I can allow-list ones I want to get in, exempt them from turnstile/captcha, etc -- is something I need.
    I'm not following what makes this controversial. Cryptographic verification of identity for web requests, sounds right.
    [-]
    - binarymax 9 hours ago ago
      I think about failure modes. What happens if cloudflare decides you are a bot and you’re not. What recourse do you have? What are the formal mechanisms to ensure a person is not blocked from the majority of the web because cloudflare is a middleman and you are a false positive?
  - jacobn 10 hours ago ago
    Isn't that how most web standards got their start? One of the interested parties pushed something, then things evolved through the standards process?
    (And then it can of course get derailed, but that's a separate story)
- sneak 4 hours ago ago
  The problem with this is that key generation is free, so being a well-behaved unknown bot is the same as being an unidentified bot, which means that you go in the block/captcha/throttle bucket.
  It is only useful for whitelisting bots, not for banning bad ones, as bad ones can rotate keys.
  Whitelisting clients by identity is the death of the open web, and means that nobody will ever be able to compete with capital on even footing.
mtrovo 11 hours ago ago
As much as I understand this is needed it rubs me the wrong way.
The standard looks fine as a distributed protocol until you have to register to pay a rent to Cloudflare, which they say will eventually trickle down into publishers pocket but you know what having a middleman this powerful means to the power dynamics of the market. Publishers have a really bad hand no matter what we do to save them, content as we know it will have to adapt.
Give it a couple more iterations and some MBA will come up with the brilliant idea of introducing an internet toll to humans and selling a content bundle with unlimited access to websites.
[-]
- maxwellg 10 hours ago ago
  Cloudflare is only the first to market with a solution. If this proposal catches on every WAF vendor under the sun will have it implemented before the next sales cycle. Enforcement of this standard will be commoditized down to nothing.
- tick_tock_tick 10 hours ago ago
  There is just too much spam and it's not clear that is a solvable problem without Cloudflare (or some other similar service). Maybe if they get big enough the incentives to spam will vanish and non Cloudflare sites can exist in peace (at-least until enough people leave Cloudflare that spam become profitable again).
nerdsniper 11 hours ago ago
Why use a "web bot" instead of an API? Either can be driven by an AI "agent"...but this just seems like an "API key for a visual api interface", and rather wasteful in cost and resources. If a company could afford to pay a partner for an API key they wouldn't need this. If they can't afford to pay the partner for access -- they'd still be blocked with or without "Web Bot Auth". I don't understand what this is for.
I suspect I'm missing something, what am I missing?
[-]
- mediaman 8 hours ago ago
  The website the human sees is the new API.
  That's needed because many APIs are either nonexistent or extremely marginal in design and content coverage.
- notatoad 11 hours ago ago
  if you already have an api that exposes all the information that your parter who is willing to pay for an API key wants, then sure, that's perfect. but what if you don't have an API, or your API doesn't expose the information that crawlers are looking for? they want to crawl your website, they're willing to pay for the ability to crawl your website, but you don't want to build an API...
  i'm sure the next step here will be a cloudflare product that sits in front of your website and blocks all bot traffic except for the bots that are verified to have paid for access. (or maybe that already exists?)
- observationist 11 hours ago ago
  Part of it, at least, is people thinking they've solved some perceived problem and being told by their chatbot that it's a terrific, brilliant new innovation and they should build a whole new protocol spec for it.
jithinraj 10 hours ago ago
Web Bot Auth solves authentication (“who is this bot?”) but not authorization/usage control. We still need a machine-readable policy layer so sites can express “what this bot may do, under which terms” (purpose limits, retention, attribution, optional pricing) at a well-known path, robots.txt-like, but enforceable via signatures.
A practical flow:
1. Bot self-identifies (Web Bot Auth)
2. Fetch policy
3. Accept terms or negotiate (HTTP 402 exists)
4. Present a signed receipt proving consent/payment
5. Origin/CDN verifies receipt and grants access
That keeps things decentralized: identity is transport; policy stays with the site; receipts provide auditability, no single gatekeeper required. There’s ongoing work in this direction (e.g., PEAC using /.well-known/peac.txt) that aims to pair Web Bot Auth with site-controlled terms and verifiable receipts.
Disclosure: I work on PEAC, but the pattern applies regardless of implementation.
hoppp 6 hours ago ago
I like that parsable signature in the http message however I dont quite understand how the system differentiates between human users and an llm agent controlling a browser
mips_avatar 12 hours ago ago
Cloudflare's verified bots program is a terrible idea. They want to be the central chokepoint for agents, and they're doing it in shady ways like auto enrolling customers into blocking agents.
[-]
- threatofrain 3 hours ago ago
  That’s not shady, that’s awesome customer value! Bot blocking as Default option is a great choice for all of us.
  [-]
  - 1gn15 3 hours ago ago
    It's discriminatory against robots and helps make the web even more locked down. DRM never works; the analog hole is always the nuclear option.
    In the end, only people with non-mainstream browsers (or using VPN to escape country-level blocks, or Tor, or noJS) suffer.
    It's like how anti-piracy measures only affect paying customers, while pirates ironically get a better experience. The best way to get around endless CAPTCHAs is to just use LLMs instead.
cuuupid 11 hours ago ago
Cloudflare is the last party that should be running this for two reasons.
1. THey have already proven to be a bad faith actor with their "DDoS protection."
2. This is pretty much the typical Cloudflare HN playbook. They release soemthing targeted at the current wave and hide behind an ideological barrier; meanwhile if you try to use them for anything serious they require a call with sales who jumps you with absurdly high pricing.
Do other cloud providers charge high fees for things they have no business charging for? Absolutely. But they typically tell you upfront and don't run ideological narratives.
This is not a company we should be putting much trust in, especially not with their continued plays to become the gatekeepers of the internet.
[-]
- tick_tock_tick 10 hours ago ago
  1) how so? Pretty much everything they do for DDoS protection is at their customers choice. You might not like what people want for their site but lets not pretend that most companies aren't very happy with it.
  2) Then don't use them? Either they provide enough value to pay them or they don't.
- esseph 11 hours ago ago
  Have you seen large cloud provider billing?????
  There is a whole segment of tech designed around helping you understand and manage cloud costs, through consultations, automations, etc. It has spawned companies and career paths!
  [-]
  - cuuupid 10 hours ago ago
    Yes but they don’t hide that behind ideological nonsense, they own up to it. They’re a good faith actor with a high price tag
  - hsbauauvhabzb 11 hours ago ago
    Ime, cloud cost centres are intentionally confusing and annoying. I get emails telling me to check their dashboard for billing info which I inevitably never do. It’s designed that way.
bgwalter 9 hours ago ago
Cloudflare is playing both sides: grok.com is served by Cloudflare.
zb3 12 hours ago ago
Seems like Cloudflare wants to regulate the internet.. they should not have that power.
[-]
- kylehotchkiss 11 hours ago ago
  Disagree. Not everybody wants their sites scraped and their content used to train a model that they'll never see a penny from. Cloudflare is the only party who wants to build a system where both the models and individual sites have their interests respected.
  [-]
  - ATechGuy 11 hours ago ago
    Are you sure that CF can stop AI bots?
    [-]
    - specialp 6 hours ago ago
      I will tell you that we have had bot super fight mode on for a year and since then we have not had to address abusing traffic nor deal with legitimate people blocked. There is no way we could have achieved such balance. prior to that it was me blocking every Chinese AS under the sun as they shifted and bombarded us with traffic
      [-]
      - 1gn15 3 hours ago ago
        > nor deal with legitimate people blocked
        How are you so sure of that? Their marketing?
    - hsbauauvhabzb 11 hours ago ago
      Do you have a better alternative?
      [-]
      - ATechGuy 11 hours ago ago
        Have you looked into open-source alternatives? I'm assuming that it's a pressing problem for you, and you have already explored alternatives.
        [-]
        tick_tock_tick 10 hours ago ago
        I have, sadly they are basically worthless and often worse then worthless as they negatively impact the site.
        [-]
        ATechGuy 9 hours ago ago
        Interesting. Care to list them here so that we all can learn.
      - yjftsjthsd-h 10 hours ago ago
        https://anubis.techaro.lol/ ?
        [-]
        hsbauauvhabzb 8 hours ago ago
        Your browser is configured to disable cookies. Anubis requires cookies for the legitimate interest of making sure you are a valid client. Please enable cookies for this domain.
        Thing is, my browser isn’t configured that way. So works well, I guess.
        [-]
        yjftsjthsd-h 8 hours ago ago
        The target was better than cloudflare, which also demands cookies but with more tracking. This is still better.
        [-]
        hsbauauvhabzb 5 hours ago ago
        I have not disabled cookies. Cloudflare works fine. Users being able to access a website is a pretty important metric when considering which is ‘better’.
  - observationist 11 hours ago ago
    Then put up a goddamn login wall.
    The internet was designed to work the way it does for good reasons.
    You not understanding those reasons is not an excuse for allowing a giant tech company to step in and be the gatekeeper for a huge portion of the internet. Nor to monetize, enshittify, balkanize, and fragment the web with no effective recourse or oversight.
    Cloudflare shouldn't be allowed to operate, in my view.
    [-]
    - acdha 7 hours ago ago
      > You not understanding those reasons is not an excuse for allowing a giant tech company to step in and be the gatekeeper for a huge portion of the internet.
      Are you somehow under the impression that Cloudflare is forcing their service on other companies? They’re not stepping in, the people who own those sites have decided paying them is a better deal than building their own alternatives.
    - nemothekid 10 hours ago ago
      >Then put up a goddamn login wall.
      They did exactly that, they just outsourced it to cloudflare. The problem became bad enough that a lot of other people did the same thing.
      If your argument is "companies shouldn't be allowed to outsource components to other companies, or cloudflare specifically", then sure, but good luck ever enforcing that.
ChrisArchitect 10 hours ago ago
URL should be blog post:
The age of agents: cryptographically recognizing agent traffic
https://blog.cloudflare.com/signed-agents/
(https://news.ycombinator.com/item?id=45052276)
realityfactchex 11 hours ago ago
No offense, but screw CloudFlare, screw their captchas for humans, and screw their wedging themselves between web operators and web users.
They can offer what they want for bots. But stop ruining the experience for humans first.
[-]
- tick_tock_tick 10 hours ago ago
  > screw their wedging themselves between web operators and web users
  Web operators choose to use them; hell they even pay Cloudflare to be between them. Seriously I just think you don't understand how bad it is to run a site without someone in-front of it.
  [-]
  - specialp 6 hours ago ago
    I run a site that is a primary source of information. We also have customers that subscribe and are very sensitive to heavy handed controls. Before cloudflare and after "AI" we had bots from all over just destroying our endpoints with bursts of mining traffic. While we would love to have more discoverability this is not that. Cloudflare is in a tough spot trying to arbitrate good traffic vs bad. From my experience they are doing this as good as one can.
  - mcspiff 9 hours ago ago
    Couldn’t agree more — Much like running my own DNS or email server, I don’t think I’ll ever go back to running my own website directly on the internet. It’s just not worth the hassle. For stuff only I use, it sits behind my VPN. For anything that _must_ be public, it’s going behind a WAF someone else can run.
  - immibis 8 hours ago ago
    They don't have to, but they're tricked into doing so. Via marketing.
    [-]
    - acdha 7 hours ago ago
      I miss the 90s, too, but these days anyone who wants to deal with current levels of bot traffic is probably going to look at a service like Cloudflare as much cheaper than the amount of ops time they’d otherwise spend keeping things up and secure.