When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity. No problem that makes sense - it's an easy way to prove authentication as a bot can't spam a credit card (or else it would be financial fraud and most likely a felony).
Amazon then charged me one hundred thousand dollars as the server was hit by bot spam. I had them refund the bill (as in how am I going to pay it?) but to this day I've hated Amazon with a passion and if I ever had to use cloud computing I'd use anyone else for that very reason. The entire service with it's horrifically complicated click through dashboard (but you can get a certification! It's so complicated they invented a fake degree for it!) just to confuse the customer into losing money.
I still blame them for missing an opportunity to be good corporate citizens and fight bot spam by using credit cards as auth. But if I go to the grocery store I can use a credit card to swipe, insert, chip or palm read (this is now in fact a thing) to buy a cookie. As opposed to using financial technology for anything useful.
This is an example of why cloud hosting is so scary.
Yes, Amazon, and I assume Azure and Google's cloud and others, "usually" refund the money.
But I don't want to be forced into bankruptcy because my five visitor a week demo project suddenly becomes the target of a DDOS for no reason at all and the hosting company decides this isn't a "usually" so please send the wire transfer.
They refund those that know how to demand it, and that notice. If you have complex infra and not a lot of observability, you'll just assume the costs are legitimate. Imagine how much they're making off of those oops moments. Probably a bug chunk of their revenue reports.
When I am playing around in the cloud I am super paranoid about charges, so I end up locking the ACLs to only permit traffic to my home IP. It’s too bad that they don’t have a better built in way of making sandbox labs. When I was doing cloud training with A Cloud Guru, it would generate a whole global AWS instance that would only last for 30 minutes.
In general that would be a good question, but you've asked it in a case where "use AWS" is the _only_ way to accomplish the goal... which is learning AWS.
There's no need to imply that, it's not illegal to criticise AWS. They do not want anybody to be able to set a limit on spend as that would probably hurt the business model.
It's extra frustrating I think on the Azure side because they absolutely have cost limited accounts for MSDN subscribers but won't extend that functionality to general users. Just let me set a cap on the cost I'm willing to pay per month and let me deal with the consequences of the resource being shut down unexpectedly. You can work around these things if you instrument the right metrics and create the right alerts so you can take action in time. But those are often hard learned lessons and not the happy path to using the cloud.
It's entirely possible to build cloud first solutions that scale better and are cheaper than your standard reliable colo solutions. But you've got to understand the tradeoffs and when to limit scaling otherwise things can run away from you. I still reach for "cloud first" tools when building my own projects because I know how to run them extremely cheaply without the risk of expenses blowing up because some random thing I've built lands on HN or the equivalent. Many hobby projects or even small businesses can leverage free tiers of cloud services almost indefinitely. But you've got to architect your solutions differently to leverage the advantages and avoid the weaknesses of the cloud. Actually understand the strengths and limitations of the various cloud "functions as a service" offerings and understand where your needs could be solved from those tools and how to work within those cost constraints. Repeatedly I see people trying to use the cloud as if it's just another colo or datacenter and build things in the same way they did before and only think about things in terms of virtual machines tend to have a more difficult time adopting the cloud and they end up spending far more than the companies who can tear down and spin up entire environments through IaC and leverage incremental pricing to your benefit.
These aren’t limits though, they are just budget notifications.
What would be helpful, would be if when you set up your account there was a default limit – as in an actual limit, where all projects stop working once you go over it - of some sane amount like $5 or $50 or even $500.
I have a handful of toy projects on AWS and Google cloud. On both I have budgets set up at $1 and $10, with notifications at 10% 50% and 90%. It’s great, but it’s not a limit. I can still get screwed if somehow, my projects become targets, and I don’t see the emails immediately or aren’t able to act on them immediately.
It blows my mind there’s no way I can just say, “there’s no conceivable outcome where I would want to spend more than $10 or more than $100 or whatever so please just cut me off as soon as I get anywhere close to that.”
The only conclusion I can come to is that these services are simply not made for small experimental projects, yet I also don’t know any other way to learn the services except by setting up toy projects, and thus exposing yourself to ruinous liability.
I’ve accidentally hit myself with a bigger than expected AWS bill (just $500 but as a student I didn’t really want to spend that much). So I get being annoyed with the pricing model.
But, I don’t think the idea of just stopping charging works. For example, I had some of their machine image thingies (AMI) on my account. They charged me less than a dollar a month, totally reasonable. The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images (as well as shutting down my $500 nodes). This would have been really annoying, I mean putting the images together took a couple hours.
And that’s just for me. With accounts that have multiple users—do you really delete all the disk images on a business’s account, because one of their employees used compute to hit their spend limit? No, I think cloud billing is just inherently complicated.
> The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images
I disagree; a reasonable but customer-friendly interpretation would be to move these into a read-only "recycle bin" storage for e.g. a month, and only afterwards delete them if you don't provide additional budget.
There is no reason that cloud providers shouldn't be able to set up the same kind of billing options that advertisers have had access to for years. In Google and Meta ads I can set up multiple campaigns and give each campaign a budget. When that budget gets hit, those ads stop showing. Why would it be unreasonable to expect the same from AWS?
Cloud providers charge for holding data, for ingress/egress, and for compute (among other things). If I hit my budget by using too much compute, then keeping my data will cause the budget to be exceeded.
The difference is that cloud providers charge you for the “at rest” configuration, doing nothing isn’t free.
Great so they can give you an option to kill all charges except basic storage. Or let you reserve part of your budget for storage. Or let you choose to have everything hard deleted.
Surely these billion and trillion dollar companies can figure out something so basic.
> But, I don’t think the idea of just stopping charging works.
You don't stop CHARGING. You stop providing the service that is accumulating charges in excess of what limit I set. And you give some short period of time to settle the bill, modify the service, etc. You can keep charging me, but provide a way to stop the unlimited accrual of charges beyond limits I want to set.
> No, I think cloud billing is just inherently complicated.
You're making it more complicated than it needs to be.
> The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images.
It's by far certainly not the 'only reasonable interpretation'.
"Stop all charges" is a red herring. No one is asking for a stop on charges. They want an option to stop/limit/cap the stuff that causes the charges.
So, are you looking for some “rate of charges” cap? Like, allow the charges to accumulate indefinitely, but keep track of how much $/sec is being accumulated, and don’t start up new services if it would cause the rate of charges to pass that threshold?
Might work. I do think that part of the appeal of these types of services is that you might briefly want to have a very high $/sec. But the idea makes sense, at least.
A theme of many of the horror stories is something like "I set up something personal, costing a few dollars a month, and I was DDOSed or (in earlier terms) slashdotted out of the blue, and I now have a bill for $17k accumulated over 4 hours".
As someone else pointed out, some(?) services prevent unlimited autoscaling, but even without unlimited, you may still hit a much larger limit.
Being able to say 'if my bill goes above $400, shut off all compute resources' or something like that. Account is still on, and you have X days (3? 1? 14?) to re-enable services, pay the bill, or proceed as you wish.
Yes, you might still want some period of high $/sec, but nearly every horror story in this vein ends with an issue with the final bill. Whether I burn $300 in 5 minutes or 26 days, I want some assurance that the services that are contributing most to that - likely/often EC2 or lambda in the AWS world - will be paused to stop the bleeding.
If you could pipe "billing notification" SNS message to something that could simply shut off public network access to certain resources, perhaps that would suffice. I imagine there's enough internal plumbing there to facilitate that, but even then, that's just AWS - how other cloud providers might handle that would be different. Having it be a core feature would be useful.
I was on a team that had our github CI pipeline routinely shutdown multiple times over a few weeks because some rogue processes were eating up a lot of minutes. We may have typically used $50/$100 per month - suddenly it was $100 in a day. Then... $200. Github just stopped the ability to run, because the credits used were over the limits. They probably could run their business where they would have just moved to charging us hundreds per day, perhaps with an email to an admin, and then set the invoice at $4500 for the month. But they shut down functionality a bit after the credits were exhausted.
The disconnect comes from the difference between 'shut it off' and 'clear the account'. If I read an earlier poster correctly, the claim is "the only reasonable interpretation is to immediately delete the contents of the entire account". But to you point, yes, this seems like it would be pretty easy to grasp. Stop incoming access, don't delete the entire account 5 seconds after I go 3 cents over a threshold.
I missed a water bill payment years ago. They shut off the water. They didn't also come in and rip out all my plumbing and take every drop of water from the house.
Yeah I get it. It just irks that it's something I'd like to spend more time with and learn, but at every corner I feel like I'm exposing myself. For what I have done w/AWS & GCP so far with personal accounts, complete deletion of all resources & images would be annoying to be sure, but still preferable to unlimited liability. Ofc most companies using it won't be in that boat so IDK.
> But, I don’t think the idea of just stopping charging works.
I'm sorry but this is complete bullshit. they can set a default limit of 1 trillion dollars and give us the option to drop it to $5. there's a good reason they won't do it, but it's not this bullshit claim that's always bandied about.
There isn’t an option to not resolve “you’ve reached your billing limit and now storage charges are exceeding it.” You can resolve it by unceremoniously dumping the user data. You can resolve it by… continuing to charge the user, and holding their files hostage until they pay the back storage charges, and then the egress fees (so, it isn’t really a limit at all). Or you can resolve it by just giving the user free storage by some other name.
Just saying that there should be a limit is not an explanation.
I hate how every time this issue mentioned everyone's response is that it would hurt the companies. Literally just make it an option. It's not that difficult for some of the smartest engineers in the world to implement it.
I feel that the likely answer here is that instrumenting real-time spending limit monitoring and cut-off at GCP/AWS scale is Complicated/Expensive to do, so they choose to not do it.
I suppose you could bake the limits into each service at deploy time, but that's still a lot of code to write to provide a good experience to a customer who is trying to not pay you money.
Not saying this is a good thing, but this feels about right to me.
Its not expensive for them, its expensive for their customers. If you went over your spending limit and they deleted all your shit, people would be absolutely apoplectic. Instead they make you file a relatively painless ticket and explain why you accidentally went over what you wanted to spend. This is an engineering trade-off they made to make things less painful for their customers.
There is a huge difference between deleting data and stopping running services.
You're right in that there's a few services that expose this complexity directly, the ones where you're paying for actual storage, but this is just complex, not impossible.
For one thing, storage costs are almost always static for the period, they don't scale to infinite in the same way.
If it’s a web server, sure. But if you drop data because you’re no longer processing it, or you need to do an expensive backfill on an ETL, then turning off compute is effectively the same as deleting data
Why would I apoplectic at Amazon if I set “turn my shit off after it has accrued $10 in charges” to TRUE and they actually followed what I asked them to do?
Is it a serious question? Because then I could have you shutdown just by posting a call to ddos with a link to your search form on an anime image board.
OK? Good! That's what I want to happen! I want that. I do not care if some weirdos on an anime image board can't access some image. I don't want my credit card maxed out.
Is that not a serious request? I play around in the same big-boy cloud as some SaaS company, but I'm on the free tier and I explicitly do not want it to scale up forever, and I explicitly do not want to destroy my credit or even think about having to call Amazon over a $100,000 bill because I set my shit up wrong or whatever. I want it to shut off my EC2 instance once it has used up whatever amount of resources is equal to $X.
Obviously any world with this feature would also feature customizable restrictions, options, decision trees, etc etc. I don't think anyone is or was suggesting that someone's SaaS app just gets turned off without their permission.
They could add it as an optional limit. If it's on and is exceeded, stop everything. Surely the geniuses at Amazon (no they really are, I'm not joking) can handle it.
What about the space you're using? Do they delete it? Remove all your configurations? Prevent you from doing anything with your account until you up your limit or wait until your month resets?
If you're worried about getting a big bill, and you don't care if it gets shut off when you're not using it, why don't you shut it down yourself?
AWS made the tradeoff to keep the lights on for customers and if there is a huge bill run up unintentionally and you contact them with it they refund it. I've never experienced them not doing this when I've run up five figure bills because of a misconfiguration I didn't understand. I don't think I've ever even heard of them not refunding someone who asked them for a refund in good faith.
How many times has AWS refunded you a five figure bill? I've heard stories from people who got refunded but were told that it would be the first and last time they would get a refund.
I agree that that’s the likely explanation. It just feels infuriating that the services are sold as easy to get started and risk free with generous free tiers, inviting people and companies to try out small projects, yet each small experiment contains an element of unlimited risk with no mitigation tools.
Pass a law requiring cloud compute providers to accept a maximum user budget and be unable to charge more than that, and see how quickly the big cloud providers figure it out.
There is no such thing as “signing up for a free tier” at least there wasn’t before July of this year. Some services have free tiers for a certain amount of time and others have an unlimited free tier that resets every month.
You can attach an action to that budget overage that applies a "Deny" to an IAM and limits costly actions (that's for small accounts not in an Org. Accounts with an Org attached also have the option of applying an SCP which can be more restrictive than an IAM "Deny")
This isn't a great answer to the overall issue (which I agree is a ridiculous dark pattern), but I've used Privacy.com cards for personal projects to hard spend at a card level so it just declines if it passes some threshold on a daily/weekly/monthly/lifetime basis. At work, I do the same thing with corporate cards to ensure the same controls are in place.
Now, as to why they're applying the dark pattern - cynically, I wonder if that's the dark side of usage/volume based pricing. Once revenue gets big enough, any hit to usage (even if it's usage that would be terminated if the user could figure out how) ends up being a metric that is optimized against at a corporate level.
> The only conclusion I can come to is that these services are simply not made for small experimental projects, yet I also don’t know any other way to learn the services except by setting up toy projects
Yeah, I'm sure this is it. There is no way that feature is worth the investment when it only helps them sell to... broke individuals? (no offense. Most individuals are broke compared to AWS's target customer).
> There can be a delay between when you incur a charge and when you receive a notification from AWS Budgets for the charge. This is due to a delay between when an AWS resource is used and when that resource usage is billed. You might incur additional costs or usage that exceed your budget notification threshold before AWS Budgets can notify you, and your actual costs or usage may continue to increase or decrease after you receive the notification.
As far as I know, neither Google, Amazon or Azure have a budget limit, only alerts.
This is a reason why I am not only clueless of anything related to cloud infrastructure unless it's stuff I am doing on the job, nor I am willing to build anything on these stacks.
And while I guess I have less than 10 products build with these techs, I am appeal by the overall reliability of the services.
Oh lastly, for Azure, in different European regions you can't instance resources, you need to go through your account representative who asks authorization from the US. So much for now having to deal with infrastructure pain.
It's just a joke.
I've used Azure with spending limits. They do work, they shut down things, and the lights go off. [1], Only some external resources you are unlikely to use don't follow spending limits, but when you create such resources, they are clearly marked as external.
These limits are only for subscriptions with a credit amount e.g. $200 trials, Visual Studio subscriptions etc.
As soon as you are on a pay as you go, you only have access to budget limit.
As others have said these are not limits, just notifications. You can’t actually create a limit unless you self create one using another AWS service (surprise) like lambda to read in the reports and shut things down.
And as others have also mentioned, the reports have a delay. In many cases it’s several hours. But worst case, your CURs (Cost usage reports) don’t really reflect reality for up to 24 hours after the fact.
I work in this space regularly. There can be a delay of 2-3 days from the event to charge. Seems some services report faster than others. But this means by the time you get a billing alert it has been ongoing for hours if not days.
To all of those who say "this is not limit, only notifications": yes, notifications that can trigger whatever you want, including a shutdown of whatever you have
Is this a perfect solution: no
Is this still a solution: yes
If you sign up for electrical service for your house, and your shithead neighbor taps your line to power his array of grow lamps and crypto mining rigs, the power company will happily charge you thousands of dollars, and you will need a police report and traverse many layers of customer service hell to get a refund. If you sign up for water service and a tree root cracks your pipe, the water company will happily charge you thousands of dollars for the leaked water, and will then proceed to mandate that you to fix the broken pipe at your own expense for a couple tens of thousands more; and yes, that may well bankrupt you, water company don't care. So why do you expect different treatment from a computing utility provider?
> If you sign up for electrical service for your house, and your shithead neighbor taps your line to power his array of grow lamps and crypto mining rigs, the power company will happily charge you thousands of dollars
Unlike cloud services, your electrical service has a literal circuit breaker. Got a regular three-phase 230V 25A hookup? You are limited to 17.25kW, no way around that. If that shithead neighbor tries to draw 50kW, the breaker will trip.
If it were the cloud, the power company would conveniently come by to upgrade your service instead. A residential home needing a dedicated 175MW high-voltage substation hookup? Sure, why not!
Water leaks, on the other hand, tend to be very noticeable. If a pipe bursts in the attic you'll end up with water literally dripping from the ceiling. It is very rare to end up with a water leak large enough to be expensive, yet small enough to go unnoticed. On the other hand, the cloud will happily let your usage skyrocket - without even bothering to send you an email.
There are plenty of compute service providers working with a fixed cap, a pre-pay system, or usage alerts. The fact that the big cloud providers don't is a deliberate choice: the goal is to make the user pay more than they wanted to.
In addition to everything that's already been mentioned, another obvious difference is that energy and water are finite resources that are already provided at relatively low margins. Cloud services are provided at obscene gross margins. The numbers are all made-up and don't reflect the actual costs in providing those services.
I don't know in US, but having limits on how much electricity a house is able to take from the gride is absolutely something in some countries out there.
At least in my country the metering is done _in_ the house so my neighbour has to break and enter to tap the line behind the meter. I would probably notice well before bills would pile up. If he taps it outside, probably no one would ever notice if done right. The grid looses energy all the time. Not every kWh that goes into the network is billed in the end.
As always, it just doesn’t make an awful lot of sense to compare physical and virtual worlds. As in leaving your front door unlocked in rural areas vs not securing your remote shell access.
For your scenarios, I have the police, the public service commission, utility regulators, my elected officials and homeowners insurance to potentially help. Not that it always works, not that it's easy, quick or without pain, but there are options.
For the cloud, I have the good will of the cloud provider and appealing to social media. Not the same thing.
The first instance is difficult to fix as crime can often involve substantial losses to people and often there's no route to getting a refund.
The broken water pipe should be covered by buildings insurance, but I can imagine it not being covered by some policies. Luckily a broken water pipe is likely not as expensive as not having e.g. third party liability protection if part of your roof falls off and hits someone.
I think one of the reasons I appreciate AWS so much is that any time there has been snafu that led to a huge bill like this they've made it pretty painless to get a refund- just like you experienced.
If it is a "free tier", Amazon should halt the application when it exceeds quota. Moving the account to a paid tier and charging $100k is not the right thing to do.
Yes. They said it was free then they surprise charge you $100k.
That’s an insane amount of both money and stress. You’re at Amazon’s mercy if they will or will not refund it. And while this is in process you’re wondering if your entire financial future is ruined.
I have never in 8 years of being in the AWS ecosystem and reading forums and Reddits on the internet had anyone report that AWS wouldn’t refund their money.
If you go over your budget with AWS, what should AWS do automatically? Delete your objects from S3? Terminate your databases and EC2 instances? Besides, billing data collection doesn’t happen anywhere near realtime, consider it a fire hose of streaming data that is captured asynchronously.
Provide the user the tools to make these choices. Give the option to explicitly choose how durable to extreme traffic you want to be. Have the free tier default to "not very durable"
Bam, you said. They’d do it if they cared, but they don’t and prefer the status quo. 100k surprise bill is the type of thing people kill themselves over. Horrific
You mean like having a billing alert send an event that allows you to trigger custom actions to turn things off? That already exists. It has for years.
I agree, but I could also see how someone would complain about that: “Our e-commerce site was taken down by Amazon right on our biggest day of the year. They should have just moved us up to the next tier.”
Seems like the most flexible option is to put a spending limit in place by default and make it obvious that it can affect availability of the service if the limit is reached.
My credit cards have credit limits, so it makes sense that a variable cost service should easily be able to support a spending limit too.
You're misunderstanding the offering. (Maybe that's their fault for using intentionally misleading language... but using that language in this way is pretty common nowadays, so this is important to understand.)
For a postpaid service with usage-based billing, there are no separate "free" and "paid" plans (= what you're clearly thinking of when you're saying "tiers" here.)
The "free tier" of these services, is a set of per-usage-SKU monthly usage credit bonuses, that are set up in such a way that if you are using reasonable "just testing" amounts of resources, your bill for the month will be credited down to $0.
And yes, this does mean that even when you're paying for some AWS services, you're still benefitting from the "free tier" for any service whose usage isn't exceeding those free-tier limits. That's why it's a [per-SKU usage] tier, rather than a "plan."
If you're familiar with electricity providers telling you that you're about to hit a "step-up rate" for your electricity usage for the month — that's exactly the same type of usage tier system. Except theirs goes [cheap usage] -> [expensive usage], whereas IaaS providers' tiers go [free usage] -> [costed usage].
> Amazon should halt the application when it exceeds quota.
There is no easy way to do this in a distributed system (which is why IaaS services don't even try; and why their billing dashboards are always these weird detached things that surface billing only in monthly statements and coarse-grained charts, with no visibility into the raw usage numbers.)
There's a lot of inherent complexity of converting "usage" into "billable usage." It involves not just muxing usage credit-spend together, but also classifying spend from each system into a SKU [where the appropriate bucket for the same usage can change over time]; and then a lot of lookups into various control-plane systems to figure out whether any bounded or continuous discounts and credits should be applied to each SKU.
And that means that this conversion process can't happen in the services themselves. It needs to be a separate process pushed out to some specific billing system.
Usually, this means that the services that generate billable usage are just asynchronously pushing out "usage-credit spend events" into something like a log or message queue; and then a billing system is, asynchronously, sucking these up and crunching through them to emit/checkpoint "SKU billing events" against an invoice object tied to a billing account.
Due to all of the extra steps involved in this pipeline, the cumulative usage that an IaaS knows about for a given billing account (i.e. can fire a webhook when one of those billing events hits an MQ topic) might be something like 5 minutes out-of-date of the actual incoming usage-credit-spend.
Which means that, by the time any "trigger" to shut down your application because it exceeded a "quota" went through, your application would have already spent 5 minutes more of credits.
And again, for a large, heavily-loaded application — the kind these services are designed around — that extra five minutes of usage could correspond to millions of dollars of extra spend.
Which is, obviously, unacceptable from a customer perspective. No customer would accept a "quota system" that says you're in a free plan, yet charges you, because you accrued an extra 5 minutes of usage beyond the free plan's limits before the quota could "kick in."
But nor would the IaaS itself just be willing to eat that bill for the actual underlying costs of serving that extra 5 minutes of traffic, because that traffic could very well have an underlying cost of "millions of dollars."
So instead they just say "no, we won't implement a data-plane billable-usage-quota feature; if you want it, you can either implement it yourself [since your L7 app can observe its usage 'live' much better than our infra can] or, more idiomatically to our infra, you can ensure that any development project is configured with appropriate sandboxing + other protections to never get into a situation where any resource could exceed its the free-tier-credited usage in the first place."
Really? You’re not “disputing it”. You were charged fair and square. You send an email to their customer support and they say “no problem” and help you prevent it in the future.
And what if they don't say "no problem"? Like the Netlify case where they at first offered a reduced bill (which was still a lot) before the post got viral and the CEO stepped in.
Amazon is currently permissive which splits opposition, this won’t always be the case, they will tighten the screws eventually as they have done in the past in other areas. Amazon because it’s so broadly used undermines the utility of chargebacks, you can do it but it’ll be a real hassle to not be able to use Amazon for shopping. A lot of people will just eat the costs, is Amazon knows this they will force the situation more often because it’ll make them more money.
Putting stuff on the internet is dangerous, but the absence of hard caps is a choice and it just looks like another massive tech company optimizing for their own benefit. Another example of this is smartphone games for children, it's easier for a child to spend $2,000 then it is for a parent to enforce a $20/month spending limit.
Yes, you as a developer should know something about how the service works before you use it. I first opened the AWS console in 2016 and by then I had read about the possible gotchas.
You do know that large corporations and startups employ junior devs as well, right?
All else being equal, would you rather choose the platform where a junior dev can accidentally incur a $1M bill (which would already bankrupt early startups), or the platform where that same junior dev get a "usage limits exceeded - click here to upgrade" email?
As the saying goes, when you owe the bank $100 you've got a problem, when you owe the bank $100k the bank has a problem...
On serverless, I can enter numbers in a calculator and guess that running my little toy demo app on AWS will cost between $1 and $100. Getting hit with a huge $1000 bill and a refusal to refund the charges (and revocation of my Prime account and a lifetime ban from AWS and cancellation of any other services I might otherwise run there) would be totally possible, but I have zero control over that. Expecting to go on social media begging for a refund is not a plan, it's evidence of a broken system - kinda like those "heartwarming" posts about poor people starting a GoFundMe so their child can afford cancer treatment. No, that's awful, can we just be sensible instead?
If a server would have cost me $20 at a VPS provider to keep a machine online 24/7 that was at 1% utilization most of the time and was terribly laggy or crashed when it went viral, that's what $20 buys you.
But, you say, analysis of acttual traffic says that serverless would only cost me $10 including scaling for the spike, in which case that's a fantastic deal. Half price! Or maybe it would be $100, 5x the price. I have no way of knowing in advance.
> (and revocation of my Prime account and a lifetime ban from AWS and cancellation of any other services I might otherwise run there)
Also a vital lesson from the big tech companies that sell a wide variety of services: don't get your cloud hosting from a company that you also use other services from.
I had to disable photo syncing because Google photos eats up my Gmail space. Having Amazon's cloud billing fuckup threaten your TV access is another level.
We clearly need to keep the option open to burn those bridges.
In any case, if I ever host anything, I'm going to host it from my home.
You haven’t been able to use your Amazon retail account to open an AWS account for years. You don’t “beg”. You just send them an email and they say “yes”.
I have never in 9 years working with AWS - four at product companies as the architect, 3.5 at AWS itself working in the Professional Services department and the last two working at 3rd party companies - ever heard or read about anyone either on a personal project or a large organization not be able to get a refund or in the case of a large org, sometimes a credit from AWS when they made a mistake that was costly to them.
From your bragging one could tell that you have seen _a lot_ of charging mistakes and "happy" refund stories from AWS. It's scary that a single human can do extensive statistics on personal experience about these monetary horror stories, don't you think?
I assume you have seen many casual instances of cost overrun in that time. I'm sure you've also seen instances where an extra $10k flies out the door to AWS and people think "no big deal, that one was on us." This world doesn't have to exist. Even if AWS has a policy of always refunding people for a big oopsie, the fact that you have seen so many big ones suggests that you have also seen a lot of little ones.
By the way, there is nothing stopping AWS from reversing their trend of issuing refunds for big mistakes. "It hasn't happened in the past" isn't a great way to argue "it won't happen in the future."
Sure. The issues with AWS could all be solved with decent billing software, though. 15 years in there isn't a good excuse for this state of the world except that it's profitable.
You can set up billing alerts to trigger actions that stop things when they trigger. The easiest way is to take permissions away from the roles you create.
They give you the tools. It’s up to you to use them. If that’s too difficult, use the AWS LightSail services where you are charged a fixed price and you don’t have to worry about overages or the new free tier
Because despite what everyone here is saying, before July of this year, there was no such thing as a free tier of AWS, there was a free tier of some of their services
Both of these are about an account compromise, which is a really fascinating story about incentives. An accidental overrun on something you designed on AWS indicates you are hooked on their drugs, so obviously the dealer is happy to give you another free hit after you had a bad trip. That's good marketing. An account compromise has no intention, so giving you a refund is just a waste.
Must be really nice people there which don't want any money. Really warms my heart.
ofc. When things go viral they say "yes". But i would really love to get some number how many students and hobbiests got a 1k-2k bill and just paid it for the problem to go away.
Amazon is a publicly traded company. If they wave fees every time something goes wrong, investors would tell them something.
AWS and all of the other cloud providers gives millions in credits each year for migrations and professional services for both their inside professional services department and third party vendors. The reputational risk for them to go after some poor student isn’t worth it to them. The same is true for Azure and GCP.
Have you read one even anecdotal case where AWS didn’t immediately give a refund to a student who made a mistake just by them asking?
You are a bit naive. They are making a ton of money with this dark pattern. As others have said Free-to-100K is not in the most generous realm of expectations. Its also why they have been doing the refunds as long as AWS has been a thing. They know it will not hold up in court. Not a month goes by without some HN story about something like this post.
They do this and make it easy to get a refund because for every demo account that does it some bigger account accidentally gets billed 10K and they have to pay for it. They have skin in the game and cannot risk their account being down for any time period.
Counter real world example. I was doing some consult work for a place that had a 9k unexpected charge on AWS. They asked about it and I explained about how they could dispute it. They said ugh never-mind and just paid it. FYI it was a charity which I've since learned its common for charities to be wasteful like this with money since they are understaffed and OPM(Other peoples money)
So how is that a counter example? The client never asked for a credit. Since the startup I worked for, I have been working in AWS consulting - first directly at AWS (Professional Services) and now a third party consulting company.
While I have no love for AWS as an employer, I can tell you that everyone up and down the chain makes damn sure that customers don’t waste money on AWS. We were always incentivized to find the lowest cost option that met their needs. AWS definitely doesn’t want customers to have a bad experience or to get the reputation that it’s hard to get a refund when you make a mistake.
Of course AWS wanted more workloads on their system.
Semi related, AWS will gladly throw credits at customers to pay for the cost of migrations and for both inside and outside professional services firms to do proof of concepts.
You, but shorter: It can't be done perfectly in 100.0% of all possible circumstances, so better to do absolutely nothing at all. On an unrelated note, this strongly aligns with their economic interests.
For storage specifically, in that circumstance, if you weren't hellbent on claiming otherwise: it's easy to figure out what to do. For storage: block writes and reach out to the customer. Also, people are extremely unlikely to accidentally upload eg 250tb which is how you'd get to, say, $200/day. Whereas similar bills are extremely easy to accidentally create with other services.
It's totally reasonable to want spend limits firmer than AWS' discretion, which they can revoke at any point in time for any reason.
What would the would “tier” mean here? There is a US tax bracket (tier) where no tax is due on long-term capital gains. That doesn’t mean it’s wrong when I pay long-term capital gains.
There's an expectation when it comes to consumer goods, and even protection in most jurisdictions, that you can't simply charge someone for something they don't want. It's like dropping a Mercedes at someone's house then charging them for it when they never wanted or asked for it. Allowing a "free" tier to run up so much traffic that it becomes a $100k bill is ridiculous and probably illegal.
Taxes are different because they never exceed the amount the person paying the taxes receives.
Once I've been kidnapped by a guy who also happen to run a security business. After a bit of discussion, I was about to convince some of his sbire to release me without paying the ransom. I'm so glad they did accept that, and I never fail to use and recommend the services of the security business now.
Since this seems to be getting some comments. Yes, it is in fact easy to shut down an instance if it goes over a spending limit. As in you monitor traffic tied directly to the billing system and you set up an if statement and if it goes over the limit you shut down the server and dump the service to a static drive.
It's the easiest thing in the world - they just don't want to because they figured that they could use their scale to screw over their customers. And now you have the same guys who screwed everyone over with cloud compute wanting you to pay for AI by using their monopoly position to charge you economic rents. Because now things like edge compute is easy because everyone overspent on hard drives because of crypto. And so you have jerks who just move on to the next thing to use their power to abuse the market rather than build credibility because the market incentivizes building bubbles and bad behavior.
Smart evil people who tell others "no you're just too dumb to 'get it' (oh by the way give me more money before this market collapses)" are the absolute bane of the industry.
It's weird that you have people in here defending the practice as if it's a difficult thing to do. Taxi cabs somehow manage not to charge you thousands of dollars for places you don't drive to but you can't set up an if statement on a server? So you're saying Amazon is run by people that are dumber than a taxi cab company?
Ok, well you might have a point. And this is how Waymo was started. I may or may not be kidding.
I've got a $25k bill right now because I had enabled data-plane audit logging on an sqs queue that about a year ago I had wired to receive a real-time feed of audit events. So for every net-new audit event there would be an infinite loop of write events to follow. My average daily bill is about $2 on that account and has been for nearly ten years. It suddenly ballooned to $3k/day and zero warning or intervention from AWS.
> I had them refund the bill (as in how am I going to pay it?) but to this day I've hated Amazon with a passion
They refunded you $100k with few questions asked, and you hate them for it?
I’ve made a few expensive mistakes on AWS that were entirely my fault, and AWS has always refunded me for them.
I imagine if Amazon did implement “shut every down when I exceed my budget” there’d be a bunch of horror stories like “I got DDOSed and AWS shutdown all my EC2s and destroyed the data I accidentally wrote to ephemeral storage.”
> They refunded you $100k with few questions asked, and you hate them for it?
They exposed him to 100K of liability without any way to avoid it (other than to avoid AWS entirely), and then happened to blink, in this case, with no guarantee that it would happen again. If you don't happen to have a few hundred thousand liquid, suddenly getting a bill for 100K might well be a life-ruiningly stressful event.
Yes, he could have set up a billing alert that triggered an action to shut everything down. Easy way is to take away privileges from the IAM roles attached to the processes.
Bad design if that isn't in place for a new free-tier experiment.
This is the problem right here. I moved from AWS and specifically Beanstalk because I don't want to be some "certified AWS goblin". I just wanted to host something sensibly.
Other hosting companies don't have this problem and while I cannot complain about AWS as a service, this can be improved if there would be the will to do so. I believe there are other incentives at work here and that isn't a service to the customer.
Given how complicated configuring AWS is, surely there could be some middle ground between stop all running services and delete every byte of data. The former is surely what the typical low spend account would desire.
In what world is that not the preferable solution? Want to know if your shit is actually robust just set your cap and ddos yourself as the first test of you architecture.
I’ve never trusted AWS with personal work for exactly this reason. If I want to spend $20 on a personal project I should be able to put a cap on that directly, not wake up to a $100k bill and go through the stress of hoping it might be forgiven.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity. No problem that makes sense
It does on the surface, but what doesn't make sense is to register with a credit card and not read the terms very carefully: both for the cloud service and for the bank service.
In this aspect cash is so much better because you have only one contract to worry about...
I use AWS out of expedience but I hate the no-hard-cap experience and this is my primary reason for shifting (WIP) to self hosting. Plus self hosting is cheaper for me anyway. In general I would like a legally forced liability limit on unbounded subscription services, perhaps a list maintained at the credit card level. If the supplier doesn’t like the limit they can stop supplying. The surprise $100K liabilities are pure insanity.
I have no idea. I know Azure does for the student/msdn/and similar accounts which are the only cloud services I use for personal projects. So I Azure doesn’t even have my credit card.
Cloud Run lets you cap the number of instances when you create a service. So you can just set max_instances to 1 and you never have to worry about a spambot or hug of death from blowing up your budget. I run all my personal sites like this and pay (generally) nothing.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance
Didn't the bootcamp told you to, at least, setup a budget alert?
I'm not trying to reduce AWS' responsibility here, but if a teaching program tells you to use AWS but doesn't teach you how to use it correctly, you should question both AWS and the program's methods.
It’s interesting because on the posted site there’s only 2 AWS posts on the main page and they’re rather mild compared to the other posts using google, vercel, cloudformation, etc.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity.
Is it just me or is this just a cheap excuse to grab a payment method from unsuspecting free-tier users?
AWS services aren't designed for people just learning to program. Beanstalk and other services have billing limits you can set, but those aren't hard limits because they are measured async to keep performance up.
With that said, AWS is notoriously opaque in terms of "how much will I pay for this service" because they bill so many variable facets of things, and I've never really trusted the free tier myself unless I made sure it wasn't serving the public.
Not that Amazon needs any defending, but for anyone running a bootcamp: this is a good reason to start with services like Heroku. They make this type of mistake much harder to make. They're very beginner friendly compared to raw AWS.
It's easy yes, but better than nothing. The verification requirements are a balance between desired conversion rate, probability of loss (how many bad guys want to exploit your system without paying) and the actual costs of said loss (in this case it's all bullshit "bandwidth" charges, so no actual loss to AWS).
Something similar happened to me, but not at the outrageous scale. I wanted to try some AI example on Bedrock. So the tutorial said I needed to set up some OpenSearch option. Voila. A few days later I had a bill for $120. The scale is not as horrible, but the principle is the same.
And then they pull out the invoice where they prove without any doubt that you actually used pay-per-use services and ran up a 100k bill because you failed to do any sort of configuration.
> I didn't use them, some bots did. Sort it out with them.
For you to put together this sort of argument with a straight face, you need to have little to no understanding of how the internet in general, services and web apps work. Good luck arguing your case with a judge.
There are light-years between what a company thinks their ToS “allow” and what a court would actually determine is allowed. Plenty of ToS clauses in major contracts are unenforceable by law.
In this situation if it were to actually go to court I’d expect the actual bill to get significantly reduced at the very least. But because it’s a completely bullshit charge based on bandwidth usage (which costs them nothing) it won’t go anywhere near that and will just get written off anyway.
Courts can be rather capricious, I’d rather avoid them as best as possible, even if you are likely to win having to fight something like this in court is punishing.
Yes, but it's better they need to get their money than you need to get your money back. 100.000 easily can put you in ruining debt. It's the better position to still have your money even if you have to pay.
> Amazon then charged me one hundred thousand dollars as the server was hit by bot spam.
That would make you one of the most successful websites on the internet, or the target of a DDoS -- which was it? I assume you're not saying that "bots" would randomly hit a single, brand-new "hello world" site enough to generate that kind of bill.
Many of the people who have this problem on toy websites end up offering what amounts to free storage or something similar. They are then surprised when "bots" come to "DDoS" them. These bills are as much a product economics problem as a technical one.
> cloud computing I'd use anyone else for that very reason. The entire service with it's horrifically complicated click through dashboard just to confuse the customer into losing money.
I feel like this brand of sentiment is everywhere. Folks want things simple. We often figure out what we need to do to get by.
Over time we learn the reason for a handful of the options we initially defaulted through, find cause to use the options. Some intrepid explorers have enough broader context and interest to figure much more out but mostly we just set and forget, remembering only the sting of facing our own ignorance & begrudging the options.
This is why k8s and systemd are have such a loud anti-following.
allways set cost alarm and max spending. AWS has great tools to controll costs.
You could have blocked this with good config but I understand its confusing and not super apparent. IMHO there should be a pop up or sth asking " you want to stop the instance the moment it costs anything?"
its so easy to get billed a ridicules amount if money
Did you do any training before launching the elastic beanstalk instance, or you just though a F-16 should be pretty easy to fly, at least according to most pilots?
An F-16 doesn't have a prominently-featured "getting started" tutorial, which has a bunch of step-by-step instructions getting a complete novice 40.000ft into the air at mach 2.
AWS also provides training and education on how to use their services. If launching a "hello world" Elastic Beanstalk instance is so dangerous, why doesn't the tutorial require you to first provide proof that you are an AWS Certified Cloud Practitioner?
> The entire service with it's horrifically complicated click through dashboard (but you can get a certification! It's so complicated they invented a fake degree for it!) just to confuse the customer into losing money.
By that logic, any technology that you can get certified in is too complicated?
Most systems are now distributed and presenting a holistic view of how it was designed to work can be useful to prevent simple mistakes.
Traffic requires a certification (license) too. Must be a fake degree as well because they made it too complicated
> By that logic, any technology that you can get certified in is too complicated?
That is a common view in UX, yes. It's a bit of an extreme view, but it's a useful gut reaction
> Traffic requires a certification (license) too. Must be a fake degree as well because they made it too complicated
In the US roads are designed so that you need as close to no knowledge as possible. You need to know some basic rules like the side of the road you drive on or that red means stop, but there is literal text on common road signs so people don't have to learn road signs. And the driving license is a bit of a joke, especially compared to other Western countries
There is something to be said about interfaces that are more useful for power users and achieve that by being less intuitive for the uninitiated. But especially in enterprise software the more prevalent effect is that spending less time and money on UX directly translates into generating more revenue from training, courses, paid support and certification programs
> By that logic, any technology that you can get certified in is too complicated?
In IT, I am inclined to agree with that. In real engineering, it's sometimes necessary, especially dangerous technology and technology that people trust with their life
> dangerous technology and technology that people trust with their life
Software runs on so many things we depend on IMO it also in many cases falls in the "dangerous technology" category.
Non-hobby OSes, non-hobby web browsers, device drivers, software that runs critical infrastructure, software that runs on network equipment, software that handles personal data, --IMHO it would not be unreasonable to require formal qualifications for developers working on any of those.
The history of making things complicated often involves "unintended" use by malicious actors.
But infact, it is intended side effects. Things like Jaywalking or "no spitting" laws let police officers harass more people _At their whim_. And they're fullying designed that way but left as "unintended" for the broader public scrutiny.
So, just like, learn that "logic" is not some magic thing you can sprinkle on everything and find some super moral or ethic reality. You have to actually integrate the impact through multiple levels of interaction to see the real problem with "it's just logic bro" response you got here.
The problem with the AWS certificate is that the entity issuing the certificate and the entity honoring the certificate have opposing priorities. When a company wants to use AWS, preferably they'd want to avoid needlessly expensive solutions and vendor lock-in, while Amazon wants to teach people how to choose needlessly expensive solutions with vendor lock-in.
Businesses are only taxed on actual revenue earned.
What you decide to charge—whether $100, $50, or even giving it away for free—is purely a business decision, not a tax one.
—
This is different from a nonprofit donation scenario though. For example, if your service normally costs $X but you choose to provide it for free (or at a discount) as a donation to a non-profit, you can typically write off the difference.
> Businesses are only taxed on actual revenue earned.
I don't want to go too far down the rabbit hole of hn speculation, but if another entity owes you 100k, and they go bankrupt, there absolutely are tax implications.
A lot of things are "fraud" when an individual or small business does it but perfectly normal and considered merely good business acumen when done by a big corporation. Even more so now that the US government is openly for sale (it was always for sale, but before at least they had the decency to pretend it wasn't).
Yeah man the whole industry is like that. OpenAI gets to say they raised X billion dollars and update their valuation but they don't mention that it's all cloud compute credits from a gigantic Corp that owns a huge amount of the business. They claim to be a non-profit to do the research then when they've looted the commons, they switch to for profit to pay out the investors. There's shit like this throughout the industry.
I took a workshop class and was told to setup a track saw. The course didn't bother explaining how to utilize it properly or protect yourself. I ended up losing a finger. I truly hate Stanley Tools with a passion and if I ever need to use another track saw, I'll use someone else.
This analogy would make sense if the saw lacked a basic and obvious safety feature (billing limits) because Stanley profited immensely from cutting your finger off.
What seems like a basic feature to you is a hindrance to me. I don’t want to have to disable “safeguards” all over the place just because of loud and rare complaints.
Protect yourself how? Most cloud providers don't support any way to immediately abort spending if things get out of hand, and when running a public-facing service there are always variables you can't control.
Even if you rig up your own spending watchdog which polls the clouds billing APIs, you're still at the mercy of however long it takes for the cloud to reconcile your spending, which often takes hours or even days.
Yes, they do. You create resources and you delete resources and if you care about cost you creat alarms and tie them to scripts that automatically delete resources.
No. Stanley Tools owns the hospital and would profit from the operation, but when you said you don't have the money they decided to let you go. Perhaps because legally they would have to anyway, or otherwise they would suffer various legal and reputational consequences.
I'm a safety inspector. Of course this is much more nuanced than this. One crucial aspect of a tool safety is proper documentation. It's also important who the tool is targeted for. There are different safety standards based on user's competence. Some "tools" will be toys for children, some will be for disabled people including people with intellectual disabilities, some will be for general populace, and only some for trained experts.
If a tool is designed for experts, but you as the manufacturer or distributor know the tool is used by general populace, you know it's being misused every now and then, you know it harms the user AND YOU KNOW YOU BENEFIT FROM THIS HARM, AND YOU COULD EASILY AVOID IT - that sounds like something you could go to jail for.
I think if Amazon was a Polish company, it would be forced by UOKiK (Office of Competition and Consumer Protection) to send money to every client harmed this way. I actually got ~$150 this way once. I know in USA the law is much less protective, it surprises me Americans aren't much more careful as a result when it comes to e.g. reading the terms of service.
I thought this would be about the horrors of hosting/developing/debugging on “Serverless” but it’s about pricing over-runs. I scrolled aimlessly through the site ignoring most posts (bandwidth usage bills aren’t super interesting) but I did see this one:
> I thought this would be about the horrors of hosting/developing/debugging on “Serverless” but it’s about pricing over-runs.
Agreed about that. I was hired onto a team that inherited a large AWS Lambda backend and the opacity of the underlying platform (which is the value proposition of serverless!) has made it very painful when the going gets tough and you find bugs in your system down close to that layer (in our case, intermittent socket hangups trying to connect to the secrets extension). And since your local testing rig looks almost nothing like the deployed environment...
I have some toy stuff at home running on Google Cloud Functions and it works fine (and scale-to-zero is pretty handy for hiding in the free tier). But I struggle to imagine a scenario in a professional setting where I wouldn't prefer to just put an HTTP server/queue consumer in a container on ECS.
I've had similar experiences with Azures services. Black boxes impossible to troubleshoot. Very unexpected behavior people aren't necessarily aware of when they initially spin these things up. For anything important I just accept the pain of deploying to kubernetes. Developers actually wind up preferring it in most cases with flux and devsoace.
I recently had customer who had smart idea to protect Container Registry with firewall... Breaking pretty much everything in process. Now it kinda works after days of punching enough holes in... But I still have no idea where does something like Container registry pull stuff from, or App Service...
And does some of their suggested solutions actually work or not...
Every time I've done a cost benefit analysis of AWS Lambda vs running a tiny machine 24/7 to handle things, the math has come out in favor of just paying to keep a machine on all the time and spinning up more instances as load increase.
There are some workloads that are suitable for lambda but they are very rare compared to the # of people who just shove REST APIs on lambda "in case they need to scale."
Is that what people do is test/develop primarily with local mocks of the services? I assumed it was more like you deploy mini copies of the app to individual instances namespaced to developer or feature branch, so everyone is working on something that actually fairly closely approximates prod just without the loading characteristics and btw you have to be online so no working on an airplane.
SST has the best dev experience but requires you be online. They deploy all the real services (namespaced to you) and then instead of your function code they deploy little proxy lambdas that pass the request/response down to your local machine.
It’s still not perfect because the code is running locally but it allows “instant” updates after you make local changes and it’s the best I’ve found.
There are many paths. Worst case, I've witnessed developers editing Lambda code in the AWS console because they had no way to recreate the environment locally.
If you can't run locally, productivity drops like a rock. Each "cloud deploy" wastes tons of time.
Mocks usually don’t line up with how things run in prod. Most teams just make small branch or dev environments, or test in staging. Once you hit odd bugs, serverless stops feeling simple and just turns into a headache.
Yeah, I’ve never worked at one of those shops but it’s always sounded like a nightmare. I get very anxious when I don’t have a local representative environment where I can get detailed logs, attach a debugger, run strace, whatever.
I raised that exact same issue to AWS in ~2015 and even though we had an Enterprise support plan, AWS response was basically: well, you problem.
We then ended up deleting the S3 bucket entirely, as that appeared to be the only way to get rid of the charges, only for AWS to come back to use a few weeks later telling us there are charges for an S3 bucket we previously owned. After explaining to them (again) that this way our only option to get rid of the charges, we never heard back.
Seems an interesting oversight. I can just imagine the roundtable, uhh guys who do we charge for 403? Who can we charge? But what if people hit random buckets as an attack? Great!
> Seems an interesting oversight. I can just imagine the roundtable, uhh guys who do we charge for 403? Who can we charge? But what if people hit random buckets as an attack? Great!
It is amazing, isn't it? Something starts as an oversight but by the time it reaches down to customer support, it becomes an edict from above as it is "expected behavior".
> AWS was kind enough to cancel my S3 bill. However, they emphasized that this was done as an exception.
The stench of this bovine excrement is so strong that it transcends space time somehow.
The devs probably never thought of it, the support people who were complained to were probably either unable to reach the devs, or time crunched enough to not be able to, and who as a project manager would want to say they told their Devs to fix an issue that will lose the company money!
> I reported my findings to the maintainers of the vulnerable open-source tool. They quickly fixed the default configuration, although they can’t fix the existing deployments.
Anyone wanna guess which open source tool this was? I'm curious to know why they never detected this themselves. I'd like to avoid this software if possible as the developers seem very incompetent.
How to destroy your competition. Love it. Also why i dislike AWS. Zero interest to protect their SMB customers from surprise bills. Azure isn't much better but at least they got a few more protections in place.
Same, I was hoping for tales of woe and cloud lock-in, of being forced to use Lambda and Dynamo for something that could easily run on a $20/month VPS with sqlite.
The webflow one at the top has an interesting detail about them not allowing you to offload images to a cheaper service. Which you can probably work around by using a different domain.
> Imagine you create an empty, private AWS S3 bucket in a region of your preference. [...] As it turns out, one of the popular open-source tools had a default configuration to store their backups in S3. And, as a placeholder for a bucket name, they used… the same name that I used for my bucket.
What are the odds?
(Not a rhetorical question. I don't know how the choice of names works.)
The assignment of blame for misconfigured cloud infra or DOS attacks is so interesting to me. There don't seem to be many principles at play, it's all fluid and contingent.
Customers demand frictionless tools for automatically spinning up a bunch of real-world hardware. If you put this in the hands of inexperienced people, they will mess up and end up with huge bills, and you take a reputational hit for demanding thousands of dollars from the little guy. If you decide to vet potential customers ahead of time to make sure they're not so incompetent, then you get a reputation as a gatekeeper with no respect for the little guy who's just trying to hustle and build.
I always enjoy playing at the boundaries in these thought experiments. If I run up a surprise $10k bill, how do we determine what I "really should owe" in some cosmic sense? Does it matter if I misconfigured something? What if my code was really bad, and I could have accomplished the same things with 10% of the spend?
Does it matter who the provider is, or should that not matter to the customer in terms of making things right? For example, do you get to demand payment on my $10k surprise bill because you are a small team selling me a PDF generation API, even if you would ask AWS to waive your own $10k mistake?
Then you’re the person who took down their small business when they were doing well.
At AWS I’d consistently have customers who’d architected horrendously who wanted us to cover their 7/8 figure “losses” when something worked entirely as advertised.
Small businesses often don’t know what they want, other than not being responsible for their mistakes.
Everyone who makes this argument always assumes that every website on the internet is a for-profit business when in reality the vast majority of websites are not trying to make any profit at all, they are not businesses. In those cases yes absolutely they want them to be brought down.
Or instead of an outage, simply have a bandwidth cap or request rate cap, same as in the good old days when we had a wire coming out of the back of the server with a fixed maximum bandwidth and predictable pricing.
There are plenty of options on the market with fixed bandwidth and predictable pricing. But for various reasons, these businesses prefer the highly scalable cloud services. They signed up for this
Every business has a bill they are unprepared to pay without evaluating and approving budget, even under successful conditions and even if that approval step is a 10 second process. It's obvious that Amazon does not add this because of substantial profit over any other concern.
Yes and no. 100% accurate billing is not available in realtime, so it's entirely possible that you have reached and exceeded your cap by the time it has been detected.
Having said that, within AWS there are the concepts of "budget" and "budget action" whereby you can modify an IAM role to deny costly actions. When I was doing AWS consulting, I had a customer who was concerned about Bedrock costs, and it was trivial to set this up with Terraform. The biggest PITA is that it takes like 48-72 hours for all the prerequisites to be available (cost data, cost allocation tags, and an actual budget each can take 24 hours)
The circuit breaker doesn’t need to be 100% accurate. The detection just needs to be quick enough that the excess operating cost incurred by the delay is negligible for Amazon. That shouldn’t really be rocket science.
The point is that by not implementing such configurable caps, they are not being customer friendly, and the argument that it couldn’t be made 100% accurate is just a very poor excuse.
Sure, not providing that customer-friendly feature bestows them higher profits, but that’s exactly the criticism.
I think most of the "horror stories" aren't related to cases like this. So we can at least agree most such stories could be easily avoided, before we looked at solutions to these more nuanced problems (one of which would be clearly communicating the mechanism of a limit and what would be the daily cost of maintaining the maxed storage - and for a free account the settings could be adjusted for these "costs" to be within free quota)
Interesting that you mention UDP, because I'm in the process of adding hard-limits to my service that handles UDP. It's not trivial, but it is possible and while I'm unsympathetic to folks casting shade on AWS for not having it, I decided a while back it was worth adding to my service. My market is experimenters and early stage projects though, which is different than AWS (most revenue from huge users) so I can see why they are more on the "buyer beware" side.
I mean, would you rather have a $10k build or have your server forcefully shut down after you hit $1k in three days?
One of those things is more important to different types of business. In some situations, any downtime at all is worth thousands per hour. In others, the service staying online is only worth hundreds of dollars a week.
So yes, the solution is as simple as giving the user hard spend caps that they can configure. I'd also set the default limits low for new accounts with a giant, obnoxious, flashing red popover that you cannot dismiss until you configure your limits.
However, this would generate less profit for Amazon et al. They have certainly run this calculation and decided they'd earn more money from careless businesses than they'd gain in goodwill. And we all know that goodwill has zero value to companies at FAANG scale. There's absolutely no chance that they haven't considered this. It's partially implemented and an incredibly obvious solution that everyone has been begging for since cloud computing became a thing. The only reason they haven't implemented this is purely greed and malice.
There are several satisfactory solutions available. Every other solution they offer was made with tradeoffs and ambiguous requirements they had to make a call on. It is obviously misaligned incentive rather than an impossibility. If they could make more money from it, they would be offering something. Product offering gaps are not merely technical impossibilities.
Not even remotely the same scale of problem. Like at all.
If your business suddenly starts generating Tbs of traffic (that is not a ddos), you'd be thrilled to pay overage fees because your business just took off.
You don't usually get $10k bandwidth fees because your misconfigured service consumes too much CPU.
And besides that, for most of these cases, a small business can host on-prem with zero bandwidth fees of any type, ever. If you can get by with a gigabit uplink, you have nothing to worry about. And if you're at the scale where AWS overages are a real problem, you almost certainly don't need more than you can get with a surplus server and a regular business grade fiber link.
This is very much not an all-or-nothing situation. There is a vast segment of industry that absolutely does not need anything more than a server in a closet wired to the internet connection your office already has. My last job paid $100/mo for an AWS instance to host a GitLab server for a team of 20. We could have gotten by with a junk laptop shoved in a corner and got the exact same performance and experience. It once borked itself after an update and railed the CPU for a week, which cost us a bunch of money. Would never have been an issue on-prem. Even if we got DDoSed or somehow stuck saturating the uplink, our added cost would be zero. Hell, the building was even solar powered, so we wouldn't have even paid for the extra 40W of power or the air conditioning.
Depends where you order your server. If you order from the same scammers that sell you "serverless" then sure. If you order from a more legitimate operator (such as literally any hosting company out there) you get unmetered bandwidth with at worst a nasty email and a request to lower your usage after hitting hundreds of TBs transferred.
The real serverless horror isn't the occasional mistake that leads to a single huge bill, it's the monthly creep. It's so easy to spin up a resource and leave it running. It's just a few bucks, right?
I worked for a small venture-funded "cloud-first" company and our AWS bill was a sawtooth waveform. Every month the bill would creep up by a thousand bucks or so, until it hit $20k at which point the COO would notice and then it would be all hands on deck until we got the bill under $10k or so. Rinse and repeat but over a few years I'm sure we wasted more money than many of the examples on serverlesshorrors.com, just a few $k at a time instead of one lump.
this is really the AWS business model - you can call it the "planet fitness" model if you prefer. Really easy to sign up and spend money, hard to conveniently stop paying the money.
Sounds like your organization isn’t learning from these periods of high bill. What lead to the bill creeping up, and what mechanisms could be put in place to prevent them in the first place?
At only 20k a month, the work put into reducing the bill back down probably costs more in man hours than the saving, time which would presumably be better spent building profitable features that more than make up for the incremental cloud cost. Assuming of course the low hanging fruit of things like oversized instances, unconstrained cloudwatch logs and unterminated volumes have all been taken care of.
> what mechanisms could be put in place to prevent them in the first place?
Those mechanisms would lead to a large reduction in their "engineering" staff and the loss of potential future bragging rights in how modern and "cloud-native" their infrastructure is, so nobody wants to implement them.
With that model, your cost doesn't change, though. When/if you find you need more resources, you can (if you haven't been doing so) audit existing applications to clear out cruft before you purchase more hardware.
The cost of going through that list often outweighs the cost of the hardware, by a lot.
And in a lot of cases it's hard to find out if a production application can be switched off. Since the cost is typically small for an unused application, I don't know if there are many people willing to risk being wrong
> I had cloudflare in front of my stuff. Hacker found an uncached object and hit it 100M+ times. I stopped that and then they found my origin bucket and hit that directly.
Pardon my ignorance, but isn’t that something that can happen to anyone? Uncached objects are not something as serious as leaving port 22 open with a weak password (or is it?). Also, aren’t S3 resources (like images) public so that anyone can hit them any times they want?
> I'm glad I use a Hetzner VPS. I pay about EUR 5 monthly, and never have to worry about unexpected bills.
The trade-off being that your site falls over with some amount of traffic. That's not a criticism, that may be what you want to happen – I'd rather my personal site on a £5 VPS fell over than charged me £££.
But that's not what many businesses will want, it would be very bad to lose traffic right at your peak. This was a driver for a migration to cloud hosting at my last company, we had a few instances of doing a marketing push and then having the site slow down because we couldn't scale up new machines quickly enough (1-12 month commitment depending on spec, 2 working day lead time). We could quantify the lost revenue and it was worth paying twice the price for cloud to have that quick scaling.
Don't they charge for every TB exceeding the included limit? (website says "For each additional TB, we charge € 1.19 in the EU and US, and € 8.81 in Singapore.")
You're missing the unit, it's $0.085 per GB, not TB, and that's only for NA/EU traffic. I rounded up a bit from that number because other regions cost more, plus you get billed a flat amount for each request as well.
They do offer progressively cheaper rates as you use more bandwidth each month, but that doesn't have much impact until you're already spending eye watering amounts of money.
Oh, yeah, egg on my face. They only put the unit of measurement at the top, and then talk about TB, so it's a bit deceptive. In retrospect, I was stupid to imagine 0.085/TB made any sense.
I would say its probably not a good idea to make a bucket directly publicly accessible, but people do not do that.
A lot of the point of serverless is convenience and less admin and things like adding a layer in front of the bucket that could authenticate, rate limit etc. is not convenient and requires more admin.
Because just using a cdn without proper caching headers is just another service you're paying for without any savings.
The real question is if they considered caching and thus configured it appropriately. If you don't, you're telling everyone you want every request to go to origin
And it's getting harder and harder to make them public because of people misconfiguring them and then going public against AWS when they discover the bill.
This story is giving "I leave OWASP top 10 vulns in my code because hacker mindset".
It's not that hard to configure access controls, they're probably cutting corners on other areas as well. I wouldn't trust anything this person is responsible for.
It's about rate limiting, not access controls. Without implementing limits your spend can go above what your budget is. Without cloud you hit natural rate limits of the hardware you are using to host.
You just shouldn't be using S3 to serve files directly. You can run most public and many private uses through CloudFront. Which gives you additional protections and reduces things like per object fetch costs.
> you hit natural rate limits
Seen by your customers or the public as a "denial of service." Which may actually be fine for the people who truly do want to limit their spending to less than $100/month.
No, s3 objects should always be private and then have a cloudfront proxy in front of them at the least. You should always have people hitting a cache for things like images.
I don't understand why it should be called "serverless" when using cloud infrastructure. Fundamentally you're still creating software following a client-server model, and expecting a server to run somewhere so that your users' clients work.
To me, "serverless" is when the end user downloads the software, and thereafter does not require an Internet connection to use it. Or at the very least, if the software uses an Internet connection, it's not to send data to a specific place, under the developer's control, for the purpose of making the software system function as advertised.
Serverless is easier to say than "load controlled ephemeral server management." Which is the real point. As my load increases the number of allocated resources, like servers, increases, and as it decreases so do the allocations and costs.
This is great if you are willing to completely change your client-server code to work efficiently in this environment. It is a strain over a standard design and you should only be using it when you truly need what "serverless" provides.
A "Server" is typically a single machine that has a specific OS and runs layers of various software that allows your business logic to be accessed by other computers (by your users). For a "Server" you typically have to choose an OS to run, install all the support software (server monitoring, etc), update the software, and if the server fails you have to fix it or rebuild it.
With "Serverless", your code is in a "function as a service" model where all you have to worry about is the business logic (your code). You don't have to set up the server, you don't have to install the server OS, or any basic server software that is needed to support the business logic code (http server, etc). You don't have to update the server or the underlying server software. You don't have to perform any maintenance to keep the server running smoothly. You never (typically) have to worry about your server going down. All you have to do is upload your business logic function "somewhere" and then your code runs when called. Essentially you do not have to deal with any of the hassle that comes with setting up and maintaining your own "server", all you have to do is write the code that is your business logic.
That's why it's called "Serverless" because you don't have to deal with any of the hassle that comes with running an actual "server".
I understand the underlying reasoning. I just don't like the terminology. Hence, "I don't understand... should be", rather than "... is". I think it's wrong that people end up using words like that. Like, almost on a moral level.
More generally, I don't like that a term ending with "-less" marks an increase in system complexity.
> Essentially you do not have to deal with any of the hassle that comes with setting up and maintaining your own "server", all you have to do is write the code that is your business logic.
Also known as "shared hosting". It's been done since the 90's (your folder full of PHP files is an NFS mount on multiple Apache servers), just that the techbros managed to rebrand it and make it trendy.
Think half an abstraction layer higher. You're on the right track with multiple PHP virtual runtimes on a single VM - that could conceptually be viewed as a sort of precursor to function runtimes.
The serverless function has higher-order features included as part of the package: you get an automatic runtime (just as with PHP but in this case it can be golang or dotnet), the function gets a unique endpoint URL, it can be triggered by events in other cloud services, you get execution logging (and basic alerting), multiple functions can be chained together (either with events or as a state machine), the function's compute can be automatically scaled up depending on the traffic, etc.
Think of it as: What do I have to do, in order to scale up the conpute of this URL? For hardware it's a call to DELL to order parts, for VMs or containers it's a matter of scaling up that runtime, or adding more instances - neither of those processes are simple to automate. One key characteristic of the function is that it will scale horizontally basically however much you want (not fully true, aws has a limit of 1500 instances/second iirc, but that's pretty massive), and it will do it automatically and without the request sources ever noticing.
Functions are also dirt cheap for low/burst traffic, and deployment is almost as easy as in the PHP FTP example. Personally I also think they are easier to test than traditional apps, due to their stateless nature and limited logical size (one endpoint). The main downsides are cost for sustained load, and latency for cold starts.
With that said, they are not "endgame". Just a tool - a great one for the right job.
Bit of a nit pick but this is a pet peeve of mine.
Creating a new word for a more specific category is never Orwellian. The project in 1984 was to create a language which was less expressive. They were destroying words describing fine distinctions and replacing them with words that elided those distinctions. Creating a new word to highlight a distinction is the opposite.
There's definitely criticisms to be made of the term serverless and how it obscures the role of servers, but Orwellian is not the correct category. Maybe we could say such services run on servelets to describe how they're "lighter" in some sense but still servers.
Yea, I agree after more thought. I think the key is what you said; the term is useful for dividing within a specific domain. People outside that domain see the word and think "those guys are calling this Category-A thing "not-category-A", that makes no sense! Inside the Category A world, there is much more nuance.
"Serverless" refers to the demarcation point in the shared responsibility model. It means there aren't any servers about as much as "cloud hosting" means the data centers are flying.
This is where is becomes confusing to me: Here are a few types of software/infrastructure. Embedded devices. Operating systems. PC software. Mobile device software. Web frontends. GPU kernels. These all truly don't use servers. When I hear "serverless", I would think it is something like that. Yet, they're talking about web servers. So it feels like a deception, or something poorly named.
If you are in the niche of IT, servers, HTTP operations etc, I can see why the name would make sense, because in that domain, you are always working with servers, so the name describes an abstraction where their technical details are hidden.
Putting any sort of pay per use product onto the open internet has always struck me as insane. Especially with scaling enabled.
At least stick a rate limited product in front of it to control the bleed. (And check whether the rate limit product is in itself pay per use...GCP looking at you)
I tried AWS serverless, figured out that it is impossible to test anything locally while you are forced to use AWS IAM role for serverless run which has access to everything.
That's just a problem waiting to happen while you are always running tests on production...
I worked on a serverless project for several years and the lack of ability to run much of anything locally was a huge cost. Debugging cycle times were absolutely terrible. There are some tools that claim to address this but as of a few years ago they were all useless for a real project.
I use my AWS security key to run local tests. It works perfectly fine. You just need a ~/.aws/credentials file appropriately configured.
I have a makefile system which controls lambda deployments. One step of the deployment is to gather the security requirements and to build a custom IAM role for each individual lambda. Then I can just write my security requirements in a JSON file and they're automatically set and managed for me.
The real joy of AWS is that everything works through the same API system. So it's easy to programmatically create things like IAM roles like this.
Not really. Sure, the cost would usually be peanuts... until you have an infinite loop that recursively calls more lambdas. Then you have a huge bill (but hey that pays for your invites to their conferences, so maybe it's a blessing in disguise?). And yes, you will pretty much always get it refunded, but it's still a hassle and something that is absolutely not necessary.
Snark aside, having an opaque dev environment always constrained by bandwidth and latency that can’t be trivially backed up/duplicated is a terrible idea and why I always recommend against “serverless”, even besides the cost concerns.
Serverless is OK for small, fully self contained pieces or code that are fire and forget. But for anything complex that’s likely to require maintenance, no thanks.
Eh, I worked on a large serverless project that worked hard to follow best practices but it was still very clunky to run and test code locally. The local serverless tools simply didn't work for our project and they had so many limitations I'm skeptical they work for most non-prottypes.
Deploying a stack to your own developer environment works fine and is well worth doing, but the turnaround time is still painful compared to running a normal web framework project locally. Deploying a stack takes much much longer than restarting a local server.
Serverless isn't all bad, it has some nice advantages for scaling a project, but running and debugging a project locally is a definite weak spot.
This is some good marketing for Coolify, which the author makes as an open source platform as a service. I prefer Dokploy these days though, since it seems to be less buggy, as Coolify seems to have such bugs due to being on PHP.
It would help to round to the cent. With 3 digits to the right of the dot it's ambiguous whether it's a decimal point or a thousands separator, and the font and underline makes the comma vs dot distinction a bit unclear.
These guys charge $550 for a measly terabyte of bandwidth?
If you get a dedi on a 10Gb/s guaranteed port and it works out to more than $3 / TB, you're probably getting scammed. How does "serverless" justify 150x that? Are people hosting some silly projects really dense enough to fall for that kind of pricing?
Just get a $10 VPS somewhere or throw stuff on GH pages. Your video game wiki/technical documentation/blog will be fine on there and - with some competent setup - still be ready for 10k concurrent users you'll never have.
This is what scares me, is social media the only way to get things sorted out nowadays? What if I don't have a large following nor an account in the first place, do I have to stomach the bill?
This is exactly what happened to me during Covid... I had a flight that got cancelled at the beginning of the pandemic since the country closed the orders (essentially). A year after, still on lock downs and et al, I wanted to enquire about a refund, for months I got not answer, until I caught wind that people using Twitter were actually getting results. Now, I don’t use social media at all, so I had to create a Twitter account, twit about my case et voila! 30 mins after I got a response and they send me a PM with a case number... Not even going to mention the airline, but it is infuriating...
Someone at a community group I'm in messed up playing with Azure through their free for non-profits offering^. We were out about 1.2k€. Not huge but huge for us.
Encouraged by comments on HN over the years I had them ask support to kindly to wave it. After repeating the request a few times they eventually reduced their bill to <100€ but refused to wave it entirely.
So even without shaming on social media, But it probably depends. It's worth at least asking.
Once you're in a contract + TAM territory, pricing works very differently. Also, temporary experiments and usage overruns become an interesting experience where the company may just forget to bill you a few thousands $ just because nobody looked at the setup recently. Very different situation to a retail user getting unexpected extra usage.
I remember at the beginning of the serverless hype how they said it was great because it automatically scaled as big as you need it. Given how sudden and massive these "scaling spikes" can be, I would much rather deal with a death-hugged VPS than a $100k bill.
Plus the VPS is just so much faster in most cases.
I once found an official Microsoft example repo to deploy an LLM gateway on Azure with ALB. Glad I did the tedious work of estimating the costs before I hit the deploy button (had to go though many Biceps manifests for that). The setup would have cost me about 10k/month.
This is why when I contract for an early stage startup, I pose the question:
"What if your app went viral and you woke to a $20k cloud bill? $50k? $80k?"
If the answer is anything less than "Hell yeah, we'll throw it on a credit card and hit up investors with a growth chart" then I suggest a basic vps setup with a fixed cost that simply stops responding instead.
There is such a thing as getting killed by success and while it's possible to negotiate with AWS or Google to reduce a surprise bill, there's no guarantee and it's a lot to throw on a startup's already overwhelming plate.
The cloud made scaling easier in ways, but a simple vps is so wildly overpowered compared to 15 years ago, a lot of startups can go far with a handful of digitalocean droplets.
Yeah I also left my website hosted on Google Cloud because costs popped from everywhere, and there is basically no built-in functionality to limit them. So I didn't really slept relaxed (I actually slept great, but I hope you get the point) knowing that a bug could cost me... who knows how much.
Actually, as the website of OP says, for spending control you have budget notifications and with that you can disable the billing for all the project altogether through some API call or something, I don't remember exactly, that is all there is. But still it looks like this functionality is just not there.
You can write Google cloud functions to disable your credit card when certain thresholds are met pretty easily, but it's unethical that this isn't just a toggle somewhere in settings.
Does that actually stop the spend immediately? If not, you're still on the hook for the bill. I suppose you can walk away and let them try to come after you, but that wouldn't work for a company.
Don’t most of these services have config options to protect against doing this? I haven’t used most of these services but it running up a bill during traffic spikes but not going down seems like it’s working as intended?
Nope, basically none of these services have a way to set a hard budget. They let you configure budget warnings, but it’s generally up to you to login and actually shut down everything to prevent from being billed for overages (or you have to build your own automation - but the billing alerts may not be reliable)
I know AWS in particular does not because they do not increment the bill for every request. I don't know exactly how they calculate billing, but based on what I do know about it, I imagine it as a MapReduce job that runs on Lambda logs every so often to calculate what to bill each user for the preceding time interval.
That billing strategy makes it impossible to prevent cost overruns because by the time the system knows your account exceeded the budget you set, the system has already given out $20k worth of gigabyte-seconds of RAM to serve requests.
I think most other serverless providers work the same way. In practice, you would prevent such high traffic spikes with rate limiting in your AWS API Gateway or equivalent to limit the amount of cost you could accumulate in the time it takes you to receive a notification and decide on a course of action.
I was also too careless with AWS when I was a beginner with no deployment experience and I am very lucky that I did not push a wrong button.
All these stories of bill forgiveness reminds me of survivorship bias. Does this happens to everyone that reaches out to support or just the ones that get enough traction on social media? I am pretty sure there is no official policy from AWS, GCP or Azure.
This site is a bit dated. I remember in response to this Vercel added a way to pause your projects when hitting a spend limit. I enabled it for my account.
Still, it made me question why I'm not using a VPS.
Vercel used to be called Zeit. They had a server product called Now that gave you 10 1CPU/1GPU instances for $10/month (or $20 I forgot). It was the best deal.
When Vercel switched everything to serverless, it all became pretty terrible. You need 3rd party services for simple things like DB connection pooling, websockets, cron jobs, simple queue, etc because those things aren’t compatible with serverless. Not to mention cold starts. Just a few weeks ago, I tried to build an API on Next.js+Vercel and get random timeouts due to cold start issues.
Vercel made it easier to build and deploy static websites. But really, why are you using Next.js for static websites? Wordpress works fine. Anything works fine. Serverless makes it drastically harder to build a full app with a back end.
We are building bare metal for our workloads… I don’t care if cloud is supposed to be cheaper because it never is. You can get a decent small business firewall to handle 10gbit fiber for $600 from unifi these days. Just another reason I’m glad I moved out of the Bay Area and nyc to a midwestern town for my company. I have a basement and can do rad things in my house to grow my business.
Seem likes there are mistakes that were made on behalf of the users. The attackers found these mistakes and took advantage of them. i don't think "severless" is the problem.
Serverless is the problem in that most serverless services don't let you hard-cap spend.
This issue is serverless-specific. If I pay $20/month on VPN the most frightening thing that can happen is the client calling you about your website being down, not a $100k bill.
If we're building anything bigger than a random script that does a small unit of work, never go for serverless. A company I recently worked for went with Serverless claiming that it would be less maintenance and overhead.
It absolutely was the worst thing I've ever seen at work. Our application state belonged at different places, we had to deal with many workarounds for simple things like error monitoring, logging, caching etc. Since there was no specific instance running our production code there was no visibility into our actual app configuration in production as well. Small and trivial things that you do in a minute in a platform like Ruby on Rails or Django would take hours if not days to achieve within this so-called blistering serverless setup.
On top of it, we had to go with DB providers like NeonDb and suffer from a massive latency. Add cold starts on top of this and the entire thing was a massive shitshow.
Our idiot of a PM kept insisting that we keep serverless despite having all these problems. It was so painful and stupid overall.
Looks like you need the "quiet part" said out loud:
Chances are, the company was fishing for (or at least wouldn't mind) VC investment, which requires things being built a certain (complex and expensive) way like the top "startups" that recently got lots of VC funding.
Chances are, the company wanted an invite to a cloud provider's conference so they could brag about their (self-inflicted) problems and attract visibility (potentially translates to investment - see previous point).
Chances are, a lot of their engineering staff wanted certain resume points to potentially be able to work at such startups in the future.
Chances are, the company wanted some stories about how they're modern and "cloud-native" and how they're solving complex (self-inflicted) problems so they can post it on their engineering blog to attract talent (see previous point).
I keep telling customers: "The cloud will scale to the size of your wallet."
They don't understand what I mean by that. That's okay, they'll learn!
Anyway, this kind of thing comes up regularly on Hacker News, so let's just short-circuit some of the conversations:
"You can set a budget!" -- that's just a warning.
"You should watch the billing data more closely!" -- it is delayed up to 48 hours or even longer on most cloud services. It is especially slow on the ones that tend to be hit the hardest during a DDoS, like CDN services.
"You can set up a lambda/function/trigger to stop your services" -- sure, for each individual service, separately, because the "stop" APIs are different, if they exist at all. Did I mention the 48 hour delay?
"You can get a refund!" -- sometimes, with no hard and fast rules about when this applies except for out of the goodness of some anonymous support person's heart.
"Lots of business services can have unlimited bills" -- not like this where buying what you thought was "an icecream cone" can turn into a firehouse of gelato costing $1,000 per minute because your kid cried and said he wanted more.
"It would be impossible for <cloud company> to put guardrails like that on their services!" -- they do exactly that, but only when it's their money at risk. When they could have unlimited expenses with no upside, then suddenly, magically, they find a way. E.g.: See the Azure Visual Studio Subscriber accounts, which have actual hard limits.
"Why would you want your cloud provider to stop your business? What if you suddenly go viral! That's the last thing you'd want!" -- who said anything about a business? What if it's just training? What if your website is just marketing with a no "profit per view" in any direct sense?
An alternative title might be "Failure to read the documentation horrors."
If you didn't sit down with the documentation, the pricing guide, and a calculator before you decided to build something then you share a significant portion of the fault.
This is a weird take on an incredibly useful paradigm (serverless). One the one side, there are obviously precautions that all of these users could have taken to avoid these charges on the other hand its totally common to spin up a thing and forget about it or not do your due diligence. I totally feel for the people who have been hit with these chargers.
At the end of the day though the whole think feels like a carpenter shooting themselves in the foot with a nail gun then insisting that hammers are the only way to do things.
At one time, I considered using Firebase as a backend, but then, I kept reading stories like these, and decided to roll my own. I'm fortunate to be able to do that.
It's kind of amazing, though. I keep getting pressure from the non-techs in my organization to "Migrate to the Cloud." When I ask "Why?" -crickets.
Industry jargon has a lot of power. Seems to suck the juice right out of people's brains (and the money right out of their wallets).
That's why I like VPS setups. You hit the monthly maximum, and it just stops working.
I host demos, not running a business, so it's less of an issue to get interrupted. Better an interruption than a $50,000 bill for forgetting to shut off a test database from last Wednesday.
Unless a startup has five+ nines service contracts with their customers already, a little bit of downtime once in a while is not the end of the world the cloud services want us to believe.
That's not comparable. With a VPS there is no monthly maximum, just a max load on a second by second basis. You can be hit with traffic of which 90% bounces because your server is down, get nowhere near your intended monthly maximum, and then the rest of the month is quiet.
Not _really_. AWS has a budget tool, but it doesn’t natively support shutting down services. Of course, you can ingest the alerts it sends any way you want, including feeding them into pipelines that disable services. There’s plenty of blueprints you can copy for this. More seriously - and this is a legitimate technical limitation - of course AWS doesn’t check each S3 request or Lambda invocation against your budget, instead, it consolidates periodically via background reporting processes. That means there’s some lag, and you are responsible for any costs incurred that go over budget between such reporting runs.
> of course AWS doesn’t check each S3 request or Lambda invocation against your budget
If it can bill them per-invocation, why can't it also check against a budget? I don't expect it to be synchronous, but a lag of minutes to respond is still better than nothing. Can you even opt-in to shutting down services from the budget tool, or is that still something you have to script by hand from Cloudwatch alarms?
I think figuring out how to do this faster is less trivial than it might sound. I agree that synchronous checks aren’t reasonable. But let’s take Lambdas. They can run for 15 minutes, and if you consolidate within five minutes after a resource has been billed, that gives you a twenty minute lag.
I’m not trying to make apologies for Amazon, mind you. Just saying that this isn’t exactly easy at scale, either. Sure, they bill by invocation, but that’s far from synchronous, too. In fact, getting alerts might very well be happening at the frequency of billing reconciliation, which might be an entirely reasonable thing to do. You could then argue that that process should happen more frequently, at Amazon’s cost.
> but it doesn’t natively support shutting down services [...] of course AWS doesn’t check each S3 request or Lambda invocation against your budget, instead, it consolidates periodically via background reporting processes
So, in other words, the vendor has provided substandard tooling with the explicit intent of forcing you to spend more money.
Just set alerts that are not really timely and homeroll your own kill scripts its easy. It doesn't really work but its not really any harder than just fucking self hosting.
Maintaining your own containers or VMs is hard considering how much risk appetite you have for the issues at infra level. So, yeah, when you complain about the costs of serverless, you are just paying for your low risk appetite low cost of your IT management.
I read a lot of the posts at the little blog here and, uh, every single one sounds like a complete amateur making a cloud configuration mistake. I haven't found one that is the provider's fault or the fault of "serverless"
I would be embarrassed to put my name on these posts admitting I can't handle my configs while blaming everyone but myself.
Serverless isn't a horror, serverlesshorrors poster. You are the horror. You suck at architecting efficient & secure systems using this technology, you suck at handling cloud spend, and you suck at taking responsibility when your "bug" causes a 10,000x discrepancy between your expected cost and your actual bill.
Just because you don't understand it doesn't mean it sucks
You're not wrong about cloud configuration mistakes, but a tool that lets you increase costs 10000x (without even letting you set a safety) is a hell of a chainsaw.
I'm more worried about the overconfident SRE that doesn't stay up at night worrying about these.
Consider this analogy: Instead of using a root command shell, it is wise to use an account with appropriately restricted capabilities, to limit the downsides of mistakes. Cloud services support the notion of access control, but not the notion of network resource usage limits. It's an architectural flaw.
Or do you always log in as root, like a real man, relying purely on your experience and competence to avoid fat-finger mistakes?
That being said, the cloud providers could do a better job explaining to new/naive users that great power comes with great responsibility and there is no hand holding. Someone might be more hesitant to willy nilly spin up something if a wizard estimates that the maximum cost could be $X per month.
I can't imagine hosting a small-time project on rented infrastructure without some kind of mechanism to terminate it once costs exceed a reasonable threshold.
I've had this twice. Once with oracle, once with azure. They both charged me $2000-$5000 for simply opening and closing a database instance (used only for a single day to test a friend's open source project)
To be fair, support was excellent both times and they waived the bills after I explained the situation.
There should also be a general category for "cloud horrors" for things that cost $50k/month to host that would be $1500/month on a bare metal provider like Datapacket or Hetzner.
I'm old enough to remember when cloud was pitched as a big cost saving move. I knew it was bullshit then. Told you so.
I believe any such policy would need its premiums based on the services used (and likely the qualifications of the staff) since, unlike rebuilding a house, the financial risk is almost unlimited with out of control cloud spend
It reminds me of the Citi(?) employee who typed the wrong decimal place in a trade: computers make everything so easy!
Many of the stories on the site are from people who have billing alerts.
If you have bot spam, how do you actually think their billing alerts work? The alert is updated every 100ms and shuts off your server immediately? That isn't how billing alerts can or should work.
Yes, actually, if continuing to run the service is going to exceed my available budget then I do want the service turned off! If I can't pay for it, and I know I can't pay for it, what other possible choice do I have?
Do any of you people have budgets, or do you all rely on the unending flow of VC money?
That isn't how this can work. If you are running a service and then find out that AWS is spamming you every 100ms to find out what your CPU is doing (or calling out every 100ms) then people would be quite unhappy.
The majority of these massive bills are due to traffic, there is pretty much no way that AWS could stop your server in time...if they had the choice, which they don't.
I think my original point was unclear: I am pointing out that if you just think about how this stuff can possibly work, billing alerts can not work in the way you expect. The alert is updated async, the horse has bolted and you are trying to shut the gate.
I don't use AWS for personal stuff because I know their billing alerts won't stop me spending a lot. Don't use them if that is a concern.
I do use AWS at work, we are a relatively big customer and it is still very expensive for what it is. The actual hardware is wildly overpriced, their services aren't particularly scalable (for us), and you are basically paying all that overage for network...which isn't completely faultless either. Imo, using them in a personal capacity is a poor idea.
When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity. No problem that makes sense - it's an easy way to prove authentication as a bot can't spam a credit card (or else it would be financial fraud and most likely a felony).
Amazon then charged me one hundred thousand dollars as the server was hit by bot spam. I had them refund the bill (as in how am I going to pay it?) but to this day I've hated Amazon with a passion and if I ever had to use cloud computing I'd use anyone else for that very reason. The entire service with it's horrifically complicated click through dashboard (but you can get a certification! It's so complicated they invented a fake degree for it!) just to confuse the customer into losing money.
I still blame them for missing an opportunity to be good corporate citizens and fight bot spam by using credit cards as auth. But if I go to the grocery store I can use a credit card to swipe, insert, chip or palm read (this is now in fact a thing) to buy a cookie. As opposed to using financial technology for anything useful.
This is an example of why cloud hosting is so scary.
Yes, Amazon, and I assume Azure and Google's cloud and others, "usually" refund the money.
But I don't want to be forced into bankruptcy because my five visitor a week demo project suddenly becomes the target of a DDOS for no reason at all and the hosting company decides this isn't a "usually" so please send the wire transfer.
They refund those that know how to demand it, and that notice. If you have complex infra and not a lot of observability, you'll just assume the costs are legitimate. Imagine how much they're making off of those oops moments. Probably a bug chunk of their revenue reports.
When I am playing around in the cloud I am super paranoid about charges, so I end up locking the ACLs to only permit traffic to my home IP. It’s too bad that they don’t have a better built in way of making sandbox labs. When I was doing cloud training with A Cloud Guru, it would generate a whole global AWS instance that would only last for 30 minutes.
Why don't you run locally?
run entire aws infra locally while studying for aws certification?
Let’s rephrase the question then, why makes an application dependent on AWS?
In general that would be a good question, but you've asked it in a case where "use AWS" is the _only_ way to accomplish the goal... which is learning AWS.
AWS skills are in quite strong demand, so it totally pays off to know the platform and have some hands-on experience if you work in the related area.
Aws, able to bill everything down to like milliseconds of usage...
We can't implement a basic cost limiter policy.
I think we all know why.
> I think we all know why.
There's no need to imply that, it's not illegal to criticise AWS. They do not want anybody to be able to set a limit on spend as that would probably hurt the business model.
It's extra frustrating I think on the Azure side because they absolutely have cost limited accounts for MSDN subscribers but won't extend that functionality to general users. Just let me set a cap on the cost I'm willing to pay per month and let me deal with the consequences of the resource being shut down unexpectedly. You can work around these things if you instrument the right metrics and create the right alerts so you can take action in time. But those are often hard learned lessons and not the happy path to using the cloud.
It's entirely possible to build cloud first solutions that scale better and are cheaper than your standard reliable colo solutions. But you've got to understand the tradeoffs and when to limit scaling otherwise things can run away from you. I still reach for "cloud first" tools when building my own projects because I know how to run them extremely cheaply without the risk of expenses blowing up because some random thing I've built lands on HN or the equivalent. Many hobby projects or even small businesses can leverage free tiers of cloud services almost indefinitely. But you've got to architect your solutions differently to leverage the advantages and avoid the weaknesses of the cloud. Actually understand the strengths and limitations of the various cloud "functions as a service" offerings and understand where your needs could be solved from those tools and how to work within those cost constraints. Repeatedly I see people trying to use the cloud as if it's just another colo or datacenter and build things in the same way they did before and only think about things in terms of virtual machines tend to have a more difficult time adopting the cloud and they end up spending far more than the companies who can tear down and spin up entire environments through IaC and leverage incremental pricing to your benefit.
They have billing limits
https://docs.aws.amazon.com/cost-management/latest/userguide...
These aren’t limits though, they are just budget notifications.
What would be helpful, would be if when you set up your account there was a default limit – as in an actual limit, where all projects stop working once you go over it - of some sane amount like $5 or $50 or even $500.
I have a handful of toy projects on AWS and Google cloud. On both I have budgets set up at $1 and $10, with notifications at 10% 50% and 90%. It’s great, but it’s not a limit. I can still get screwed if somehow, my projects become targets, and I don’t see the emails immediately or aren’t able to act on them immediately.
It blows my mind there’s no way I can just say, “there’s no conceivable outcome where I would want to spend more than $10 or more than $100 or whatever so please just cut me off as soon as I get anywhere close to that.”
The only conclusion I can come to is that these services are simply not made for small experimental projects, yet I also don’t know any other way to learn the services except by setting up toy projects, and thus exposing yourself to ruinous liability.
I’ve accidentally hit myself with a bigger than expected AWS bill (just $500 but as a student I didn’t really want to spend that much). So I get being annoyed with the pricing model.
But, I don’t think the idea of just stopping charging works. For example, I had some of their machine image thingies (AMI) on my account. They charged me less than a dollar a month, totally reasonable. The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images (as well as shutting down my $500 nodes). This would have been really annoying, I mean putting the images together took a couple hours.
And that’s just for me. With accounts that have multiple users—do you really delete all the disk images on a business’s account, because one of their employees used compute to hit their spend limit? No, I think cloud billing is just inherently complicated.
> The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images
I disagree; a reasonable but customer-friendly interpretation would be to move these into a read-only "recycle bin" storage for e.g. a month, and only afterwards delete them if you don't provide additional budget.
Which they already do for account closure, supposedly, I have never tested it.
https://docs.aws.amazon.com/accounts/latest/reference/manage...
There is no reason that cloud providers shouldn't be able to set up the same kind of billing options that advertisers have had access to for years. In Google and Meta ads I can set up multiple campaigns and give each campaign a budget. When that budget gets hit, those ads stop showing. Why would it be unreasonable to expect the same from AWS?
Because usually marketing works next to administration and legal, while dev, devops or infra is 2-3 layers of management below.
Cloud providers charge for holding data, for ingress/egress, and for compute (among other things). If I hit my budget by using too much compute, then keeping my data will cause the budget to be exceeded.
The difference is that cloud providers charge you for the “at rest” configuration, doing nothing isn’t free.
Great so they can give you an option to kill all charges except basic storage. Or let you reserve part of your budget for storage. Or let you choose to have everything hard deleted.
Surely these billion and trillion dollar companies can figure out something so basic.
How many small charges get written off though? If you make a $20 mistake, maybe you let it go and just pay.
Is that worth the support to refund the 10k and 100k charges? Maybe it is.
> But, I don’t think the idea of just stopping charging works.
You don't stop CHARGING. You stop providing the service that is accumulating charges in excess of what limit I set. And you give some short period of time to settle the bill, modify the service, etc. You can keep charging me, but provide a way to stop the unlimited accrual of charges beyond limits I want to set.
> No, I think cloud billing is just inherently complicated.
You're making it more complicated than it needs to be.
> The only reasonable interpretation of “emergency stop on all charges completely” would be to delete those images.
It's by far certainly not the 'only reasonable interpretation'.
"Stop all charges" is a red herring. No one is asking for a stop on charges. They want an option to stop/limit/cap the stuff that causes the charges.
So, are you looking for some “rate of charges” cap? Like, allow the charges to accumulate indefinitely, but keep track of how much $/sec is being accumulated, and don’t start up new services if it would cause the rate of charges to pass that threshold?
Might work. I do think that part of the appeal of these types of services is that you might briefly want to have a very high $/sec. But the idea makes sense, at least.
A theme of many of the horror stories is something like "I set up something personal, costing a few dollars a month, and I was DDOSed or (in earlier terms) slashdotted out of the blue, and I now have a bill for $17k accumulated over 4 hours".
As someone else pointed out, some(?) services prevent unlimited autoscaling, but even without unlimited, you may still hit a much larger limit.
Being able to say 'if my bill goes above $400, shut off all compute resources' or something like that. Account is still on, and you have X days (3? 1? 14?) to re-enable services, pay the bill, or proceed as you wish.
Yes, you might still want some period of high $/sec, but nearly every horror story in this vein ends with an issue with the final bill. Whether I burn $300 in 5 minutes or 26 days, I want some assurance that the services that are contributing most to that - likely/often EC2 or lambda in the AWS world - will be paused to stop the bleeding.
If you could pipe "billing notification" SNS message to something that could simply shut off public network access to certain resources, perhaps that would suffice. I imagine there's enough internal plumbing there to facilitate that, but even then, that's just AWS - how other cloud providers might handle that would be different. Having it be a core feature would be useful.
I was on a team that had our github CI pipeline routinely shutdown multiple times over a few weeks because some rogue processes were eating up a lot of minutes. We may have typically used $50/$100 per month - suddenly it was $100 in a day. Then... $200. Github just stopped the ability to run, because the credits used were over the limits. They probably could run their business where they would have just moved to charging us hundreds per day, perhaps with an email to an admin, and then set the invoice at $4500 for the month. But they shut down functionality a bit after the credits were exhausted.
You can do that today. Billing alerts can trigger workflows.
Sounds like this should be a standard workflow that's a very simple and visible option.
I don't understand how this is hard to grasp.
Compute and API access to storage is usually the thing that bites people with cloud costs.
I want an option that says if I go over $20 on my lambda costs for lambda X, shut it off. If I go over $10 on s3 reads, shut it off.
The disconnect comes from the difference between 'shut it off' and 'clear the account'. If I read an earlier poster correctly, the claim is "the only reasonable interpretation is to immediately delete the contents of the entire account". But to you point, yes, this seems like it would be pretty easy to grasp. Stop incoming access, don't delete the entire account 5 seconds after I go 3 cents over a threshold.
I missed a water bill payment years ago. They shut off the water. They didn't also come in and rip out all my plumbing and take every drop of water from the house.
It's fairly straightforward for compute, as you allude to; it's not straightforward for storage, as GP describes.
"They want an option to stop/limit/cap the stuff that causes the charges."
Most (aws) services support limits which prevents unlimited autoscaling (and thus unlimited billing)
Yeah I get it. It just irks that it's something I'd like to spend more time with and learn, but at every corner I feel like I'm exposing myself. For what I have done w/AWS & GCP so far with personal accounts, complete deletion of all resources & images would be annoying to be sure, but still preferable to unlimited liability. Ofc most companies using it won't be in that boat so IDK.
> But, I don’t think the idea of just stopping charging works.
I'm sorry but this is complete bullshit. they can set a default limit of 1 trillion dollars and give us the option to drop it to $5. there's a good reason they won't do it, but it's not this bullshit claim that's always bandied about.
How would you resolve the situation where ongoing storage costs cause the limit (whatever it is) to be exceeded?
Like every other service. They warn, freeze access and give you a period of time to pay.
What happens to your files when you refuse to pay the bill?
I wouldn't. I've explained what I want.
There isn’t an option to not resolve “you’ve reached your billing limit and now storage charges are exceeding it.” You can resolve it by unceremoniously dumping the user data. You can resolve it by… continuing to charge the user, and holding their files hostage until they pay the back storage charges, and then the egress fees (so, it isn’t really a limit at all). Or you can resolve it by just giving the user free storage by some other name.
Just saying that there should be a limit is not an explanation.
I said you could put the limit at 1 trillion dollars if that's your concern. there's no limit for you!
(for my hobby projects, I'm happy for the limit to be $5 and delete everything when the limit is reached. that's what my backups are for.)
I hate how every time this issue mentioned everyone's response is that it would hurt the companies. Literally just make it an option. It's not that difficult for some of the smartest engineers in the world to implement it.
I feel that the likely answer here is that instrumenting real-time spending limit monitoring and cut-off at GCP/AWS scale is Complicated/Expensive to do, so they choose to not do it.
I suppose you could bake the limits into each service at deploy time, but that's still a lot of code to write to provide a good experience to a customer who is trying to not pay you money.
Not saying this is a good thing, but this feels about right to me.
I don't care if it is expensive for them. I'm not running their business, I'm their customer - it is inconvenient for me.
And frankly any pay-as-you-go scheme should be regulated to have maximum spending limit setting. Not only in IT.
Its not expensive for them, its expensive for their customers. If you went over your spending limit and they deleted all your shit, people would be absolutely apoplectic. Instead they make you file a relatively painless ticket and explain why you accidentally went over what you wanted to spend. This is an engineering trade-off they made to make things less painful for their customers.
There is a huge difference between deleting data and stopping running services.
You're right in that there's a few services that expose this complexity directly, the ones where you're paying for actual storage, but this is just complex, not impossible.
For one thing, storage costs are almost always static for the period, they don't scale to infinite in the same way.
If it’s a web server, sure. But if you drop data because you’re no longer processing it, or you need to do an expensive backfill on an ETL, then turning off compute is effectively the same as deleting data
If I ask Amazon to turn my lambdas off if I breach $500, and they turn my lambdas off when I breach $500, I won't be mad at them, I promise.
Why would I apoplectic at Amazon if I set “turn my shit off after it has accrued $10 in charges” to TRUE and they actually followed what I asked them to do?
Is it a serious question? Because then I could have you shutdown just by posting a call to ddos with a link to your search form on an anime image board.
OK? Good! That's what I want to happen! I want that. I do not care if some weirdos on an anime image board can't access some image. I don't want my credit card maxed out.
Is that not a serious request? I play around in the same big-boy cloud as some SaaS company, but I'm on the free tier and I explicitly do not want it to scale up forever, and I explicitly do not want to destroy my credit or even think about having to call Amazon over a $100,000 bill because I set my shit up wrong or whatever. I want it to shut off my EC2 instance once it has used up whatever amount of resources is equal to $X.
Obviously any world with this feature would also feature customizable restrictions, options, decision trees, etc etc. I don't think anyone is or was suggesting that someone's SaaS app just gets turned off without their permission.
You can freeze the account vs you can give someone a hundred thousand dollar bill?
If you did this to a vps the instance might be unavailable until the traffic slows down but up after.
Trust me you would rather they freeze your account.
They could add it as an optional limit. If it's on and is exceeded, stop everything. Surely the geniuses at Amazon (no they really are, I'm not joking) can handle it.
What about the space you're using? Do they delete it? Remove all your configurations? Prevent you from doing anything with your account until you up your limit or wait until your month resets?
If you're worried about getting a big bill, and you don't care if it gets shut off when you're not using it, why don't you shut it down yourself?
AWS made the tradeoff to keep the lights on for customers and if there is a huge bill run up unintentionally and you contact them with it they refund it. I've never experienced them not doing this when I've run up five figure bills because of a misconfiguration I didn't understand. I don't think I've ever even heard of them not refunding someone who asked them for a refund in good faith.
What do they do when you don't pay your bill.. they freeze, notify and delete after period of time.
If you try adding files that will result in a larger bill than your limits over the billing period you warn and refuse.
So simple.
How many times has AWS refunded you a five figure bill? I've heard stories from people who got refunded but were told that it would be the first and last time they would get a refund.
I agree that that’s the likely explanation. It just feels infuriating that the services are sold as easy to get started and risk free with generous free tiers, inviting people and companies to try out small projects, yet each small experiment contains an element of unlimited risk with no mitigation tools.
Pass a law requiring cloud compute providers to accept a maximum user budget and be unable to charge more than that, and see how quickly the big cloud providers figure it out.
So tell me again who we need a law? Can you cite one instance where any of the cloud providers refused to give a refund to someone?
The person who signs up for the free tier and is charged.
https://medium.com/%40akshay.kannan.email/amazon-is-refusing...
There is no such thing as “signing up for a free tier” at least there wasn’t before July of this year. Some services have free tiers for a certain amount of time and others have an unlimited free tier that resets every month.
Google App Engine had this when it first started.
You can attach an action to that budget overage that applies a "Deny" to an IAM and limits costly actions (that's for small accounts not in an Org. Accounts with an Org attached also have the option of applying an SCP which can be more restrictive than an IAM "Deny")
This isn't a great answer to the overall issue (which I agree is a ridiculous dark pattern), but I've used Privacy.com cards for personal projects to hard spend at a card level so it just declines if it passes some threshold on a daily/weekly/monthly/lifetime basis. At work, I do the same thing with corporate cards to ensure the same controls are in place.
Now, as to why they're applying the dark pattern - cynically, I wonder if that's the dark side of usage/volume based pricing. Once revenue gets big enough, any hit to usage (even if it's usage that would be terminated if the user could figure out how) ends up being a metric that is optimized against at a corporate level.
> The only conclusion I can come to is that these services are simply not made for small experimental projects, yet I also don’t know any other way to learn the services except by setting up toy projects
Yeah, I'm sure this is it. There is no way that feature is worth the investment when it only helps them sell to... broke individuals? (no offense. Most individuals are broke compared to AWS's target customer).
Those are not in fact limits:
> There can be a delay between when you incur a charge and when you receive a notification from AWS Budgets for the charge. This is due to a delay between when an AWS resource is used and when that resource usage is billed. You might incur additional costs or usage that exceed your budget notification threshold before AWS Budgets can notify you, and your actual costs or usage may continue to increase or decrease after you receive the notification.
As far as I know, neither Google, Amazon or Azure have a budget limit, only alerts.
This is a reason why I am not only clueless of anything related to cloud infrastructure unless it's stuff I am doing on the job, nor I am willing to build anything on these stacks.
And while I guess I have less than 10 products build with these techs, I am appeal by the overall reliability of the services.
Oh lastly, for Azure, in different European regions you can't instance resources, you need to go through your account representative who asks authorization from the US. So much for now having to deal with infrastructure pain. It's just a joke.
I've used Azure with spending limits. They do work, they shut down things, and the lights go off. [1], Only some external resources you are unlikely to use don't follow spending limits, but when you create such resources, they are clearly marked as external.
That's one positive side of Azure.
[1]: https://learn.microsoft.com/en-us/azure/cost-management-
These limits are only for subscriptions with a credit amount e.g. $200 trials, Visual Studio subscriptions etc. As soon as you are on a pay as you go, you only have access to budget limit.
As others have said these are not limits, just notifications. You can’t actually create a limit unless you self create one using another AWS service (surprise) like lambda to read in the reports and shut things down.
And as others have also mentioned, the reports have a delay. In many cases it’s several hours. But worst case, your CURs (Cost usage reports) don’t really reflect reality for up to 24 hours after the fact.
I work in this space regularly. There can be a delay of 2-3 days from the event to charge. Seems some services report faster than others. But this means by the time you get a billing alert it has been ongoing for hours if not days.
"Limits" like this are how I woke up one Sunday morning in my college dorm with a $7k bill from dreamhost.
To all of those who say "this is not limit, only notifications": yes, notifications that can trigger whatever you want, including a shutdown of whatever you have
Is this a perfect solution: no Is this still a solution: yes
The notifications are not real time. You can rack up a significant bill before they are triggered.
To paraphrase Rainer Wolfcastle - the budgets do nothing!
You get a warning. There's no service cutoffs or hard limits on spending.
If you sign up for electrical service for your house, and your shithead neighbor taps your line to power his array of grow lamps and crypto mining rigs, the power company will happily charge you thousands of dollars, and you will need a police report and traverse many layers of customer service hell to get a refund. If you sign up for water service and a tree root cracks your pipe, the water company will happily charge you thousands of dollars for the leaked water, and will then proceed to mandate that you to fix the broken pipe at your own expense for a couple tens of thousands more; and yes, that may well bankrupt you, water company don't care. So why do you expect different treatment from a computing utility provider?
> If you sign up for electrical service for your house, and your shithead neighbor taps your line to power his array of grow lamps and crypto mining rigs, the power company will happily charge you thousands of dollars
Unlike cloud services, your electrical service has a literal circuit breaker. Got a regular three-phase 230V 25A hookup? You are limited to 17.25kW, no way around that. If that shithead neighbor tries to draw 50kW, the breaker will trip.
If it were the cloud, the power company would conveniently come by to upgrade your service instead. A residential home needing a dedicated 175MW high-voltage substation hookup? Sure, why not!
Water leaks, on the other hand, tend to be very noticeable. If a pipe bursts in the attic you'll end up with water literally dripping from the ceiling. It is very rare to end up with a water leak large enough to be expensive, yet small enough to go unnoticed. On the other hand, the cloud will happily let your usage skyrocket - without even bothering to send you an email.
There are plenty of compute service providers working with a fixed cap, a pre-pay system, or usage alerts. The fact that the big cloud providers don't is a deliberate choice: the goal is to make the user pay more than they wanted to.
You're right, but even if I cut the water pipe right after the meter and run it for a month I might get a few thousand dollar charge.
You can ring up tens of thousands+ overnight with AWS. The scale of potential damages is nowhere even close.
In addition to everything that's already been mentioned, another obvious difference is that energy and water are finite resources that are already provided at relatively low margins. Cloud services are provided at obscene gross margins. The numbers are all made-up and don't reflect the actual costs in providing those services.
I don't know in US, but having limits on how much electricity a house is able to take from the gride is absolutely something in some countries out there.
Definitely in the US too, I'm not resident either, but your 100A or whatever supply is a hard limit on what it can cost you per time period.
At least in my country the metering is done _in_ the house so my neighbour has to break and enter to tap the line behind the meter. I would probably notice well before bills would pile up. If he taps it outside, probably no one would ever notice if done right. The grid looses energy all the time. Not every kWh that goes into the network is billed in the end.
As always, it just doesn’t make an awful lot of sense to compare physical and virtual worlds. As in leaving your front door unlocked in rural areas vs not securing your remote shell access.
The difference is this actually happens, a lot, unlike your straw man. It happens enough that there's a website dedicated to it.
For your scenarios, I have the police, the public service commission, utility regulators, my elected officials and homeowners insurance to potentially help. Not that it always works, not that it's easy, quick or without pain, but there are options.
For the cloud, I have the good will of the cloud provider and appealing to social media. Not the same thing.
The first instance is difficult to fix as crime can often involve substantial losses to people and often there's no route to getting a refund.
The broken water pipe should be covered by buildings insurance, but I can imagine it not being covered by some policies. Luckily a broken water pipe is likely not as expensive as not having e.g. third party liability protection if part of your roof falls off and hits someone.
Amazon refunded you and you hate them for it?
I think one of the reasons I appreciate AWS so much is that any time there has been snafu that led to a huge bill like this they've made it pretty painless to get a refund- just like you experienced.
If it is a "free tier", Amazon should halt the application when it exceeds quota. Moving the account to a paid tier and charging $100k is not the right thing to do.
Yes. They said it was free then they surprise charge you $100k.
That’s an insane amount of both money and stress. You’re at Amazon’s mercy if they will or will not refund it. And while this is in process you’re wondering if your entire financial future is ruined.
I have never in 8 years of being in the AWS ecosystem and reading forums and Reddits on the internet had anyone report that AWS wouldn’t refund their money.
If you go over your budget with AWS, what should AWS do automatically? Delete your objects from S3? Terminate your databases and EC2 instances? Besides, billing data collection doesn’t happen anywhere near realtime, consider it a fire hose of streaming data that is captured asynchronously.
> If you go over your budget with AWS, what should AWS do automatically? Delete your objects from S3? Terminate your databases and EC2 instances?
Why not simply take the service offline once it reaches the free tier limit??
The reason why is that AWS is greedy, and would rather force you to become a paid customer…
How do you take your S3 service offline when they charge for storage or your EBS volumes? Your databases?
Block access to the service until the next billing period starts, or the user upgrades to a paid tier.
Provide the user the tools to make these choices. Give the option to explicitly choose how durable to extreme traffic you want to be. Have the free tier default to "not very durable"
Bam, you said. They’d do it if they cared, but they don’t and prefer the status quo. 100k surprise bill is the type of thing people kill themselves over. Horrific
You mean like having a billing alert send an event that allows you to trigger custom actions to turn things off? That already exists. It has for years.
So why isn't it the default yet? Why isn't unlimited scaling something you have to turn on?
I agree, but I could also see how someone would complain about that: “Our e-commerce site was taken down by Amazon right on our biggest day of the year. They should have just moved us up to the next tier.”
Then let that be the non default option.
The default option is always going to be the one that makes the majority of Amazon's paying customers happy.
Maybe offer 'Sales day rush auto-scale' as a setting.
Seems like the most flexible option is to put a spending limit in place by default and make it obvious that it can affect availability of the service if the limit is reached.
My credit cards have credit limits, so it makes sense that a variable cost service should easily be able to support a spending limit too.
That would get caught during the pre-peak stress testing.
You do do stress testing ahead of peak season, right?
Good news! This is exactly how the free tier works now.
You're misunderstanding the offering. (Maybe that's their fault for using intentionally misleading language... but using that language in this way is pretty common nowadays, so this is important to understand.)
For a postpaid service with usage-based billing, there are no separate "free" and "paid" plans (= what you're clearly thinking of when you're saying "tiers" here.)
The "free tier" of these services, is a set of per-usage-SKU monthly usage credit bonuses, that are set up in such a way that if you are using reasonable "just testing" amounts of resources, your bill for the month will be credited down to $0.
And yes, this does mean that even when you're paying for some AWS services, you're still benefitting from the "free tier" for any service whose usage isn't exceeding those free-tier limits. That's why it's a [per-SKU usage] tier, rather than a "plan."
If you're familiar with electricity providers telling you that you're about to hit a "step-up rate" for your electricity usage for the month — that's exactly the same type of usage tier system. Except theirs goes [cheap usage] -> [expensive usage], whereas IaaS providers' tiers go [free usage] -> [costed usage].
> Amazon should halt the application when it exceeds quota.
There is no easy way to do this in a distributed system (which is why IaaS services don't even try; and why their billing dashboards are always these weird detached things that surface billing only in monthly statements and coarse-grained charts, with no visibility into the raw usage numbers.)
There's a lot of inherent complexity of converting "usage" into "billable usage." It involves not just muxing usage credit-spend together, but also classifying spend from each system into a SKU [where the appropriate bucket for the same usage can change over time]; and then a lot of lookups into various control-plane systems to figure out whether any bounded or continuous discounts and credits should be applied to each SKU.
And that means that this conversion process can't happen in the services themselves. It needs to be a separate process pushed out to some specific billing system.
Usually, this means that the services that generate billable usage are just asynchronously pushing out "usage-credit spend events" into something like a log or message queue; and then a billing system is, asynchronously, sucking these up and crunching through them to emit/checkpoint "SKU billing events" against an invoice object tied to a billing account.
Due to all of the extra steps involved in this pipeline, the cumulative usage that an IaaS knows about for a given billing account (i.e. can fire a webhook when one of those billing events hits an MQ topic) might be something like 5 minutes out-of-date of the actual incoming usage-credit-spend.
Which means that, by the time any "trigger" to shut down your application because it exceeded a "quota" went through, your application would have already spent 5 minutes more of credits.
And again, for a large, heavily-loaded application — the kind these services are designed around — that extra five minutes of usage could correspond to millions of dollars of extra spend.
Which is, obviously, unacceptable from a customer perspective. No customer would accept a "quota system" that says you're in a free plan, yet charges you, because you accrued an extra 5 minutes of usage beyond the free plan's limits before the quota could "kick in."
But nor would the IaaS itself just be willing to eat that bill for the actual underlying costs of serving that extra 5 minutes of traffic, because that traffic could very well have an underlying cost of "millions of dollars."
So instead they just say "no, we won't implement a data-plane billable-usage-quota feature; if you want it, you can either implement it yourself [since your L7 app can observe its usage 'live' much better than our infra can] or, more idiomatically to our infra, you can ensure that any development project is configured with appropriate sandboxing + other protections to never get into a situation where any resource could exceed its the free-tier-credited usage in the first place."
Oracle can do it.
stop putting stuff on the internet you don't understand.
You wake up to a bill of one hundred thousand dollars and now it's up to you to dispute it. I think hating them for it is very fair.
Its just numbers on a screen.
If you woke up to the auto-withdrew $100k from your bank account and now you need to get it back that's another story.
Really? You’re not “disputing it”. You were charged fair and square. You send an email to their customer support and they say “no problem” and help you prevent it in the future.
And what if they don't say "no problem"? Like the Netlify case where they at first offered a reduced bill (which was still a lot) before the post got viral and the CEO stepped in.
Then don’t use a service that has a reputation for poor customer service?
Like they use to say about IBM…
“No one ever got fired for choosing AWS (or Azure)”
Amazon is currently permissive which splits opposition, this won’t always be the case, they will tighten the screws eventually as they have done in the past in other areas. Amazon because it’s so broadly used undermines the utility of chargebacks, you can do it but it’ll be a real hassle to not be able to use Amazon for shopping. A lot of people will just eat the costs, is Amazon knows this they will force the situation more often because it’ll make them more money.
AWS has been very liberal about refunds and credits because of mistakes since 2006.
Every company was for 10 or 15 years... until they weren't.
Particularly the other side of Amazon.
That was also true for returned goods - though admittedly scammers did ruin that for everyone.
I'd rather they didn't have the option, I'd rather not stake my financial future on the good graces of Amazon.
Amazon is irresponsible when they let people sign up for a unlimited credit.
At minimum they should provide hard billing caps.
putting stuff on the internet is dangerous. if you're not prepared to secure public endpoints stop creating them.
Putting stuff on the internet is dangerous, but the absence of hard caps is a choice and it just looks like another massive tech company optimizing for their own benefit. Another example of this is smartphone games for children, it's easier for a child to spend $2,000 then it is for a parent to enforce a $20/month spending limit.
Are you really comparing a software developer provisioning an online service to a child buying tokens for loot boxes?
More the "dark pattern" of empowering unlimited spending and then what keeps on unfolding from it.
No it isn't. There are many ways to put stuff on internet with guaranteed max costs.
blaming the victim? stay classy.
intentionally allowing huge billing by default is scummy, period.
Yes, you as a developer should know something about how the service works before you use it. I first opened the AWS console in 2016 and by then I had read about the possible gotchas.
Well, people get informed by reading these stories. So let's keep informing people to avoid AWS.
Yes I’m sure large corporations and even startups are going to leave AWS because a few junior devs didn’t do their research.
You do know that large corporations and startups employ junior devs as well, right?
All else being equal, would you rather choose the platform where a junior dev can accidentally incur a $1M bill (which would already bankrupt early startups), or the platform where that same junior dev get a "usage limits exceeded - click here to upgrade" email?
As the saying goes, when you owe the bank $100 you've got a problem, when you owe the bank $100k the bank has a problem...
On serverless, I can enter numbers in a calculator and guess that running my little toy demo app on AWS will cost between $1 and $100. Getting hit with a huge $1000 bill and a refusal to refund the charges (and revocation of my Prime account and a lifetime ban from AWS and cancellation of any other services I might otherwise run there) would be totally possible, but I have zero control over that. Expecting to go on social media begging for a refund is not a plan, it's evidence of a broken system - kinda like those "heartwarming" posts about poor people starting a GoFundMe so their child can afford cancer treatment. No, that's awful, can we just be sensible instead?
If a server would have cost me $20 at a VPS provider to keep a machine online 24/7 that was at 1% utilization most of the time and was terribly laggy or crashed when it went viral, that's what $20 buys you.
But, you say, analysis of acttual traffic says that serverless would only cost me $10 including scaling for the spike, in which case that's a fantastic deal. Half price! Or maybe it would be $100, 5x the price. I have no way of knowing in advance.
It's just not worth the risk.
> (and revocation of my Prime account and a lifetime ban from AWS and cancellation of any other services I might otherwise run there)
Also a vital lesson from the big tech companies that sell a wide variety of services: don't get your cloud hosting from a company that you also use other services from.
I had to disable photo syncing because Google photos eats up my Gmail space. Having Amazon's cloud billing fuckup threaten your TV access is another level.
We clearly need to keep the option open to burn those bridges.
In any case, if I ever host anything, I'm going to host it from my home.
You haven’t been able to use your Amazon retail account to open an AWS account for years. You don’t “beg”. You just send them an email and they say “yes”.
They are economic realists about this. They say "yes" if they can't realistically get $100k from you for your error.
I have never in 9 years working with AWS - four at product companies as the architect, 3.5 at AWS itself working in the Professional Services department and the last two working at 3rd party companies - ever heard or read about anyone either on a personal project or a large organization not be able to get a refund or in the case of a large org, sometimes a credit from AWS when they made a mistake that was costly to them.
I haven’t even seen reports on subreddits
From your bragging one could tell that you have seen _a lot_ of charging mistakes and "happy" refund stories from AWS. It's scary that a single human can do extensive statistics on personal experience about these monetary horror stories, don't you think?
So can you find any anecdotes even on Reddit where a student or hobbyist asked for a refund and was refused?
I assume you have seen many casual instances of cost overrun in that time. I'm sure you've also seen instances where an extra $10k flies out the door to AWS and people think "no big deal, that one was on us." This world doesn't have to exist. Even if AWS has a policy of always refunding people for a big oopsie, the fact that you have seen so many big ones suggests that you have also seen a lot of little ones.
By the way, there is nothing stopping AWS from reversing their trend of issuing refunds for big mistakes. "It hasn't happened in the past" isn't a great way to argue "it won't happen in the future."
Yes and I’ve also seen bad on prem build outs, bad hires, bad initiatives, proof of concepts that didn’t go anywhere, etc
Sure. The issues with AWS could all be solved with decent billing software, though. 15 years in there isn't a good excuse for this state of the world except that it's profitable.
You can set up billing alerts to trigger actions that stop things when they trigger. The easiest way is to take permissions away from the roles you create.
They give you the tools. It’s up to you to use them. If that’s too difficult, use the AWS LightSail services where you are charged a fixed price and you don’t have to worry about overages or the new free tier
https://aws.amazon.com/free/
Because despite what everyone here is saying, before July of this year, there was no such thing as a free tier of AWS, there was a free tier of some of their services
Instead of lightsail, I'll use digital ocean. It's a cheaper way to get undifferentiated VMs.
If it is so easy to set up these automations, Amazon can easily set up this automation for you. Ask yourself why they don't.
$1 cheaper per month for the lowest performance VM on both?
https://medium.com/%40akshay.kannan.email/amazon-is-refusing...
https://www.reddit.com/r/aws/comments/rm8t2j/any_experience_...
Both of these are about an account compromise, which is a really fascinating story about incentives. An accidental overrun on something you designed on AWS indicates you are hooked on their drugs, so obviously the dealer is happy to give you another free hit after you had a bad trip. That's good marketing. An account compromise has no intention, so giving you a refund is just a waste.
Must be really nice people there which don't want any money. Really warms my heart.
ofc. When things go viral they say "yes". But i would really love to get some number how many students and hobbiests got a 1k-2k bill and just paid it for the problem to go away.
Amazon is a publicly traded company. If they wave fees every time something goes wrong, investors would tell them something.
AWS and all of the other cloud providers gives millions in credits each year for migrations and professional services for both their inside professional services department and third party vendors. The reputational risk for them to go after some poor student isn’t worth it to them. The same is true for Azure and GCP.
Have you read one even anecdotal case where AWS didn’t immediately give a refund to a student who made a mistake just by them asking?
I think it is more about having sane limits. If someone signs up for a free account, they probably aren't expecting to be on the hook for $100K+.
You are a bit naive. They are making a ton of money with this dark pattern. As others have said Free-to-100K is not in the most generous realm of expectations. Its also why they have been doing the refunds as long as AWS has been a thing. They know it will not hold up in court. Not a month goes by without some HN story about something like this post.
They do this and make it easy to get a refund because for every demo account that does it some bigger account accidentally gets billed 10K and they have to pay for it. They have skin in the game and cannot risk their account being down for any time period.
I have had a “larger account” when I was at startup and was able to ask for a refund for a business.
As I asked before, if what is causing overages is not web requests but storage should they just delete everything?
Not sure what you define as larger account.
Counter real world example. I was doing some consult work for a place that had a 9k unexpected charge on AWS. They asked about it and I explained about how they could dispute it. They said ugh never-mind and just paid it. FYI it was a charity which I've since learned its common for charities to be wasteful like this with money since they are understaffed and OPM(Other peoples money)
So how is that a counter example? The client never asked for a credit. Since the startup I worked for, I have been working in AWS consulting - first directly at AWS (Professional Services) and now a third party consulting company.
While I have no love for AWS as an employer, I can tell you that everyone up and down the chain makes damn sure that customers don’t waste money on AWS. We were always incentivized to find the lowest cost option that met their needs. AWS definitely doesn’t want customers to have a bad experience or to get the reputation that it’s hard to get a refund when you make a mistake.
Of course AWS wanted more workloads on their system.
Semi related, AWS will gladly throw credits at customers to pay for the cost of migrations and for both inside and outside professional services firms to do proof of concepts.
You, but shorter: It can't be done perfectly in 100.0% of all possible circumstances, so better to do absolutely nothing at all. On an unrelated note, this strongly aligns with their economic interests.
For storage specifically, in that circumstance, if you weren't hellbent on claiming otherwise: it's easy to figure out what to do. For storage: block writes and reach out to the customer. Also, people are extremely unlikely to accidentally upload eg 250tb which is how you'd get to, say, $200/day. Whereas similar bills are extremely easy to accidentally create with other services.
It's totally reasonable to want spend limits firmer than AWS' discretion, which they can revoke at any point in time for any reason.
If it's a free tier there should never have been a charge in the first place...
What would the would “tier” mean here? There is a US tax bracket (tier) where no tax is due on long-term capital gains. That doesn’t mean it’s wrong when I pay long-term capital gains.
There's an expectation when it comes to consumer goods, and even protection in most jurisdictions, that you can't simply charge someone for something they don't want. It's like dropping a Mercedes at someone's house then charging them for it when they never wanted or asked for it. Allowing a "free" tier to run up so much traffic that it becomes a $100k bill is ridiculous and probably illegal.
Taxes are different because they never exceed the amount the person paying the taxes receives.
Once I've been kidnapped by a guy who also happen to run a security business. After a bit of discussion, I was about to convince some of his sbire to release me without paying the ransom. I'm so glad they did accept that, and I never fail to use and recommend the services of the security business now.
Since this seems to be getting some comments. Yes, it is in fact easy to shut down an instance if it goes over a spending limit. As in you monitor traffic tied directly to the billing system and you set up an if statement and if it goes over the limit you shut down the server and dump the service to a static drive.
It's the easiest thing in the world - they just don't want to because they figured that they could use their scale to screw over their customers. And now you have the same guys who screwed everyone over with cloud compute wanting you to pay for AI by using their monopoly position to charge you economic rents. Because now things like edge compute is easy because everyone overspent on hard drives because of crypto. And so you have jerks who just move on to the next thing to use their power to abuse the market rather than build credibility because the market incentivizes building bubbles and bad behavior.
Smart evil people who tell others "no you're just too dumb to 'get it' (oh by the way give me more money before this market collapses)" are the absolute bane of the industry.
It's weird that you have people in here defending the practice as if it's a difficult thing to do. Taxi cabs somehow manage not to charge you thousands of dollars for places you don't drive to but you can't set up an if statement on a server? So you're saying Amazon is run by people that are dumber than a taxi cab company?
Ok, well you might have a point. And this is how Waymo was started. I may or may not be kidding.
Not defending the practice of the cloud providers but you’re oversimplifying the difficulty involved.
I've got a $25k bill right now because I had enabled data-plane audit logging on an sqs queue that about a year ago I had wired to receive a real-time feed of audit events. So for every net-new audit event there would be an infinite loop of write events to follow. My average daily bill is about $2 on that account and has been for nearly ten years. It suddenly ballooned to $3k/day and zero warning or intervention from AWS.
> I had them refund the bill (as in how am I going to pay it?) but to this day I've hated Amazon with a passion
They refunded you $100k with few questions asked, and you hate them for it?
I’ve made a few expensive mistakes on AWS that were entirely my fault, and AWS has always refunded me for them.
I imagine if Amazon did implement “shut every down when I exceed my budget” there’d be a bunch of horror stories like “I got DDOSed and AWS shutdown all my EC2s and destroyed the data I accidentally wrote to ephemeral storage.”
> They refunded you $100k with few questions asked, and you hate them for it?
They exposed him to 100K of liability without any way to avoid it (other than to avoid AWS entirely), and then happened to blink, in this case, with no guarantee that it would happen again. If you don't happen to have a few hundred thousand liquid, suddenly getting a bill for 100K might well be a life-ruiningly stressful event.
Yes, he could have set up a billing alert that triggered an action to shut everything down. Easy way is to take away privileges from the IAM roles attached to the processes.
Bad design if that isn't in place for a new free-tier experiment.
This is the problem right here. I moved from AWS and specifically Beanstalk because I don't want to be some "certified AWS goblin". I just wanted to host something sensibly.
Other hosting companies don't have this problem and while I cannot complain about AWS as a service, this can be improved if there would be the will to do so. I believe there are other incentives at work here and that isn't a service to the customer.
He hit you in the face? But girl, he apologized! Best boyfriend ever.
You don't get it! He normally isn't like that!
Given how complicated configuring AWS is, surely there could be some middle ground between stop all running services and delete every byte of data. The former is surely what the typical low spend account would desire.
Well, shutting down is the obvious choice if you are getting DDOSed. The alternative is infinite potential debt. That's the real horror.
In what world is that not the preferable solution? Want to know if your shit is actually robust just set your cap and ddos yourself as the first test of you architecture.
Yes, a sign of resilient architecture is to shut down when it encounters some stress.
Consider it like a crumple zone in a car.
Oops you hit a pothole now your car won't go.
Terrible analogy.
I’ve never trusted AWS with personal work for exactly this reason. If I want to spend $20 on a personal project I should be able to put a cap on that directly, not wake up to a $100k bill and go through the stress of hoping it might be forgiven.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity. No problem that makes sense
It does on the surface, but what doesn't make sense is to register with a credit card and not read the terms very carefully: both for the cloud service and for the bank service.
In this aspect cash is so much better because you have only one contract to worry about...
I use AWS out of expedience but I hate the no-hard-cap experience and this is my primary reason for shifting (WIP) to self hosting. Plus self hosting is cheaper for me anyway. In general I would like a legally forced liability limit on unbounded subscription services, perhaps a list maintained at the credit card level. If the supplier doesn’t like the limit they can stop supplying. The surprise $100K liabilities are pure insanity.
I would never use a cloud service that doesn't let me set a hard cap for any service. Not just an alert. A hard cap.
Which cloud service does this?
I have no idea. I know Azure does for the student/msdn/and similar accounts which are the only cloud services I use for personal projects. So I Azure doesn’t even have my credit card.
Cloud Run lets you cap the number of instances when you create a service. So you can just set max_instances to 1 and you never have to worry about a spambot or hug of death from blowing up your budget. I run all my personal sites like this and pay (generally) nothing.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance
Didn't the bootcamp told you to, at least, setup a budget alert?
I'm not trying to reduce AWS' responsibility here, but if a teaching program tells you to use AWS but doesn't teach you how to use it correctly, you should question both AWS and the program's methods.
It’s interesting because on the posted site there’s only 2 AWS posts on the main page and they’re rather mild compared to the other posts using google, vercel, cloudformation, etc.
> When I was learning to program through a bootcamp I spun up an elastic beanstalk instance that was free but required a credit card to prove your identity.
Is it just me or is this just a cheap excuse to grab a payment method from unsuspecting free-tier users?
AWS services aren't designed for people just learning to program. Beanstalk and other services have billing limits you can set, but those aren't hard limits because they are measured async to keep performance up.
With that said, AWS is notoriously opaque in terms of "how much will I pay for this service" because they bill so many variable facets of things, and I've never really trusted the free tier myself unless I made sure it wasn't serving the public.
As does Lightsail…
Not that Amazon needs any defending, but for anyone running a bootcamp: this is a good reason to start with services like Heroku. They make this type of mistake much harder to make. They're very beginner friendly compared to raw AWS.
> required a credit card to prove your identity
Given the relative accessibility of stolen credit card info, isn't the CC-as-ID requirement easy for a criminal to bypass?
It's easy yes, but better than nothing. The verification requirements are a balance between desired conversion rate, probability of loss (how many bad guys want to exploit your system without paying) and the actual costs of said loss (in this case it's all bullshit "bandwidth" charges, so no actual loss to AWS).
Something similar happened to me, but not at the outrageous scale. I wanted to try some AI example on Bedrock. So the tutorial said I needed to set up some OpenSearch option. Voila. A few days later I had a bill for $120. The scale is not as horrible, but the principle is the same.
That’s why I prefer prepaid cards or those I can easily freeze to prevent any booking.
Freezing a card doesn’t mean the debt is erased. They can still take you to collections.
But it’s a difference to object their claims while you still have your money instead of being in debt while you try to get your money back
"your honor, they provided the credit card to prove their identity for the free plan and now we want to collect 100k"
And then they pull out the invoice where they prove without any doubt that you actually used pay-per-use services and ran up a 100k bill because you failed to do any sort of configuration.
I didn't use them, some bots did. Sort it out with them.
> I didn't use them, some bots did. Sort it out with them.
For you to put together this sort of argument with a straight face, you need to have little to no understanding of how the internet in general, services and web apps work. Good luck arguing your case with a judge.
I’ve not read the fine print but I’d be worried that there would be something in there that allows this.
There are light-years between what a company thinks their ToS “allow” and what a court would actually determine is allowed. Plenty of ToS clauses in major contracts are unenforceable by law.
In this situation if it were to actually go to court I’d expect the actual bill to get significantly reduced at the very least. But because it’s a completely bullshit charge based on bandwidth usage (which costs them nothing) it won’t go anywhere near that and will just get written off anyway.
Courts can be rather capricious, I’d rather avoid them as best as possible, even if you are likely to win having to fight something like this in court is punishing.
If your card is declined and they don't feel like forgiving the bill, won't they just send debt collectors after you instead?
Yes, but it's better they need to get their money than you need to get your money back. 100.000 easily can put you in ruining debt. It's the better position to still have your money even if you have to pay.
> Amazon then charged me one hundred thousand dollars as the server was hit by bot spam.
That would make you one of the most successful websites on the internet, or the target of a DDoS -- which was it? I assume you're not saying that "bots" would randomly hit a single, brand-new "hello world" site enough to generate that kind of bill.
Many of the people who have this problem on toy websites end up offering what amounts to free storage or something similar. They are then surprised when "bots" come to "DDoS" them. These bills are as much a product economics problem as a technical one.
> cloud computing I'd use anyone else for that very reason. The entire service with it's horrifically complicated click through dashboard just to confuse the customer into losing money.
I feel like this brand of sentiment is everywhere. Folks want things simple. We often figure out what we need to do to get by.
Over time we learn the reason for a handful of the options we initially defaulted through, find cause to use the options. Some intrepid explorers have enough broader context and interest to figure much more out but mostly we just set and forget, remembering only the sting of facing our own ignorance & begrudging the options.
This is why k8s and systemd are have such a loud anti-following.
> Amazon then charged me one hundred thousand dollars
> as in how am I going to pay it?
Really?
Amazon charged your card for $100,000 and your bank allowed it?
You're filthy rich by most people's standard, and you were able to pay it.
Amazon was operating in such a good faith that they ate the computational cost you spent. And you hate them for this to this day...
Let’s be real: OP needed that money way more than Amazon does.
You had a credit card with not only a $100k+ limit, but allowed a single $100k transaction on it?
I call bullshit
allways set cost alarm and max spending. AWS has great tools to controll costs. You could have blocked this with good config but I understand its confusing and not super apparent. IMHO there should be a pop up or sth asking " you want to stop the instance the moment it costs anything?"
its so easy to get billed a ridicules amount if money
Did you do any training before launching the elastic beanstalk instance, or you just though a F-16 should be pretty easy to fly, at least according to most pilots?
An F-16 doesn't have a prominently-featured "getting started" tutorial, which has a bunch of step-by-step instructions getting a complete novice 40.000ft into the air at mach 2.
AWS also provides training and education on how to use their services. If launching a "hello world" Elastic Beanstalk instance is so dangerous, why doesn't the tutorial require you to first provide proof that you are an AWS Certified Cloud Practitioner?
c’mon mate, be real. aws is absolute shit for beginers, this is such a bad comment
You are attributing to greed what can easily be explained by just not giving an f. They don't care that much about small customers.
> The entire service with it's horrifically complicated click through dashboard (but you can get a certification! It's so complicated they invented a fake degree for it!) just to confuse the customer into losing money.
By that logic, any technology that you can get certified in is too complicated?
Most systems are now distributed and presenting a holistic view of how it was designed to work can be useful to prevent simple mistakes.
Traffic requires a certification (license) too. Must be a fake degree as well because they made it too complicated
> By that logic, any technology that you can get certified in is too complicated?
That is a common view in UX, yes. It's a bit of an extreme view, but it's a useful gut reaction
> Traffic requires a certification (license) too. Must be a fake degree as well because they made it too complicated
In the US roads are designed so that you need as close to no knowledge as possible. You need to know some basic rules like the side of the road you drive on or that red means stop, but there is literal text on common road signs so people don't have to learn road signs. And the driving license is a bit of a joke, especially compared to other Western countries
There is something to be said about interfaces that are more useful for power users and achieve that by being less intuitive for the uninitiated. But especially in enterprise software the more prevalent effect is that spending less time and money on UX directly translates into generating more revenue from training, courses, paid support and certification programs
> By that logic, any technology that you can get certified in is too complicated?
In IT, I am inclined to agree with that. In real engineering, it's sometimes necessary, especially dangerous technology and technology that people trust with their life
> dangerous technology and technology that people trust with their life
Software runs on so many things we depend on IMO it also in many cases falls in the "dangerous technology" category.
Non-hobby OSes, non-hobby web browsers, device drivers, software that runs critical infrastructure, software that runs on network equipment, software that handles personal data, --IMHO it would not be unreasonable to require formal qualifications for developers working on any of those.
If I go buy a TIG welder, use it without any training, leave it on and go get coffee, do I get to complain that I have to pay for a new house?
Sorry, I do not understand. What is your point?
The history of making things complicated often involves "unintended" use by malicious actors.
But infact, it is intended side effects. Things like Jaywalking or "no spitting" laws let police officers harass more people _At their whim_. And they're fullying designed that way but left as "unintended" for the broader public scrutiny.
So, just like, learn that "logic" is not some magic thing you can sprinkle on everything and find some super moral or ethic reality. You have to actually integrate the impact through multiple levels of interaction to see the real problem with "it's just logic bro" response you got here.
The problem with the AWS certificate is that the entity issuing the certificate and the entity honoring the certificate have opposing priorities. When a company wants to use AWS, preferably they'd want to avoid needlessly expensive solutions and vendor lock-in, while Amazon wants to teach people how to choose needlessly expensive solutions with vendor lock-in.
It is a fake degree.
Not really. I think he's saying complicated for a cloud server. I don't think you can get degrees in digitalocean set up.
These "refund after overcharge" things are not without benefit to the corporations.
They get a nice tax write-off.
It's couch-cushion change for them, but it adds up. They have whole armies of beancounters, dreaming this stuff up.
It's also the game behind those "coupons" you get, for outrageously-priced meds that aren't covered by insurance.
If they charge $1,000 for the medication, but give you a "special discount" for $90%, they get to write off $900.
I’m fairly certain that’s incorrect.
Businesses are only taxed on actual revenue earned.
What you decide to charge—whether $100, $50, or even giving it away for free—is purely a business decision, not a tax one.
—
This is different from a nonprofit donation scenario though. For example, if your service normally costs $X but you choose to provide it for free (or at a discount) as a donation to a non-profit, you can typically write off the difference.
You may be right (this is not my forte), but the invoice is real. So is the forgiveness. I don’t see how the IRS can legitimately deny a write-off.
I’ve heard stories like this, many times, from businesses people.
They certainly believe in the pattern.
> Businesses are only taxed on actual revenue earned.
I don't want to go too far down the rabbit hole of hn speculation, but if another entity owes you 100k, and they go bankrupt, there absolutely are tax implications.
Agreed … but that is a different situation.
That is a lack of payment situation.
Revenue was still earned (and charged) … and since you never collected revenue then you don’t pay taxes.
Actually, I described two different things.
The second one (the prescription one) may well be wrong. It’s total speculation, on my part.
The first one, though, I’m pretty sure is right.
I love the traditional debate tactic, where we find one part of an argument to be wrong, and use that, to say the other part is, too.
It’s no matter. I’m done here, anyway, and yield to you. The fox ain’t worth the chase.
Would the tax implications not just be for whatever it costs on their end, regardless of what the customer was charged?
Smells like fraud.
A lot of things are "fraud" when an individual or small business does it but perfectly normal and considered merely good business acumen when done by a big corporation. Even more so now that the US government is openly for sale (it was always for sale, but before at least they had the decency to pretend it wasn't).
Yeah man the whole industry is like that. OpenAI gets to say they raised X billion dollars and update their valuation but they don't mention that it's all cloud compute credits from a gigantic Corp that owns a huge amount of the business. They claim to be a non-profit to do the research then when they've looted the commons, they switch to for profit to pay out the investors. There's shit like this throughout the industry.
I took a workshop class and was told to setup a track saw. The course didn't bother explaining how to utilize it properly or protect yourself. I ended up losing a finger. I truly hate Stanley Tools with a passion and if I ever need to use another track saw, I'll use someone else.
This analogy would make sense if the saw lacked a basic and obvious safety feature (billing limits) because Stanley profited immensely from cutting your finger off.
What seems like a basic feature to you is a hindrance to me. I don’t want to have to disable “safeguards” all over the place just because of loud and rare complaints.
It's as easy as having a single option:
> Do you want safeguards to be enabled by default, so you have to disable manually those you want to resign from?
OR
> Do you want safeguards to be disabled by default, so you have to enable manually those you want to be in place?
To then rebel against it and say "I lose seconds of my life reading this, I don't want to have to!" would be ridiculous.
Protect yourself how? Most cloud providers don't support any way to immediately abort spending if things get out of hand, and when running a public-facing service there are always variables you can't control.
Even if you rig up your own spending watchdog which polls the clouds billing APIs, you're still at the mercy of however long it takes for the cloud to reconcile your spending, which often takes hours or even days.
Yes, they do. You create resources and you delete resources and if you care about cost you creat alarms and tie them to scripts that automatically delete resources.
It’s basic stuff.
Those alarms can take hours to cut in, because AWS does not report costs in real time
It's true that they can but mostly they don't (particularly with "serverless" services).
> I ended up losing a finger
You forgot to mention Stanley Tools paid for the hospital bill.
No. Stanley Tools owns the hospital and would profit from the operation, but when you said you don't have the money they decided to let you go. Perhaps because legally they would have to anyway, or otherwise they would suffer various legal and reputational consequences.
This is a good analogy. When you use a tool you are responsible for what it does.
I'm a safety inspector. Of course this is much more nuanced than this. One crucial aspect of a tool safety is proper documentation. It's also important who the tool is targeted for. There are different safety standards based on user's competence. Some "tools" will be toys for children, some will be for disabled people including people with intellectual disabilities, some will be for general populace, and only some for trained experts.
If a tool is designed for experts, but you as the manufacturer or distributor know the tool is used by general populace, you know it's being misused every now and then, you know it harms the user AND YOU KNOW YOU BENEFIT FROM THIS HARM, AND YOU COULD EASILY AVOID IT - that sounds like something you could go to jail for.
I think if Amazon was a Polish company, it would be forced by UOKiK (Office of Competition and Consumer Protection) to send money to every client harmed this way. I actually got ~$150 this way once. I know in USA the law is much less protective, it surprises me Americans aren't much more careful as a result when it comes to e.g. reading the terms of service.
I thought this would be about the horrors of hosting/developing/debugging on “Serverless” but it’s about pricing over-runs. I scrolled aimlessly through the site ignoring most posts (bandwidth usage bills aren’t super interesting) but I did see this one:
https://medium.com/@maciej.pocwierz/how-an-empty-s3-bucket-c...
About how you make unauth’d API calls to an s3 bucket you don’t own to run up the costs. That was a new one for me.
> I thought this would be about the horrors of hosting/developing/debugging on “Serverless” but it’s about pricing over-runs.
Agreed about that. I was hired onto a team that inherited a large AWS Lambda backend and the opacity of the underlying platform (which is the value proposition of serverless!) has made it very painful when the going gets tough and you find bugs in your system down close to that layer (in our case, intermittent socket hangups trying to connect to the secrets extension). And since your local testing rig looks almost nothing like the deployed environment...
I have some toy stuff at home running on Google Cloud Functions and it works fine (and scale-to-zero is pretty handy for hiding in the free tier). But I struggle to imagine a scenario in a professional setting where I wouldn't prefer to just put an HTTP server/queue consumer in a container on ECS.
I've had similar experiences with Azures services. Black boxes impossible to troubleshoot. Very unexpected behavior people aren't necessarily aware of when they initially spin these things up. For anything important I just accept the pain of deploying to kubernetes. Developers actually wind up preferring it in most cases with flux and devsoace.
I recently had customer who had smart idea to protect Container Registry with firewall... Breaking pretty much everything in process. Now it kinda works after days of punching enough holes in... But I still have no idea where does something like Container registry pull stuff from, or App Service...
And does some of their suggested solutions actually work or not...
Convince them to add IPv6 and you’ll be set for life
They did!
But they network address translate (NAT) IPv6, entirely defeating the only purpose of this protocol.
It's just so, so painful that I have no words with which I can adequately express my disdain for this miserable excuse for "software engineering".
Every time I've done a cost benefit analysis of AWS Lambda vs running a tiny machine 24/7 to handle things, the math has come out in favor of just paying to keep a machine on all the time and spinning up more instances as load increase.
There are some workloads that are suitable for lambda but they are very rare compared to the # of people who just shove REST APIs on lambda "in case they need to scale."
Is that what people do is test/develop primarily with local mocks of the services? I assumed it was more like you deploy mini copies of the app to individual instances namespaced to developer or feature branch, so everyone is working on something that actually fairly closely approximates prod just without the loading characteristics and btw you have to be online so no working on an airplane.
SST has the best dev experience but requires you be online. They deploy all the real services (namespaced to you) and then instead of your function code they deploy little proxy lambdas that pass the request/response down to your local machine.
It’s still not perfect because the code is running locally but it allows “instant” updates after you make local changes and it’s the best I’ve found.
There are many paths. Worst case, I've witnessed developers editing Lambda code in the AWS console because they had no way to recreate the environment locally.
If you can't run locally, productivity drops like a rock. Each "cloud deploy" wastes tons of time.
Mocks usually don’t line up with how things run in prod. Most teams just make small branch or dev environments, or test in staging. Once you hit odd bugs, serverless stops feeling simple and just turns into a headache.
Yeah, I’ve never worked at one of those shops but it’s always sounded like a nightmare. I get very anxious when I don’t have a local representative environment where I can get detailed logs, attach a debugger, run strace, whatever.
I believe they changed that shortly after that blog post went viral: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3...
I raised that exact same issue to AWS in ~2015 and even though we had an Enterprise support plan, AWS response was basically: well, you problem.
We then ended up deleting the S3 bucket entirely, as that appeared to be the only way to get rid of the charges, only for AWS to come back to use a few weeks later telling us there are charges for an S3 bucket we previously owned. After explaining to them (again) that this way our only option to get rid of the charges, we never heard back.
You have to wonder how many people quietly got burned by that in the 18 years between S3 launching and that viral post finally prompting a response.
Seems an interesting oversight. I can just imagine the roundtable, uhh guys who do we charge for 403? Who can we charge? But what if people hit random buckets as an attack? Great!
> Seems an interesting oversight. I can just imagine the roundtable, uhh guys who do we charge for 403? Who can we charge? But what if people hit random buckets as an attack? Great!
It is amazing, isn't it? Something starts as an oversight but by the time it reaches down to customer support, it becomes an edict from above as it is "expected behavior".
> AWS was kind enough to cancel my S3 bill. However, they emphasized that this was done as an exception.
The stench of this bovine excrement is so strong that it transcends space time somehow.
Even pooper is upset about the stench. Tech is fuckin dumb in the corps, the only logical explanation to me is kickbacks to the CTO or similar.
Pooping at the job is one thing but pooping at the job and trying to sell it as a favor to the customer is a whole different game.
you don't need kickbacks at this level. They're all judged by their 6-month outlook on revenue and their market shares.
This is just obfuscating grift justified by the "well, you own the severless functions!"
> I can just imagine the roundtable
That's the best part!
The devs probably never thought of it, the support people who were complained to were probably either unable to reach the devs, or time crunched enough to not be able to, and who as a project manager would want to say they told their Devs to fix an issue that will lose the company money!
> I reported my findings to the maintainers of the vulnerable open-source tool. They quickly fixed the default configuration, although they can’t fix the existing deployments.
Anyone wanna guess which open source tool this was? I'm curious to know why they never detected this themselves. I'd like to avoid this software if possible as the developers seem very incompetent.
How to destroy your competition. Love it. Also why i dislike AWS. Zero interest to protect their SMB customers from surprise bills. Azure isn't much better but at least they got a few more protections in place.
Same, I was hoping for tales of woe and cloud lock-in, of being forced to use Lambda and Dynamo for something that could easily run on a $20/month VPS with sqlite.
The webflow one at the top has an interesting detail about them not allowing you to offload images to a cheaper service. Which you can probably work around by using a different domain.
> Imagine you create an empty, private AWS S3 bucket in a region of your preference. [...] As it turns out, one of the popular open-source tools had a default configuration to store their backups in S3. And, as a placeholder for a bucket name, they used… the same name that I used for my bucket.
What are the odds?
(Not a rhetorical question. I don't know how the choice of names works.)
The assignment of blame for misconfigured cloud infra or DOS attacks is so interesting to me. There don't seem to be many principles at play, it's all fluid and contingent.
Customers demand frictionless tools for automatically spinning up a bunch of real-world hardware. If you put this in the hands of inexperienced people, they will mess up and end up with huge bills, and you take a reputational hit for demanding thousands of dollars from the little guy. If you decide to vet potential customers ahead of time to make sure they're not so incompetent, then you get a reputation as a gatekeeper with no respect for the little guy who's just trying to hustle and build.
I always enjoy playing at the boundaries in these thought experiments. If I run up a surprise $10k bill, how do we determine what I "really should owe" in some cosmic sense? Does it matter if I misconfigured something? What if my code was really bad, and I could have accomplished the same things with 10% of the spend?
Does it matter who the provider is, or should that not matter to the customer in terms of making things right? For example, do you get to demand payment on my $10k surprise bill because you are a small team selling me a PDF generation API, even if you would ask AWS to waive your own $10k mistake?
How about spending caps / circuit breakers? Doesn't seem an unsolveable problem to me.
Then you’re the person who took down their small business when they were doing well.
At AWS I’d consistently have customers who’d architected horrendously who wanted us to cover their 7/8 figure “losses” when something worked entirely as advertised.
Small businesses often don’t know what they want, other than not being responsible for their mistakes.
Everyone who makes this argument always assumes that every website on the internet is a for-profit business when in reality the vast majority of websites are not trying to make any profit at all, they are not businesses. In those cases yes absolutely they want them to be brought down.
Or instead of an outage, simply have a bandwidth cap or request rate cap, same as in the good old days when we had a wire coming out of the back of the server with a fixed maximum bandwidth and predictable pricing.
There are plenty of options on the market with fixed bandwidth and predictable pricing. But for various reasons, these businesses prefer the highly scalable cloud services. They signed up for this
Every business has a bill they are unprepared to pay without evaluating and approving budget, even under successful conditions and even if that approval step is a 10 second process. It's obvious that Amazon does not add this because of substantial profit over any other concern.
The solution is simple: budget caps.
Yes and no. 100% accurate billing is not available in realtime, so it's entirely possible that you have reached and exceeded your cap by the time it has been detected.
Having said that, within AWS there are the concepts of "budget" and "budget action" whereby you can modify an IAM role to deny costly actions. When I was doing AWS consulting, I had a customer who was concerned about Bedrock costs, and it was trivial to set this up with Terraform. The biggest PITA is that it takes like 48-72 hours for all the prerequisites to be available (cost data, cost allocation tags, and an actual budget each can take 24 hours)
The circuit breaker doesn’t need to be 100% accurate. The detection just needs to be quick enough that the excess operating cost incurred by the delay is negligible for Amazon. That shouldn’t really be rocket science.
We're talking about a $2.5T company. Literally every example in this thread is already negligible to Amazon already without circuit breakers.
Implementing that functionality across AWS would cost orders of magnitude more than just simply refunding random $100k charges.
The point is that by not implementing such configurable caps, they are not being customer friendly, and the argument that it couldn’t be made 100% accurate is just a very poor excuse.
Sure, not providing that customer-friendly feature bestows them higher profits, but that’s exactly the criticism.
They also refuse refunds. Because it is profitable, even if the customer is unhappy to pay it.
If it were highly profitable for them to implement some form of budget cap cutoffs, they would! It's obvious it's not a game they are interested in.
What about 90% accurate?
Is it simple? So what happens when you hit the cap, does AWS delete the resources that are incurring the cost and destroy your app?
Imagine the horror stories on Hacker News that would generate.
Stop accepting requests like has been the case since the beginning of time?
Or simply returns 503? Why would you go directly to destroying things??
Suppose you’re going over the billing cap based on your storage consumption, how would AWS stop the continued consumption without deleting storage?
Why would they need to delete storage, they could just not accept past the cap.
Storage billing is partly time-based.
EBS is billed by the second (with a one minute minimum, I think).
Once a customer hits their billing cap, either AWS has to give away that storage, have the bill continue to increase, or destroy user data.
I think most of the "horror stories" aren't related to cases like this. So we can at least agree most such stories could be easily avoided, before we looked at solutions to these more nuanced problems (one of which would be clearly communicating the mechanism of a limit and what would be the daily cost of maintaining the maxed storage - and for a free account the settings could be adjusted for these "costs" to be within free quota)
Not everything on AWS is a Web app
Everything on AWS can deny a request no matter what the API happens to be
TCP session close? Don't reply back the UDP response? Stop scheduling time on the satellite transceiver for that account?
Interesting that you mention UDP, because I'm in the process of adding hard-limits to my service that handles UDP. It's not trivial, but it is possible and while I'm unsympathetic to folks casting shade on AWS for not having it, I decided a while back it was worth adding to my service. My market is experimenters and early stage projects though, which is different than AWS (most revenue from huge users) so I can see why they are more on the "buyer beware" side.
While I can imagine having budget overload from storage, most (all?) of the "horrors" on the page are from compute or access.
Set it up so that machines are deleted, but EBS volumes remain. S3 bucket is locked-out but data is safe.
Yes, that’s exactly the expected behavior. It can alert if it’s closed to threshold. Very straightforward from my point of view.
I mean, would you rather have a $10k build or have your server forcefully shut down after you hit $1k in three days?
One of those things is more important to different types of business. In some situations, any downtime at all is worth thousands per hour. In others, the service staying online is only worth hundreds of dollars a week.
So yes, the solution is as simple as giving the user hard spend caps that they can configure. I'd also set the default limits low for new accounts with a giant, obnoxious, flashing red popover that you cannot dismiss until you configure your limits.
However, this would generate less profit for Amazon et al. They have certainly run this calculation and decided they'd earn more money from careless businesses than they'd gain in goodwill. And we all know that goodwill has zero value to companies at FAANG scale. There's absolutely no chance that they haven't considered this. It's partially implemented and an incredibly obvious solution that everyone has been begging for since cloud computing became a thing. The only reason they haven't implemented this is purely greed and malice.
If you want hard caps, you can already do it. It’s not a checkbox in the UX, but the capability is there.
> Is it simple? So what happens when you hit the cap, does AWS delete the resources that are incurring the cost and destroy your app?
Sounds like you're saying "there aren't caps because it's hard".
> If you want hard caps, you can already do it. ... the capability is there.
What technique are you thinking of?
There are several satisfactory solutions available. Every other solution they offer was made with tradeoffs and ambiguous requirements they had to make a call on. It is obviously misaligned incentive rather than an impossibility. If they could make more money from it, they would be offering something. Product offering gaps are not merely technical impossibilities.
Surely that's the fault of the purchaser setting the cap too low.
Maybe rather than completely stopping the service, it'd be better to rate limit the service when approaching/reaching the cap.
Using that logic, isn’t it the fault of the user to set up an app without rate limiting?
It's misleading to promote a free tier that can then incur huge charges without being able to specify a charge cap.
Maybe, but its a huge reason to use real servers instead of serverless.
I mean real servers get hit with things like bandwidth fees so it's not a 100% solution.
Not even remotely the same scale of problem. Like at all.
If your business suddenly starts generating Tbs of traffic (that is not a ddos), you'd be thrilled to pay overage fees because your business just took off.
You don't usually get $10k bandwidth fees because your misconfigured service consumes too much CPU.
And besides that, for most of these cases, a small business can host on-prem with zero bandwidth fees of any type, ever. If you can get by with a gigabit uplink, you have nothing to worry about. And if you're at the scale where AWS overages are a real problem, you almost certainly don't need more than you can get with a surplus server and a regular business grade fiber link.
This is very much not an all-or-nothing situation. There is a vast segment of industry that absolutely does not need anything more than a server in a closet wired to the internet connection your office already has. My last job paid $100/mo for an AWS instance to host a GitLab server for a team of 20. We could have gotten by with a junk laptop shoved in a corner and got the exact same performance and experience. It once borked itself after an update and railed the CPU for a week, which cost us a bunch of money. Would never have been an issue on-prem. Even if we got DDoSed or somehow stuck saturating the uplink, our added cost would be zero. Hell, the building was even solar powered, so we wouldn't have even paid for the extra 40W of power or the air conditioning.
Depends where you order your server. If you order from the same scammers that sell you "serverless" then sure. If you order from a more legitimate operator (such as literally any hosting company out there) you get unmetered bandwidth with at worst a nasty email and a request to lower your usage after hitting hundreds of TBs transferred.
The real serverless horror isn't the occasional mistake that leads to a single huge bill, it's the monthly creep. It's so easy to spin up a resource and leave it running. It's just a few bucks, right?
I worked for a small venture-funded "cloud-first" company and our AWS bill was a sawtooth waveform. Every month the bill would creep up by a thousand bucks or so, until it hit $20k at which point the COO would notice and then it would be all hands on deck until we got the bill under $10k or so. Rinse and repeat but over a few years I'm sure we wasted more money than many of the examples on serverlesshorrors.com, just a few $k at a time instead of one lump.
this is really the AWS business model - you can call it the "planet fitness" model if you prefer. Really easy to sign up and spend money, hard to conveniently stop paying the money.
Sounds like your organization isn’t learning from these periods of high bill. What lead to the bill creeping up, and what mechanisms could be put in place to prevent them in the first place?
At only 20k a month, the work put into reducing the bill back down probably costs more in man hours than the saving, time which would presumably be better spent building profitable features that more than make up for the incremental cloud cost. Assuming of course the low hanging fruit of things like oversized instances, unconstrained cloudwatch logs and unterminated volumes have all been taken care of.
> what mechanisms could be put in place to prevent them in the first place?
Those mechanisms would lead to a large reduction in their "engineering" staff and the loss of potential future bragging rights in how modern and "cloud-native" their infrastructure is, so nobody wants to implement them.
You don't think this happens on prem? Servers running an application that is no longer used?
Sure they're probably VMs but their cost isn't 0 either
With that model, your cost doesn't change, though. When/if you find you need more resources, you can (if you haven't been doing so) audit existing applications to clear out cruft before you purchase more hardware.
The cost of going through that list often outweighs the cost of the hardware, by a lot.
And in a lot of cases it's hard to find out if a production application can be switched off. Since the cost is typically small for an unused application, I don't know if there are many people willing to risk being wrong
That's the equivalent of saying "just audit your cloud usage and remove stuff that's no longer used".
> I had cloudflare in front of my stuff. Hacker found an uncached object and hit it 100M+ times. I stopped that and then they found my origin bucket and hit that directly.
Pardon my ignorance, but isn’t that something that can happen to anyone? Uncached objects are not something as serious as leaving port 22 open with a weak password (or is it?). Also, aren’t S3 resources (like images) public so that anyone can hit them any times they want?
No. Your buckets should be private, with a security rule that they can only be accessed by your CDN provider, precisely to force the CDN to be used.
Why isn't that the default?
I'm glad I use a Hetzner VPS. I pay about EUR 5 monthly, and never have to worry about unexpected bills.
> I'm glad I use a Hetzner VPS. I pay about EUR 5 monthly, and never have to worry about unexpected bills.
The trade-off being that your site falls over with some amount of traffic. That's not a criticism, that may be what you want to happen – I'd rather my personal site on a £5 VPS fell over than charged me £££.
But that's not what many businesses will want, it would be very bad to lose traffic right at your peak. This was a driver for a migration to cloud hosting at my last company, we had a few instances of doing a marketing push and then having the site slow down because we couldn't scale up new machines quickly enough (1-12 month commitment depending on spec, 2 working day lead time). We could quantify the lost revenue and it was worth paying twice the price for cloud to have that quick scaling.
Don't they charge for every TB exceeding the included limit? (website says "For each additional TB, we charge € 1.19 in the EU and US, and € 8.81 in Singapore.")
They do, but the risk of having to pay $1.44/TB after the first 20TB is easier to swallow than say, CloudFront's ~$100/TB after 1TB.
> CloudFront's ~$100/TB after 1TB.
I had to double-check because that sounds hilariously wrong. I can't find it anywhere in the pricing. It's at most 0.08/TB.
Am I missing something?
You're missing the unit, it's $0.085 per GB, not TB, and that's only for NA/EU traffic. I rounded up a bit from that number because other regions cost more, plus you get billed a flat amount for each request as well.
They do offer progressively cheaper rates as you use more bandwidth each month, but that doesn't have much impact until you're already spending eye watering amounts of money.
Oh, yeah, egg on my face. They only put the unit of measurement at the top, and then talk about TB, so it's a bit deceptive. In retrospect, I was stupid to imagine 0.085/TB made any sense.
0.085/TB makes a lot of sense if you sell just with a 50 to 100% markup. But they rather sell at tens of thousands of markup to the real cost
Because not all uses for buckets fit that.
Buckets are used for backups, user uploads, and lots of things other than distributing files publicly.
I would say its probably not a good idea to make a bucket directly publicly accessible, but people do not do that.
A lot of the point of serverless is convenience and less admin and things like adding a layer in front of the bucket that could authenticate, rate limit etc. is not convenient and requires more admin.
Because just using a cdn without proper caching headers is just another service you're paying for without any savings.
The real question is if they considered caching and thus configured it appropriately. If you don't, you're telling everyone you want every request to go to origin
Buckets are private by default.
And it's getting harder and harder to make them public because of people misconfiguring them and then going public against AWS when they discover the bill.
This story is giving "I leave OWASP top 10 vulns in my code because hacker mindset".
It's not that hard to configure access controls, they're probably cutting corners on other areas as well. I wouldn't trust anything this person is responsible for.
It's about rate limiting, not access controls. Without implementing limits your spend can go above what your budget is. Without cloud you hit natural rate limits of the hardware you are using to host.
> It's about rate limiting, not access controls.
You just shouldn't be using S3 to serve files directly. You can run most public and many private uses through CloudFront. Which gives you additional protections and reduces things like per object fetch costs.
> you hit natural rate limits
Seen by your customers or the public as a "denial of service." Which may actually be fine for the people who truly do want to limit their spending to less than $100/month.
That might be the more general solution but in this context it is absolutely also an access control issue.
with "classic" hosting, your server goes down from being overloaded to the hoster shutting it off.
with AWS, you wake up to a 6 figures bill.
No, s3 objects should always be private and then have a cloudfront proxy in front of them at the least. You should always have people hitting a cache for things like images.
I don't understand why it should be called "serverless" when using cloud infrastructure. Fundamentally you're still creating software following a client-server model, and expecting a server to run somewhere so that your users' clients work.
To me, "serverless" is when the end user downloads the software, and thereafter does not require an Internet connection to use it. Or at the very least, if the software uses an Internet connection, it's not to send data to a specific place, under the developer's control, for the purpose of making the software system function as advertised.
That's generally called "local." Serverless is poorly named but describes how certain backends are deployed, not applications without a backend.
Serverless is easier to say than "load controlled ephemeral server management." Which is the real point. As my load increases the number of allocated resources, like servers, increases, and as it decreases so do the allocations and costs.
This is great if you are willing to completely change your client-server code to work efficiently in this environment. It is a strain over a standard design and you should only be using it when you truly need what "serverless" provides.
It's like a company with no employees. There are still people performing services, but on temporary contracts.
A "Server" is typically a single machine that has a specific OS and runs layers of various software that allows your business logic to be accessed by other computers (by your users). For a "Server" you typically have to choose an OS to run, install all the support software (server monitoring, etc), update the software, and if the server fails you have to fix it or rebuild it.
With "Serverless", your code is in a "function as a service" model where all you have to worry about is the business logic (your code). You don't have to set up the server, you don't have to install the server OS, or any basic server software that is needed to support the business logic code (http server, etc). You don't have to update the server or the underlying server software. You don't have to perform any maintenance to keep the server running smoothly. You never (typically) have to worry about your server going down. All you have to do is upload your business logic function "somewhere" and then your code runs when called. Essentially you do not have to deal with any of the hassle that comes with setting up and maintaining your own "server", all you have to do is write the code that is your business logic.
That's why it's called "Serverless" because you don't have to deal with any of the hassle that comes with running an actual "server".
I understand the underlying reasoning. I just don't like the terminology. Hence, "I don't understand... should be", rather than "... is". I think it's wrong that people end up using words like that. Like, almost on a moral level.
More generally, I don't like that a term ending with "-less" marks an increase in system complexity.
> Essentially you do not have to deal with any of the hassle that comes with setting up and maintaining your own "server", all you have to do is write the code that is your business logic.
Also known as "shared hosting". It's been done since the 90's (your folder full of PHP files is an NFS mount on multiple Apache servers), just that the techbros managed to rebrand it and make it trendy.
Think half an abstraction layer higher. You're on the right track with multiple PHP virtual runtimes on a single VM - that could conceptually be viewed as a sort of precursor to function runtimes.
The serverless function has higher-order features included as part of the package: you get an automatic runtime (just as with PHP but in this case it can be golang or dotnet), the function gets a unique endpoint URL, it can be triggered by events in other cloud services, you get execution logging (and basic alerting), multiple functions can be chained together (either with events or as a state machine), the function's compute can be automatically scaled up depending on the traffic, etc.
Think of it as: What do I have to do, in order to scale up the conpute of this URL? For hardware it's a call to DELL to order parts, for VMs or containers it's a matter of scaling up that runtime, or adding more instances - neither of those processes are simple to automate. One key characteristic of the function is that it will scale horizontally basically however much you want (not fully true, aws has a limit of 1500 instances/second iirc, but that's pretty massive), and it will do it automatically and without the request sources ever noticing.
Functions are also dirt cheap for low/burst traffic, and deployment is almost as easy as in the PHP FTP example. Personally I also think they are easier to test than traditional apps, due to their stateless nature and limited logical size (one endpoint). The main downsides are cost for sustained load, and latency for cold starts.
With that said, they are not "endgame". Just a tool - a great one for the right job.
Hetzner, 16TBx2 HDD, 1TBx2 SDD, 64GB RAM, 20TB free bandwidth, $70/month.
I used 1TB of traffic on a micro instance and it cost me $150 (iirc). Doesn't have to be this way.
"Serverless" is a an Orwellian name for a server-based system!
Bit of a nit pick but this is a pet peeve of mine.
Creating a new word for a more specific category is never Orwellian. The project in 1984 was to create a language which was less expressive. They were destroying words describing fine distinctions and replacing them with words that elided those distinctions. Creating a new word to highlight a distinction is the opposite.
There's definitely criticisms to be made of the term serverless and how it obscures the role of servers, but Orwellian is not the correct category. Maybe we could say such services run on servelets to describe how they're "lighter" in some sense but still servers.
Yea, I agree after more thought. I think the key is what you said; the term is useful for dividing within a specific domain. People outside that domain see the word and think "those guys are calling this Category-A thing "not-category-A", that makes no sense! Inside the Category A world, there is much more nuance.
I doubleplus appreciate the thought.
"Serverless" means you don't have to configure the servers, or know what servers, where, are running your code.
"Here's some code, make sure it runs once an hour, I don't care where."
"There's no cloud; it's just someone else's computer"
It's just the tech-bro version of "shared hosting", now with a 10000% markup and per-request billing.
But your so called "no-code" system runs on code. Checkmate atheists.
There becomes a point where being mad that the specific flavor of PaaS termed serverless achtually has severs is just finding a thing to be mad at.
In the "no-code" system, the end user does not write code. In the "serverless" system, the end user does connect to a server.
It doesn't just "have" servers; they aren't a hidden implementation detail. Connecting to a website is an instrumental part of using the software.
"Serverless" refers to the demarcation point in the shared responsibility model. It means there aren't any servers about as much as "cloud hosting" means the data centers are flying.
This is where is becomes confusing to me: Here are a few types of software/infrastructure. Embedded devices. Operating systems. PC software. Mobile device software. Web frontends. GPU kernels. These all truly don't use servers. When I hear "serverless", I would think it is something like that. Yet, they're talking about web servers. So it feels like a deception, or something poorly named.
If you are in the niche of IT, servers, HTTP operations etc, I can see why the name would make sense, because in that domain, you are always working with servers, so the name describes an abstraction where their technical details are hidden.
and your wireless modem has wires
Thats true!
Putting any sort of pay per use product onto the open internet has always struck me as insane. Especially with scaling enabled.
At least stick a rate limited product in front of it to control the bleed. (And check whether the rate limit product is in itself pay per use...GCP looking at you)
I tried AWS serverless, figured out that it is impossible to test anything locally while you are forced to use AWS IAM role for serverless run which has access to everything.
That's just a problem waiting to happen while you are always running tests on production...
I worked on a serverless project for several years and the lack of ability to run much of anything locally was a huge cost. Debugging cycle times were absolutely terrible. There are some tools that claim to address this but as of a few years ago they were all useless for a real project.
I use my AWS security key to run local tests. It works perfectly fine. You just need a ~/.aws/credentials file appropriately configured.
I have a makefile system which controls lambda deployments. One step of the deployment is to gather the security requirements and to build a custom IAM role for each individual lambda. Then I can just write my security requirements in a JSON file and they're automatically set and managed for me.
The real joy of AWS is that everything works through the same API system. So it's easy to programmatically create things like IAM roles like this.
1. Put your stuff in a stack. Deploy it to your isolated developer account. Basically free staging environment.
2. Use the officially supported docker runtime for local testing.
3. Treat it like any other code and make unit tests
4. Use one of the tools like localstack to emulate your staging env on your machine.
There are so many options that I don’t know how you could walk away with your opinion.
Or you could just write conventional software. But I get it, you don't get resume points nor invites to cloud-provider conferences for that.
> Basically free staging environment. [emphasis mine]
Not really. Sure, the cost would usually be peanuts... until you have an infinite loop that recursively calls more lambdas. Then you have a huge bill (but hey that pays for your invites to their conferences, so maybe it's a blessing in disguise?). And yes, you will pretty much always get it refunded, but it's still a hassle and something that is absolutely not necessary.
Snark aside, having an opaque dev environment always constrained by bandwidth and latency that can’t be trivially backed up/duplicated is a terrible idea and why I always recommend against “serverless”, even besides the cost concerns.
Serverless is OK for small, fully self contained pieces or code that are fire and forget. But for anything complex that’s likely to require maintenance, no thanks.
Eh, I worked on a large serverless project that worked hard to follow best practices but it was still very clunky to run and test code locally. The local serverless tools simply didn't work for our project and they had so many limitations I'm skeptical they work for most non-prottypes.
Deploying a stack to your own developer environment works fine and is well worth doing, but the turnaround time is still painful compared to running a normal web framework project locally. Deploying a stack takes much much longer than restarting a local server.
Serverless isn't all bad, it has some nice advantages for scaling a project, but running and debugging a project locally is a definite weak spot.
This is nowhere near being true.
This is some good marketing for Coolify, which the author makes as an open source platform as a service. I prefer Dokploy these days though, since it seems to be less buggy, as Coolify seems to have such bugs due to being on PHP.
https://coolify.io/
https://dokploy.com/
CapRover is another good alternative, and also much more lightweight than Coolify, easily runs on even a 512MB server: https://caprover.com/
It would help to round to the cent. With 3 digits to the right of the dot it's ambiguous whether it's a decimal point or a thousands separator, and the font and underline makes the comma vs dot distinction a bit unclear.
A number of the titles appear to have 69 or 420 cents added to the amount that appears in the story.
These guys charge $550 for a measly terabyte of bandwidth?
If you get a dedi on a 10Gb/s guaranteed port and it works out to more than $3 / TB, you're probably getting scammed. How does "serverless" justify 150x that? Are people hosting some silly projects really dense enough to fall for that kind of pricing?
Just get a $10 VPS somewhere or throw stuff on GH pages. Your video game wiki/technical documentation/blog will be fine on there and - with some competent setup - still be ready for 10k concurrent users you'll never have.
After a quick check on Vercel stories, it seems all payments were discarded or mistakes in the first place.
Does it really happen to really have to pay such a bill? Do you need to tweet about it to be reimbursed?
> Do you need to tweet about it to be reimbursed?
This is what scares me, is social media the only way to get things sorted out nowadays? What if I don't have a large following nor an account in the first place, do I have to stomach the bill?
This is exactly what happened to me during Covid... I had a flight that got cancelled at the beginning of the pandemic since the country closed the orders (essentially). A year after, still on lock downs and et al, I wanted to enquire about a refund, for months I got not answer, until I caught wind that people using Twitter were actually getting results. Now, I don’t use social media at all, so I had to create a Twitter account, twit about my case et voila! 30 mins after I got a response and they send me a PM with a case number... Not even going to mention the airline, but it is infuriating...
I can't imagine them sending it to collections. What kind of recourse would a company like Vercel have if you don't pay it?
Someone at a community group I'm in messed up playing with Azure through their free for non-profits offering^. We were out about 1.2k€. Not huge but huge for us.
Encouraged by comments on HN over the years I had them ask support to kindly to wave it. After repeating the request a few times they eventually reduced their bill to <100€ but refused to wave it entirely.
So even without shaming on social media, But it probably depends. It's worth at least asking.
^The deal changed about six months ago.
It's waive, not wave
Relying on the mercy of a support agent that may be having a bad day is a poor strategy
No, at least in enterprise consulting for these kind of hosting, usually there is a contact person on the support team that one can reach directly.
However these projects are measured in ways that make Oracle licenses rounding errors.
Which naturally creates market segmentation on who gets tier 1 treatment and everyone else.
Once you're in a contract + TAM territory, pricing works very differently. Also, temporary experiments and usage overruns become an interesting experience where the company may just forget to bill you a few thousands $ just because nobody looked at the setup recently. Very different situation to a retail user getting unexpected extra usage.
I mean if developer got charged with 100k, more often than not the bank would decline that first maybe if you didn't have that high credit limit
but what happen if this happen to corporate account and somewhere resource get leaked???
multi billions dollar company probably just shrug it off as opex and call it a day
For everyone complaining about no free tier that blocks you from being charged
https://aws.amazon.com/free/
Experience AWS for up to 6 months without cost or commitment
Receive up to $200 USD in credits
Includes free usage of select services
No charges incurred unless you switch to the Paid Plan
Workloads scale beyond credit thresholds
Access to all AWS services and features
I remember at the beginning of the serverless hype how they said it was great because it automatically scaled as big as you need it. Given how sudden and massive these "scaling spikes" can be, I would much rather deal with a death-hugged VPS than a $100k bill.
Plus the VPS is just so much faster in most cases.
I once found an official Microsoft example repo to deploy an LLM gateway on Azure with ALB. Glad I did the tedious work of estimating the costs before I hit the deploy button (had to go though many Biceps manifests for that). The setup would have cost me about 10k/month.
This is why when I contract for an early stage startup, I pose the question:
"What if your app went viral and you woke to a $20k cloud bill? $50k? $80k?"
If the answer is anything less than "Hell yeah, we'll throw it on a credit card and hit up investors with a growth chart" then I suggest a basic vps setup with a fixed cost that simply stops responding instead.
There is such a thing as getting killed by success and while it's possible to negotiate with AWS or Google to reduce a surprise bill, there's no guarantee and it's a lot to throw on a startup's already overwhelming plate.
The cloud made scaling easier in ways, but a simple vps is so wildly overpowered compared to 15 years ago, a lot of startups can go far with a handful of digitalocean droplets.
Yeah I also left my website hosted on Google Cloud because costs popped from everywhere, and there is basically no built-in functionality to limit them. So I didn't really slept relaxed (I actually slept great, but I hope you get the point) knowing that a bug could cost me... who knows how much. Actually, as the website of OP says, for spending control you have budget notifications and with that you can disable the billing for all the project altogether through some API call or something, I don't remember exactly, that is all there is. But still it looks like this functionality is just not there.
You can write Google cloud functions to disable your credit card when certain thresholds are met pretty easily, but it's unethical that this isn't just a toggle somewhere in settings.
Does that actually stop the spend immediately? If not, you're still on the hook for the bill. I suppose you can walk away and let them try to come after you, but that wouldn't work for a company.
Don’t most of these services have config options to protect against doing this? I haven’t used most of these services but it running up a bill during traffic spikes but not going down seems like it’s working as intended?
Nope, basically none of these services have a way to set a hard budget. They let you configure budget warnings, but it’s generally up to you to login and actually shut down everything to prevent from being billed for overages (or you have to build your own automation - but the billing alerts may not be reliable)
I know AWS in particular does not because they do not increment the bill for every request. I don't know exactly how they calculate billing, but based on what I do know about it, I imagine it as a MapReduce job that runs on Lambda logs every so often to calculate what to bill each user for the preceding time interval.
That billing strategy makes it impossible to prevent cost overruns because by the time the system knows your account exceeded the budget you set, the system has already given out $20k worth of gigabyte-seconds of RAM to serve requests.
I think most other serverless providers work the same way. In practice, you would prevent such high traffic spikes with rate limiting in your AWS API Gateway or equivalent to limit the amount of cost you could accumulate in the time it takes you to receive a notification and decide on a course of action.
Related. Others?
Single day Firebase bill for $100k - https://news.ycombinator.com/item?id=43884892 - May 2025 (14 comments)
Serverless Horrors - https://news.ycombinator.com/item?id=39532754 - Feb 2024 (169 comments)
I was also too careless with AWS when I was a beginner with no deployment experience and I am very lucky that I did not push a wrong button.
All these stories of bill forgiveness reminds me of survivorship bias. Does this happens to everyone that reaches out to support or just the ones that get enough traction on social media? I am pretty sure there is no official policy from AWS, GCP or Azure.
This site is a bit dated. I remember in response to this Vercel added a way to pause your projects when hitting a spend limit. I enabled it for my account.
Still, it made me question why I'm not using a VPS.
Vercel used to be called Zeit. They had a server product called Now that gave you 10 1CPU/1GPU instances for $10/month (or $20 I forgot). It was the best deal.
When Vercel switched everything to serverless, it all became pretty terrible. You need 3rd party services for simple things like DB connection pooling, websockets, cron jobs, simple queue, etc because those things aren’t compatible with serverless. Not to mention cold starts. Just a few weeks ago, I tried to build an API on Next.js+Vercel and get random timeouts due to cold start issues.
Vercel made it easier to build and deploy static websites. But really, why are you using Next.js for static websites? Wordpress works fine. Anything works fine. Serverless makes it drastically harder to build a full app with a back end.
Serverless is the most common deployment on MACH projects.
Because when everything is a bunch of SaaS Lego bricks, serverless is all one needs for integration logic, and some backend like logic.
Add to it that many SaaS vendors in CMS and ecommerce space have special partner deals with Vercel and Nelify.
https://macharchitecture.com/
Couple years ago I was charged in USD 4K on Google Cloud after trying recursive cloud functions.
I told them that was a mistake and they forgot the debit, they just asked to no do again.
Troy Hunt and HIBP is a good example in the other direction but Hunt has also been burned plenty of times by serverless.
https://www.troyhunt.com/closer-to-the-edge-hyperscaling-hav...
Andras (author of Serverless Horrors) knows what he’s talking about.
The amount of brainwashing that big cloud providers have done, is insane.
We are building bare metal for our workloads… I don’t care if cloud is supposed to be cheaper because it never is. You can get a decent small business firewall to handle 10gbit fiber for $600 from unifi these days. Just another reason I’m glad I moved out of the Bay Area and nyc to a midwestern town for my company. I have a basement and can do rad things in my house to grow my business.
bUt wuT aBowT deV OpS?!
Seem likes there are mistakes that were made on behalf of the users. The attackers found these mistakes and took advantage of them. i don't think "severless" is the problem.
Serverless is the problem in that most serverless services don't let you hard-cap spend.
This issue is serverless-specific. If I pay $20/month on VPN the most frightening thing that can happen is the client calling you about your website being down, not a $100k bill.
In my experience: Fuck serverless.
If we're building anything bigger than a random script that does a small unit of work, never go for serverless. A company I recently worked for went with Serverless claiming that it would be less maintenance and overhead.
It absolutely was the worst thing I've ever seen at work. Our application state belonged at different places, we had to deal with many workarounds for simple things like error monitoring, logging, caching etc. Since there was no specific instance running our production code there was no visibility into our actual app configuration in production as well. Small and trivial things that you do in a minute in a platform like Ruby on Rails or Django would take hours if not days to achieve within this so-called blistering serverless setup.
On top of it, we had to go with DB providers like NeonDb and suffer from a massive latency. Add cold starts on top of this and the entire thing was a massive shitshow. Our idiot of a PM kept insisting that we keep serverless despite having all these problems. It was so painful and stupid overall.
Why was your PM making tech decisions?
Looks like you need the "quiet part" said out loud:
Chances are, the company was fishing for (or at least wouldn't mind) VC investment, which requires things being built a certain (complex and expensive) way like the top "startups" that recently got lots of VC funding.
Chances are, the company wanted an invite to a cloud provider's conference so they could brag about their (self-inflicted) problems and attract visibility (potentially translates to investment - see previous point).
Chances are, a lot of their engineering staff wanted certain resume points to potentially be able to work at such startups in the future.
Chances are, the company wanted some stories about how they're modern and "cloud-native" and how they're solving complex (self-inflicted) problems so they can post it on their engineering blog to attract talent (see previous point).
And so on.
I keep telling customers: "The cloud will scale to the size of your wallet."
They don't understand what I mean by that. That's okay, they'll learn!
Anyway, this kind of thing comes up regularly on Hacker News, so let's just short-circuit some of the conversations:
"You can set a budget!" -- that's just a warning.
"You should watch the billing data more closely!" -- it is delayed up to 48 hours or even longer on most cloud services. It is especially slow on the ones that tend to be hit the hardest during a DDoS, like CDN services.
"You can set up a lambda/function/trigger to stop your services" -- sure, for each individual service, separately, because the "stop" APIs are different, if they exist at all. Did I mention the 48 hour delay?
"You can get a refund!" -- sometimes, with no hard and fast rules about when this applies except for out of the goodness of some anonymous support person's heart.
"Lots of business services can have unlimited bills" -- not like this where buying what you thought was "an icecream cone" can turn into a firehouse of gelato costing $1,000 per minute because your kid cried and said he wanted more.
"It would be impossible for <cloud company> to put guardrails like that on their services!" -- they do exactly that, but only when it's their money at risk. When they could have unlimited expenses with no upside, then suddenly, magically, they find a way. E.g.: See the Azure Visual Studio Subscriber accounts, which have actual hard limits.
"Why would you want your cloud provider to stop your business? What if you suddenly go viral! That's the last thing you'd want!" -- who said anything about a business? What if it's just training? What if your website is just marketing with a no "profit per view" in any direct sense?
Title should really be "Cloud Hosting Horrors", not serverless per se.
Hahaha, this is awesome!
I guess I'm missing something, why is this 'serverless' horrors? If anything it seems to specifically be serverful horrors.
"Serverless" is just marketing-speak for "somebody else's server".
How are all these cases of exorbitant surprise bills not prosecuted as fraud?
look mom & dad, I am famous!
An alternative title might be "Failure to read the documentation horrors."
If you didn't sit down with the documentation, the pricing guide, and a calculator before you decided to build something then you share a significant portion of the fault.
This is a weird take on an incredibly useful paradigm (serverless). One the one side, there are obviously precautions that all of these users could have taken to avoid these charges on the other hand its totally common to spin up a thing and forget about it or not do your due diligence. I totally feel for the people who have been hit with these chargers.
At the end of the day though the whole think feels like a carpenter shooting themselves in the foot with a nail gun then insisting that hammers are the only way to do things.
At one time, I considered using Firebase as a backend, but then, I kept reading stories like these, and decided to roll my own. I'm fortunate to be able to do that.
It's kind of amazing, though. I keep getting pressure from the non-techs in my organization to "Migrate to the Cloud." When I ask "Why?" -crickets.
Industry jargon has a lot of power. Seems to suck the juice right out of people's brains (and the money right out of their wallets).
Are there any protections these days at the cloud provider level?
Like setting a maximum budget for a certain service (EC2, Aurora?) because downtime is preferable to this?
That's why I like VPS setups. You hit the monthly maximum, and it just stops working.
I host demos, not running a business, so it's less of an issue to get interrupted. Better an interruption than a $50,000 bill for forgetting to shut off a test database from last Wednesday.
Unless a startup has five+ nines service contracts with their customers already, a little bit of downtime once in a while is not the end of the world the cloud services want us to believe.
That's not comparable. With a VPS there is no monthly maximum, just a max load on a second by second basis. You can be hit with traffic of which 90% bounces because your server is down, get nowhere near your intended monthly maximum, and then the rest of the month is quiet.
You seem to be describing this as a bad thing instead of the objectively good thing that it is.
The ideal is obviously smoothed limits, such that you can absorb a big traffic spike if it still fits within your budget. Nobody seems to offer that.
How would you predict the smooth curve ahead of time?
Not _really_. AWS has a budget tool, but it doesn’t natively support shutting down services. Of course, you can ingest the alerts it sends any way you want, including feeding them into pipelines that disable services. There’s plenty of blueprints you can copy for this. More seriously - and this is a legitimate technical limitation - of course AWS doesn’t check each S3 request or Lambda invocation against your budget, instead, it consolidates periodically via background reporting processes. That means there’s some lag, and you are responsible for any costs incurred that go over budget between such reporting runs.
> of course AWS doesn’t check each S3 request or Lambda invocation against your budget
If it can bill them per-invocation, why can't it also check against a budget? I don't expect it to be synchronous, but a lag of minutes to respond is still better than nothing. Can you even opt-in to shutting down services from the budget tool, or is that still something you have to script by hand from Cloudwatch alarms?
You script it by hand.
I think figuring out how to do this faster is less trivial than it might sound. I agree that synchronous checks aren’t reasonable. But let’s take Lambdas. They can run for 15 minutes, and if you consolidate within five minutes after a resource has been billed, that gives you a twenty minute lag.
I’m not trying to make apologies for Amazon, mind you. Just saying that this isn’t exactly easy at scale, either. Sure, they bill by invocation, but that’s far from synchronous, too. In fact, getting alerts might very well be happening at the frequency of billing reconciliation, which might be an entirely reasonable thing to do. You could then argue that that process should happen more frequently, at Amazon’s cost.
> but it doesn’t natively support shutting down services [...] of course AWS doesn’t check each S3 request or Lambda invocation against your budget, instead, it consolidates periodically via background reporting processes
So, in other words, the vendor has provided substandard tooling with the explicit intent of forcing you to spend more money.
Just set alerts that are not really timely and homeroll your own kill scripts its easy. It doesn't really work but its not really any harder than just fucking self hosting.
Maintaining your own containers or VMs is hard considering how much risk appetite you have for the issues at infra level. So, yeah, when you complain about the costs of serverless, you are just paying for your low risk appetite low cost of your IT management.
last employer asked for an estimate to migrate to cloud.
it would be 2x more expensive and halve developer speed. also we would lose some internal metric systems honed over 20yr.
ceo told to go ahead anyway (turn out company was being sold to Apollo)
first thing we did was a way to bootstrap accounts into aws so we could have spend limits from day one.
can't imagine how companies miss that step.
I read a lot of the posts at the little blog here and, uh, every single one sounds like a complete amateur making a cloud configuration mistake. I haven't found one that is the provider's fault or the fault of "serverless"
I would be embarrassed to put my name on these posts admitting I can't handle my configs while blaming everyone but myself.
Serverless isn't a horror, serverlesshorrors poster. You are the horror. You suck at architecting efficient & secure systems using this technology, you suck at handling cloud spend, and you suck at taking responsibility when your "bug" causes a 10,000x discrepancy between your expected cost and your actual bill.
Just because you don't understand it doesn't mean it sucks
You're not wrong about cloud configuration mistakes, but a tool that lets you increase costs 10000x (without even letting you set a safety) is a hell of a chainsaw.
I'm more worried about the overconfident SRE that doesn't stay up at night worrying about these.
Consider this analogy: Instead of using a root command shell, it is wise to use an account with appropriately restricted capabilities, to limit the downsides of mistakes. Cloud services support the notion of access control, but not the notion of network resource usage limits. It's an architectural flaw.
Or do you always log in as root, like a real man, relying purely on your experience and competence to avoid fat-finger mistakes?
That being said, the cloud providers could do a better job explaining to new/naive users that great power comes with great responsibility and there is no hand holding. Someone might be more hesitant to willy nilly spin up something if a wizard estimates that the maximum cost could be $X per month.
> every single one sounds like a complete amateur making a cloud configuration mistake
Golly if only the configuration wasn't made this way on purpose exactly to cause this exact problem.
truth nuke
I can't imagine hosting a small-time project on rented infrastructure without some kind of mechanism to terminate it once costs exceed a reasonable threshold.
I've had this twice. Once with oracle, once with azure. They both charged me $2000-$5000 for simply opening and closing a database instance (used only for a single day to test a friend's open source project)
To be fair, support was excellent both times and they waived the bills after I explained the situation.
How did you run up a $5000 bill for just testing a project? What kind of project was it that could put so much load on the DB?
There should also be a general category for "cloud horrors" for things that cost $50k/month to host that would be $1500/month on a bare metal provider like Datapacket or Hetzner.
I'm old enough to remember when cloud was pitched as a big cost saving move. I knew it was bullshit then. Told you so.
even $1500/mo on hetzner is a seriously large app. You could get 300 cpus and 1.5TB of RAM for that price.
there should be some kind of insurance for bugs that introduce unusually expensive usage
I believe any such policy would need its premiums based on the services used (and likely the qualifications of the staff) since, unlike rebuilding a house, the financial risk is almost unlimited with out of control cloud spend
It reminds me of the Citi(?) employee who typed the wrong decimal place in a trade: computers make everything so easy!
I have a feeling I will be downvoted for this, but...
Have the people posting these horror stories never heard of billing alerts?
Many of the stories on the site are from people who have billing alerts.
If you have bot spam, how do you actually think their billing alerts work? The alert is updated every 100ms and shuts off your server immediately? That isn't how billing alerts can or should work.
Yes, actually, if continuing to run the service is going to exceed my available budget then I do want the service turned off! If I can't pay for it, and I know I can't pay for it, what other possible choice do I have?
Do any of you people have budgets, or do you all rely on the unending flow of VC money?
That isn't how this can work. If you are running a service and then find out that AWS is spamming you every 100ms to find out what your CPU is doing (or calling out every 100ms) then people would be quite unhappy.
The majority of these massive bills are due to traffic, there is pretty much no way that AWS could stop your server in time...if they had the choice, which they don't.
I think my original point was unclear: I am pointing out that if you just think about how this stuff can possibly work, billing alerts can not work in the way you expect. The alert is updated async, the horse has bolted and you are trying to shut the gate.
I don't use AWS for personal stuff because I know their billing alerts won't stop me spending a lot. Don't use them if that is a concern.
I do use AWS at work, we are a relatively big customer and it is still very expensive for what it is. The actual hardware is wildly overpriced, their services aren't particularly scalable (for us), and you are basically paying all that overage for network...which isn't completely faultless either. Imo, using them in a personal capacity is a poor idea.
There is also WAF.
looking forward to the "LLM token horrors" version
I thought there was an OWASP for "denial of wallet" vulnerabilities but this link was the closest one I found https://www.prompt.security/vulnerabilities/denial-of-wallet... (although the link makes it sound like they're offering denials)