I think everything has a growing problem with LLM/AI generated content. Emails, blog posts, news articles, research papers, grant applications, business proposals, music, art, pretty much everything you can think of.
There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
In some ways I'm starting to enjoy this. Do you remember the 419 scams? 'Hi, I'm a Nigerian prince named Michael Jordan. Give me $50 so I can buy some chemicals to clean a bunch of money I secretly stowed away and I'll send you $5000.' People actually used to fall for that. Of course some people probably still would (and a lot more certainly gets blocked by spam blockers) but I think overall society grew less substantially less gullible over time.
But in general I think most people still remain excessively gullible and naive. Social media image crafting is one of the best examples of this. People create completely fake and idealized lives that naive individuals think are real. Now with AI enabling one to create compelling 'proof' of whatever lie you want, I think more people are becoming more suspicious of things that were, in fact, fake all along.
---
Going back to ancient times many don't know that Socrates literally wrote nothing down. Basically everything we know of him is thanks to other people, his student Plato in particular, instead writing down what he said. The reason for this was not a lack of literacy - rather he felt that writing was harmful because words cannot defend themselves, and can be spun into misrepresentations or falsehoods. Basically - the argumentative fallacies that indeed make up most 'internet debates', for instance. Yet now few people are not aware of this issue, and quotes themselves are rarely taken at face value, unless they confirm ones biases. People became less naive as writing became ubiquitous, and I think this is probably a recurring theme in technologies that transform our abilities to transfer information in some format or another.
> but I think overall society grew less substantially less gullible over time.
Absolutely not. All that happened is most people became aware that "Nigerians offering you money are scammers." But they still fall for other get rich quick schemes so long as it diverges a little bit from that known pattern, and they'll confidently walk into the scam despite being warned and saying, "It's not a scam, dumbass. It's not like that Nigerian price stuff." If anything, people seem to be becoming more confident that they're in on some secret knowledge and everyone else is being scammed.
I call it the antibody effect. My favorite example is clickbait headlines like, "Five things you MUST do if you're doing this thing. You'd never guess #3!" It used to be everywhere and now it's nowhere.
AI is starting to show this effect - people stay away from em-dashes. There's that yellowish tinge and that composition which people avoid on art. Some of this is bad, but we can probably live without it.
I'm inclined to disagree just because photoshop has had a measurable effect on the population being skeptical of photos which at one point were practically treated as the gold standard of evidence. It's still easy to find people who have fallen for photoshopped images, but it's also easy to find people expressing doubts and insisting they can "tell by the pixels". Sometimes even legitimate photos get accused of being photoshopped which seems healthy.
The other side of this is that ai tools are being treat like magic to the point that people are denying well documented events happened at all, such as the shooting of Charlie Kirk - conspiracies abound!
Also bizarrely a subsection of the population seems to be really into blatantly ai generated images, just hop onto Facebook and see for yourself. I wonder if it has something to do with whatever monkey brain things makes people download apps with a thumbnail of a guy shouting or watch videos that have a thumbnail of a face with its mouth open, since ai generated photos seem very centralized around a single face making a strange expression.
People of a profound initial bias will, in general, believe anything that supports that bias, and reject anything that challenges it, in both cases without any real consideration or thought whatsoever. So I don't think examples of individuals being "misled" by e.g. AI generated images or video, to extremes, is entirely realistic. Rather they were already at those extremes and will just eat up anything that appeals to those extremes.
To take a less politically charged example, imagine there is fake content 'proving' that the Moon landing is faked. Is that going to meaningfully sway people who don't have a major opinion one way or the other? Probably not, certainly not in meaningful numbers. And in general I think the truth does come out on most things. And when people find they have been misled, particularly if it was somebody they thought they could trust, it tends to result in a major rubber-banding in the opposite direction.
If something "floods the zone with shit," it needs S amount of shit to cause a flood. But too much will eventually make the scam ineffectual. Widespread public distrust for the scam is (S+X)/time where X is the extra amount of shit beyond the minimum needed. Time is a global variable constrained by the rate at which people get burned or otherwise catch on to all other scams of the same variety. If we imagine that time-to-distrust shrinks with each new iteration of shit, then X the amount of excessive shit needed to trigger distrust should decrease over time.
The longer term problem is the externality where nothing is trusted, and the whole zone is destroyed. When that zone was "what someone wrote down that Socrates might have said," or "Protocols of the Elders of Zion," or "emails from unknown senders," that was one thing. A new baseline could be set for 'S'. When it's all writing, all art, all music and all commentary on those things, it seems catastrophic. The whole cave is flooded with shit.
You seem to assume people who are the victims of scams are people who are more naive. But that’s not how it works. Scams try to catch people at their weakest. It’s not if but when.
> Video or photo evidence of a crime become useless the better AI gets
This is probably a good thing because photoshop and CGI have existed for a very long time and people shouldn't have the ability to frame an innocent for a crime or even get away with one just because they pirate some software and put in a few hours watching tutorials on youtube.
The sooner jurors understand that unverified video/photo evidence is worthless the better.
The old quote of, 'You can fool some of the people all of the time, all of the people some of the time'.
There is the hope that in dumping so much slop so rapidly that it will break the bottom of the bucket. But there is the alternative that the bottom of the bucket will never break, it will just get bigger.
I personally know people who look down upon people who use LLMs to write code. There is a lot of hate in some of senior developers that I talk to. I don't know if this growing tendency to be suspicious of AI usage is good or bad.
For example, towards the final semester of my bachelors degree, my algorithms class started reporting students for academic misconduct because they the TAs started assuming that all the optimal solutions to assignment problems were written by LLMs. In fact, several classmates started purposely writing sub-optmial solutions so that the TAs at least grade them without any prejudice.
I worry that because LLM slop also tends to be so well presented, it might compel software developers to start writing shabby code and documentation on purpose to make it appear human.
At the moment it is the other way around. LLMs rarely write good code if not instructed by someone that knows what they are doing.
And even then the code is rarely good.
> People actually used to fall for that. Of course some people probably still would (and a lot more certainly gets blocked by spam blockers) but I think overall society grew less substantially less gullible over time.
It is still a serious problem just want that to be abundantly clear. Several thousand people (in the US alone) fall for it every year. I used to browse 419 eater regularly and up until a few years ago (when I last really followed this issue) these scams were raking in billions a year. Could be more or less now but doubt it’s shifted a ton.
> There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
Even if you think the harms of AI/machine generated content outweigh the good, this is not a winning argument.
People don’t just consume arbitrary content for the sake of consuming any existing content. That’s rarely the point of it. People look for all kinds of things that don’t exist yet — quite a lot of it referring to things that are only now known or relevant in the given moment or to the given niche audience requesting it. Much of it could likely never exist if it weren’t possible to produce it on demand and which would not be valuable if you had to wait for a human to make it.
The generated content is wasting the time of maintainers. How would you solve that?
For your winning argument, what would you use to prevent slop filling up your feed when there is more AI generated content, any sort of protocol that you have?
Just happened to me yesterday. I realized I'd really like to watch a long-form video on Huygens Clock. Something between a short (god damnit) and Quicksilver. Thought there must be something detailed. Nope, at least Youtube let me down. Or the search algo, who knows. Since they removed the "20+ minutes" button it has become basically useless to search on YT.
They're pushing Shorts so fucking hard it makes me sick.
Even worse is that we're banning TikTok because it's bad for the kids (short form algorithmic content), Snapchat (similar thing + strangers creeping) and Instagram Stories (algorithm again).
BUT there is NO WAY for a parent to allow their kid to use Youtube AND block Shorts. (yes there are browser plugins etc, but how do you enforce them on a child?)
And from what I've seen the AI slop on Shorts is so fucking bad that it seems we just collectively forgot about Elsagate...
> AI slop is digital content made with generative artificial intelligence, specifically when perceived to show a lack of effort, quality or deeper meaning, and an overwhelming volume of production.
Creating a email/messaging protocol to solve spam, which can also be used to solve LLM generated spam pull-requests because pull-requests are also a form of messaging and have the same characteristics.
Check this profile for the email if you wanna ask for more info or get updates.
Edit: To the people downvoting this, what have you done to help solve the spam problem that will be made worse by LLMs?
I personally am more afraid of not the exact opposite, but more that AI will cause us to have too little content more than we will have too much. I just notice with myself how much less I’m using Wikipedia or forums, and get my answers from OpenAI or Antrophic. From what I remember it’s a over 20% drop in Wikipedia traffic from monthly 2022 vs 2025.
Primary sources will always exist as the content needs to come from somewhere. Personally, I shy away from ChatGPT answers for the same reason I avoid Wikipedia: you've got to find authoritative information to get reliable answers. I wonder if people who use LLMs don't care about accuracy, or don't know about sourcing?
Large Language Models cannot think but they can self correct. You can think but clearly have some serious problems self-correcting your wrong assumptions. Who is smarter then?
The goal is "Taiwan" -> "Taiwan, Province of China" but via the premise of updating to UN ISO standards, which of course does not allow Taiwan.
The comment after was interesting with how reasonable it sounds:
"This is the technical specification of the ISO 3166-1 international standard, just like we follow other ISO standards. As an open-source project, it follows international technical standards to ensure data interoperability and professionalism."
The politics of the intent of the PR was masked. Luckily, it was still a bit hamfisted. The PR incorrectly changed many things and the user stated their political intention in the original PR (the above is from a later comment).
One of these days I need to make a bot that scans FOSS repos for this kind of little pink nonsense behavior.
The insecurity of wanting to call a place "country name, province of different country name" should alone be mocked. Imagine, "Ukraine, province of Russia," or "India, colony of The United Kingdom." Absurd on its face.
I think AI generated issues and pull requests may lead to the death of GitHub as the place for open-source projects. As Microsoft is hell-bent on shoving easy AI buttons in every interface including GitHub, and as more people try to pad their GitHub profile for their resume, the burden for open-source projects to filter the low-quality spam may grow immense and it may become worth to just switch to Codeberg or other platforms. Some LLM use will inevitably leak into these other platforms but the relatively high barrier (no built-in AI buttons), effective moderation banning obvious violators, and reduced usefulness for resume padding would make it negligible I think.
Since the cat is out of the bag (i.e. there's no stopping of LLM-generated code / PR / issues now that vibe-coding is fairly universally accessible), the only thing in our (especially those who are custodians of software systems and products) control is to invest a lot more in testing and evals. Just plain opposing LLM-generated content as a policy is a losing battle at this point..
I don't think this will work. The same arguments could have been said to mitigate junior devs' work or short timeline/high stakes project technical debt and rarely ever anyone listened
I disagree. I think there’s a near future where this is true, but we’re still at the point where you can cut down on a lot of spam/slop by rejecting the class of LLM-generated content as a policy.
Sidenote, but I love that in a GitHub issue discussing banning the use of LLMs, the GitHub interface asks if there's anything I'd like to fix with CoPilot.
Prompt injection must be a really lucrative activity. I wonder how many hidden prompts will be detected on GitHub in the coming years and how many devs dare point LLMs at GitHub issues
Policy is a decent filter for consensual candidates, but mostly ineffective against incentives and automation.
The issue is exported costs: whether submitters make reviewers work too hard for the contribution value.
The policy/practice should focus first on making reviewer/developer's work easier and better, and second on refining submitter skills to become developers. The same is true for Senior/Junior relations internally.
So the AI company that solves how to pare AI slop down to clean PR's would meet a real and growing need, and probably also help with senior/junior relations as well.
Then you could meet automation with automation, and the incentives are aligned around improving quality of code and of work experience. People would feel they're using AI instead of competing with it.
Maybe the solution is to counteract LLM pull requests with LLM first pass reviews. Obviously not a perfect solution, but LLMs should be good enough at detecting spam. This only needs to apply to first time contributors who haven't yet proven themselves.
In stuff I maintain I hardly get anything at all, whether using LLMs or not, so it does not affect me much. Furthermore, I exclusively use the API, so the UI will not try to make you use Copilot and those things either (that is not the reason I use the API, although it is a side effect).
I do not use Copilot, Claude, etc, although I partially agree with one of the comments there, that using LLM for minor auto-completion is probably OK, as long as you can actually see that the completion is not incorrect (although that should apply to other uses of auto-completion too, even if LLM is not used; but it is even more important to check more carefully if LLM is used). I think it would be better to not accept any LLM generated stuff otherwise (although the author might use LLM to assist before submitting it if desired (I don't, but it might help some programmers), e.g. in case the LLM finds problems with it, that they will then have to review themself to check if it is correct, before correcting and submitting it; i.e. don't trust the results of the LLM).
It links to https://github.com/lxc/incus/commit/54c3f05ee438b962e8ac4592... (add .patch on the end of the URL if it is not displayed), and I think the policy described there is good. (However, for my own projects, nobody else can directly modify it anyways; they will have to make their own copy and modify that instead, and then I will review it by myself and can include the changes (possibly with differences from how they did it) or not.)
It will be more interesting to see if and when AI becomes better than humans at coding, which seems to be coming close to reality, some projects may start accepting only AI contributions :D.
Unfortunately, LLMs empower "contributors" who can't be bothered to put in any effort and who don't care about the negative impact of their actions on the maintainers.
The open-source community, generally speaking, is a high-trust society and I'm afraid that LLM abuse may turn it into a low-trust society. The end result will be worse than the status quo for everyone involved.
Everything is collapsing toward a low-trust default. At the end of this trajectory, we rediscover that the analog world becomes valuable precisely because it can't be infinitely replicated.
Authenticity becomes the foundational currency.
But everyone must master AI tools to stay relevant. The brilliant engineer who refuses AI-generated PR by principle will get replaced. Every 18-24 months, as capabilities double, required skills shift. Specialization diminishes. Learning velocity becomes the only durable advantage. These people cannot learn new tricks.
Those who cannot question their assumptions cannot self-correct and will be replaced. The future belongs to the humble, the fluid, and the resilient. 60% of HN users is going toward a very tough time, and I am being very charitable with this assumption.
Curious to know if others are seeing a similar uptick in AI slop in issues or PRs for projects they are maintaining. If yes, how are you dealing with this?
Some of the software that I maintain is critical to container ecosystem and I'm an extremely paranoid developer who starts investigating any github issue within a few minutes of it opening. Now, some of these AI slop github issues have a way to "gaslight" me into thinking that some code paths are problematic when they actually are not. And lately AI slop in issues and PRs have been taking up a lot of my time.
I haven’t seen anything obvious, even including the other repos where I look through issues a lot.
Maybe it’s only the really popular and buzzword-y repos that are targets?
In my experience, the people trying to leverage LLMs for career advancement are drawn to the most high profile projects and buzzwords, where they think making PRs and getting commits will give them maximum career boost value. I don’t think they spend time playing in the boring repos that aren’t hot projects.
I have a simple rule: whenever I receive an issue or PR from an unfamiliar account, I skim the text and/or code and post 1-2 quick questions. If the submitter responds intelligently, then the issue warrants a more in-depth investigation. If not, the submitter clearly doesn't think that the issue is worth following up on, so it's not worth my time, either.
The more code a PR contains, the more thorough knowledge of it the submitter must demonstrate in order to be accepted as a serious contributor.
It doesn't matter if it was written by a human or an LLM. What matters is whether we can have a productive discussion about some potential problem. If an LLM passes the test, well it's a good doggy, no need to kick it out. I'd rather talk with an intelligent LLM than a clueless human who is trying to use github as a support forum.
> I have a simple rule: whenever I receive an issue or PR from an unfamiliar account, I skim the text and/or code and post 1-2 quick questions. If the submitter responds intelligently, then the issue warrants a more in-depth investigation. If not, the submitter clearly doesn't think that the issue is worth following up on, so it's not worth my time, either.
And if the submitter responds with "Great question! ...", what then? :D
That's a reasonable first step, but a human can just as well feed your questions to the LLM, and post the responses. So you really have to be vigilant throughout your interactions with the author, or, if the PR is unsalvagable, save yourself some time and effort and reject it from the start.
The goal is not to filter out LLMs. The goal is to find and fix legitimate issues with my open-source software. I don't care if a human is feeding my questions to an LLM or an undead dog as long as they produce a decent signal.
If the noise increases to an uncomfortable level, of course, I may have to change my strategies. The point is that humans produce noise, too, sometimes even more than LLMs do.
As an open source maintainer, I don't have an issue with AI; I have an issue with low quality slop whether it comes from a machine or from a human.
The responsibility then is for an open source project to not be shy on calling out low quality/low effort work, have good integ tests and linters, and have guidance like AGENTS.md files that tell coding robots how to be successful in the repo.
Some years ago I read the Neal Stephenson book Anathem. SPOILERS: it has a version of the Internet called the Reticulum and one thing I remember is that it was filled with garbage. True information subtly changed multiple times until it was garbage. And there were agents to see through the garbage. I imagined this to be a neverending arms race.
Honestly, this is kind of where I see LLM generated content going where you'll have to pay for ChatGPT 9 to get information because all the other bots have vandalized all the primary sources.
What's really fascinating is you need GPUs for LLMs. And most LLM output is, well, garbage. What did you previously need GPUs for? Mining crypto and that is, at least in the case of Bitcoin, pointless work for the sake of pointless work ie garbage.
I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
> I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
If you squint your eyes right at the shelves at Target or in the Amazon delivery trucks, or honestly just look around you most anywhere, you may not have to wait for the future to see it.
Considering how energy intensive generative content is, there is a good chance that it is already becoming a sizable share of energy use for the internet.
I think OP wanted to have a proof that the submitter can demonstrate understanding of both the system and the code in the PR. So how would a guidance solve that? Also OP didn't want to ban them I believe
>>> the entire issue description contains so much unneeded (and probably incorrect) information that it'd be better if they just provided their LLM prompt as an issue instead
When it's put this way, it seems a lot like the problem of people walking into doctors' offices with certainty that they know their own diagnosis after reading stuff on Reddit and WebMD.
What this post actually amounts to, indirectly, is a plea to trust human expertise in a particular domain instead of assuming that a layperson armed with random web pickings has the same chance as an expert at accurately diagnosing the problem. This wastes the expert's time and just increases mistrust.
The exceptions where Reddit solves something that a doctor failed to solve are what infuse the idea of lay online folk wisdom with merit, for people desperately looking for answers and cures. Makes it impossible to impose a blanket rule that we should trust experts, who are fallible as well.
The problem is societal. It's that if you erode trust in learned expertise long enough, you end up with a chaos of misinformation that makes it impossible to find a real answer.
A friend of mine who died of lung cancer recently, in his last days became convinced that he'd gotten it because of the covid vaccine (despite being a lifelong smoker, whose father had died of it at 41). And in every individual case you say, well, I don't want to disabuse someone of the fantasy they've landed on.
This is a devastatingly bad way to raise a generation, though. Short-circuiting one's own logic and handing it over to non-deterministic machines, or randos online... how do we expect this to end?
I assume any LLM contributions are uncopyrightable at best or copyright infringement at worst. Either way, I don't own the copyright.
In order to avoid a potential future where I lose the copyright due to being unable to show a substantial portion is human authored, I try to keep track of what is AI authored and what is human authored.
From a copyright perspective, right now accepting LLM contributions feels like playing with fire, at least for closed source projects.
Copyright in the context of LLM's is kind of weird. It doesn't explicitly copy others code but it does lean heavily on the structure of it. Is it evidence of copying prior work or is it fair use? Alas, we don't really have any legal bearing that can handle this yet.
If it is copying prior work, then you are right that there would be a lot of cross licensing bleed through. The opposite is also true in that it could take proprietary code structure and liberate it into GPL 3 for instance. Again what is the legal standing on this?
Years back there was a source code leak of Microsoft Office. Immediately the Libre office team put up restrictions to ensure that contributors didn't even look at it for fear that it would end up into their project and become a leverage point against the whole project. Now with LLM's it can be difficult to know where anything comes from.
We don’t know if it is weird yet, right? It is just a big question mark.
I guess as some point there will be a massive lawsuit. But, so much of the economy is wrapped up in this stuff nowadays, the folks paying for Justice System Premium Edition probably prefer not to have anything solid yet.
I completely agree, but it seems that our industry has decided to turn a blind eye to it. They might even get away with it -- the recent rulings around fair use with regard to Facebook and Anthropic's unrepentant copyright violations[1] was particularly galling to me.
Almost all of the projects I work on require you to sign the Developer Certificate of Origin[2] (which attempts to protect projects from people submitting code that they know cannot be licensed under the project's license), and in my view LLM code you submit does not fulfill the requirements of the DCO. Unfortunately, it seems nobody actually cares about this either.
I've seen an uptick in LLM generated bug reports from coworkers. A employee of my company (but not someone I work with regularly) used one of the CLI LLMs to search through logs for errors, and then automatically cut (hundreds!) of bugs to (sometimes) the correct teams. Turns out it was the result of some manager's mandate to "try integrating AI into our workflow". The resulting email was probably the least professional communication I've ever sent, but the message was received.
The only solution I can see is a hard-no policy. If I think this bug is AI, either by content or by reputation, I close without any investigation. If you want it re-opened, you'll need to IRL prove its genuine in an educated, good-faith approach that involves independent efforts to debug.
> "If you put your name on AI slop once, I'll assume anything with your name on it is (ignorable) slop, so consider if that is professionally advantageous".
Alas this is an issue of LLM generated code. Increased productivity but lessened understanding. If you can increase productivity while also increasing understanding then that will be a decent middle ground.
The issue expresses doubt about a policy specific to LLMs being accepted. I think the way to go might be to accept that the bar for outside contributions should unfortunately be higher. It doesn't take an LLM to have a glut of low quality contributions, it just takes an incentive and some attention, as we've seen with Hacktoberfest: https://news.ycombinator.com/item?id=31628342
This is a poor take. We need to stop slop and low effort issues/PRs. Stopping AI generated code is a lost battle because detecting that in high quality work is impossible.
If people can't make a good (PR, story, or whatever) without AI, they certainly can't do it with AI, because using the AI is strictly more difficult - it requires all the original skill, plus the skill to work around the AI's quirks.
This is arguably isomorphic to Kernighan's Law:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
Are you sure about that? Can you make the opposite case and steel-man it? Until you can, a stupid LLM will be smarter than you are.
I can write a PR now. I can code now. Probably not as good as you can, but before LLMs I couldn't. I tried for decades learning to code. My brain is a Top-down network, I can see the big picture very quickly, but I cannot maintain focus to build bottom-up. Now I don't have to. I use LLMs to set the goal, to examine all corner cases, to define the milestones, to predict the wrong turns, to write a human-readable spec, to break it down to units of test code, to write the blue-prints of units of code. I can test them, and debug them with LLMs. The end result can be sub-optimal, but it runs, it does what I want, is well documented, and is maintainable. Before LLMs I couldn't do any of that. In doing all this, I get better at the bottom-up thing, just by trying.
We are a spectrum of people. Do not assume the world is like you.
This is a minor issue. The big issue comes when we start complaining about code that's not generated by AI. When hand coded stuff becomes buggier than AI stuff.
You're being downvoted but I think I get what you're getting at.
When I was doing my masters a few months ago, I would get my assignments rejected whenever I didn't run them through Grammarly first.
I have nothing against Grammarly, it's a useful too, but I find that it has the tendency to reject things that (as far as I can tell) are still technically correct but don't have the "AI vibe" to it. I suspect that the graders are running things through Grammarly themselves and rejecting anything that it rejects. This is probably going to become increasingly more common as time goes on.
It's hardly the worst thing in the world, but I do think it will lead to the only "accepted" writing being extremely plain and formulaic.
I make a lot of drive-by contributions, and I use AI coding tools. I submitted my first PR that is a cross between those two recently. It's somewhere between "vibe-coded" and "vibe-engineered", where I definitely read the resulting code, had the agent make multiple revisions, and deployed the result on my own infrastructure before submitting a PR. In the PR I clearly stated that it was done by a coding agent.
I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
[append]
Getting a lot of hate for this, which I guess is a pretty clear answer. I guess the reason I'm not receiving the "fuck off" clearly is because when I see these threads of people complaining about AI content, it's really clearly low-quality crap that (for example) doesn't even compile, and wastes everyone's time.
I feel different from those cases because I did spend my time to resolve the issue for myself, did review the code, did test it, and do stand by what I'm putting under my name. Hmm.
I think it's disrespectful to throw bad code on somebody else, regardless of its provenance. And (as a professional who uses AI coding agents) I understand that LLM code is often bad code. But the OP isn't complaining about the deluge of bad code being submitted as PRs, they're complaining about LLM code being submitted as PRs.
I've also been on the other side of this, receiving some spammy LLM-generated irrelevant "security vulnerabilities", so I also get the desire for some filtering. I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
Well, we don't receive that many low-quality PRs in general (I opened this issue to discuss solutions before it becomes a real problem). Speaking personally, when it does happen I try to help mentor the person to improve their code or (in the case where the person isn't responsive) I sit down and make the improvements I would've made and explain why they were made as a comment in the PR.
When it comes to LLM-generated code, I am now going to be going back-and-forth with someone who is probably just going to copy-paste my comments into an LLM (probably not even bothering to read them). It just feels disrespectful.
> I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
Well, this is a two-way street -- all of the LLM-generated PRs and issues I've seen so far do not say that they are LLM-generated, in a way that I am tempted to describe as "dishonest". If every LLM-generated PR was tagged as such, I might have a different outlook on the situation (and might instead be willing to reviewing these issues but with lower priority).
Thanks for the response here. I also read your follow-up on the Github issue. I definitely agree with you on the mentorship point. The "CPU quota allocation mismatch" issue felt particularly agitating to read from the point of view as a maintainer. Human being acting as a proxy for ChatGPT. Have you considered putting an "LLM usage disclosure" question on the issue/PR templates? Something like "To what extent were AI tools used to create this issue/PR? What validation steps did you perform to ensure accuracy?"
The "hard-line policy" would then shift from being "used LLM tools" to "lied on the LLM usage disclosure", and it feels a lot less like selective enforcement (from my perspective). Obviously it won't stop these spammy issues/PRs, but neither will a hard-line policy against all AI.
I would potentially accept it (although you should not lie about it, and if someone says in their policy that they specifically do not want this then you should not send it), although for my own projects, there are reasons (having to do with the way the version control is set up) that I cannot accept direct contributions anyways, and I usually make my own changes to any contributions anyways (but anyone else can make their own copy with their own changes however they want to do, whether or not I accept it). Since it is honest, and that you had reviewed it by yourself before submitting it, and also tested it, and that I would review it again anyways as well, it would be acceptable, although I would still much more prefer to receive contributions that do not use LLM, still the way you do it is much better than the other stuff some by LLM which does not meet the threshold of being acceptable.
If I were the maintainer, and you specified up front that your PR was largely written by an LLM, I would appreciate it. I may prioritize it lower than other PRs, perhaps, but that's about it.
I think it's also important to disclose how rigorously you tested your changes, too. I would hate to spend my time looking at a change that was never even tested by a human.
It sounds like you do both of these. Judging by the other replies, it seems that other reviewers may take a harsher stance, given the heavily polarized nature of LLMs. Still, if you made the changes and you're up front about your methodology, why not? In the worst case, your PR gets closed and everybody moves on.
> I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
If a project has a stated policy that code written with an LLM-based aid is not accepted, then it shouldn't be submitted, same as with anything else that might be prohibited. If you attempt to circumvent this by hiding it and it is revealed that you knowingly did so in violation of the policy, then it would be unsurprising for you to receive a harsh reply and/or ban, as well as a revert if the PR was committed. This would be the same as any other prohibition, such as submitting code copied from another project with an incompatible license.
You could argue that such a blanket ban is unwarranted, and you might be right. But the project maintainers have a right to set the submission rules for their project, even if it rules out high-quality LLM assisted submissions. The right way to deal with this is to ask the project maintainers if they would be willing the adjust the policy, not to try to slip such code into the project anyway.
"should I stop doing this thing that people are explicitly saying they don't want me to do or should I keep doing this thing that people are explicitly saying they don't want me to do???"
As with evertything, there's always nuance. If everyone followed similar midset to the comment you were replying to, likely llm generated pr issues wouldn't be as much of a problem and we wouldn't even be here discussing it
I think everything has a growing problem with LLM/AI generated content. Emails, blog posts, news articles, research papers, grant applications, business proposals, music, art, pretty much everything you can think of.
There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
In some ways I'm starting to enjoy this. Do you remember the 419 scams? 'Hi, I'm a Nigerian prince named Michael Jordan. Give me $50 so I can buy some chemicals to clean a bunch of money I secretly stowed away and I'll send you $5000.' People actually used to fall for that. Of course some people probably still would (and a lot more certainly gets blocked by spam blockers) but I think overall society grew less substantially less gullible over time.
But in general I think most people still remain excessively gullible and naive. Social media image crafting is one of the best examples of this. People create completely fake and idealized lives that naive individuals think are real. Now with AI enabling one to create compelling 'proof' of whatever lie you want, I think more people are becoming more suspicious of things that were, in fact, fake all along.
---
Going back to ancient times many don't know that Socrates literally wrote nothing down. Basically everything we know of him is thanks to other people, his student Plato in particular, instead writing down what he said. The reason for this was not a lack of literacy - rather he felt that writing was harmful because words cannot defend themselves, and can be spun into misrepresentations or falsehoods. Basically - the argumentative fallacies that indeed make up most 'internet debates', for instance. Yet now few people are not aware of this issue, and quotes themselves are rarely taken at face value, unless they confirm ones biases. People became less naive as writing became ubiquitous, and I think this is probably a recurring theme in technologies that transform our abilities to transfer information in some format or another.
> but I think overall society grew less substantially less gullible over time.
Absolutely not. All that happened is most people became aware that "Nigerians offering you money are scammers." But they still fall for other get rich quick schemes so long as it diverges a little bit from that known pattern, and they'll confidently walk into the scam despite being warned and saying, "It's not a scam, dumbass. It's not like that Nigerian price stuff." If anything, people seem to be becoming more confident that they're in on some secret knowledge and everyone else is being scammed.
I call it the antibody effect. My favorite example is clickbait headlines like, "Five things you MUST do if you're doing this thing. You'd never guess #3!" It used to be everywhere and now it's nowhere.
AI is starting to show this effect - people stay away from em-dashes. There's that yellowish tinge and that composition which people avoid on art. Some of this is bad, but we can probably live without it.
I'm inclined to disagree just because photoshop has had a measurable effect on the population being skeptical of photos which at one point were practically treated as the gold standard of evidence. It's still easy to find people who have fallen for photoshopped images, but it's also easy to find people expressing doubts and insisting they can "tell by the pixels". Sometimes even legitimate photos get accused of being photoshopped which seems healthy.
The other side of this is that ai tools are being treat like magic to the point that people are denying well documented events happened at all, such as the shooting of Charlie Kirk - conspiracies abound!
Also bizarrely a subsection of the population seems to be really into blatantly ai generated images, just hop onto Facebook and see for yourself. I wonder if it has something to do with whatever monkey brain things makes people download apps with a thumbnail of a guy shouting or watch videos that have a thumbnail of a face with its mouth open, since ai generated photos seem very centralized around a single face making a strange expression.
People of a profound initial bias will, in general, believe anything that supports that bias, and reject anything that challenges it, in both cases without any real consideration or thought whatsoever. So I don't think examples of individuals being "misled" by e.g. AI generated images or video, to extremes, is entirely realistic. Rather they were already at those extremes and will just eat up anything that appeals to those extremes.
To take a less politically charged example, imagine there is fake content 'proving' that the Moon landing is faked. Is that going to meaningfully sway people who don't have a major opinion one way or the other? Probably not, certainly not in meaningful numbers. And in general I think the truth does come out on most things. And when people find they have been misled, particularly if it was somebody they thought they could trust, it tends to result in a major rubber-banding in the opposite direction.
Let's think about this algorithmically.
If something "floods the zone with shit," it needs S amount of shit to cause a flood. But too much will eventually make the scam ineffectual. Widespread public distrust for the scam is (S+X)/time where X is the extra amount of shit beyond the minimum needed. Time is a global variable constrained by the rate at which people get burned or otherwise catch on to all other scams of the same variety. If we imagine that time-to-distrust shrinks with each new iteration of shit, then X the amount of excessive shit needed to trigger distrust should decrease over time.
The longer term problem is the externality where nothing is trusted, and the whole zone is destroyed. When that zone was "what someone wrote down that Socrates might have said," or "Protocols of the Elders of Zion," or "emails from unknown senders," that was one thing. A new baseline could be set for 'S'. When it's all writing, all art, all music and all commentary on those things, it seems catastrophic. The whole cave is flooded with shit.
You seem to assume people who are the victims of scams are people who are more naive. But that’s not how it works. Scams try to catch people at their weakest. It’s not if but when.
> I think more people are becoming more suspicious of things that were, in fact, fake all along.
Sadly they also become suspicious of things that are, in fact, facts all along.
Video or photo evidence of a crime become useless the better AI gets
> Video or photo evidence of a crime become useless the better AI gets
This is probably a good thing because photoshop and CGI have existed for a very long time and people shouldn't have the ability to frame an innocent for a crime or even get away with one just because they pirate some software and put in a few hours watching tutorials on youtube.
The sooner jurors understand that unverified video/photo evidence is worthless the better.
The old quote of, 'You can fool some of the people all of the time, all of the people some of the time'.
There is the hope that in dumping so much slop so rapidly that it will break the bottom of the bucket. But there is the alternative that the bottom of the bucket will never break, it will just get bigger.
I personally know people who look down upon people who use LLMs to write code. There is a lot of hate in some of senior developers that I talk to. I don't know if this growing tendency to be suspicious of AI usage is good or bad. For example, towards the final semester of my bachelors degree, my algorithms class started reporting students for academic misconduct because they the TAs started assuming that all the optimal solutions to assignment problems were written by LLMs. In fact, several classmates started purposely writing sub-optmial solutions so that the TAs at least grade them without any prejudice.
I worry that because LLM slop also tends to be so well presented, it might compel software developers to start writing shabby code and documentation on purpose to make it appear human.
At the moment it is the other way around. LLMs rarely write good code if not instructed by someone that knows what they are doing. And even then the code is rarely good.
> People actually used to fall for that. Of course some people probably still would (and a lot more certainly gets blocked by spam blockers) but I think overall society grew less substantially less gullible over time.
It is still a serious problem just want that to be abundantly clear. Several thousand people (in the US alone) fall for it every year. I used to browse 419 eater regularly and up until a few years ago (when I last really followed this issue) these scams were raking in billions a year. Could be more or less now but doubt it’s shifted a ton.
> There’s already more human produced content in the world than anyone could ever hope to consume, we don’t need more from AI.
Even if you think the harms of AI/machine generated content outweigh the good, this is not a winning argument.
People don’t just consume arbitrary content for the sake of consuming any existing content. That’s rarely the point of it. People look for all kinds of things that don’t exist yet — quite a lot of it referring to things that are only now known or relevant in the given moment or to the given niche audience requesting it. Much of it could likely never exist if it weren’t possible to produce it on demand and which would not be valuable if you had to wait for a human to make it.
The generated content is wasting the time of maintainers. How would you solve that?
For your winning argument, what would you use to prevent slop filling up your feed when there is more AI generated content, any sort of protocol that you have?
Just happened to me yesterday. I realized I'd really like to watch a long-form video on Huygens Clock. Something between a short (god damnit) and Quicksilver. Thought there must be something detailed. Nope, at least Youtube let me down. Or the search algo, who knows. Since they removed the "20+ minutes" button it has become basically useless to search on YT.
> Since they removed the "20+ minutes" button it has become basically useless to search on YT.
A few days ago I definitely got into an A/B test where the search results were:
- 5 shorts one under another
- new section with one or two videos and one or two shorts
- new section with five or more shorts in a horizontal layout
- new section with videos of which 20-30% were shorts
It's insane
They're pushing Shorts so fucking hard it makes me sick.
Even worse is that we're banning TikTok because it's bad for the kids (short form algorithmic content), Snapchat (similar thing + strangers creeping) and Instagram Stories (algorithm again).
BUT there is NO WAY for a parent to allow their kid to use Youtube AND block Shorts. (yes there are browser plugins etc, but how do you enforce them on a child?)
And from what I've seen the AI slop on Shorts is so fucking bad that it seems we just collectively forgot about Elsagate...
I think we should start calling LLM-generated data 'dis-content'.
How bout "botshit"?
We already have definition “AI slop”:
> AI slop is digital content made with generative artificial intelligence, specifically when perceived to show a lack of effort, quality or deeper meaning, and an overwhelming volume of production.
https://en.wikipedia.org/wiki/AI_slop
Creating a email/messaging protocol to solve spam, which can also be used to solve LLM generated spam pull-requests because pull-requests are also a form of messaging and have the same characteristics.
Check this profile for the email if you wanna ask for more info or get updates.
Edit: To the people downvoting this, what have you done to help solve the spam problem that will be made worse by LLMs?
I personally am more afraid of not the exact opposite, but more that AI will cause us to have too little content more than we will have too much. I just notice with myself how much less I’m using Wikipedia or forums, and get my answers from OpenAI or Antrophic. From what I remember it’s a over 20% drop in Wikipedia traffic from monthly 2022 vs 2025.
Primary sources will always exist as the content needs to come from somewhere. Personally, I shy away from ChatGPT answers for the same reason I avoid Wikipedia: you've got to find authoritative information to get reliable answers. I wonder if people who use LLMs don't care about accuracy, or don't know about sourcing?
I don't think that's fully attributable to LLMs.
Attention spans for long-form content at are at all-time lows, judging by the metrics I've seen from various platforms across different media types.
Large Language Models cannot think but they can self correct. You can think but clearly have some serious problems self-correcting your wrong assumptions. Who is smarter then?
I've seen an interesting politically motivated one. It didn't appear to be a bot, just a user from China:
https://github.com/umami-software/umami/pull/3678
The goal is "Taiwan" -> "Taiwan, Province of China" but via the premise of updating to UN ISO standards, which of course does not allow Taiwan.
The comment after was interesting with how reasonable it sounds: "This is the technical specification of the ISO 3166-1 international standard, just like we follow other ISO standards. As an open-source project, it follows international technical standards to ensure data interoperability and professionalism."
The politics of the intent of the PR was masked. Luckily, it was still a bit hamfisted. The PR incorrectly changed many things and the user stated their political intention in the original PR (the above is from a later comment).
Doubly interesting (and relevant to this discussion) is that it was an AI code review tool that detected the issues with the PR
One of these days I need to make a bot that scans FOSS repos for this kind of little pink nonsense behavior.
The insecurity of wanting to call a place "country name, province of different country name" should alone be mocked. Imagine, "Ukraine, province of Russia," or "India, colony of The United Kingdom." Absurd on its face.
It's just information warfare, sign of the times.
Every little thing counts, even if it's just changing names in an open source app like that.
I think AI generated issues and pull requests may lead to the death of GitHub as the place for open-source projects. As Microsoft is hell-bent on shoving easy AI buttons in every interface including GitHub, and as more people try to pad their GitHub profile for their resume, the burden for open-source projects to filter the low-quality spam may grow immense and it may become worth to just switch to Codeberg or other platforms. Some LLM use will inevitably leak into these other platforms but the relatively high barrier (no built-in AI buttons), effective moderation banning obvious violators, and reduced usefulness for resume padding would make it negligible I think.
Since the cat is out of the bag (i.e. there's no stopping of LLM-generated code / PR / issues now that vibe-coding is fairly universally accessible), the only thing in our (especially those who are custodians of software systems and products) control is to invest a lot more in testing and evals. Just plain opposing LLM-generated content as a policy is a losing battle at this point..
> to invest a lot more in testing and evals
I don't think this will work. The same arguments could have been said to mitigate junior devs' work or short timeline/high stakes project technical debt and rarely ever anyone listened
I disagree. I think there’s a near future where this is true, but we’re still at the point where you can cut down on a lot of spam/slop by rejecting the class of LLM-generated content as a policy.
Sidenote, but I love that in a GitHub issue discussing banning the use of LLMs, the GitHub interface asks if there's anything I'd like to fix with CoPilot.
That might be a contributing factor to this problem. the means to produce these bogus reports are integrated directly into so many dev tools nowadays.
Prompt injection must be a really lucrative activity. I wonder how many hidden prompts will be detected on GitHub in the coming years and how many devs dare point LLMs at GitHub issues
Policy is a decent filter for consensual candidates, but mostly ineffective against incentives and automation.
The issue is exported costs: whether submitters make reviewers work too hard for the contribution value.
The policy/practice should focus first on making reviewer/developer's work easier and better, and second on refining submitter skills to become developers. The same is true for Senior/Junior relations internally.
So the AI company that solves how to pare AI slop down to clean PR's would meet a real and growing need, and probably also help with senior/junior relations as well.
Then you could meet automation with automation, and the incentives are aligned around improving quality of code and of work experience. People would feel they're using AI instead of competing with it.
Maybe the solution is to counteract LLM pull requests with LLM first pass reviews. Obviously not a perfect solution, but LLMs should be good enough at detecting spam. This only needs to apply to first time contributors who haven't yet proven themselves.
There seems to be a bot that routinely creates huge LLM generated Issues on runc github: https://github.com/containerd/containerd/issues/12496
And honestly, its becoming annoying
Is there a way to ban specific users in your GitHub project?
(I prefer GitLab, I'm sure if it had projects that are as popular it would be similarly inundated.)
IIRC if one of the maintainers of a project blocks a user that prevents them from participating in issues and PRs.
For bigger projects with many maintainers that can also lead to problems if people use the block function as liberally as on Twitter.
In stuff I maintain I hardly get anything at all, whether using LLMs or not, so it does not affect me much. Furthermore, I exclusively use the API, so the UI will not try to make you use Copilot and those things either (that is not the reason I use the API, although it is a side effect).
I do not use Copilot, Claude, etc, although I partially agree with one of the comments there, that using LLM for minor auto-completion is probably OK, as long as you can actually see that the completion is not incorrect (although that should apply to other uses of auto-completion too, even if LLM is not used; but it is even more important to check more carefully if LLM is used). I think it would be better to not accept any LLM generated stuff otherwise (although the author might use LLM to assist before submitting it if desired (I don't, but it might help some programmers), e.g. in case the LLM finds problems with it, that they will then have to review themself to check if it is correct, before correcting and submitting it; i.e. don't trust the results of the LLM).
It links to https://github.com/lxc/incus/commit/54c3f05ee438b962e8ac4592... (add .patch on the end of the URL if it is not displayed), and I think the policy described there is good. (However, for my own projects, nobody else can directly modify it anyways; they will have to make their own copy and modify that instead, and then I will review it by myself and can include the changes (possibly with differences from how they did it) or not.)
It will be more interesting to see if and when AI becomes better than humans at coding, which seems to be coming close to reality, some projects may start accepting only AI contributions :D.
Unfortunately, LLMs empower "contributors" who can't be bothered to put in any effort and who don't care about the negative impact of their actions on the maintainers.
The open-source community, generally speaking, is a high-trust society and I'm afraid that LLM abuse may turn it into a low-trust society. The end result will be worse than the status quo for everyone involved.
Everything is collapsing toward a low-trust default. At the end of this trajectory, we rediscover that the analog world becomes valuable precisely because it can't be infinitely replicated.
Authenticity becomes the foundational currency.
But everyone must master AI tools to stay relevant. The brilliant engineer who refuses AI-generated PR by principle will get replaced. Every 18-24 months, as capabilities double, required skills shift. Specialization diminishes. Learning velocity becomes the only durable advantage. These people cannot learn new tricks.
Those who cannot question their assumptions cannot self-correct and will be replaced. The future belongs to the humble, the fluid, and the resilient. 60% of HN users is going toward a very tough time, and I am being very charitable with this assumption.
AI seems like a rerun of the Eternal September problem that eventually killed internet news groups and a lot of email discussion lists.
Curious to know if others are seeing a similar uptick in AI slop in issues or PRs for projects they are maintaining. If yes, how are you dealing with this?
Some of the software that I maintain is critical to container ecosystem and I'm an extremely paranoid developer who starts investigating any github issue within a few minutes of it opening. Now, some of these AI slop github issues have a way to "gaslight" me into thinking that some code paths are problematic when they actually are not. And lately AI slop in issues and PRs have been taking up a lot of my time.
I haven’t seen anything obvious, even including the other repos where I look through issues a lot.
Maybe it’s only the really popular and buzzword-y repos that are targets?
In my experience, the people trying to leverage LLMs for career advancement are drawn to the most high profile projects and buzzwords, where they think making PRs and getting commits will give them maximum career boost value. I don’t think they spend time playing in the boring repos that aren’t hot projects.
At first, I thought "wow, this project has been inactive for some time and this PR is quite large". The use of emojis should have tipped me off :)
https://github.com/photo/frontend/pull/1609
AI slop attacks on the cURL project : https://www.youtube.com/watch?v=6n2eDcRjSsk
I have a simple rule: whenever I receive an issue or PR from an unfamiliar account, I skim the text and/or code and post 1-2 quick questions. If the submitter responds intelligently, then the issue warrants a more in-depth investigation. If not, the submitter clearly doesn't think that the issue is worth following up on, so it's not worth my time, either.
The more code a PR contains, the more thorough knowledge of it the submitter must demonstrate in order to be accepted as a serious contributor.
It doesn't matter if it was written by a human or an LLM. What matters is whether we can have a productive discussion about some potential problem. If an LLM passes the test, well it's a good doggy, no need to kick it out. I'd rather talk with an intelligent LLM than a clueless human who is trying to use github as a support forum.
> I have a simple rule: whenever I receive an issue or PR from an unfamiliar account, I skim the text and/or code and post 1-2 quick questions. If the submitter responds intelligently, then the issue warrants a more in-depth investigation. If not, the submitter clearly doesn't think that the issue is worth following up on, so it's not worth my time, either.
And if the submitter responds with "Great question! ...", what then? :D
That's a reasonable first step, but a human can just as well feed your questions to the LLM, and post the responses. So you really have to be vigilant throughout your interactions with the author, or, if the PR is unsalvagable, save yourself some time and effort and reject it from the start.
> I'd rather talk with an intelligent LLM
There's no such thing.
The goal is not to filter out LLMs. The goal is to find and fix legitimate issues with my open-source software. I don't care if a human is feeding my questions to an LLM or an undead dog as long as they produce a decent signal.
If the noise increases to an uncomfortable level, of course, I may have to change my strategies. The point is that humans produce noise, too, sometimes even more than LLMs do.
As an open source maintainer, I don't have an issue with AI; I have an issue with low quality slop whether it comes from a machine or from a human.
The responsibility then is for an open source project to not be shy on calling out low quality/low effort work, have good integ tests and linters, and have guidance like AGENTS.md files that tell coding robots how to be successful in the repo.
What good does having integration tests and linters do when triaging issues? When trying to figure out if a bug report is hallucinated or not?
Some years ago I read the Neal Stephenson book Anathem. SPOILERS: it has a version of the Internet called the Reticulum and one thing I remember is that it was filled with garbage. True information subtly changed multiple times until it was garbage. And there were agents to see through the garbage. I imagined this to be a neverending arms race.
Honestly, this is kind of where I see LLM generated content going where you'll have to pay for ChatGPT 9 to get information because all the other bots have vandalized all the primary sources.
What's really fascinating is you need GPUs for LLMs. And most LLM output is, well, garbage. What did you previously need GPUs for? Mining crypto and that is, at least in the case of Bitcoin, pointless work for the sake of pointless work ie garbage.
I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
Seems about right.
> I can see a future in our lifetimes where a significant amount of our capital expenditure and energy consumption is used, quite simply, to produce garbage.
If you squint your eyes right at the shelves at Target or in the Amazon delivery trucks, or honestly just look around you most anywhere, you may not have to wait for the future to see it.
If I could find a way to turn LLM training into some sort of proof of work crypto mining I could cut the energy use in half!
Considering how energy intensive generative content is, there is a good chance that it is already becoming a sizable share of energy use for the internet.
Humans are far more effective at producing garbage than LLMs will ever be :)
They're really not, though. Creating garbage is the one ability at which LLMs are unarguably superhuman.
Wouldn't you provide LLM friendly guidance material in your repo instead of spending effort on a pointless endeavor to ban them ?
I think OP wanted to have a proof that the submitter can demonstrate understanding of both the system and the code in the PR. So how would a guidance solve that? Also OP didn't want to ban them I believe
> I think OP wanted to have a proof that the submitter can demonstrate understanding of both the system and the code in the PR.
LLMs are really good at writing these. IF they think this will prove the author is human, they're mistaken.
If you plan to accept LLM contributions then a lot more effort will have to be spent on reviewing them.
>>> the entire issue description contains so much unneeded (and probably incorrect) information that it'd be better if they just provided their LLM prompt as an issue instead
When it's put this way, it seems a lot like the problem of people walking into doctors' offices with certainty that they know their own diagnosis after reading stuff on Reddit and WebMD.
What this post actually amounts to, indirectly, is a plea to trust human expertise in a particular domain instead of assuming that a layperson armed with random web pickings has the same chance as an expert at accurately diagnosing the problem. This wastes the expert's time and just increases mistrust.
The exceptions where Reddit solves something that a doctor failed to solve are what infuse the idea of lay online folk wisdom with merit, for people desperately looking for answers and cures. Makes it impossible to impose a blanket rule that we should trust experts, who are fallible as well.
The problem is societal. It's that if you erode trust in learned expertise long enough, you end up with a chaos of misinformation that makes it impossible to find a real answer.
A friend of mine who died of lung cancer recently, in his last days became convinced that he'd gotten it because of the covid vaccine (despite being a lifelong smoker, whose father had died of it at 41). And in every individual case you say, well, I don't want to disabuse someone of the fantasy they've landed on.
This is a devastatingly bad way to raise a generation, though. Short-circuiting one's own logic and handing it over to non-deterministic machines, or randos online... how do we expect this to end?
A much bigger problem is that when an AI/LLM coughs up code, you have absolutely no idea what the copyright or license is.
I assume any LLM contributions are uncopyrightable at best or copyright infringement at worst. Either way, I don't own the copyright.
In order to avoid a potential future where I lose the copyright due to being unable to show a substantial portion is human authored, I try to keep track of what is AI authored and what is human authored.
From a copyright perspective, right now accepting LLM contributions feels like playing with fire, at least for closed source projects.
Copyright in the context of LLM's is kind of weird. It doesn't explicitly copy others code but it does lean heavily on the structure of it. Is it evidence of copying prior work or is it fair use? Alas, we don't really have any legal bearing that can handle this yet.
If it is copying prior work, then you are right that there would be a lot of cross licensing bleed through. The opposite is also true in that it could take proprietary code structure and liberate it into GPL 3 for instance. Again what is the legal standing on this?
Years back there was a source code leak of Microsoft Office. Immediately the Libre office team put up restrictions to ensure that contributors didn't even look at it for fear that it would end up into their project and become a leverage point against the whole project. Now with LLM's it can be difficult to know where anything comes from.
We don’t know if it is weird yet, right? It is just a big question mark.
I guess as some point there will be a massive lawsuit. But, so much of the economy is wrapped up in this stuff nowadays, the folks paying for Justice System Premium Edition probably prefer not to have anything solid yet.
I completely agree, but it seems that our industry has decided to turn a blind eye to it. They might even get away with it -- the recent rulings around fair use with regard to Facebook and Anthropic's unrepentant copyright violations[1] was particularly galling to me.
Almost all of the projects I work on require you to sign the Developer Certificate of Origin[2] (which attempts to protect projects from people submitting code that they know cannot be licensed under the project's license), and in my view LLM code you submit does not fulfill the requirements of the DCO. Unfortunately, it seems nobody actually cares about this either.
[1]: https://www.debevoise.com/insights/publications/2025/06/anth... [2]: https://developercertificate.org/
copyrighting software to begin with was always laughable.
I've seen an uptick in LLM generated bug reports from coworkers. A employee of my company (but not someone I work with regularly) used one of the CLI LLMs to search through logs for errors, and then automatically cut (hundreds!) of bugs to (sometimes) the correct teams. Turns out it was the result of some manager's mandate to "try integrating AI into our workflow". The resulting email was probably the least professional communication I've ever sent, but the message was received.
The only solution I can see is a hard-no policy. If I think this bug is AI, either by content or by reputation, I close without any investigation. If you want it re-opened, you'll need to IRL prove its genuine in an educated, good-faith approach that involves independent efforts to debug.
> "If you put your name on AI slop once, I'll assume anything with your name on it is (ignorable) slop, so consider if that is professionally advantageous".
Alas this is an issue of LLM generated code. Increased productivity but lessened understanding. If you can increase productivity while also increasing understanding then that will be a decent middle ground.
All comes down to accountability.
The issue expresses doubt about a policy specific to LLMs being accepted. I think the way to go might be to accept that the bar for outside contributions should unfortunately be higher. It doesn't take an LLM to have a glut of low quality contributions, it just takes an incentive and some attention, as we've seen with Hacktoberfest: https://news.ycombinator.com/item?id=31628342
This is a poor take. We need to stop slop and low effort issues/PRs. Stopping AI generated code is a lost battle because detecting that in high quality work is impossible.
If people can't make a good (PR, story, or whatever) without AI, they certainly can't do it with AI, because using the AI is strictly more difficult - it requires all the original skill, plus the skill to work around the AI's quirks.
This is arguably isomorphic to Kernighan's Law:
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
Are you sure about that? Can you make the opposite case and steel-man it? Until you can, a stupid LLM will be smarter than you are.
I can write a PR now. I can code now. Probably not as good as you can, but before LLMs I couldn't. I tried for decades learning to code. My brain is a Top-down network, I can see the big picture very quickly, but I cannot maintain focus to build bottom-up. Now I don't have to. I use LLMs to set the goal, to examine all corner cases, to define the milestones, to predict the wrong turns, to write a human-readable spec, to break it down to units of test code, to write the blue-prints of units of code. I can test them, and debug them with LLMs. The end result can be sub-optimal, but it runs, it does what I want, is well documented, and is maintainable. Before LLMs I couldn't do any of that. In doing all this, I get better at the bottom-up thing, just by trying.
We are a spectrum of people. Do not assume the world is like you.
It seems either your LLM is 10x better than anything I've ever used or your standards are 10x lower.
This is a minor issue. The big issue comes when we start complaining about code that's not generated by AI. When hand coded stuff becomes buggier than AI stuff.
You're being downvoted but I think I get what you're getting at.
When I was doing my masters a few months ago, I would get my assignments rejected whenever I didn't run them through Grammarly first.
I have nothing against Grammarly, it's a useful too, but I find that it has the tendency to reject things that (as far as I can tell) are still technically correct but don't have the "AI vibe" to it. I suspect that the graders are running things through Grammarly themselves and rejecting anything that it rejects. This is probably going to become increasingly more common as time goes on.
It's hardly the worst thing in the world, but I do think it will lead to the only "accepted" writing being extremely plain and formulaic.
I make a lot of drive-by contributions, and I use AI coding tools. I submitted my first PR that is a cross between those two recently. It's somewhere between "vibe-coded" and "vibe-engineered", where I definitely read the resulting code, had the agent make multiple revisions, and deployed the result on my own infrastructure before submitting a PR. In the PR I clearly stated that it was done by a coding agent.
I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
[append] Getting a lot of hate for this, which I guess is a pretty clear answer. I guess the reason I'm not receiving the "fuck off" clearly is because when I see these threads of people complaining about AI content, it's really clearly low-quality crap that (for example) doesn't even compile, and wastes everyone's time.
I feel different from those cases because I did spend my time to resolve the issue for myself, did review the code, did test it, and do stand by what I'm putting under my name. Hmm.
I think it’s disrespectful of others to throw generated code their way. They become responsible for it and often donate their time.
I think it's disrespectful to throw bad code on somebody else, regardless of its provenance. And (as a professional who uses AI coding agents) I understand that LLM code is often bad code. But the OP isn't complaining about the deluge of bad code being submitted as PRs, they're complaining about LLM code being submitted as PRs.
I've also been on the other side of this, receiving some spammy LLM-generated irrelevant "security vulnerabilities", so I also get the desire for some filtering. I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
(OP here.)
Well, we don't receive that many low-quality PRs in general (I opened this issue to discuss solutions before it becomes a real problem). Speaking personally, when it does happen I try to help mentor the person to improve their code or (in the case where the person isn't responsive) I sit down and make the improvements I would've made and explain why they were made as a comment in the PR.
When it comes to LLM-generated code, I am now going to be going back-and-forth with someone who is probably just going to copy-paste my comments into an LLM (probably not even bothering to read them). It just feels disrespectful.
> I hope projects don't adopt blanket hard-line "no AI" policies, which will only be selectively enforced against new contributors where the code "smells like" LLM code, but that's what I'm afraid will happen.
Well, this is a two-way street -- all of the LLM-generated PRs and issues I've seen so far do not say that they are LLM-generated, in a way that I am tempted to describe as "dishonest". If every LLM-generated PR was tagged as such, I might have a different outlook on the situation (and might instead be willing to reviewing these issues but with lower priority).
Thanks for the response here. I also read your follow-up on the Github issue. I definitely agree with you on the mentorship point. The "CPU quota allocation mismatch" issue felt particularly agitating to read from the point of view as a maintainer. Human being acting as a proxy for ChatGPT. Have you considered putting an "LLM usage disclosure" question on the issue/PR templates? Something like "To what extent were AI tools used to create this issue/PR? What validation steps did you perform to ensure accuracy?"
The "hard-line policy" would then shift from being "used LLM tools" to "lied on the LLM usage disclosure", and it feels a lot less like selective enforcement (from my perspective). Obviously it won't stop these spammy issues/PRs, but neither will a hard-line policy against all AI.
I would potentially accept it (although you should not lie about it, and if someone says in their policy that they specifically do not want this then you should not send it), although for my own projects, there are reasons (having to do with the way the version control is set up) that I cannot accept direct contributions anyways, and I usually make my own changes to any contributions anyways (but anyone else can make their own copy with their own changes however they want to do, whether or not I accept it). Since it is honest, and that you had reviewed it by yourself before submitting it, and also tested it, and that I would review it again anyways as well, it would be acceptable, although I would still much more prefer to receive contributions that do not use LLM, still the way you do it is much better than the other stuff some by LLM which does not meet the threshold of being acceptable.
If I were the maintainer, and you specified up front that your PR was largely written by an LLM, I would appreciate it. I may prioritize it lower than other PRs, perhaps, but that's about it.
I think it's also important to disclose how rigorously you tested your changes, too. I would hate to spend my time looking at a change that was never even tested by a human.
It sounds like you do both of these. Judging by the other replies, it seems that other reviewers may take a harsher stance, given the heavily polarized nature of LLMs. Still, if you made the changes and you're up front about your methodology, why not? In the worst case, your PR gets closed and everybody moves on.
They don't want your contribution, so don't disrespect them by trying to make it.
> I can't imagine that any policy against LLM code would allow this sort of thing, but I also imagine that if I don't say "this was made by a coding agent", that no one would ever know. So, should I just stop contributing, or start lying?
If a project has a stated policy that code written with an LLM-based aid is not accepted, then it shouldn't be submitted, same as with anything else that might be prohibited. If you attempt to circumvent this by hiding it and it is revealed that you knowingly did so in violation of the policy, then it would be unsurprising for you to receive a harsh reply and/or ban, as well as a revert if the PR was committed. This would be the same as any other prohibition, such as submitting code copied from another project with an incompatible license.
You could argue that such a blanket ban is unwarranted, and you might be right. But the project maintainers have a right to set the submission rules for their project, even if it rules out high-quality LLM assisted submissions. The right way to deal with this is to ask the project maintainers if they would be willing the adjust the policy, not to try to slip such code into the project anyway.
"should I stop doing this thing that people are explicitly saying they don't want me to do or should I keep doing this thing that people are explicitly saying they don't want me to do???"
As with evertything, there's always nuance. If everyone followed similar midset to the comment you were replying to, likely llm generated pr issues wouldn't be as much of a problem and we wouldn't even be here discussing it