US Copyright Office found AI companies breach copyright. Its boss was fired

(theregister.com)

361 points | by croes 8 hours ago ago

237 comments

jhaile 2 hours ago ago
One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit. This will mean that US LLM companies will either fall behind or be too expensive. Which means China and other countries will probably surge ahead in AI, at least in terms of how useful the AI is.
That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.
[-]
- oooyay an hour ago ago
  Well hell, by that logic average citizens should be able to launder corporate intellectual property because China will never follow suit in adhering to intellectual property law. I'm game if you are.
  [-]
  - rollcat 16 minutes ago ago
    Well I always felt rebellious about the contemporary face of "rules for thee but not for me", specifically regarding copyright.
    Musicians remain subject to abuse by the recording industry; they're making pennies on each dollar you spend on buying CDs^W^W streaming services. I used to say, don't buy that; go to a concert, buy beer, buy merch, support directly. Nowadays live shows are being swallowed whole through exclusivity deals (both for artists and venues). I used to say, support your favourite artist on Bandcamp, Patreon, etc. But most of these new middlemen are ready for their turn to squeeze.
    And now on top of all that, these artists' work is being swallowed whole by yet another machine, disregarding what was left of their rights.
    What else do you do? Go busking?
  - jowea an hour ago ago
    Isn't that sort of logic precisely why China doesn't adhere to IP law?
    [-]
    - oooyay an hour ago ago
      Yes, I was being a bit facetious. It was snark intended to point out that corporations don't get to have their cake and eat it too. Either everything is free and there are no boundaries or we live by our own principles.
      [-]
      - gruez 26 minutes ago ago
        >It was snark intended to point out that corporations don't get to have their cake and eat it too.
        "have their cake and eat it too" allegations only work if you're talking about the same entity. The copyright maximalist corporations (ie. publishers) aren't the same as the permissive ones (ie. AI companies). Making such characterizations make as much sense as saying "citizens don't get to eat their cake and eat it too", when referring to the fact that citizens are anti-AI, but freely pirate movies.
        [-]
        _aavaa_ 23 minutes ago ago
        Yes they are. Look at what happened when deepseek came out. Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony
        [-]
        gruez 15 minutes ago ago
        >Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony
        Can you link to the exact comments he made? My impression was that he was upset at the fact that they broke T&C of openai, and deepseek's claim of being much cheaper to train than openai didn't factor in the fact that it requried openai's model to bootstrap the training process. Neither of them directly contradict the claim that training is copyright infringement.
      - r053bud 43 minutes ago ago
        It’s barely facetious though. What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?
        [-]
        gruez 25 minutes ago ago
        >What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?
        Nothing. You don't even need the LLC. I don't think anyone got prosecuted for only downloading. All prosecutions were for distribution. Note that if you're torrenting, even if you stop the moment it's finished (and thus never goes to "seeding"), you're still uploading, and would count as distribution for the purposes of copyright law.
      - snozolli 41 minutes ago ago
        Either everything is free and there are no boundaries or we live by our own principles.
        Or C) large corporations (and the wealthy) do whatever they want while you still get extortion letters because your kid torrented a movie.
        They really do get to have their cake and eat it too, and I don't see any end to it.
- Bjorkbat an hour ago ago
  I broadly agree in that sure, unfettered access to copyrighted material will AI more capable, but more capable of what exactly?
  For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.
  Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.
  In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?
  I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.
  [-]
  - Teever 22 minutes ago ago
    If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.
    If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.
    This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.
- bigbuppo an hour ago ago
  The real problem here is that AI companies aren't even willing to follow the norms of big business and get the laws changed to meet their needs.
- bgwalter an hour ago ago
  The same president that is putting 145% tariffs on China could put 1000% tariffs on Internet chat bots located in China. Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).
  I'm not sure at all what China will do. I find it likely that they'll forbid AI at least for minors so that they do not become less intelligent.
  Military applications are another matter that are not really related to these copyright issues.
  [-]
  - pc86 an hour ago ago
    How exactly does one add a tariff to a foreign-based chat bot?
    [-]
    - bilbo0s an hour ago ago
      You know that 20 bucks a month a lot of people pay for chatgpt?
      Yeah..
      you tax it if the "chatgpt" is foreign.
  - gruez 23 minutes ago ago
    >Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).
    what if they route through third countries?
- therouwboat an hour ago ago
  If AI is so important, maybe it should be owned by the government and free to use for all citizens.
  [-]
  - pc86 an hour ago ago
    Name two non-military things that the government owns and aren't complete dumpster fires that barely do the thing they're supposed to do.
    Even (especially?) the military is a dumpster fire but it's at least very good at doing what it exists to do.
    [-]
    - pergadad an hour ago ago
      The government doesn't make tanks, it just shells out gigantic amounts to companies to make them.
      That said, there are plenty of successful government actions across the world, where Europe or Japan probably have a good advantage with solid public services. Think streets, healthcare, energy infrastructure, water infrastructure, rail, ...
    - azemetre an hour ago ago
      Medicaid, Medicare, and Social Security are all three programs that have massive approval from US citizens.
      Even saying the military is a dumpster fire isn't accurate. The military has led trillions of dollars worth of extraction for the wealthy and elite across the globe.
      In no sane world can you say that the ability to protect GLOBAL shipping lanes as a failure. That one service alone has probably paid for itself thousands of times.
      We aren't even talking about things like public education (high school education use to be privatized and something only the elites enjoyed 100 years ago; yes public high school education isn't even 100 years old) or libraries or public parks.
      ---
      I really don't understand this "gobermint iz bad" meme you see in tech circles.
      I get more out of my taxes compared to equivalent corporate bills that it's laughable.
      Government is comprised of people and the last 50 years has been the government mostly giving money and establishing programs to the small cohorts that have been hoarding all the wealth. Somehow this is never an issue with the government however.
      Also never understand the arguments from these types either because if you think the government is bad then you should want it to be better. Better mostly meaning having more money to redistribute and more personal to run programs, but it's never about these things. It's always attacking the government to make it worse at the expense of the people.
    - Buttons840 an hour ago ago
      Weather Forecasting
    - bongodongobob 18 minutes ago ago
      National Weather Service
      Library of Congress
      National Park Service
      U.S. Geological Survey (USGS)
      NASA
      Smithsonian Institution
      Centers for Disease Control and Prevention (CDC)
      Social Security Administration (SSA)
      Federal Aviation Administration (FAA) air traffic control
      U.S. Postal Service (USPS)
    - nilamo an hour ago ago
      1) art museums, specifically the Smithsonian, but nearly every major city has a decent one.
      2) state parks are pretty rad.
      [-]
      - standardUser 36 minutes ago ago
        The US federal government doesn't run most museums, but it does run the massive parks system with 20k employees (pre-Musk) and that system enjoys extremely high ratings from guests.
    - sklargh an hour ago ago
      Hi. Assuming the US here. Depends on scope of analysis and dumpster fire definition.
      1. The National Weather Service. Crown jewel and very effective at predicting the weather and forecasting life threatening events.
      2. IRS, generally very good at collecting revenue. 3. National Interagency Fire Service / US Forest service tactical fire suppression
      4. NTSB/US Chemicals Safety Board - Both highly regarded.
      5. Medicare - Basically clung to with talons by seniors, revealed preference is that they love it.
      6. DOE National Labs
      7. NIH (spicy pick)
      8. Highway System
      There are valid critiques of all of these but I don’t think any of them could be universally categorized as a complete dumpster fire.
    - zem an hour ago ago
      post office and USDA (pre trump regime slash-and-burn of course)
    - lappet an hour ago ago
      Highways
    - bilbo0s an hour ago ago
      That's a trick question.
      I mean, name 2 things anyone owns that aren't dumpster fires?
      Long time ago industrial engineers used to say, "Even Toyota has recalls."
      Something being a dumpster fire is so common nowadays that you really need a better reason to argue in support of a given entity's ownership. (Or even non-ownership for that matter.)
- asddubs an hour ago ago
  you could apply that same logic to any IP breaches though, not just AI
mattxxx 3 hours ago ago
Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:
1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law
My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.
Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.
[-]
- madeofpalk 3 hours ago ago
  > Humans can read a book, get inspiration, and write a new book and not be litigated against
  Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.
  https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...
  [-]
  - jrajav 3 hours ago ago
    If you follow these cases more closely over time you'll find that they're less an example of humans stealing work from others and more an example of typical human greed and pride. Old, well established musicians arguing that younger musicians stole from them for using a chord progression used in dozens of songs before their own original, or a melody on the pentatonic scale that sounds like many melodies on the pentatonic scale do. It gets ridiculous.
    Plus, all art is derivative in some sense, it's almost always just a matter of degree.
- zelphirkalt 2 hours ago ago
  The law covers these cases pretty well, it is just that the law has very powerful extremely rich adversaries, whose greed has gotten the better of them again and again. They could use work released sufficiently long ago to be legally available, or they could take work released as creative commons, or they could run a lookup, to make sure to never output verbatim copies of input or outputs, that are within a certain string editing distance, depending on output length, or they could have paid people to reach out to all the people, whose work they are infringing upon. But they didn't do any of that, of course, because they think they are above the law.
  [-]
  - nadermx 2 hours ago ago
    I'm confused, so you're saying its illegal? Because last I checked it's still in the process of going through the courts. And need we forget that copyright's purpose is to advance the arts and sciences. Fair use is codified into law, which states each case is seen on a use by use basis, hence the litigation to determine if it is in fact, legal.
    [-]
    - mdhb 2 hours ago ago
      It’s so fucking obviously illegal when you think about it rationally for more than a few seconds. We aren’t even talking about “fair use” we are talking about how it works in practice which was Meta torrenting pirated books, never paying anyone a cent and straight up stealing the content at scale.
      [-]
      - nadermx an hour ago ago
        The fact you are even using the word stealing, is telling to your lack of knowledge in this field. Copyright infringement is not stealing[0]. The propaganda of the copyright cartel has gotten to you.
        [0] https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985...
      - Intralexical an hour ago ago
        A test to apply here: If you or I did this, would it be illegal? Would we even be having this conversation?
        The law is supposed to be impartial. So if the answer is different, then it's not really a law problem we're talking about.
  - ashoeafoot 2 hours ago ago
    Obviously a revenue tracking weight should be trained in allowing the tracking and collection of all values generated from derivative works.
- palmotea 3 hours ago ago
  > My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.
  The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.
  [-]
  - ulbu 3 hours ago ago
    these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.
    abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments
    [-]
    - temporalparts 2 hours ago ago
      The problem isn't that people aren't aware that the scale and magnitude differences are large and significant.
      It's that the space of intellectual property LAW does not handle the robust capabilities of LLMs. Legislators NEED to pass laws to reflect the new realities or else all prior case law relies on human analogies which fail in the obvious ways you alluded to.
      If there was no law governing the use of death stars and mass murder, and the only legal analogy is to environmental damage, then the only crime the legal system can ascribe is mass environmental damage.
      [-]
      - Intralexical 2 hours ago ago
        Why do you think the obvious analogy is LLM=Human, and not LLM=JPEG or LLM=database?
        I think you're overstating the legal uniqueness of LLMs. They're covered just fine by the existing legal precedents around copyrighted and derived works, just as building a death star would be covered by existing rules around outer space use and WMDs. Pretending they should be treated differently is IMO the entire lie told by the "AI" companies about copyright.
        [-]
        sdenton4 an hour ago ago
        LLMs are certainly not a jpeg or a database...
        The google news snippets case is, in my non-lawyer opinion, the most obvious touch point. And in that case, it was decided that providing large numbers of snippets in search results was non-infringing, despite being a case of copying text from other people at-scale... And the reasons this was decided are worth reading and internalizing.
        There is not an obvious right answer here. Copyright rules are, in fact, Calvinball, and we're deep in uncharted territory.
        [-]
        Intralexical an hour ago ago
        > LLMs are certainly not a jpeg or a database...
        Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials.
        The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists.
        [-]
        SilasX 40 minutes ago ago
        The problem is, you can say all of that for human learning-from-copyrighted-works, so that point isn't definitive.
    - staticman2 an hour ago ago
      > these comparisons of llms with human artists copying are just ridiculous.
      I've come to think of this as the "Performatively failing to recognize the difference between an organism and a machine" rhetorical device that people employ here and elsewhere.
      The person making the argument is capable of distinguishing the two things, they just performatively choose not to do so.
    - Intralexical 2 hours ago ago
      It's a very consistently Silicon Valley mindset. Seems like almost every company that makes it big in tech, be it Facebook and Google monetizing our personal data, or Uber and Amazon trampling workers' rights, makes money by reducing people to objects that can be bought and sold, more than almost any other industry. No matter the company, all claimed prosocial intentions are just window dressing to convince us to be on board with our own commodification.
      That's also why I'm really not worried about the "AI singularity" folks. The hype is IMO blatantly unsubstantiated by the actual capabilities, but gets pushed anyway only because it speaks to this deep-seated faith held across the industry. "AI" is the culmination of an innate belief that people should be replaceable, fungible, perfectly obedient objects, and such a psychosis blinds decision-makers to its actual limits. Only trouble is whether they have the political power to try to force it anyway.
  - gruez 30 minutes ago ago
    >The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.
    That might be true but I don't see how it's relevant. There's no provision in copyright law that gives a free pass to humans vs machines, or makes a distinction between them.
    [-]
    - moralestapia 2 minutes ago ago
      In the case of Copyright law, no provision means it will fall in "forbidden" land, not in "allowed" land.
      Also in general, grey areas don't mean those things are legal.
  - jobigoud 3 hours ago ago
    We are talking about the rights of the humans training the models and the humans using the models to create new things.
    Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.
    [-]
    - spacemadness 2 hours ago ago
      Sounds like we’re talking about the right of AI company founders and people on HN to acquire wealth from creative works due to some weak argument concerning similarity to the human mind and creation of art. Since we’ve now veered into armchair philosophy territory, I think one could argue that the way human memory works and creates, both physically and mentally, from inspiration is vastly different from how AI works. So saying they’re the same and that’s it is both lazy and takes interesting questions off the table to squash debate.
    - MyOutfitIsVague 3 hours ago ago
      It's not only publication, otherwise people wouldn't be able to be successfully sued for downloading and consuming copyrighted content, it would only be the uploaders who get into trouble.
      [-]
      - HappMacDonald 2 hours ago ago
        Do you have any links to cases where people were sued for downloading and consuming content without also uploading (eg, bittorent), hosting, sharing the copyrighted works, etc?
    - bgwalter 2 hours ago ago
      Does the distinction matter? If humans build a machine that uses so much oxygen that the oxygen levels on earth drop by half, can they say:
      "Humans are allowed to breathe, so our machine is too, because it is operated by humans!"
      [-]
      - TeMPOraL 2 hours ago ago
        Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".
        Point being, laws aren't some God-ordained rules, beautiful in their fractal recursive abstraction, perfectly covering everything that will ever happen in the universe. No, laws are more or less crude hacks that deal with here and now. Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI. This is a new situation, and laws need to be updated to cover it.
        [-]
        palmotea an hour ago ago
        > Yes, and then the response would be, "what have you done, we now need to pass laws about oxygen consumption where before we didn't".
        Except in this case, we already have the equivalent of "laws about oxygen consumption": copyright.
        > Intellectual property rights were questionable from the start and only got worse; they've been barely keeping up with digital media in the past couple decades, and they're entirely ill-equipped to deal with generative AI.
        The laws are not "entirely ill-equipped to deal with generative AI," unless your interests lie in breaking them. All the hand-waving about the laws being "questionable" and "entirely ill-equipped" is just noise.
        Under current law OpenAI, Google, etc. have no right to cheap training data, because someone made that data and may have the reasonable interest in getting paid for their efforts. Like all businesses, those companies would ideally like the law to be unfairly biased towards them: to protect them when they charge as much as they can, but not protect anyone else so they can pay as little as possible.
    - palmotea an hour ago ago
      >>> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.
      >> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.
      > We are talking about the rights of the humans training the models and the humans using the models to create new things.
      Then that's even easier, because that prevents appeals to things humans do, like learning, from muddying the waters.
      If "training the models" entails loading up copyrighted works into your system (e.g. encoded them during training), you've just copied them into a retrieval system and violated copyright based on established precedent. And people have prompted verbatim copyrighted text out of well-known LLMs, which makes it even clearer.
      And then to defend LLM training you're left with BS akin to claiming an ASCII encoded copy of a book not a copyright violation, because the book is paper and ASCII is numbers.
  - Intralexical 2 hours ago ago
    > The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.
    The direction we're going, it seems more likely it'll be recycling to murder a human.
- timdiggerm 3 hours ago ago
  Or we could acknowledge that something could be a bad idea, despite its utility
- ActionHank 3 hours ago ago
  Assuming this means copyright is dead, companies will be vary upset and patents will likely follow.
  The hold US companies have on the world will be dead too.
  I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.
- ceejayoz 3 hours ago ago
  > Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.
  You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".
  [-]
  - WesolyKubeczek 3 hours ago ago
    No, but you are most likely allowed to commercially publish "Hairy Potter and the Philosophizer's Rock", a story about a prehistoric community. The hero is literally a hairy potter who steals a rock from a lazy deadbeat dude who is pestering the rest of the group with his weird ideas.
    [-]
    - zelphirkalt an hour ago ago
      Not sure what you are getting at?
- franczesko 2 hours ago ago
  > Piracy refers to the illegal act of copying, distributing, or using copyrighted material without authorization. It can occur in various forms
  Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me
- regularjack 3 hours ago ago
  Then they need to be changed for everyone and not just AI companies, but we all know that ain't happening.
- SilasX an hour ago ago
  >My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.
  Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.
  Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.
- bitfilped an hour ago ago
  Sorry but AI isn't that useful and I don't see it becoming any more useful in the near term. It's taken since ~1950 to get LLMs working well enough to become popular and they still don't work well.
- jeroenhd 3 hours ago ago
  Pirating movies is also useful, because I can watch movies without paying on devices that apps and accounts don't work on.
  That doesn't make piracy legal, even though I get a lot of use out of it.
  Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.
  [-]
  - Workaccount2 3 hours ago ago
    It's only complete non-sense if you understand how humans learn. Which we don't.
    What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.
    Compare this to something like a search indexed database, where the recall of information given to it is perfect.
    [-]
    - zelphirkalt an hour ago ago
      Well, you don't get to pick and choose in which situations an LLM is considered similar to a human being and in which not. If you argue that it similarly to a human is lossy, well let's go ahead and get most output checked by organizations and courts for violations of the law and licenses, just like human work is. Oh wait, I forgot, LLMs are run by companies with too much cash to successfully sue them. I guess we just have to live with it then, what a pity.
      [-]
      - philipkglass 12 minutes ago ago
        There are a couple of ways that models could theoretically prevent copyright violations in output. For closed models that aren't distributed as weights, companies could index perceptual hashes of all the training data at a granular level (like individual paragraphs of text) and check/retry output so that no duplicates or near-duplicates of training data ever get served as a response to customers.
        Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.
  - datavirtue an hour ago ago
    And everyone here is downloading every show and movie in existence without even a hint of guilt.
- stevenAthompson 3 hours ago ago
  Doing a cover song requires permission, and doing it without that permission can be illegal. Being inspired by a song to write your own is very legal.
  AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.
  Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.
  [-]
  - mjburgess 3 hours ago ago
    I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have. But go ahead, tare down the law to whatever inanity can be described by the trivial machines of the world's current popular charlatans. Presumably you weren't using society's presumption of your agency anyway.
  - toast0 3 hours ago ago
    > Doing a cover song requires permission, and doing it without that permission can be illegal.
    I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.
  - datavirtue an hour ago ago
    "If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem."
    Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.
- apercu 2 hours ago ago
  >Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.
  Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.
  Why the comparison?
- vessenes 3 hours ago ago
  Thank you - a voice of sanity on this important topic.
  I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.
  Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.
  "People should not be allowed to read the book I distributed online if I don't want them to."
  "People should not be allowed to write Harry Potter fanfic in my writing style."
  "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."
  We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.
  [-]
  - caconym_ an hour ago ago
    If it was as obvious as you claim, the legal issues would already be settled, and your characterization of what LLMs are doing as "reading and summarizing" is hilariously disingenuous and ignores essentially the entire substance of the debate (which is happening not just on internet forums but in real courts, where real legal professionals and scholars are grappling with how to fit AI into our framework of existing copyright law, e.g.^[1]).
    Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.
    ^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...
  - datavirtue 2 hours ago ago
    Exactly, it is an immense privilege to have your works preserved and promulgated through the ages for instant recall and automated publishing. It's literally what everyone wants. The creators and the consumers. The AI companies are not robbing your money or IP. Period.
  - jasonlotito 2 hours ago ago
    > But I don't understand how people jump to "Copyright Violation" for the fact of reading.
    The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.
    Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.
    > "People should not be allowed to read the book I distributed online if I don't want them to."
    This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.
    So it's odd you would use "insane" to describe this.
    > "People should not be allowed to write Harry Potter fanfic in my writing style."
    Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.
    > "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."
    Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.
    Of course, technically speaking, none of this is what gen AI models are doing.
    > We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics
    I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.
    [-]
    - Aerroon 2 hours ago ago
      >Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it.
      Which is a clear failure of the copyright system. Millions of people are expanding our cultural artifacts with their own additions, but all of it is illegal, because they haven't waited another 100 years.
      People are interested in these pieces of culture, but they're not going to remain interested in them forever. At least not interested enough to make their own contributions.
    - vessenes an hour ago ago
      > Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.
      My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like. They sense and fear change. For instance here you say it's an issue when AI uses something as a source that you don't have Copyright to. Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue". What you said just isn't true. The copyright refers to the right to copy a work.
      Distribution: Sure. License your content however you want. That said, in the US a license prohibiting you from READING something just wouldn't be possible. You can limit distribution, copying, etc. This is how journalists can write about sneak previews or leaked information or misfiled court documents released when they should be under seal. The leaking <-- the distribution might violate a contract or a license, but the reading thereof is really not a thing that US law or Common law think they have a right to control, except in the case of the state classifying secrets. As well, here we have people saying "my song in 1983 that I put out on the radio, I don't want AI listening to that song." Did your license in 1983 prohibit computers from processing your song? Does that mean digital radio can't send it out? Essentially that ship has sailed, full stop, without new legislation.
      On my last points, I think you're missing my point, Fan fiction is legal if you're not trying to profit from it. It is almost impossible to perfectly copy a painting, although some people are pretty good at it. I think it's perfectly legal to paint a super close copy of say Starry Night, and sell it as "Starry night by Jason Lotito." In any event, the discourse right now claims its wrong for AI to look at and learn from paintings and photographs.
      [-]
      - jasonlotito an hour ago ago
        > My proposal is that it's a luddish kneejerk reaction to things people don't understand and don't like.
        Your proposal is moving goal posts.
        > Allow me to update your sentence: "Every paper every scientist or academic wrote that references any copyrighted work becomes an issue".
        No, I never said that. Fair Use exists.
        > Fan fiction is legal if you're not trying to profit from it.
        No, it's not.[1] You can make arguments that it should be, but, no.
        [1] https://jipel.law.nyu.edu/is-fanfiction-legal/
        > I think you're missing my point
        I think you got called out, and you are now trying to reframe your original comment so it comes across as having accounted for the things you were called out on.
        You think you know what you are talking about, but you don't. But, you rely on the fact that you think you do to lose the money you do.
    - datavirtue 2 hours ago ago
      "However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to."
      Absolute horse shit. I can start a 1-900 answer line and use any reference I want to answer your question.
      [-]
      - jasonlotito an hour ago ago
        > Absolute horse shit.
        I agree, what followed was.
        > I can start a 1-900 answer line and use any reference I want to answer your question
        Yeah, that's not what we are talking about. If you think it was, you should probably do some more research on the topic.
wnevets an hour ago ago
> Minnesota woman to pay $220,000 fine for 24 illegally downloaded songs [1]
https://www.theguardian.com/technology/2012/sep/11/minnesota... [1]
[-]
- gruez 37 minutes ago ago
  How is this relevant?
  >The RIAA accused her of downloading and distributing more than 1,700 music files on file-sharing site KaZaA
  Emphasis mine. I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.
  [-]
  - jofla_net 24 minutes ago ago
    Who knew alls she needed was to change the tempo, pitch, timbre, add/remove lyrics, add/subtract a few notes, rearrange harmony, put it behind a web portal with a fancy name, claim it had an inspirational muse or assume all mortal beings as being without one in the first place so it doesn't matter, and proceed to make millions off of said process methodically rather than giving it away for free, and she'd be right as rain.
    [-]
    - glimshe 14 minutes ago ago
      You just described pop music making. Change tempo, pitch, add/remove lyrics, etc from prior art.
  - wnevets 33 minutes ago ago
    > I think most people would agree that whatever AI companies are doing with training AI models is different than sending verbatim copies to random people on the internet.
    I think most artist who had their works "trained by AI" without compensation would disagree with you.
    [-]
    - EMIRELADERO 21 minutes ago ago
      The question is: would that disagreement have the same basis as the news above? I don't think so. Artists that are against GenAI take that stance out of a perceived abstract unfairness of the situation, where the AI companies aren't copy-pasting the works per-se with each generation, but rather "taking" the "sweat of the brow" of those artists. You can agree or not about this being an actual problem, but that's where the main claim is.
      [-]
      - wnevets 11 minutes ago ago
        > would that disagreement have the same basis as the news above?
        Yes. An artist's style can and sometimes is their IP.
    - gruez 17 minutes ago ago
      Studio ghibli[1] might object to both people pirating their films and AI companies allowing their art style to be duplicated, but that's not the same as saying those two things are the same. Sharing a movie rip on bittorrent is obviously different than training an AI model that can reproduce the studio ghbili style, even to diehard AI opponents.
      [1] used purely as an example
prvc 5 hours ago ago
The released draft report seems merely to be a litany of copyright holder complaints repeated verbatim, with little depth of reasoning to support the conclusions it makes.
[-]
- bgwalter 4 hours ago ago
  The required reasoning is not very deep though: If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.
  If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.
  Only large corporations get away with it.
  [-]
  - scraptor 4 hours ago ago
    Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.
    [-]
    - palmotea 3 hours ago ago
      > Plagiarism is not an issue of copyright law, it's an entirely separate system of rules maintained by academia. The US Copyright Office has no business having opinions about it. If a AI^W human reads 100 papers and then churns out a new one this is usually called research.
      If you draw a Venn Diagram of plagiarism and copyright violations, there's a big intersection. For example: if I take your paper, scratch off your name, make some minor tweaks, and submit it; I'm guilty of both plagiarism and copyright violation.
    - dfxm12 4 hours ago ago
      Please argue in good faith. A new research paper is obviously materially different from "rearranging that text to create a marginally new text".
      [-]
      - int_19h 4 hours ago ago
        "Rearranging text" is not what modern LLMs do though, unless you specifically ask them to.
        [-]
        dfxm12 2 hours ago ago
        I didn't make this claim. Feel free to bring a cogent argument to a commenter who did.
        [-]
        gruez 12 minutes ago ago
        >I didn't make this claim
        ???
        Did you not literally comment the following?
        >A new research paper is obviously materially different from "rearranging that text to create a marginally new text".
        What did you mean by that, if that's not your claim?
      - shkkmo 4 hours ago ago
        The comment is responding to this line:
        > If an AI reads 100 scientific papers and churns out a new one, it is plagiarism.
        That is a specific claim that is being directly addressed and pretty clearly qualifies as "good faith".
    - biophysboy 2 hours ago ago
      Having actually done research and published scientific papers, the key limitation is experimentation. Review papers are useful, and AI is useful, but creating new knowledge is more useful. I haven't had much luck using LLMs to extrapolate well beyond their knowledge domain.
    - ta1243 4 hours ago ago
      Only when those papers are referenced
  - glial 4 hours ago ago
    It reminds me of the old joke.
    "To steal ideas from one person is plagiarism; to steal from many is research."
  - wizee 2 hours ago ago
    Is reading and memorizing a copyrighted text a breach of copyright? I.e. is creating a copy of the text in your mind a breach of copyright or fair fair use? Is it a breach of copyright if a digital “mind” similarly memorizes copyrighted text? Or is it only a breach of copyright to output and publish that memorized text?
    What about loosely memorizing the gist of a copyrighted text. Is that a breach or fair use? What if a machine does something similar?
    This falls under a rather murky area of the law that is not well defined.
    [-]
    - aeonik an hour ago ago
      "Filthy eidetics. Their freeloading had become too much for our society to bear. Something had to be done. We found the mutation in their hippocampus and released a new CRISPR-mRNA-based gene suppression system.
      Those who were immune were put under the scalpel."
  - satanfirst 4 hours ago ago
    That's not logical. If the savant has perfect recall and makes minor edits they are like a digital copy and aren't really like a human, neural network or by extension any other ML model that isn't over-fitted.
  - tantalor 4 hours ago ago
    If AI really could "churn out a new scientific paper" we would all be ecstatically rejoicing in the dawning of an age of AGI. We are nowhere near that.
    [-]
    - viraptor 4 hours ago ago
      We're relatively close already https://openreview.net/pdf?id=12T3Nt22av And we don't need anything even close to AGI to achieve that.
  - JKCalhoun 3 hours ago ago
    My understanding — LLMs are nothing at all like a "savant with perfect recall".
    More like a speed-reader who retains a schema-level grasp of what they’ve read.
  - Maxatar 4 hours ago ago
    Plagiarism isn't illegal, has nothing to do with the law.
    [-]
    - shkkmo 4 hours ago ago
      Plagarism is often illegal. If you use plagarism to obtain a financial or other benefit, that can be fraud.
      [-]
      - jobigoud 3 hours ago ago
        That further drives the point that the issue is not what the AI is doing but what people using it are doing.
  - mr_toad 3 hours ago ago
    > If a savant has perfect recall
    AI don’t have perfect recall.
  - shkkmo 4 hours ago ago
    > If a savant has perfect recall, remembers text perfectly and rearranges that text to create a marginally new text, he'd be sued for breach of copyright.
    Any suits would be based on the degree the marginally new copy was fair use. You wouldn't be able to sue the savant for reading and remembering the text.
    Using AI to creat marginally new copies of copyrighted work is ALREADY a violation. We don't need a dramatic expansion of copyright law that says that just giving the savant the book to real is a copyright violation.
    Plagarism and copyright are two entirely different things. Plagarism is about citations and intellectual integrity. Copyright is a about protecting economic interests, has nothing to to with intellectual integrity, and isn't resolved by citing the original work. In fact most of the contexts where you would be accused of plagarism, would be places like reporting, criticism, education or research goals make fair use arguments much easier.
- nadermx 4 hours ago ago
  Not only does it read like a litany[0]. It seems like the copyright holders are not happy with how the meta case is working through court and are trying to sidestep fair use entirely.
  https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
  [-]
  - mr_toad 3 hours ago ago
    Copywriter holders have always hated fair use, and often like to pretend it doesn’t exist.
    The average copywrite holder would like you to think that the law only allows use of their works in ways that they specifically permit, i.e. that which is not explicitly permitted is forbidden.
    But the law is largely the reverse; it only denies use of copyright works in certain ways. That which is not specifically forbidden is permitted.
    [-]
    - ls612 2 hours ago ago
      That used to be how it worked. Then the DMCA 1201 provisions arrived and so now anything not expressly permitted by the enumerated exceptions is forbidden. Even talking about how it works is punishable as a felony (upheld by SCOTUS in like 2000 or 2001, they basically said the Copyright clause is in the constitution so the government can censor information on how to defeat DRM).
      [-]
      - nadermx an hour ago ago
        Breaking DRM, is in fact, Fair Use: https://www.ca5.uscourts.gov/opinions/pub/08/08-10521-CV0.wp...
- raverbashing 5 hours ago ago
  I don't have much spare sympathy here honestly
stevetron 2 hours ago ago
It's amazing the amount of bad deeds coming out of the current administration in support of special interests.
Workaccount2 2 hours ago ago
I have yet to see someone explain in detail how transformer model training works (showing they understand the technical nitty gritty and the overall architecture of transformers) and also layout a case for why it is clearly a violation of copyright.
You can find lots of people talking about training, and you can find lots (way more) of people talking about AI training being a violation of copyright, but you can't find anyone talking about both.
Edit: Let me just clarify that I am talking about training, not inference (output).
[-]
- jfengel 2 hours ago ago
  I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.
  It's less clear whether taking vast amounts of copyrighted material and using it to generate other things rises to the level of copyright violation or not. It's the kind of thing that people would have prevented if it had occurred to them, by writing terms of use that explicitly forbid it. (Which probably means that the Web becomes a much smaller place.)
  Your comment seems to suggest that writers and artists have absolutely no conceivable stake in products derived from their work, and that it's purely a misunderstanding on their part. But I'm both a computer scientist and an artist and I don't see how you could reach that conclusion. If my work is not relevant then leave it out.
  [-]
  - gruez 2 hours ago ago
    >I'm not sure I understand your question. It's reasonably clear that transformers get caught reproducing material that they have no right to. The kind of thing that would potentially result in a lawsuit if you did it by hand.
    Is that a problem with the tool, or the person using it? A photocopier can copy an entire book verbatim. Should that be illegal? Or is it the problem that the "training" process can produce a model that has the ability to reproduce copyrighted work? If so, what implication does that hold for human learning? Many people can recite an entire song's lyrics from scratch, and reproducing an entire song's lyrics verbatim is probably enough to be considered copyright infringement. Does that mean the process of a human listening to music counts as copyright infringement?
    [-]
    - empath75 2 hours ago ago
      Let's start with I think a case that everyone agrees with.
      If I were to take an image, and compress it or encrypt it, and then show you data file, you would not be able to see the original copyrighted material anywhere in the data.
      But if you had the right computer program, you could use it to regenerate the original image flawlessly.
      I think most people would easily agree that distributing the encrypted file without permission is still a distribution of a copyrighted work and against the law.
      What if you used _lossy_ encryption, and can merely reproduce a poor quality jpeg of the original image? I think still copyright infringement, right?
      Would it matter if you distributed it with an executable that only rendered the image non-deterministically? Maybe one out of 10 times? Or if the command to reproduce it was undocumented?
      Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.
      I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.
      I think, legally, it's pretty clear that it is illegally distributing copyrighted material without permission. I think calling it an "ai" just needlessly anthropomorphizes everything. It's a computer program that distributes copyrighted work without permission. It doesn't matter if it's the primary purpose or not.
      I think probably there needs to be some kind of new law to fix this situation, but under the current law as it exists, it seems to me to be clearly illegal.
      [-]
      - halkony 15 minutes ago ago
        > I do not think it is meaningfully different from the simpler example, just with a lot of extra steps.
        Those extra steps are meaningfully different. In your description, a casual observer could compare the two JPEGs and recognize the inferior copy. However, AI has become so advanced that such detection is becoming impossible. It is clearly voodoo.
      - Workaccount2 44 minutes ago ago
        The crux of the debate is a motte and bailey.
        AI is capable of reproducing copyright (motte) therefore training on copyright is illegal (bailey).
      - gruez an hour ago ago
        >Okay, so now we have AI. We can ignore the algorithm entirely and how it works, because it's not relevant. There is a large amount of data that it operates on, the weights of the model and so on. You _can_ with the correct prompts, sometimes generate a copy of a copyrighted work, to some degree of fidelity or another.
        Suppose we accept all of the above. What does that hold for human learning?
        [-]
        empath75 27 minutes ago ago
        If a human were to reproduce, from memory, a copyrighted work, that would be illegal as well, and multiple people have been sued over it, even doing it unintentionally.
        I'm not talking about learning. I'm talking about the complete reproduction of a copyrighted work. It doesn't matter how it happens.
  - tensor 2 hours ago ago
    If I write a math book, and you read it, then tell someone about the math within it. You are not violating copyright. In fact, you could write your OWN math book, or history book, or whatever, and as long as you're not copying my actual text, you are not violating copyright.
    However, when an LLM does the same, people now what it to be illegal. It seems pretty straightforward to apply existing copyright law to LLMs in the same way we apply them to humans. If the actual text they generate is substantially similar to a source material that it would constitute a copyright violation if a human were to have done it, then it should be illegal. Otherwise it should not.
    edit: and in fact it's not even whether an LLM reproduces text, it's wether someone subsequently publishes that text. The person publishing that text should be the one taking on the legal hit.
    [-]
    - rrook 34 minutes ago ago
      That mathematical formulas already cannot be copyrighted makes this a kinda nonsense example?
  - Workaccount2 2 hours ago ago
    My comment is about training models, not model inference.
    Most artists can readily violate copyright, that doesn't me we block them from seeing copyright.
    [-]
    - gitremote 2 hours ago ago
      The judgement was about model inference, not training.
      [-]
      - Workaccount2 an hour ago ago
        >"But making commercial use of vast troves of copyrighted works to produce expressive content"
        This can only be referring to training, the models themselves are a rounding error in size compared to their training sets.
- gitremote 2 hours ago ago
  They never said model training is a violation of copyright. The ruling says model training on copyrighted material for analysis and research is NOT copyright infringement, but the commercial use of the resulting model is:
  "When a model is deployed for purposes such as analysis or research… the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."
  [-]
  - Workaccount2 an hour ago ago
    The vast trove of copyright work has to refer to training. ChatGPT is likely on the order of 5-10TB in size. (Yes, Terabyte).
    There are college kids with bigger "copyright collections" than that...
    [-]
    - gitremote 18 minutes ago ago
      No. The paragraph as a whole refers to the "outputs" of vast troves of copyrighted work.
      Disk size is irrelevant. If you lossy-compress a copyrighted bitmap image to small JPEG image and then sell the JPEG image, it's still copyright infringement.
- moralestapia 4 minutes ago ago
  Because it's a machine that reproduces other people's work, who are copyrighted. Copyright protects original work even after it turns into derivative work.
  Some try to make the argument of "but that's what humans do and it's allowed", but that's not a real argument as it has not been proven, not it is easy to, that machine learning === human reasoning.
- belorn 2 hours ago ago
  I would also like to see such explanation, especially one that explains how it differ from regular transformers found in video codecs. Why is a lossy compression a clear violation of copyright, but not a generative AI?
- autobodie 2 hours ago ago
  I have yet to see someone explain in detail how writing the same words as another person works (showing they understand the technical nitty gritty and the overall architecture of the human mind) and also layout a case for why it is clearly a violation of copyright. You can find lots of people talking about reading, and you can find lots (way more) of people talking about plagarism being a violation of copyright, but you can't find anyone talking about both.
  [-]
  - xhkkffbf 2 hours ago ago
    A big part of copyright law is protecting the market for the original creator. Not guaranteeing them anything. Just preventing someone else from coming along and copying someone else's work in a way that hurts their sales.
    While AIs don't reproduce things verbatim like pirates, I can see how they really undermine the market, especially for non-fiction books. If people can get the facts without buying the original book, there's much less incentive for the original author to do the hard research and writing.
- jsiepkes 2 hours ago ago
  This isn't about training AI on a book, but AI companies never paying for the book at all. As in: They "downloaded the e-book from a warez site" and then used it for training.
  [-]
  - xhkkffbf 2 hours ago ago
    This is what's most offensive about it. At least buy one friggin copy.
- kranke155 2 hours ago ago
  It doesn’t matter how they work, it only matters what they do.
- dmoy 2 hours ago ago
  Not a ton of expert programmer + copyright lawyers, but I bet they're out there
  You can probably find a good number of expert programmer + patent lawyers. And presumably some of those osmose enough copyright knowledge from their coworkers to give a knowledgeable answer.
  At the end of the day though, the intersection of both doesn't matter. The lawyers win, so what really matters is who has the pulse on how the Fed Circuit will rule on this
  Also in this specific case from the article, it's irrelevant?
- nickpsecurity 2 hours ago ago
  I did here with proofs of infingement:
  https://gethisword.com/tech/exploringai/
- anhner 2 hours ago ago
  because people who understand how training works also understand that it's not a violation of copyright...
elif 2 hours ago ago
Intellectual property law is quickly becoming an institution of hegemonic corporate litigation of the spreading of ideas.
If it's illegal to know the entire contents of a book it is arbitrary to what degree you are able to codify that knowing itself into symbols.
If judges are permitted to rule here it is not about reproduction of commercial goods but about control of humanity's collective understanding.
throw0101c 4 hours ago ago
See "Copyright and Artificial Intelligence Part 3: Generative AI Training" (PDF):
* https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
Molitor5901 an hour ago ago
Representative Joe Morelle (D-NY), wrote the termination was “…surely no coincidence he acted less than a day after she refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models.”
Interesting, but everyone is mining copyrighted works to train AI models.
ChrisArchitect 3 hours ago ago
Earlier on the report pdf:
https://news.ycombinator.com/item?id=43955025
ChrisArchitect 3 hours ago ago
[dupe] https://news.ycombinator.com/item?id=43960518
renewiltord 3 hours ago ago
I wonder when general internet sentiment moved from pro-piracy to IP maximalism. Fascinating shift.
[-]
- wvenable an hour ago ago
  There's now an entire generation now that believes "Intellectual Property" is a real thing.
  Instead of the understanding that copyrights and patents are temporary state-granted monopolies meant to benefit society they are instead framed as real perpetual property rights. This framing fuels support for draconian laws and obscures the real purpose of these laws: to promote innovation and knowledge sharing and not to create eternal corporate fiefdoms.
- vharuck 2 hours ago ago
  Personally, I'd support an alternative to copyright for letting creators earn living expenses while working or in reward for good works. But it's a terrible thing to offer them the copyright system and then ignore it to use the works they hoped could earn money. And to further use those works to make something that will replace a lot of creative positions they've relied on because copyright only pays off after the work's been done.
  Maybe the government should set up a fund to pay all the copyright holders whose works were used to train the AI models. And if it's a pain to track down the rights holders, I'll play a tiny violin.
- ronsor 3 hours ago ago
  AI has made people lose their minds and principles. It's fascinating to observe.
  In the meantime, I will continue to dislike copyright regardless of the parties involved.
- Ekaros an hour ago ago
  Not having massively overfunded corporations exploit artists is not IP minimalism. Private persons stealing something is seen as tiny evil. But big corporation exploiting everyone else is entirely different thing.
- Ukv 2 hours ago ago
  No hard data to back this up, but anecdotally I'd place the AI/copyright sentiment shift around mid-late 2022. DALL-E 2 experimentation (e.g: [0]) in early-mid 2022 seemed to just about sneak by unaffected, receiving similar positive/curious reception to previous trends (TalkToTransformer, ArtBreeder, GPT-3/AI Dungeon, etc.), but then Stable Diffusion bore the full brunt of "machine learning is theft" arguments.
  [0]: https://x.com/xkcd/status/1552279517477183488
  [-]
  - renewiltord 2 hours ago ago
    Hmm, "when it got good" then. I think what you're saying makes sense to me.
- bgwalter 2 hours ago ago
  That is fairly easy to answer: When the infringement shifted from small people taking from Walt Disney to Silicon Valley taking from everyone, including open source authors and small YouTube channels.
  I find the shift of some right wing politicians and companies from "TPB and megaupload are criminals and its owners must be extradited from foreign countries!" to "Information wants to be free!" much more illuminating.
- throwaway1854 2 hours ago ago
  Apples and oranges - and also I don't know if anyone is really supporting IP maximalism.
  IP maximalism is requiring DRM tech in every computer and media-capable device that won't play anything without checking into a central server and also making it illegal to reverse or break that DRM. IP maximalism is extending the current bonkers time interval of copyright (over 100 years) to forever. If AI concerns manage to get this down to a reasonable, modern timeframe it'll be awesome.
  Record companies in the 90s tied the noose around their own necks, which is just as well because they're very useless now except for supporting geriatric bands. They should have started selling mp3s for 99 cents in 1997 and maybe they would have made a couple of dollars before their slide into irrelevance.
  The specific thing people don't want, which a few weirdos keep pushing, is AI-generated stuff passed off as new creative material. It's fine for fun and games, but no one wants a streaming service of AI-generated music, even if you can't tell it's AI generated. And the minute you think you have that cracked - that an AI can create music/art as good as a human and that humans can't tell, the humans will start making bad music/art in rebellion, and it'll be the cool new thing, and the armies of 10Kw GPUs will be wasting their energy on stuff an 1Mhz 8-bit machine could do in the 80s.
jagermo 2 hours ago ago
man, if we just had some napster fanboy in the oval office back then. Lot's of laws would not exist.
brador 4 hours ago ago
Lifetime for human copyright, 20 years for corporate copyright. That’s the golden zone.
[-]
- Zambyte 4 hours ago ago
  Zero (0) years for corporate copyright, zero (0) years for human copyright is the golden zone in my book.
  [-]
  - umanwizard 4 hours ago ago
    Why?
    [-]
    - Zambyte 4 hours ago ago
      It took me a while to be convinced that copyright is strictly a bad idea, but these two articles were very convincing to me.
      https://drewdevault.com/2020/08/24/Alice-in-Wonderland.html
      https://drewdevault.com/2021/12/23/Sustainable-creativity-po...
      [-]
      - dmonitor 3 hours ago ago
        You need some mechanism in place to prevent any joe schmoe from spinning up FreeSteam and rehosting the whole thing.
        [-]
        zelphirkalt 8 minutes ago ago
        Just to challenge that idea: Why?
        pitaj 2 hours ago ago
        There can be many incentives for people to use official sources: early access, easy updates, live events, etc
        [-]
        Zambyte an hour ago ago
        "Early access" doesn't work in this context, but yes for the other means.
      - SketchySeaBeast 3 hours ago ago
        The first article is saying that "Copyright is bad because of corporations", and I can kind of get behind that, especially the very long term copyrights that have lost the intent, but the second article says that artists will be happier without copyright if we just solve capitalism first. I don't know about you, but that reads to me like "If you wish to make an apple pie from scratch you must first invent the universe".
        If an artist produces a work they should have the rights to that work. If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
        [-]
        int_19h 2 hours ago ago
        The problem of "how do artists earn enough money to eat?" is legitimate, but I don't think it's a good idea to solve it by making things that inherently don't work like real property to work like it, just so that we can shove them into the same framework. And this is exactly what copyright does - it takes information, which can be copied essentially for free by its very fundamental nature, and tries to make it scarce through legal means solely so that it can be sold as if it were a real good.
        There are two reasons why it's a problem. The first reason is that any such abstraction is leaky, and those leaks are ripe for abuse. For example, in case of copyright on information, we made it behave like physical property for the consumers, but not for the producers (who still only need to expend resources to create a single work from scratch, and then duplicate it for free while still selling each copy for $$$). This means that selling information is much more lucrative than selling physical things, which is a big reason why our economy is so distorted towards the former now - just look at what the most profitable corporations on the market do.
        The second reason is that it artificially entrenches capitalism by enmeshing large parts of the economy into those mechanics, even if they aren't naturally a good fit. This then gets used as an argument to prop up the whole arrangement - "we can't change this, it would break too much!".
        jasonjayr 3 hours ago ago
        But in this idealized copyright-free world, those self-publishing companies could just as easily take Penguin's top sellers and reproduce those.
        The thing that'd set apart these companies are the services + quality of their work.
        [-]
        SketchySeaBeast 3 hours ago ago
        Is not part of the quality of the work the contents of the book? What are these companies putting within the pages? We've taken the greatest and longest part of the effort and made it meaningless.
        Zambyte 3 hours ago ago
        > If an artist produces a work they should have the rights to that work.
        That would indeed be nice, but as the article says, that's usually not the case. The rights holder and the author are almost never the same entity in commercial artistic endeavors. I know I'm not the rights holder for my erroneously-considered-art work (software).
        > If I self-publish a novel and then penguin decides that novel is really good and they want to publish it, without copyright they'd just do that, totally swamping me with their clout and punishing my ever putting the work out. That's a bad thing.
        Why? You created influential art and its influence was spread. Is that not the point of (good) art?
        [-]
        noirscape 3 hours ago ago
        It may surprise you, but artists need to buy things like food, water and pay for their basic necessities like electricity, rent and taxes. Otherwise they die or go bankrupt.
        In our current society, that means they need some sort of means to make money from their work. Copyright, at least in theory, exists to incentivize the creation of art by protecting an artists ability to monetize it.
        If you abolish copyright today, under our current economic framework, what will happen is that people create less art because it goes from a (semi-)viable career to just being completely worthless to pursue. It's simply not a feasible option unless you fundamentally restructure society (which is a different argument entirely.)
        [-]
        Zambyte an hour ago ago
        Amazing. Have you considered reading the articles I linked? They aren't even that long.
        SketchySeaBeast 3 hours ago ago
        > The rights holder and the author are almost never the same entity in commercial artistic endeavors.
        There's definitely problems with corporatization of ownership of these things, I won't disagree.
        > Why? You created influential art and its influence was spread. Is that not the point of (good) art?
        Why do we expect artists to be selfless? Do you think Stephen King is still writing only because he loves the art? You don't simply make software because you love it, right? Should people not be able to make money off their effort?
        [-]
        Zambyte an hour ago ago
        > You don't simply make software because you love it, right?
        I can't speak for Stephen but I absolutely do. I program for fun all the time.
        > Should people not be able to make money off their effort?
        Is anyone arguing otherwise?
    - whamlastxmas 3 hours ago ago
      Because the concept of owning an idea is really gross. Copyright means I can’t write about whatever I want in my own home even if I never distribute it or no one ever sees it. I’m breaking the law by privately writing Harry Potter fanfic in my journal or whatever. Copyright is supposed to be about encouraging intangibles, and the reality is that it only massively stifles it
      [-]
      - redwall_hp 3 hours ago ago
        Whole genres of music are based entirely on sampling, and they got screwed by copyright law as it evolved over the 90s and 2000s. Now only people with a sufficiently sized business backing them can truly participate, or they're stuck licensing things on Splice.
        And that's not even touching the spurious lawsuits about musical similarity. That's what musicians call a genre...
        It makes some sense for a very short term literal right to reproduction of a singular work, but any time the concept of derivative works comes into play, it's just a bizarrely dystopian suppression of art, under the supposition that art is commercial activity rather than an innate part of humanity.
      - otterley 3 hours ago ago
        Copyright doesn’t protect ideas. It protects expression of those ideas.
        Consider how many books exist on how to care for trees. Each one of them has similar ideas, but the way those ideas are expressed differ. Copyright protects the content of the book; it doesn’t protect the ideas of how to care for trees.
        [-]
        93po an hour ago ago
        Disney has a copyright over Moana. I would argue Moana is an idea in the sense that most people think of as ideas. Moana isn't tangle, it's not a physical good. It's not a plate on my table. It only exists in our heads. If I made a Moana comic book, with an entirely original storyline and original art and it was all drawn in my own style and not using 3D assets similar to their movies, that is violating copyright. Moana is an idea and there are a million ways to express the established character Moana, and Moana itself is an idea built on a million things that Disney doesn't have any rights to - history, culture, tropes, etc.
        I understand what you're saying but the way you're framing it isn't what I really have a problem with. I still don't agree with the idea that I can't make my own physical copies of Harry Potters books, identical word for word. I think people can choose to buy the physical books from the original publisher because they want to support them or like the idea that it's the "true" physical copy. And I'm going to push back on that a million times less than the concept of things like Moana comic books. But still, it's infringing copyright for me to make Moana comic books in my own home, in private, and never showing them to anyone. And that's ridiculous.
      - flats 3 hours ago ago
        I don’t believe this is true? I’m pretty sure that you’re prohibited from making money from that fan fiction, not from writing it at all. So I don’t understand the claim that copyright “massively stifles” creativity. There are of course examples of people not being able to make money on specific “ideas” because of copyright laws, but that doesn’t seem to me to be “massively stifling” creativity itself, especially given that it also protects and supports many people generating these ideas. And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?
        I mean, owning an idea is kinda gross, I agree. I also personally think that owning land is kinda gross. But we live in a capitalist society right now. If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs. Sam Altman, Elon Musk, and all the other tech CEOs will benefit in place of all of the artists I love and admire.
        That, to me, sucks.
        [-]
        Zambyte 3 hours ago ago
        > And if we got rid of copyright law, wouldn’t we be in that exact place, where people wouldn’t be allowed to make money off of creative endeavors?
        This is addressed in the second article I linked.
        93po an hour ago ago
        I will also add: there are tons of examples of companies taking down not for profit fanction or fan creation of stuff. Nintendo is very aggressive about this. The publisher of Harry Potter has also aggressively taken down not for profit fanfiction.
        > If we allow AI companies to train LLMs on copyrighted works without paying for that access, we are choosing to reward these companies instead of the humans who created the data upon which these companies are utterly reliant for said LLMs.
        It's interesting how much parallel there is here to the idea that company owners reap the rewards of their employee's labor when doing no additional work themselves. The fruits of labors should go to the individuals who labor, I 100% agree.
        93po an hour ago ago
        Copyright isn't about distribution, it's about creation. In reality the chances of getting in trouble is basically zero if you don't distribute it - who would know? But technically any creation, even in private, is violating copyright. Doesn't matter if you make money or put it on the internet.
        There is fair use, but fair is an affirmative defense to infringing copyright. By claiming fair use you are simultaneously admitting infringement. The idea that you have to defend your own private expression of ideas based on other ideas is still wrong in my view.
        [-]
        Zambyte 28 minutes ago ago
        > Copyright isn't about distribution, it's about creation
        This is exactly wrong. You can copy all of Harry Potter into your journal as many times as you want legally (creating copies) so long as you do not distribute it.
  - achierius 3 hours ago ago
    Well what we're getting is lifetime for corporate, and zero (0) for human. Hope you're happy.
    [-]
    - Zambyte 3 hours ago ago
      I'm not, because that's not what I asked for.
- GuB-42 3 hours ago ago
  The issue with lifetime (vs something like lifetime + X years) is that of inheritance.
  Assuming you agree with the idea of inheritance, which is another topic, then it is unfair to deny inheritance of intellectual property. For example if your father has built a house, it will be yours when he dies, it won't become a public house. So why would a book your father wrote just before he died become public domain the moment he dies. It is unfair to those doing who are doing intellectual work, especially older people.
  If you want short copyright, is would make more sense to make it 20 years, human or corporate, like patents.
  [-]
  - Ekaros an hour ago ago
    20 or 25 years from publication. Enough for anyone inhering it to exploit if they are children. No need to have more. It is not like house builder keeps getting paid after house has been build.
  - dghlsakjg 3 hours ago ago
    Then make it the greater of 20 years or the lifetime for humans.
    Comparing intellectual property to real or physical property makes no sense. Intellectual property is different because it is non exclusive. If you are living in your father’s house, no one else can be living there. If I am reading your fathers book, that has nothing to do with whether anyone else can read the book.
    [-]
    - GuB-42 an hour ago ago
      That intellectual property is non exclusive doesn't change the inheritance problem.
      If you consider it right to get value from the work of your family, and you consider that intellectual work (such as writing a book) to be valuable, then as an inheritor, you should get value from it. And since the way we give value to intellectual work is though copyright, then inheritors should inherit copyright.
      If you think that copyright should not exceed lifetime, then the logical consequences would be one of:
      - inheritance should be abolished
      - intellectual work is less valuable than other forms of work
      - intellectual property / copyright is not how intellectual work should be rewarded
      There are arguments for abolishing inheritance, it is after all one of the greatest sources of inequality. Essentially, it means 100% inheritance tax in addition to all the work going into the public domain. Problematic in practice.
      For the value of intellectual work, well, hard to argue against it on Hacker News without being a massive hypocrite.
      And there are alternatives to copyright (i.e. artificial scarcity) for compensating intellectual work like there are alternatives to capitalism. Unfortunately, it often turns out poorly in practice. One suggestion is to have some kind of tax that is fairly distributed between authors in exchange for having their work in the public domain. Problem is: define "fairly".
      Note that I am not saying that copyright should last long, you can make copyright 20 years, humans or corporate, inheritable. Simple, gets in the public domain sooner, fairer to older authors, already works for patents. Why insist on "lifetime"?
      [-]
      - dghlsakjg 37 minutes ago ago
        Agreed. I think it should be the greater of 20 years or the lifetime of the original authors.
  - MyOutfitIsVague 3 hours ago ago
    The issue with that is that inheritance only makes sense for tangible, scarce resources. Having copyright isn't easily analogous to ownership of a physical object, because an object is something you have and if somebody else has it, you can not have and use it.
    Copyright is about control. If you know a song and you sing it to yourself, somebody overhears it and starts humming it, they have not deprived you of the ability to still know and sing that song. You can make economic arguments, of deprived profit and financial incentives, and that's fine; I'm not arguing against copyright here (I am not a fan of copyright, it's just not my point at the moment), I'm just saying that inheritance does not naturally apply to copyright, because data and ideas are not scarce, finite goods. They are goods that feasibly everybody in the world can inherit rapidly without lessening the amount that any individual person gets.
    If real goods could be freely and easily copied the way data can, we might be having some very interesting debates about the logic and morality of inheriting your parents' house and depriving other people of having a copy.
codr7 4 hours ago ago
As if it wasn't already obvious to anyone paying attention that we're going to eat this shit voluntarily or kicking and screaming.
seper8 5 hours ago ago
(this is duplicate of https://news.ycombinator.com/item?id=43960518)
aurizon 3 hours ago ago
Ned Ludd heirs at last win - High Court rules the spinning Jenny IS ILLEGAL!. All machine made cloth and machines must be destroyed. This is the end of the road for all mechanical ways to make cloth. Get naked, boys 'n girls = this will be fun!
thomastjeffery 4 hours ago ago
> The remarks about Musk may refer to the billionaire’s recent endorsement of Twitter founder Jack Dorsey’s desire to “Delete all IP law"...
Yes please.
Delete it for everyone, not just these ridiculous autocrats. It's only helping them in the first place!
imafish 5 hours ago ago
news flash - the billionaires don't care about you.
hatenberg 2 hours ago ago
Big Tech: We shouldn’t pay, each individual piece of content is worth basically nothing.
Also Big Tech: We added 300.000.000 users worth of GTM because we trained in the 10 specific anime movies of Studio Ghibli and are selling their style.
[-]
- Aerroon 2 hours ago ago
  The funny thing is that style is not copyrightable.
  [-]
  - _trampeltier 2 hours ago ago
    Exept it's a rectangle with 4 rounded corner.
- nickpsecurity 2 hours ago ago
  "Pretraining data us worth basically nothing."
  (Raises $10 billion based on estimated worth of the resulting models.)
  "We can't share the GPT4 prettaining data or weights because they're trade secrets that generate over a billion in revenue for us."
  I'll believe they're worth nothing when (a) nobody is buying AI models or (b) AI companies stop using the copyrighted works to train models they sell. So far, it looks like they're lying about the worth of the training data.
achrono 4 hours ago ago
If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.
And for the future, here's one heuristic: if there is a profound violation of the law anywhere that (relatively speaking) is ignored or severely downplayed, it is likely that interested parties have arrived at an understanding. Or in other words, a conspiracy.
[1] There are tons of legal arguments on both sides, but for me it is enough to ask: if this is not illegal and is totally fair use (maybe even because, oh no look at what China's doing, etc.), why did they have to resort to & foster piracy in order to obtain this?
[-]
- NitpickLawyer 4 hours ago ago
  > If anyone was skeptical of the US government being deeply entrenched with these companies in letting this blatant violation of the spirit of the law [1] continue, this should hopefully secure the conclusion.
  European here, but why do you think this is so clear cut? There are other jurisdictions where training on copyrighted data has already been allowed by law/caselaw (Germany and Japan). Why do you need a conspiracy in the US?
  AFAICT the US copyright law deals with direct reproductions of a copyrighted piece of content (and also carves out some leeway with direct reproduction, like fair use). I think we can all agree by now that LLMs don't fully reproduce "letter perfect" content, right? What then is the "spirit" of the law that you think was broken here? Isn't this the definition of "transformative work"?
  Of note is also the other big case involving books - the one where google was allowed to process mountains of books, they were sued and allowed to continue. How is scanning & indexing tons of books different than scanning & "training" an LLM?
  [-]
  - AlotOfReading 3 hours ago ago
    Google asserted fair use in that case, which is an admission of (allowed) copyright infringement. They didn't turn books into a "new form", they provided limited excerpts that couldn't replace the original usage and directly incentivized purchases through normal sales channels while also providing new functionality.
    Contrast that with AI companies:
    They don't necessarily want to assert fair use, the results aren't necessarily publicly accessible, the work used isn't cited, users aren't directed to typical sales channels, and many common usages do meaningfully reduce the market for the original content (e.g. AI summaries for paywalled pages).
    It's not obvious to me as a non-lawyer that these situations are analogous, even if there's some superficial similarity.
- whycome 4 hours ago ago
  What’s your reading of the spirit of the law?
internet_rand0 3 hours ago ago
copyright is long overdue for a total rework
the internet demands it.
the people demand free mega upload for everybody, why? because we can (we seem to NOT want to, but that should be a politically solvable problem)
tempeler 4 hours ago ago
I think, A new chapter is about to begin. It seems that in the future, many IPs will become democratized — in other words, they will become public assets.
[-]
- SketchySeaBeast 4 hours ago ago
  "Democratized" as in large corporations are free to ingest the IPs and then reinterpret and censor them before they feed their version back to us, with us never having free access to the original source?
  [-]
  - rurban 3 hours ago ago
    "Democratized" in the meaning of fascistoized, right? Laws do not apply to the cartels, military, executive and secret services.
    [-]
    - tempeler 2 hours ago ago
      To defend yourself against those who don't play by the rules. it has to be democratized. The world isn’t a fair place.
- kmeisthax 3 hours ago ago
  They aren't going to legalize, say, publishing Mario fangames or whatever. They're just going to make copyright allow AI training, because AI is what the owner class wants. That's not democratizing IP, that's just prejudicial (dis)enforcement against the creative class.
  [-]
  - jobigoud 3 hours ago ago
    Millions of pages of fan fic based on existing IP have been written. There is a point where it doesn't really make sense trying to go after individuals especially if they make no money out of it.
    If we enter a world where anyone can create a new Mario game and there are thousands of them released on the public web it would be impossible for the rights holders to do anything, and it would be a PR bad move to go after individuals doing it for fun.
    [-]
    - int_19h 3 hours ago ago
      Imagine a world where all models capable of creating a new Mario game from scratch are only available through cloud providers which must implement mandatory filters such that asking "write me a Mario clone" (or anything functionally equivalent) gets you a lecture on don't-copy-that-floppy.
      Bad PR? The entire copyright enforcement industry has had bad PR pretty much since easy copying enabled grassroots piracy - i.e. since before computers even. It never stopped them. What are you going to do about it? Vote? But all the mainstream parties are onboard with the copyright lobby.
- AlexandrB 4 hours ago ago
  Public assets as long as you pay your monthly ChatGPT bill.
- Hoasi 2 hours ago ago
  “We used publicly available data” worked good enough for now. And yet OpenAI just accused China of stealing its content.
- numpad0 4 hours ago ago
  Oh yeah. It's the Cultural Revolution all over again.
- ahmeni 4 hours ago ago
  If only there was some sort of term for fake democracy where you're actually just there to plunder resources.
  [-]
  - gadders 4 hours ago ago
    Congress? https://www.capitoltrades.com/
  - tempeler 4 hours ago ago
    This idea does not belong to me. If lawmakers and regulators allow companies to use these IPs, how can you keep ordinary people away from them? Something created by AI is regarded as if it was created from scratch by human hands. that's reality.
evanjrowley 3 hours ago ago
If AI companies in the US are penalized for this, then the effect on copyright holders will only be slowed until foriegn AI companies overtake them. In such cases the legal recourse will be much slower and significantly limited.
[-]
- mitthrowaway2 3 hours ago ago
  Access to copyrighted materials might make for slightly better-trained models the way that access to more powerful GPUs does. But I don't think it will accelerate foundational advances in the underlying technology. If anything, maybe having to compete under tight constraints means AI companies will have to innovate more, rather than merely push scale.
  [-]
  - int_19h 2 hours ago ago
    The problem is that regardless of any innovations, scale still matters. If you figure out the technique to, say, make a model that is significantly better given N parameters - where N is just large enough to be the perfect fit for the amount of training data that you have access to - then someone else with access to more data will use the same technique to make a model with >N parameters, and it will be better than yours.
andy99 5 hours ago ago
Two different issues that while apparently related need separate consideration. Re the copyright finding, does the US copyright office have standing to make such a determination? Presumably not since various claims about AI and copyright are before the courts. Why did they write this finding?
[-]
- kklisura 5 hours ago ago
  > The Office is releasing this pre-publication version of Part 3 in response to congressional inquiries and expressions of interest from stakeholders
  They acknowledge the issue is before courts:
  > These issues are the subject of intense debate. Dozens of lawsuits are pending in the United States, focusing on the application of copyright’s fair use doctrine. Legislators around the world have proposed or enacted laws regarding the use of copyrighted works in AI training, whether to remove barriers or impose restrictions
  Why did they write the finding: I assume it's because it's their responsibility:
  > Pursuant to the Register of Copyrights’ statutory responsibility to “[c]onduct studies” and “[a]dvise Congress on national and international issues relating to copyright,”...
  All excerpts are from https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
- _heimdall 5 hours ago ago
  Given that the issue at hand is related to potential misuse of copyright protected material, it seems totally reasonable for the copyright office to investigate and potentially act to reconcile the issue.
  Sure the courts may find its out of their jurisdiction, but they should act as they see fit and let the courts settle that later.
- bgwalter 4 hours ago ago
  The US Supreme court has complained on multiple occasions that it is forced to do the work of the legislative.
  Why could a copyright office not advise the congress/senate to enact a law that forbids copyrighted material to be used in AI training? This is literally the politicians' job.
  [-]
  - 9283409232 4 hours ago ago
    Part of Congresses power is to defer that agencies it has created. Such as the US Copyright Office.
sophrocyne 3 hours ago ago
The USCO report was flawed, biased, and hypocritical. A pre-publication of this sort is also extremely unusual.
https://chatgptiseatingtheworld.com/2025/05/12/opinion-why-t...
[-]
- ceejayoz 3 hours ago ago
  What in https://chatgptiseatingtheworld.com/about/ says "ah, yes, trustworthy unbiased analysis" to you? Why should I trust this source's opinion?
  Pre-publication reports aren't unusual. https://www.federalregister.gov/public-inspection/current
  https://www.federalregister.gov/reader-aids/using-federalreg...
  > The Federal Register Act requires that the Office of the Federal Register (we) file documents for public inspection at our office in Washington, DC at least one business day before publication in the Federal Register.