Australian startup joins race to build local ChatGPT

(afr.com)

77 points | by yakkomajuri a day ago ago

25 comments

WaltPurvis a day ago ago
http://archive.today/ELsbZ
(Note: The word "local" in the headline means "in Australia")
crowcroft a day ago ago
A serious company would not consider this research. Zero evidence of anything is presented. No one in the organisation or advisory board have any expertise in building modern LLMs, and most are self-fashioned AI Execs that moved into the space all of about 2 years ago...
https://sovereign-au.ai/preserving-australias-digital-voice-...
[-]
- msy a day ago ago
  This is entirely on brand with the Australian startup scene, absolutely lousy with hangers-on, cosplayers and washed-out bankers & consultants that have never built a thing. There's a lot of seriously talented engineers trying to build real tech but they get drowned out by this kind of rubbish.
apparent a day ago ago
They're starting out allocating $10MM AUD for copyright payments, when Anthropic has just paid $2.3BB AUD to settle their lawsuit. While I give them credit for realizing that $10MM is just the starting point, I don't understand how they can possibly build a competitive model while spending less than $100MM when others are spending 20x that amount just on copyright.
[-]
- gpm a day ago ago
  Anthropic paid $3000USD per work because they pirated works and US copyright law comes with statutory damages completely unrelated to the amount it would have cost to acquire the same thing legally.
  The same thing, legally, according to the judge in that lawsuit would have been a purchased (potentially used) copy of the book scanned - i.e. what Anthropic also did after pirating works. It'd be surprising if that would cost even $30USD/work, two orders of magnitude less.
  $10AUD million doesn't seem sufficient for a competitive set (and as you say they aren't saying it is), but if you told me $50AUD million was enough to build a legal (according to Judge Alsup's interpretation of US law) repository of training data I would not be surprised.
  [-]
  - apparent a day ago ago
    If they spend half of their budget on copyright, does that leave enough for hardware, energy, salary, etc.?
- myhf a day ago ago
  LLM training is not fair use. It would cost trillions to genuinely secure the rights to use any data set that could include any excerpts of copyrighted work.
  The millions and billions you hear about in copyright "settlements" are just the amount it takes to bribe a local court, so $10MM is reasonable for Australia.
  [-]
  - rpdillon a day ago ago
    The earlier Anthropic case found that training was fair use. Anthropic got slammed for how they obtained the digital copies, not how they used them.
    > June 24 (Reuters) - A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use", opens new tab of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model.
    https://www.reuters.com/legal/litigation/anthropic-wins-key-...
    [-]
    - nutjob2 a day ago ago
      That hardly settles the fair use question. There are other cases that will address that question and will be appealed to higher courts where it will be settled, unless overridden by legislation.
      [-]
      - rpdillon a day ago ago
        The law lives. The current ruling is fair use. Yes, it could change, but it's not like no courts have looked at or ruled on this.
  - danielbln a day ago ago
    I don't think it has been settled yet of training is fair use or not. Also, how is a settlement a bribed court? The other party has to accept a settlement, not the judge.
    [-]
    - gpm a day ago ago
      > The other party has to accept a settlement, not the judge.
      Not to defend the absurd statement about a bribe, but with regards to the $1.5 billion dollar settlement this isn't quite true. Because it's a class action the judge will also have to approve the settlement - finding that it was fair and agreed to without collusion. This is done because the incentive structures set up a bit of a conflict of interest between class action lawyers and members of the class... Of course none of the settlement goes to the judge or court, there's no bribing going on. But judges do reject class action settlements sometimes.
      [-]
      - apparent a day ago ago
        You're right to point out that a judge's sign-off is also required, but it's a necessary-but-not-sufficient condition. The point GP was presumably trying to make was that bribing a judge doesn't get the job done. The first task is to convince the other side.
- crowcroft a day ago ago
  Do they even have $10mm? What exactly is being allocated?
  [-]
  - apparent a day ago ago
    I think the allocation is purely a theoretical exercise. They have apparently put in a million dollars or something, and are now looking to raise more.
loa_in_ a day ago ago
Looking for generous donors with this headline I'm sure.
[-]
- Maxious a day ago ago
  > Our AI future is being built overseas. We can’t afford that
  > Unless we develop our own sovereign AI capability from the ground up, organisations will forever be looking over their shoulder, dogged by fear of ending up on the front page of the papers for all the wrong reasons.
  > Michelle Ananda-Rajah; Senator for Victoria
  https://www.afr.com/technology/our-ai-future-is-being-built-...
  The grift that keeps giving
gizajob a day ago ago
The first thing ChatAUD says: "hawzitgahn?"
nurettin a day ago ago
Spoilers: They did not.
wtbdbrrr a day ago ago
Now ... of course.
And do they mention anything about how much of the work is going to be outsourced and where to? Or are they gonna import workers to do the job and send them back home when their local AI can replace most of the easy and tedious stuff? Or are they gonna use local models to do all that right away?
The site is loading ...
yahoozoo a day ago ago
Yes it is much cheaper when you just train off ChatGPT and Claude responses.
[-]
- zerotolerance a day ago ago
  Seems like a feature. This was always going to be the case. Just like how it was cheaper to train those models on billions of prior works than to have generated or paid to generate all those works in-house.
  [-]
  - throwawayoldie a day ago ago
    > Seems like a feature
    If by "feature" you mean "pathway to model collapse" meaning "disappearing up one's own asshole" then yes. And the sooner the better.
- daveguy a day ago ago
  > "One test for potential investors will be their willingness to support Sovereign Australia AI’s decision to earmark $10 million of its future funding to compensate copyright owners for the data used to train its model. This includes working with news services under a paid model and buying books and music where needed."
  > “We don’t want the adversarial relationship of most other AI builders around the world who chose not to take that proactive approach to copyright.”
  > "Sovereign Australia AI said it would not scrape the pages of publishers who have added “robot.txt” files to their web pages. This is a line of code that tells bots not to scrape the information, but it is frequently ignored. The company will add a meta tag to every piece of data it acquires, recording where it came from and how it was sourced."
  > To build its model, Sovereign Australia AI says it has placed Australia’s largest-ever order for sovereign AI capacity: 256 of the latest Nvidia Blackwell B200 GPUs which will be hosted inside one of NextDC’s Melbourne data centres
  So... almost the exact opposite. Please read the article before commenting next time.
  [-]
  - kadushka a day ago ago
    256 of the latest Nvidia Blackwell B200 GPUs
    Did they forget to add "k" to that number? OpenAI plans to have one million GPUs by the EOY.