1. this is apparently MiniMax's "launch week" - they did M1 on Monday and Hailuo 2 on Tuesday (https://news.smol.ai/issues/25-06-16-chinese-models). remains to be seen if they can keep up the pace of model releases for the rest of this week - these 2 were big ones, they aren't yet known for much else beyond llm and video models. just watch https://x.com/MiniMax__AI for announcements.
2. minimax m1's tech report is worthwhile: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M... while they may not be the SOTA open weights model, they do make some very big/notable claims on lightning attention and their GRPO variant (CISPO).
(im unaffiliated, just sharing what ive learned so far since no comments have been made here yet
My experience with heavily quantized models is they do better than a similar sized unquantized model but don't really perform anywhere near the pre-quantized model.
>My experience with heavily quantized models is they do better than a similar sized unquantized model but don't really perform anywhere near the pre-quantized model.
People have tested it. Q8 has essentially no drop in quality, Q4 is measurable but still not realistically a problem. If this impacts you, just pay for the commercial saas option.
This assumes that the benchmarks are representative of real usage scenarios. I'm not saying that there is bad faith, but that benchmarking is really hard in the context of LLMs.
>This assumes that the benchmarks are representative of real usage scenarios. I'm not saying that there is bad faith, but that benchmarking is really hard in the context of LLMs.
It's a fair point, but the conclusion is 'i dont know'
I could assume that it gets better because it'll keep to simpler code.
So in around 6 months, we will see that the person who bought this H200 in the listing just got scammed for $250k and will realize that you just needed specific quantizations to the model and a few optimizations to run locally.
Unless they want to train their own model, buying this for inference for $250k is unnecessary and still isn't enough for a full production deployment.
* A Singapore based company, according to LinkedIn. There doesn't seem to be much of a barrier to entry to building a very good LLM.
* Open weight models + the development of Strix Halo / Ryzen AI Max makes me optimistic that running great LLMs locally will be relatively cheap in a few years.
I think the main limitation, right now, is hardware. For GPUs the main limit is the VRAM available on consumer models. CPUs have plenty of memory but don't have the bandwidth or vector compute power for LLMs. This is why I think the Strix Halo is so exciting: it has bandwidth + compute power plus a lot of memory. It's not quite where it needs to be to replace a dedicated GPU, but in a few iterations it could be.
I'm interested in other opinions. I'm no expert on this stuff.
How does the shared memory model for GPUs on Apple Silicon factor into this? These are technically consumer grade and not very expensive, but they can offer a huge amount of memory since all the memory is shared between CPU and GPU, even a midtier machine can easily have 100 GB of GPU memory.
If you squint the M4 is the same as the Strix Halo. The M4 has roughly
* double the bandwidth;
* half the compute; and
* double the price for comparable memory (128GB)
compared to the Strix Halo.
I'm more interested in the AMD chips because of cost plus, while I have an Apple laptop, I do most of my work on a Linux desktop. So a killer AMD chip works better for me. If you don't mind paying the Apple tax then a Mac is a viable option. I'm not sure on the software side of LLMs on Apple Silicon but I cannot imagine it's unusable.
I am also very interested in AMD's Strix Halo for running LLMs locally. For that I have a Framework Desktop in order (batch 1!).
Alex Ziskind on Youtube does videos comparing Strix Halo, M4 Mac mini and MacBook Pro, Nvidia 5090, etc. including power consumption. The only downside is one has to pull out the numbers from the videos, there's no tables or anything. Here is the recent video with testing Strix Halo and a Mac mini: https://www.youtube.com/watch?v=B7GDr-VFuEo
I don't know, what's worst with people running LLM locally compared to running any software locally?
There is nothing fundamentally new in having freedom in edge of societies. Yes it can lead to horrible situation, like someone kill neighbors, using the single handable bright new tool available to all. But that's far less of a concern than having the powerful new tool staying in full concentrated control of the greediest humans out there, who will gladly escalate any hindrance to genocide whenever something doesn't fit their perspective.
https://www.minimaxi.com is their website for the Chinese parent company 上海稀宇科技有限公司, https://minimax.io is their international website for the Singapore based company Nanonoble Pte Ltd that handles operations outside of China.
What source do you want? I have a few friends who work for them and they all live in either Shanghai (most) or Beijing. And I've never seen anyone who claimed they are based in Singapore or anywhere else before. Does this work?
Wikipedia in itself is no source, and after reading parents message I went there to check to and surprise surprise, neither of the statements have sources attached to it. None of the linked articles have any information about where their headquarters is either.
If someone knows of a trustworthy article that states it outright, please feel free to share.
I'm the OP who claimed it was Singaporean, after checking LinkedIn. I then found the Wikipedia page, which I posted above. Amongst the comments here there is also a link to a Bloomberg article about a potential IPO. I don't have a dog in the race. Just passing on what I found.
Many people know that Minimax is Chinese because their video generator has a super obviously Chinese name (Hailuo), and that's what they've been known for so far
It's best when they state it themselves, and we can also verify by third parties. Not stating it, or outright obscuring it is also information. The Hailuo name is definitely an indicator, but it could be Taiwanese or Singaporean as well, or just foreign branding, like Häagen-Dazs.
I can't say I remember any model/weights release including the nation where the authors happen to live or where the company is registered. Usually they include some details about what languages they've included to train on, and disclose some of their relationships, which you could use for inferring that from.
But is it really a convention to include the nation the company happen to be registered in, or where the authors live, in submitted papers? I think that'd stick out more to me, than a paper missing such a detail.
Ok, lets change the argument to "It's conventional for companies to publish what country they're located in on their project's page", which companies are doing this? Not even OpenAI or Anthropic are doing this as far as I can tell.
Where do you see that? e.g. I just checked https://openai.com/about/ and it doesn't say where they are based. I have no associations either way, but I usually have to work hard to find out where startups are based.
> If you believe that your intellectual property rights have been infringed, please send notice to the address below or fill out this form. We may delete or disable content that we believe violates these Terms or is alleged to be infringing and will terminate accounts of repeat infringers where appropriate.
OpenAI, L.L.C.
1455 3rd Street
San Francisco, CA 94158
Attn: General Counsel / Copyright Agent
1. No it's not. Top GitHub repository from Google as an example: https://github.com/google/material-design-icons I think you'd actually be hard pressed to find a single repository where the company that owns it lists where they are registered.
2. This is a requirement for companies registered in the UK. You should also read your own link, it doesn't say anything about the company's presence on 3rd party websites.
3. This is such a remote reason it's laughable, there are plenty more things that are more relevant to potential job applications, such as whether they are hiring at all or not.
You just want them to mention it because it's a Chinese company. If they were American, Mexican, German or Zimbabwean you wouldn't give the slightest fuck.
Your link's parent page (https://github.com/google) states that they are in the United States of America, on the top, so it's not a good example.
I don't know about your OP, but even as a layperson, I personally like to check where my things come from. And yes, I am mostly curious about which wide geopolitical region the thing is from.
In case of IT projects, it matters when I want to include them in a project.
Also, thanks for putting words in my mouth. If they were Mexican or Zimbabwean I would find it very interesting to see a roughly SOtA model coming from that country.
I compared its website with openai's, not much different, both do not say it is an American company or a Chinese company. Most time only when the company is targeting its domestic market, very popular for food products, some will even label the country name on the package.
Forget the project page, I couldn't find definitive information on any of the official pages.
They state HQ in Singapore on LinkedIn, and San Francisco elsewhere. Compared to this, it's outright disingenuous that they don't mention that they are a Chinese company.
As a layman, I'm mostly indifferent to this information.
If I were a project manager, this would be vital information. And the people running projects know this. So it begs the question: why not disclose, and why obscure it?
As best I can tell from a gloss-over read, it doesn't use anything like the Minimax algorithm. Astute readers are aware that one of the first applications of Minimax was in an AI chess program designed by Claude Shannon.
The company supplies contemporary AI solutions, like LLM and video generation. The name is just a reference, like in the case of Tesla, or like how there is a kaliapparat in the American Chemical Society logo.
> "In our attention design, a transformer block with softmax attention follows every seven transnormer blocks (Qin et al., 2022a) with lightning attention."
Alright, so it's 87.5% linear attention + 12.5% full attention.
TBH I find the terminology around "linear attention" rather confusing.
"Softmax attention" is an information routing mechanism: when token `k` is being computed, it can receive information from tokens 1..k, but it has to be crammed through a channel of a fixed size.
"Linear attention", on the other hand, is just a 'register bank' of a fixed size available to each layer. It's not real attention, it's attention only in the sense it's compatible with layer-at-once computation.
1. this is apparently MiniMax's "launch week" - they did M1 on Monday and Hailuo 2 on Tuesday (https://news.smol.ai/issues/25-06-16-chinese-models). remains to be seen if they can keep up the pace of model releases for the rest of this week - these 2 were big ones, they aren't yet known for much else beyond llm and video models. just watch https://x.com/MiniMax__AI for announcements.
2. minimax m1's tech report is worthwhile: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M... while they may not be the SOTA open weights model, they do make some very big/notable claims on lightning attention and their GRPO variant (CISPO).
(im unaffiliated, just sharing what ive learned so far since no comments have been made here yet
They are also known for audio models, having the best TTS on some leaderboards (and my personal favorite) https://artificialanalysis.ai/text-to-speech/arena?tab=leade...
> they did M1 on Monday and Hailuo 2 on Tuesday
It would've been fun to see them name their models like Apple chips: M1, M1 Pro, M1 Ultra.
Yeah MiniMax M1 certainly directed my thoughts to Mac mini M1. :)
In case you're wondering what it takes to run it, the answer is 8x H200 141GB [1] which costs $250k [2].
1. https://github.com/MiniMax-AI/MiniMax-M1/issues/2#issuecomme...
2. https://www.ebay.com/itm/335830302628
Can’t you run it on a Mac Studio with 512GB? That’s about $8,500.
It's also 1/20 of the speed. So not very useable.
That's full quantization. If you run Q4 or Q8 you can run this on <$10,000 equipment.
My experience with heavily quantized models is they do better than a similar sized unquantized model but don't really perform anywhere near the pre-quantized model.
>My experience with heavily quantized models is they do better than a similar sized unquantized model but don't really perform anywhere near the pre-quantized model.
People have tested it. Q8 has essentially no drop in quality, Q4 is measurable but still not realistically a problem. If this impacts you, just pay for the commercial saas option.
This assumes that the benchmarks are representative of real usage scenarios. I'm not saying that there is bad faith, but that benchmarking is really hard in the context of LLMs.
>This assumes that the benchmarks are representative of real usage scenarios. I'm not saying that there is bad faith, but that benchmarking is really hard in the context of LLMs.
It's a fair point, but the conclusion is 'i dont know'
I could assume that it gets better because it'll keep to simpler code.
No point in running anything but full quantization.
Quantization doesn't work? Really?
And if you add in heavy sparsification it should fit and run on a raspberry pi.
So in around 6 months, we will see that the person who bought this H200 in the listing just got scammed for $250k and will realize that you just needed specific quantizations to the model and a few optimizations to run locally.
Unless they want to train their own model, buying this for inference for $250k is unnecessary and still isn't enough for a full production deployment.
It's already sparsified from the 150T parameter model..
It took me several hours to realize that 150 trillion parameters is a reference to the number of synapses in a human brain.
How many parameters does this model have?
456 bn, about 46 bn active at a time (it's moe)
"We publicly release MiniMax-M1 at this https url" in the arxiv paper, and it isn't a link to an empty repo!
I like these people already.
A few thoughts:
* A Singapore based company, according to LinkedIn. There doesn't seem to be much of a barrier to entry to building a very good LLM.
* Open weight models + the development of Strix Halo / Ryzen AI Max makes me optimistic that running great LLMs locally will be relatively cheap in a few years.
They are a Chinese company based out of the city of Shanghai, not Singapore.
They're also planning to IPO at HKEX in Hong Kong soon
https://www.scmp.com/tech/tech-trends/article/3314819/deepse...
ask it what country taiwan is part of...
I'll keep an eye out on that ipo
It seems more and more like an inevitability we will run models locally. Exciting and concerning implications.
If anyone has any suggestions of people thinking about this space they respect, I'd love to listen to more ideas and thoughts on the developments.
I think the main limitation, right now, is hardware. For GPUs the main limit is the VRAM available on consumer models. CPUs have plenty of memory but don't have the bandwidth or vector compute power for LLMs. This is why I think the Strix Halo is so exciting: it has bandwidth + compute power plus a lot of memory. It's not quite where it needs to be to replace a dedicated GPU, but in a few iterations it could be.
I'm interested in other opinions. I'm no expert on this stuff.
How does the shared memory model for GPUs on Apple Silicon factor into this? These are technically consumer grade and not very expensive, but they can offer a huge amount of memory since all the memory is shared between CPU and GPU, even a midtier machine can easily have 100 GB of GPU memory.
If you squint the M4 is the same as the Strix Halo. The M4 has roughly
* double the bandwidth;
* half the compute; and
* double the price for comparable memory (128GB)
compared to the Strix Halo.
I'm more interested in the AMD chips because of cost plus, while I have an Apple laptop, I do most of my work on a Linux desktop. So a killer AMD chip works better for me. If you don't mind paying the Apple tax then a Mac is a viable option. I'm not sure on the software side of LLMs on Apple Silicon but I cannot imagine it's unusable.
An example of desktop with the Strix Halo is the Framework desktop (AI Max+ 395 is the marketing name for the Strix Halo chip with the most juice): https://frame.work/gb/en/products/desktop-diy-amd-aimax300/c...
I am also very interested in AMD's Strix Halo for running LLMs locally. For that I have a Framework Desktop in order (batch 1!). Alex Ziskind on Youtube does videos comparing Strix Halo, M4 Mac mini and MacBook Pro, Nvidia 5090, etc. including power consumption. The only downside is one has to pull out the numbers from the videos, there's no tables or anything. Here is the recent video with testing Strix Halo and a Mac mini: https://www.youtube.com/watch?v=B7GDr-VFuEo
Apple has machines with 2x and about 3x the Strix Halo bandwidth by doubling up the memory buses. These get expensive though.
Honest question: what is the concerning aspect to it?
I don't know, what's worst with people running LLM locally compared to running any software locally?
There is nothing fundamentally new in having freedom in edge of societies. Yes it can lead to horrible situation, like someone kill neighbors, using the single handable bright new tool available to all. But that's far less of a concern than having the powerful new tool staying in full concentrated control of the greediest humans out there, who will gladly escalate any hindrance to genocide whenever something doesn't fit their perspective.
> A Singapore based company, according to LinkedIn
Nah, this is a Shanghai-based company.
[flagged]
https://www.minimaxi.com is their website for the Chinese parent company 上海稀宇科技有限公司, https://minimax.io is their international website for the Singapore based company Nanonoble Pte Ltd that handles operations outside of China.
What source do you want? I have a few friends who work for them and they all live in either Shanghai (most) or Beijing. And I've never seen anyone who claimed they are based in Singapore or anywhere else before. Does this work?
https://en.wikipedia.org/wiki/MiniMax_(company)
Wikipedia in itself is no source, and after reading parents message I went there to check to and surprise surprise, neither of the statements have sources attached to it. None of the linked articles have any information about where their headquarters is either.
If someone knows of a trustworthy article that states it outright, please feel free to share.
I'm the OP who claimed it was Singaporean, after checking LinkedIn. I then found the Wikipedia page, which I posted above. Amongst the comments here there is also a link to a Bloomberg article about a potential IPO. I don't have a dog in the race. Just passing on what I found.
[dead]
In a linked Twitter post[0], they trained this for $500k-ish. I wonder how?
> RL at unmatched efficiency: trained with just $534,700
0: https://x.com/MiniMax__AI/status/1934637031193514237
This is stated nowhere on the official pages, but it's a Chinese company.
https://en.wikipedia.org/wiki/MiniMax_(company)
Many people know that Minimax is Chinese because their video generator has a super obviously Chinese name (Hailuo), and that's what they've been known for so far
It's best when they state it themselves, and we can also verify by third parties. Not stating it, or outright obscuring it is also information. The Hailuo name is definitely an indicator, but it could be Taiwanese or Singaporean as well, or just foreign branding, like Häagen-Dazs.
Why would you expect them to mention that on their project's page?
1. It's conventional to do so.
2. It's a legal requirement in some jurisdictions (e.g. https://www.gov.uk/running-a-limited-company/signs-stationer...)
3. It's useful for people who may be interested in applying for jobs
> 1. It's conventional to do so.
I can't say I remember any model/weights release including the nation where the authors happen to live or where the company is registered. Usually they include some details about what languages they've included to train on, and disclose some of their relationships, which you could use for inferring that from.
But is it really a convention to include the nation the company happen to be registered in, or where the authors live, in submitted papers? I think that'd stick out more to me, than a paper missing such a detail.
OP said "official pages", which I took to mean the company website: https://www.minimax.io/ not the repo or the paper.
Ok, lets change the argument to "It's conventional for companies to publish what country they're located in on their project's page", which companies are doing this? Not even OpenAI or Anthropic are doing this as far as I can tell.
If you mean a Github page, like text generated from the `README.md`, then I do not expect any mention of country there.
> If you mean a Github page
I'm trying to figure out what you mean here. Where do you expect the country to be mentioned?
> It's conventional to do so
Where do you see that? e.g. I just checked https://openai.com/about/ and it doesn't say where they are based. I have no associations either way, but I usually have to work hard to find out where startups are based.
it's right there in their terms of use: https://openai.com/policies/terms-of-use/ bottom of each of their pages
> If you believe that your intellectual property rights have been infringed, please send notice to the address below or fill out this form. We may delete or disable content that we believe violates these Terms or is alleged to be infringing and will terminate accounts of repeat infringers where appropriate.
Is this what you are talking about?1. No it's not. Top GitHub repository from Google as an example: https://github.com/google/material-design-icons I think you'd actually be hard pressed to find a single repository where the company that owns it lists where they are registered.
2. This is a requirement for companies registered in the UK. You should also read your own link, it doesn't say anything about the company's presence on 3rd party websites.
3. This is such a remote reason it's laughable, there are plenty more things that are more relevant to potential job applications, such as whether they are hiring at all or not.
You just want them to mention it because it's a Chinese company. If they were American, Mexican, German or Zimbabwean you wouldn't give the slightest fuck.
Your link's parent page (https://github.com/google) states that they are in the United States of America, on the top, so it's not a good example.
I don't know about your OP, but even as a layperson, I personally like to check where my things come from. And yes, I am mostly curious about which wide geopolitical region the thing is from.
In case of IT projects, it matters when I want to include them in a project.
OP said "official pages", which I took to mean the company website: https://www.minimax.io/
Also, thanks for putting words in my mouth. If they were Mexican or Zimbabwean I would find it very interesting to see a roughly SOtA model coming from that country.
I compared its website with openai's, not much different, both do not say it is an American company or a Chinese company. Most time only when the company is targeting its domestic market, very popular for food products, some will even label the country name on the package.
Forget the project page, I couldn't find definitive information on any of the official pages.
They state HQ in Singapore on LinkedIn, and San Francisco elsewhere. Compared to this, it's outright disingenuous that they don't mention that they are a Chinese company.
As a layman, I'm mostly indifferent to this information.
If I were a project manager, this would be vital information. And the people running projects know this. So it begs the question: why not disclose, and why obscure it?
Please come up with better names for these models. This sounds like the processor in my Mac Studio.
https://en.wikipedia.org/wiki/Minimax
They named themselves after a classic ai algorithm.
As best I can tell from a gloss-over read, it doesn't use anything like the Minimax algorithm. Astute readers are aware that one of the first applications of Minimax was in an AI chess program designed by Claude Shannon.
https://en.wikipedia.org/wiki/Claude_Shannon#Shannon's_compu...
The company supplies contemporary AI solutions, like LLM and video generation. The name is just a reference, like in the case of Tesla, or like how there is a kaliapparat in the American Chemical Society logo.
Does facebook use llamas in their model? it's a name, it doesn't have to be 100% true to its meaning.
it's the name of the company
But then there's the "M1" part.
Your Mac is made by 'Apple' and literally named after an apple cultivar
Is that like a pineapple that doesn’t grow on pine trees?
... but the cultivar is named for a person. :)
Also sounds like my long lost dog whose name was Max but he was tiny. Absolutely horrible name, borderline criminal I say.
> "In our attention design, a transformer block with softmax attention follows every seven transnormer blocks (Qin et al., 2022a) with lightning attention."
Alright, so it's 87.5% linear attention + 12.5% full attention.
TBH I find the terminology around "linear attention" rather confusing.
"Softmax attention" is an information routing mechanism: when token `k` is being computed, it can receive information from tokens 1..k, but it has to be crammed through a channel of a fixed size.
"Linear attention", on the other hand, is just a 'register bank' of a fixed size available to each layer. It's not real attention, it's attention only in the sense it's compatible with layer-at-once computation.
They apparently building buzz for an IPO
https://www.bloomberg.com/news/articles/2025-06-18/alibaba-b...
if they trained this scale without western cloud infra, i'd want to know what their token throughput setup looks like
They trained on 512 H800 GPUs for three weeks, equivalent to around half a million dollars. https://xcancel.com/MiniMax__AI
That is for the reinforcement learning part. The base model was likely trained on more GPUs for significantly longer.
Sneakernet
[dead]