Transmeta made a technology bet that dynamic compilation could beat OOO super scalar CPUs in SPEC.
It was wrong, but it was controversial among experts at the time.
I’m glad that they tried it even though it turned out to be wrong. Many of the lessons learned are documented in systems conferences and incorporated into modern designs, ie GPUs.
To me transmeta is a great example of a venture investment. If it would have beaten Intel at SPEC by a margin, it would have dominated the market. Sometimes the only way to get to the bottom of a complex system is to build it.
The same could be said of scaling laws and LLMs. It was theory before Dario, Ilya, OpenAI, et al trained it.
I think more about the timing being incorrect - betting on software in an era of exponential hardware growth was unwise (software performance can’t scale that way). The problem is that you need to marry it with a significantly better CPU/architecture because the JIT is about not losing performance while retaining back compat.
However, if you add it onto a better CPU it’s a fine technique to bet on - case in point Apple’s move away from Intel onto homegrown CPUs.
They were also the first to produce an x86 CPU with an integrated northbridge, they could have pitched it more at embedded and industrial markets where SPEC scores are less important.
Not to the same level. Crusoe was, in many ways, more classic CISC than x86 - except it's microcode was actually doing dynamic translation to internal ISA instead of operating like interpreter in old CISCs.
x86 ISA had the funny advantage of being way closer to RISC than "beloved" CISC architectures of old like m68k or VAX. Many common instructions translate to single "RISCy" instruction for the internal microarchitecture (something AMD noted IIRC in the original K5 with its AMD29050-derived core as "most instructions translate to 1 internal microinstruction, some between 2 to 4"). X86 prefixes are also way simpler than the complicated logic of decoding m68k or VAX. An instruction with multiple prefixes will quite probably decode to single microinstruction.
That said, there's funny thing in that Transmeta tech survived quite a long way to the point that there were Android tablets, in fact flagship Google ones like Nexus 9, whose CPU was based on it - because nvidia "Denver" architecture used same technology (AFAIK licensed from Transmeta, but don't cite me on this)
> Many common [x86] instructions translate to single "RISCy" instruction for the internal microarchitecture
And then there are read-modify-write instructions, which on modern CPUs need two address-generation μops in addition to the load one, the store one, and the ALU one. So the underlying load-store architecture is very visible.
There’s also the part where we’ve trained ourselves out of using the more CISCy parts of x86 like ENTER, BOUND, or even LOOP, because they’ve been slow for ages, and thus they stay slow.
Even many of the more complex instructions often can translate into surprisingly short sequences - all sorts of loop structures have now various kinds of optimizations including instruction fusion that probably would not be necessary if we didn't stop using higher level LOOP constructs ;-)
But for example REP MOVS now is fused into equivalent of using SSE load-stores (16 bytes) or even AVX-512 load stores (64 bytes).
And of course equivalent of LEA by using ModRM/SIB prefixes is pretty much free with it being AFAIK handled as pipeline step
One aspect of Transmeta not mentioned by this article is their "Code Morphing" technique used by the Crusoe and Efficeon processors. This was a low level piece of software similar to a JIT compiler that translated x86 instructions to the processor's native VLIW instruction set.
Similar technology was developed later by Nvidia, which had licensed Transmeta's IP, for the Denver CPU cores used in the HTC Nexus 9 and the Carmel CPU cores in the Magic Leap One. Denver was originally intended to target both ARM and x86 but they had to abandon the x86 support due to patent issues.
Code morphing was fascinating. I had no idea nVidia tried anything similar.
I always felt Transmeta could have carved out a small but sustained niche by offering even less-efficient "morphing" for other architectures, especially discontinued ones. 680x0, SPARC, MIPS, Alpha, PA-RISC... anything the vendors stopped developing hardware (or competitive hardware) for.
So glad someone else also knew about this connection :) Details about Denver are pretty minimal, but this talk at Stanford is one of the most detailed I’ve been able to find for those interested. It’s fascinating stuff with lots of similarities to how Transmeta operated: https://youtu.be/oEuXA0_9feM?si=WXuBDzCXMM4_5YhA
There was a Hot Chips presentation by them that also gave some good details. Unlike the original Transmeta design they first ran code natively and only recompiled the hot spots.
Didn't Transmeta's technology end up in Apple's PowerPC emulator Rosetta, following the switch to Intel?
IIRC Transmeta's technology came out of HP (?) research into dynamic inlining of compiled code, giving performance comparable to profile-guided optimization without the upfront work. It worked similarly to an inlining JIT compiler, except it was working with already compiled code. Very interesting approach and one I think could be generally useful. Imagine if, say, your machine's bootup process was optimized for the hardware you actually have. I'm going off decades old memories here, so the details might be incorrect.
In the early 1990s, HP had a product called “SoftPC” that was used to emulate x86 on PA-RISC. IIRC, however, this was an OEM product written externally. My recollection of how it worked was similar to what is described in the Dynamo paper. I’m wondering if HP bought the technology and whether Dynamo was a later iteration of it? Essentially, it was a tracing JIT. Regardless, all these ideas ended up morphing into Rosetta (versions 1 and 2), though as I understand it, Rosetta also uses a couple hardware hooks to speed up some cases that would be slow if just performed in software.
I used a Fujitsu Lifebook P-2046 laptop at university. It had an 800Mhz Crusoe chip. IIRC it shipped with 256 MB of RAM, which I eventually upgraded to 384.
Somehow I managed to tolerate running Gentoo on it. Compiling X, OpenOffice, or Firefox were multi-day affairs. One thing that annoyed me was I could never get the graphics card (an ATI Rage 128 with 4 MB RAM, IIRC) working with acceleration under Linux, and that was when compositing window managers were gaining prevalence; I kept trying to get it working in the hope that it would take a bit of the load off of the struggling CPU.
Despite the bad performance, it worked really well for a college student: it was great for taking notes, and the batteries (extended main and optical drive bay) would easily last a full day of classes. It wouldn't run Eclipse very well, but most of my CS assignments were done using a text editor, anyways.
Well, they ended up being mobile-oriented, but even that didn’t work. They were definitely not server-oriented and they really couldn’t compete at desktop. Honestly, while the tech was interesting, it wasn’t really solving a problem that anyone was struggling with.
> it wasn’t really solving a problem that anyone was struggling with
They did push the envelope on efficiency. My Crusoe-equipped laptop could go six hours on the stock battery (12+ on the extended batteries) back when most laptops struggled to get three.
I liked the Transmeta web page from before they launched. It was just bare HTML with no styling. It said:
This page is not here yet.
The product hype and lack of knowledge about what it was meant that nobody knew what to expect. In these hyped expectations, and with Torvalds on board, everyone expected that everything would be different. But it wasn't.
A similar product launch was the Segway, where we went from this incredible vision of everyone on Segways to nobody wanting one.
The hype was part of the problem with Transmeta. Even in it's delivered form it could have found a niche. For example, the network computer was in vogue at the time, thanks to Oracle. A different type of device, like a Chromebook might have worked.
With Torvalds connected to Transmeta and the stealthy development, we never did get to hear about who was really behind Transmeta and why.
>
A similar product launch was the Segway, where we went from this incredible vision of everyone on Segways to nobody wanting one.
The problem with Segway in Germany was rather the certification for road traffic. Because of the insane red tape involved, the introduction was delayed, and for the same reason nobody thus wanted one.
Transmeta made a technology bet that dynamic compilation could beat OOO super scalar CPUs in SPEC.
It was wrong, but it was controversial among experts at the time.
I’m glad that they tried it even though it turned out to be wrong. Many of the lessons learned are documented in systems conferences and incorporated into modern designs, ie GPUs.
To me transmeta is a great example of a venture investment. If it would have beaten Intel at SPEC by a margin, it would have dominated the market. Sometimes the only way to get to the bottom of a complex system is to build it.
The same could be said of scaling laws and LLMs. It was theory before Dario, Ilya, OpenAI, et al trained it.
I think more about the timing being incorrect - betting on software in an era of exponential hardware growth was unwise (software performance can’t scale that way). The problem is that you need to marry it with a significantly better CPU/architecture because the JIT is about not losing performance while retaining back compat.
However, if you add it onto a better CPU it’s a fine technique to bet on - case in point Apple’s move away from Intel onto homegrown CPUs.
They were also the first to produce an x86 CPU with an integrated northbridge, they could have pitched it more at embedded and industrial markets where SPEC scores are less important.
Aren't modern CPUs, essetially, dynamic translators from x86_64 instruction set into internal RISC-like intsruction sets?
Not to the same level. Crusoe was, in many ways, more classic CISC than x86 - except it's microcode was actually doing dynamic translation to internal ISA instead of operating like interpreter in old CISCs.
x86 ISA had the funny advantage of being way closer to RISC than "beloved" CISC architectures of old like m68k or VAX. Many common instructions translate to single "RISCy" instruction for the internal microarchitecture (something AMD noted IIRC in the original K5 with its AMD29050-derived core as "most instructions translate to 1 internal microinstruction, some between 2 to 4"). X86 prefixes are also way simpler than the complicated logic of decoding m68k or VAX. An instruction with multiple prefixes will quite probably decode to single microinstruction.
That said, there's funny thing in that Transmeta tech survived quite a long way to the point that there were Android tablets, in fact flagship Google ones like Nexus 9, whose CPU was based on it - because nvidia "Denver" architecture used same technology (AFAIK licensed from Transmeta, but don't cite me on this)
> Many common [x86] instructions translate to single "RISCy" instruction for the internal microarchitecture
And then there are read-modify-write instructions, which on modern CPUs need two address-generation μops in addition to the load one, the store one, and the ALU one. So the underlying load-store architecture is very visible.
There’s also the part where we’ve trained ourselves out of using the more CISCy parts of x86 like ENTER, BOUND, or even LOOP, because they’ve been slow for ages, and thus they stay slow.
Even many of the more complex instructions often can translate into surprisingly short sequences - all sorts of loop structures have now various kinds of optimizations including instruction fusion that probably would not be necessary if we didn't stop using higher level LOOP constructs ;-)
But for example REP MOVS now is fused into equivalent of using SSE load-stores (16 bytes) or even AVX-512 load stores (64 bytes).
And of course equivalent of LEA by using ModRM/SIB prefixes is pretty much free with it being AFAIK handled as pipeline step
One aspect of Transmeta not mentioned by this article is their "Code Morphing" technique used by the Crusoe and Efficeon processors. This was a low level piece of software similar to a JIT compiler that translated x86 instructions to the processor's native VLIW instruction set.
Similar technology was developed later by Nvidia, which had licensed Transmeta's IP, for the Denver CPU cores used in the HTC Nexus 9 and the Carmel CPU cores in the Magic Leap One. Denver was originally intended to target both ARM and x86 but they had to abandon the x86 support due to patent issues.
https://en.wikipedia.org/wiki/Project_Denver
Code morphing was fascinating. I had no idea nVidia tried anything similar.
I always felt Transmeta could have carved out a small but sustained niche by offering even less-efficient "morphing" for other architectures, especially discontinued ones. 680x0, SPARC, MIPS, Alpha, PA-RISC... anything the vendors stopped developing hardware (or competitive hardware) for.
So glad someone else also knew about this connection :) Details about Denver are pretty minimal, but this talk at Stanford is one of the most detailed I’ve been able to find for those interested. It’s fascinating stuff with lots of similarities to how Transmeta operated: https://youtu.be/oEuXA0_9feM?si=WXuBDzCXMM4_5YhA
There was a Hot Chips presentation by them that also gave some good details. Unlike the original Transmeta design they first ran code natively and only recompiled the hot spots.
Very similar approach is used in MCST Elbrus CPUs: https://en.wikipedia.org/wiki/Elbrus-8S#Supported_operating_...
All I know about Transmeta is that Linus Torvalds moved over from Finland to the USA to work at this startup.
Other than that, it seems to have sunk without a trace.
Didn't Transmeta's technology end up in Apple's PowerPC emulator Rosetta, following the switch to Intel?
IIRC Transmeta's technology came out of HP (?) research into dynamic inlining of compiled code, giving performance comparable to profile-guided optimization without the upfront work. It worked similarly to an inlining JIT compiler, except it was working with already compiled code. Very interesting approach and one I think could be generally useful. Imagine if, say, your machine's bootup process was optimized for the hardware you actually have. I'm going off decades old memories here, so the details might be incorrect.
No, you are confusing Transmeta with Transitive. https://en.wikipedia.org/wiki/QuickTransit
I remember it being in one of Sony VAIO's product lines called the picturebook, for its small form factor and a swivel webcam.
hat was the first laptop i owned ;-) as a frequent traveler it was a very useful device.
Dynamo <https://www.cse.iitm.ac.in/~krishna/courses/2022/odd-cs6013/...>?
In the early 1990s, HP had a product called “SoftPC” that was used to emulate x86 on PA-RISC. IIRC, however, this was an OEM product written externally. My recollection of how it worked was similar to what is described in the Dynamo paper. I’m wondering if HP bought the technology and whether Dynamo was a later iteration of it? Essentially, it was a tracing JIT. Regardless, all these ideas ended up morphing into Rosetta (versions 1 and 2), though as I understand it, Rosetta also uses a couple hardware hooks to speed up some cases that would be slow if just performed in software.
A lot ended up in HotSpot for the JVM. I know a number of extremely good engineers whose career path went TransMeta -> Sun -> Google.
I had a pretty slick Toshiba Libretto L1 from Japan at the time - twice as wide as long, with a 1280x600 display.
Its 600Mhz Transmeta Crusoe CPU was pretty slow, unfortunately. Like a Celeron 333Mhz IIRC.
I used a Fujitsu Lifebook P-2046 laptop at university. It had an 800Mhz Crusoe chip. IIRC it shipped with 256 MB of RAM, which I eventually upgraded to 384.
Somehow I managed to tolerate running Gentoo on it. Compiling X, OpenOffice, or Firefox were multi-day affairs. One thing that annoyed me was I could never get the graphics card (an ATI Rage 128 with 4 MB RAM, IIRC) working with acceleration under Linux, and that was when compositing window managers were gaining prevalence; I kept trying to get it working in the hope that it would take a bit of the load off of the struggling CPU.
Despite the bad performance, it worked really well for a college student: it was great for taking notes, and the batteries (extended main and optical drive bay) would easily last a full day of classes. It wouldn't run Eclipse very well, but most of my CS assignments were done using a text editor, anyways.
> But they were still a technology company, and if their plans had gone well, they would have sold their product to dotcoms
I'm not sure that that's really correct; they were very desktop-oriented.
Well, they ended up being mobile-oriented, but even that didn’t work. They were definitely not server-oriented and they really couldn’t compete at desktop. Honestly, while the tech was interesting, it wasn’t really solving a problem that anyone was struggling with.
> it wasn’t really solving a problem that anyone was struggling with
They did push the envelope on efficiency. My Crusoe-equipped laptop could go six hours on the stock battery (12+ on the extended batteries) back when most laptops struggled to get three.
I liked the Transmeta web page from before they launched. It was just bare HTML with no styling. It said:
The product hype and lack of knowledge about what it was meant that nobody knew what to expect. In these hyped expectations, and with Torvalds on board, everyone expected that everything would be different. But it wasn't.A similar product launch was the Segway, where we went from this incredible vision of everyone on Segways to nobody wanting one.
The hype was part of the problem with Transmeta. Even in it's delivered form it could have found a niche. For example, the network computer was in vogue at the time, thanks to Oracle. A different type of device, like a Chromebook might have worked.
With Torvalds connected to Transmeta and the stealthy development, we never did get to hear about who was really behind Transmeta and why.
> A similar product launch was the Segway, where we went from this incredible vision of everyone on Segways to nobody wanting one.
The problem with Segway in Germany was rather the certification for road traffic. Because of the insane red tape involved, the introduction was delayed, and for the same reason nobody thus wanted one.
> I liked the Transmeta web page from before they launched. It was just bare HTML with no styling. It said: > > This page is not here yet.
I remember that fondly.
If you did view source there was a comment that said something like:
No, there are no hidden messages in the source code, either.
https://web.archive.org/web/19970710102251/http://www.transm...
Then, https://web.archive.org/web/20000229173916/http://www.transm... , when content appeared around Feb 2000.
Product launch PDF from Jan 19, 2000: https://web.archive.org/web/20000815231116/http://www.transm...
Thanks for that, I was almost right - This web page is not here yet.
I still use this as important placeholder text, not that anyone outside HN would get the reference.