Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.
> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.
> Takeaway: Be paranoid about MBAs running your business.
Except Andy is talking about himself, and Noyce the engineers getting it wrong: (watch a few minutes of this to get the gist of where they were vs Japan) https://www.youtube.com/watch?v=At3256ASxlA&t=465s
Intel has a long history of sucking, and other people stepping in to force them to get better. Their success has been accident and intervention over and over.
And this isnt just an intel thing, this is kind of an American problem (and maybe a business/capitalism problem). See this take on steel: https://www.construction-physics.com/p/no-inventions-no-inno... that sounds an awful lot like what is happening to intel now.
The story I heard (which I can't corroborate) was that it was Microsoft that nixed Intel's alternative 64-bit x86 ISA, instead telling it to implement AMD's version instead.
Microsoft did port some versions of Windows to Itanium, so they did not reject it at first.
With poor market demand and AMD's success with amd64, Microsoft did not support itanium in vista and later desktop versions which signaled the end of Intel's Itanium.
Microsoft also ships/shipped a commercial compiler with tons of users, and so they were probably in a position to realize early that the hypothetical "sufficiently smart compiler" which Itanium needed to reach its potential wasn't actually possible.
I wanted to mention that the Pentium 4 (Prescott) that was marketed as the Centrino in laptops had 64bit capabilities, but it was described as 32bit extended mode. I remember buying a laptop in 2005(?) which I first ran with XP 32bit, and then downloading the wrong Ubuntu 64bit Dapper Drake image, and the 64bit kernel was running...and being super confused about it.
Also, for a long while, Intel rebranded the Pentium 4 as Intel Atom, which then usually got an iGPU on top with being a bit higher in clock rates. No idea if this is still the case (post Haswell changes) but I was astonished to buy a CPU 10 years later to have the same kind of oldskool cores in it, just with some modifications, and actually with worse L3 cache than the Centrino variants.
core2duo and core2quad were peak coreboot hacking for me, because at the time the intel ucode blob was still fairly simple and didn't contain all the quirks and errata fixes that more modern cpu generations have.
Speaking of marketing, that era of Intel was very weird for consumers. In the 1990s, they had iconic ads and words like Pentium or MMX became powerful branding for Intel. In the 2000s I think it got very confused. Centrino? Ultrabook? Atom? Then for some time there was Core. But it became hard to know what to care about and what was bizarre corporate speak. That was a failure of marketing. But maybe it was also an indication of a cultural problem at Intel.
Centrino was Intel's brand for their wireless networking and laptops that had their wireless chipsets, the CPUs of which were all P6-derived (Pentium M, Core Duo).
Possibly you meant Celeron?
Also the Pentium 4 uarch (Netburst) is nothing like any of the Atoms (big for the time out-of-order core vs. a small in-order core).
Pentium 4 was never marketed as Centrino - that came in with the Pentium M, which was very definitely not 64-bit capable (and didn't even officially have PAE support to begin with). Atom was its own microarchitecture aimed at low power use cases, which Pentium 4 was definitely not.
I remember at the time thinking it was really silly for Intel to release a 64-bit processor that broke compatibility, and was very glad AMD kept it. Years later I learned about kernel writing, and I now get why Intel tried to break with the old - the compatibility hacks piled up on x86 are truly awful. But ultimately, customers don't care about that, they just want their stuff to run.
Intel might have been successful with the transition if they didn't decide to go with such radically different and real-world untested architecture for Itanium.
It is worth noting that at the turn of the century x86 wasn't yet so utterly dominant yet. Alphas, PowerPC, MIPS, SPARC and whatnot were still very much a thing. So that is part why running x86 software was not as high priority, and maybe even compatibility with PA-RISC would have been a higher priority.
The writing was on the wall once Linux was a thing. I did alot of solution design in that period. The only times there were good business cases in my world for not-x86 were scenarios where DBAs and some vertical software required Sun, and occasionally AIX or HPUX for license optimization or some weird mainframe finance scheme.
The cost structure was just bonkers. I replaced a big file server environment that was like $2M of Sun gear with like $600k of HP Proliant.
Nah, HP made bank on their superdome computers even though they had very few clients. People paid through the nose for those. I worked on IA-64 stuff in 2011, long after I thought it was dead :D.
The real thing that killed the division is Oracle announcing that they would no longer support IA-64. It just so happened that like 90% of the clients using Itanium were using it for oracle DBs.
But by that point HP was already trying to get people to transition to more traditional x86 servers that they were selling.
Nitpick: The author states that removal of 16-bit in Windows 64 was a design decision and not a technical one. That’s not quite true.
When AMD64 is in one of the 64-bit modes, long mode (true 64-bit) or compatibility mode (64-bit with 32-bit compatibility), you can not execute 16-bit code. There are tricks to make it happen, but they all require switching the CPU mode, which is insecure and can cause problems in complex execution environments (such as an OS).
If Microsoft (or Linux, Apple, etc) wanted to support 16-bit code in their 64-bit OSes, they would have had to create an emulator+VM (such as OTVDM/WineVDM) or make costly hacks to the OS.
Microsoft has just such an emulator. Via Windows source code leaks the NTVDM (Virtual DOS Machine) from 32-bit Windows versions has been built for 64-bit Windows targets[0].
I don't understand why Microsoft chose to kill it. That's not in their character re: backwards compatibility.
NTVDM requires Virtual 8086 mode in the processor. This doesn't exist in the 64-bit modes, requiring a software emulator. That is why OTVDM/WineVDM exist.
You can see all of this explained in the README for the very project you linked:
```
How does it work?
=================
I never thought that it would be possible at all, as NTVDM on Win32 uses V86
mode of the CPU for fast code execution which isn't available in x64 long
mode.
However I stumbled upon the leaked Windows NT 4 sourcecode and the guys from
OpenNT not only released the source but also patched it and included all
required build tools so that it can be compiled without installing anything
but their installation package.
The code was a pure goldmine and I was curious how the NTVDM works.
It seems that Microsoft bought the SoftPC solution from Insignia, a company
that specialised in DOS-Emulators for UNIX-Systems. I found out that it also
existed on MIPS, PPC and ALPHA Builds of Windows NT 4 which obviously don't
have a V86 mode available like Intel x86 has. It turned out that Insignia
shipped SoftPC with a complete emulated C-CPU which also got used by Microsoft
for MIPS, PPC and ALPHA-Builds.
```
As to why they didn't continue with that solution, because they didn't want to rely on SoftPC anymore or take on development themselves for a minuscule portion of users who would probably just use 32-bit Windows anyways.
Yeah. Like I said, Microsoft had the emulator. NTVDM on x64 is handled just like MIPS or Alpha, by using the SoftPC emulator. It's just a new CPU architecture.
They had a proven and tested emulator yet they chose not to build it for the new x64 CPU architecture. It turns out that it wasn't too hard to build for the new architecture either. That's the crux of my confusion.
It's not like SoftPC was new and unproven code. It doesn't feel like it would have been a major endeavor to keep supporting it.
Obviously, I don't know Microsoft's telemetry told them re: the number of 16-bit application users. I know it impacted a number of my Customers (some of whom are running DOSBox today to keep old fit-for-purpose software working) and I don't support a ton of offices or people.
It seems out of character for Microsoft to make their Customers throw away software.
It's not so much running 16 bit code, but running something that wants to run on bare metal, i.e. DOS programs that access hardware directly. Maintaining the DOS virtualization box well into the 21st century probably wasn't worth it.
> The 64-bit builds of Windows weren’t available immediately.
There was a year or so between the release of AMD-64 and the first shipping Microsoft OS that supported it.[1] It was rumored that Intel didn't want Microsoft to support AMD-64 until Intel had compatible hardware. Anyone know?
Meanwhile, Linux for AMD-64 was shipping, which meant Linux was getting more market share in data centers.[1]
I've written code to call 16-bit code from 64-bit code that works on Linux (because that's the only OS where I know the syscall to modify the LDT).
It's actually no harder to call 16-bit code from 64-bit code than it is to call 32-bit code from 64-bit code... you just need to do a far return (the reverse direction is harder because of stack alignment issues). The main difference between 32-bit and 16-bit is that OS's support 32-bit code by having a GDT entry for 32-bit code, whereas you have to go and support an LDT to do 16-bit code, and from what I can tell, Windows decided to drop support for LDTs with the move to 64-bit.
The other difficulty (if I've got my details correct) is that returning from an interrupt into 16-bit code is extremely difficult to do correctly and atomically, in a way that isn't a problem for 32-bit or 64-bit code.
Executing 16-bit code in Compatibility Mode (not Long Mode) is possible, that's not the problem. The problem is lack of V86 allowing legacy code to run. So Real Mode code is out wholesale (a sizable chunk of legacy software) and segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).
So yes, you can write/run 16-bit code in 64-bit Compatibility Mode. You can't execute existing 16-bit software in 64-bit Compatibility Mode. The former is a neat trick, the latter is what people actually expect "16-bit compatibility" to mean.
> segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).
No, segmented memory is exactly what you can get working. You set up the segments via the LDT, which is still supported even in 64-bit mode; this is how Wine is able to execute Win16 code on 64-bit Linux. (Reading Wine code is how I figured out how to execute 16-bit code from 64-bit code in the first place!)
What doesn't work, if my memory serves me correctly, is all the call gate and task gate stuff. Which is effectively building blocks for an OS kernel that everyone tossed out in the early 90s and instead went with kernel-mode and user-mode with the syscalls (first software interrupts and then the actual syscall instruction in x86-64). You don't need any of that stuff to run most 16-bit code, you just need to emulate the standard Windows DLLs like kernel, ntdll, and user.
- Intel quietly introduced their implementation of amd64 under the name "EM64T". It was only later that they used the name "Intel64".
- Early Itanium processors included hardware features, microcode and software that implemented an IA‑32 Execution Layer (dynamic binary translation plus microcode assists) to run 32‑bit x86 code; while the EL often ran faster than direct software emulation, it typically lagged native x86 performance and could be worse than highly‑optimised emulators for some workloads or early processor steppings.
You go to a small shop recommended by a friend, he convinces you to get AMD despite Intel still being the reigning default.
You get it home, doing a little research you realize the CPU is the best performance per price in the recent CPUs.
Now you know you trusted the right person
> In 2004, Intel wrote off the Itanium and cloned AMD64.
AMD introduced x86-64 in 2003. You don't just clone an ISA (even if based on AMD documents), design it, fab it etc. in a year or two. Intel must have been working on this well before AMD introduced the Athlon64.
I was one of those weird users who used the 64-bit version of Windows XP, with what I'm pretty sure was an Athlon 64 X2, both the first 64-bit chip and first dual-core one that I had.
Core memories for me were my pc builds for the Athlon Thunderbird and later the Athlon 64 FX-60. What an experience it was to fire those machines up and feel the absolutely gigantic performance improvements.
At least with Itanium Intel was trying something fresh. In comparison, the Pentium 4 arch was extra bad because it had a very long pipeline to achieve high core frequencies. Branch mispredictions were thus very costly. And it was soon obvious that the process wouldn't scale much above 3Ghz without wasting humongous amounts of power, defeating the long pipeline's purpose.
They have also made a lot of successful products and come backs. While the Pentium 4 lost out to the Athlon's and their marketshare dropped they then released the Core series of CPUs and the Core 2 Duo was a huge hit and marked the beginning of the dark ages for AMD until they released Ryzen.
As a company they have had long periods of dominance potted with big losses to AMD on the CPU front which they always claw back. They seem this time to be taken out by their inability to get their fabs online more than anything their competitor is doing.
AMD was beating the on performance before Athlon and Athlon 64 made it simply clear to everybody.
Intel spent literally 8 years and many, many billions and billions of $ to do everything possible to prevent AMD from getting volume.
The had so much production capacity and AMD so little, that they basically had the ability to pay every single large OEM not to use AMD. If you as company used AMD, you would instantly lose billions of $, you would be the last Intel costumer served, you wouldn't get the new chips early on and potentially much more. OEM were terrified of Intel. Because Intel and Microsoft were so dominate OEMs made terrible margin, and Intel could basically crush them. Intel used to joke that OMEs were their distributes nothing more.
This was to the point where AMD offered free chips to people and they refused it.
AMD had a long period of time where they had better product, but the couldn't sustaining investing in better products and fighting so many legal battles. And the regulators around the world took to long and were to soft on Intel.
Intel in the 80s invested big in memory, and got crushed by Japan. They invested big into the iAPX 186 and got crushed, it was horrible product. Luckily they were saved by the PC and were then able to have exclusivity on the back of the i386.
By the late 90s AMD was better then them and that persisted for almost 10 years. And then they took the lead for for about 8 years and then lost it. And they didn't lose it because of the fabs I don't think. When they lost on the fabs they just fell further behind.
Its really the late 80s and 90s gigantic PC boom that gave them the crazy manufacturing and market lead that AMD was not able to overcome the 10 years after that.
Good article. I remember being very skeptical of Athlon because the K6 I owned before was subjectively muss less stable than any Intel I had used until then. So felt it was only a question of time until IA64 would establish itself. Since, after all, Intel had the power to buy itself into a leader position.
That feeling that AMD isn't quite as stable never really left until a few years ago, where with Spectre, I then thought that Intel was now playing catch-up with mobile-phone-like tactics rather that being design-superior.
Now again, Intel had a great opportunity with Xe but it feels like they just can't get their horsepower transferred onto the road. Not bad by any means, but something's just lacking.
Meanwhile, Qualcomm is announcing it's snapdragon X2 .. if only they could bring themselves to ensuring proper Linux support ..
Youngsters today don't remember it; x86 was fucking dead according to the press; it really wasn't until Athlon 64 came out (which gave a huge bump to Linux as it was one of the first OSes to fully support it - one of the reasons I went to Gentoo early on was to get that sweet 64 bit compilation!) that everyone started to admit the Itanium was a turd.
The key to the whole thing was that it was a great 32 bit processor; the 64 bit stuff was gravy for many, later.
Apple did something similar with its CPU changes - now three - they only swap when the old software runs better on the new chip even if emulated than it did on the old.
AMD64 was also well thought out; it wasn't just a simple "have two more bytes" slapped on 32 bit. Doubling the number of general purpose registers was noticeable - you took a performance hit going to 64 bit early on because all the memory addresses were wider, but the extra registers usually more than made up for it.
100% -- the conventional wisdom was that the x86 architecture was too riddled with legacy and complexity to improve its performance, and was a dead end.
Itanium never met an exotic computer architecture journal article that it didn't try and incorporate. Initially this was viewed as "wow such amazing VLIW magic will obviously dominate" and subsequently as "this complexity makes it hard to write a good compiler for, and the performance benefit just doesn't justify it."
Intel had to respond to AMD with their "x86-64" copy, though it really didn't want to.
Eventually it became obvious that the amd64/x64/x86-64 chips were going to exceed Itanium in performance, and with the massive momentum of legacy on its side and Itanium was toast.
Back in that era I went to an EE380 talk at Stanford where the people from HP trying to do a compiler for Itanium spoke. It the project wasn't going well at all. Itanium is an explicit-parallelism superscalar machine. The compiler has to figure out what operations to do in parallel. Most superscalar machines do that during execution.
Instruction ordering and packing turned out to be a hard numerical optimization problem.
The compiler developers sounded very discouraged.
It's amazing that retirement units, the part of a superscalar CPU that puts everything back together as the parallel operations finish, not only work but don't slow things down. The Pentium Pro head designer had about 3,000 engineers working at peak, which indicates how hard this is. But it all worked, and that became the architecture of the future.
This was around the time that RISC was a big thing. Simplify the CPU, let the compiler do the heavy lifting, have lots of registers, make all instructions the same size, and do one instruction per clock.
That's pure RISC. Sun's SPARC is an expression of that approach. (So is a CRAY-1, which is a large but simple supercomputer with 64 of everything.) RISC, or something like it, seemed the way to go faster. Hence Itanium. Plus, it had lots of new patented technology, so Intel could finally avoid being cloned.
Superscalars can get more than one instruction per clock, at the cost of insane CPU complexity.
Superscalar RISC machines are possible, but they lose the simplicity of RISC. Making all instructions the same size increases the memory bandwidth the CPU needs. That's where RISC lost out over x86 extensions. x86 is a terse notation.
So we ended up with most of the world still running on an instruction set based on the one Harry Pyle designed when he was an undergrad at Case in 1969.
If I am remembering correctly, this was also a good time to be in Linux. Since the Linux world operated on source code rather than binary blobs, it was easier to convert software to run 64-bit native. Non-trivial in an age of C, but still much easier than the commercial world. I had a much more native 64-bit system running a couple of years before it was practical in the Windows world.
It also helps that linux had a much better 32-bit compatibility than windows did. Not sure why but it probably has something to do with legacy support windows shed moving to 64-bits.
It absolutely was. It was possible, hypothetically, to write a chunk of code that ran very fast. There were any number of very small bits of high-profile code which did this. However, it was impossible to make general-purpose, not-manually-tuned code run fast on it. Itanium placed demands on compiler technology that simple didn't exist, and probably still don't.
Basically, you could write some tuned assembly that would run fast on one specific Itanium CPU release by optimizing for its exact number of execution units, etc. It was not possible to run `./configure && make && make install` for anything not designed with that level of care and end up with a binary that didn't run like frozen molasses.
I had to manage one of these pigs in a build farm. On paper, it should've been one of the more powerful servers we owned. In practice, the Athlon servers were several times faster at any general purpose workloads.
Itanium was compatible with x86. In fact, it booted into x86 mode. Merced, the first implementation had a part of the chip called the IVE, Intel Value Engine, that implemented x86 very slowly.
You would boot in x86 mode and run some code to switch to ia64 mode.
HP saw the end of the road for their solo efforts on PA-RISC and Intel eyed the higher end market against SPARC, MIPS, POWER, and Alpha (hehe. all those caps) so they banded together to tackle the higher end.
But as AMD proved, you could win by scaling up instead of dropping an all-new architecture.
* worked at HP during the HP-Intel Highly Confidential project.
I used it for numerical simulations and it was very fast there. But on my workstation many common programs like "grep" were slower than on my cheap Athlon machine. (Both were running Red Hat Linux at the time.) I don't know how much of that was a compiler problem and how much was an architecture problem; the Itanium numerical simulation code was built with Intel's own compiler but all the system utilities were built with GNU compilers.
Wasn't the only compiler that produced code worth anything for Itanium the paid one from Intel? I seem to recall complaining about it on the GCC lists.
NOTHING produced good code for the original Itanium which is why they switched gears REALLY early on.
Intel first publicly mentioned Poulson all the way back in 2005 just FOUR years after the original chip was launched. Poulson was basically a traditional out-of-order CPU core that even had hyperthreading[0]. They knew really early on that the designs just weren't that good. This shouldn't have been a surprise to Intel as they'd already made a VLIW CPU in the 90s (i860) that failed spectacularly.
Even the i860 found more usage as a specialized CPU than the Itanium. The original Nextcube had an optional video card that used an i860 dedicated to graphics.
I lost track of it but HP, as co-architects, had its own compiler team working on it. I think SGI also had efforts to target ia64 as well.
But the EPIC (Explicitly Parallel Instruction Computing) didn't really catch on.
VLIW would need recompilation on each new chip but EPIC promised it would still run.
In the compiler world, these HP compiler folks are leading compiler teams/orgs at ~all the tech companies now, while almost none of the Intel compiler people seem to be around.
The stripped down ARM 8/9 for AArch64 is good for a lot of use-cases, but most of the vendor specific ASIC advanced features were never enabled for reliability reasons.
ARM is certainly better than before, but could have been much better. =3
The Itanium had some interesting ideas executed poorly. It was a bloated design by committee.
It should have been iterated on a bit before it was released to the world, but Intel was stressed by there being several 64-bit RISC-processors on the market already.
I acquired a copy of the Itanium manuals, and in flicking through it, you can barely get through a page before going "you did WHAT?" over some feature.
IIRC, wasn't part of the issue that compile-time instruction scheduling was a poor match with speculative execution and/or hardware-based branch prediction?
I.e., the compiler had no access to information that's only revealed at runtime?
Itanium was pointless when Alpha existed already and was already getting market penetration in the high end market. Intel played disgusting corporate politics to kill it and then push the ugly failed Itanium to market, only to have to panic back to x86_64 later.
I have no idea how/why Intel got a second life after that, but they did. Which is a shame. A sane market would have punished them and we all would have moved on.
> I have no idea how/why Intel got a second life after that, but they did.
For the same reason the line "No one ever got fired for buying IBM." exists. Buying AMD at large companies was seen as a gamble that deciders weren't will to make. Even now, if you just call up your account managers at Dell, HP, or Lenovo asking for servers or PCs, they are going to quote you Intel builds unless you specifically ask. I don't think I've ever been asked by my sales reps if I wanted an Intel or AMD CPU. Just how many slots/cores, etc.
Historically, when Intel is on their game, they have great products, and better than most support for OEMs and integrators. They're also very effective at marketting and arm twisting.
The arm twisting gets them through rough times like itanium and pentium4 + rambus, etc. I still think they can recover from the 10nm fab problems, even though they're taking their sweet time.
Gordon Moore tried to link up with Intel when he was at DEC. Alpha would have become Intels 64 bit architecture. This of course didn't happen and Intel instead linked up with DEC biggest competitor HP, and adopted their, much, much worse VLIW architecture.
Imagine a future where Intel and Apple both adopt DEC and Alpha instead of Intel HP and Apple IBM.
Fun fact: Bob Colwell (chief architect of the Pentium Pro through Pentium 4) recently revealed that the Pentium 4 had its own 64-bit extension to x86 that would have beaten AMD64 to market by several years, but management forced him to disable it because they were worried that it would cannibalize IA64 sales.
> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.
https://www.quora.com/How-was-AMD-able-to-beat-Intel-in-deli...
"If you don't cannibalize yourself, someone else will."
Intel has a strong history of completely mis-reading the market.
Andy Grove, "Only the paranoid survive":-
Quote: Business success contains the seeds of its own destruction. Success breeds complacency. Complacency breeds failure. Only the paranoid survive.
- Andy Grove, former CEO of Intel
From wikipedia: https://en.wikipedia.org/wiki/Andrew_Grove#Only_the_Paranoid...
Takeaway: Be paranoid about MBAs running your business.
> Takeaway: Be paranoid about MBAs running your business.
Except Andy is talking about himself, and Noyce the engineers getting it wrong: (watch a few minutes of this to get the gist of where they were vs Japan) https://www.youtube.com/watch?v=At3256ASxlA&t=465s
Intel has a long history of sucking, and other people stepping in to force them to get better. Their success has been accident and intervention over and over.
And this isnt just an intel thing, this is kind of an American problem (and maybe a business/capitalism problem). See this take on steel: https://www.construction-physics.com/p/no-inventions-no-inno... that sounds an awful lot like what is happening to intel now.
It wasn't recent; Yamhill has been known since 2002. A detailed article about this topic just came out: https://computerparkitecture.substack.com/p/the-long-mode-ch...
The story I heard (which I can't corroborate) was that it was Microsoft that nixed Intel's alternative 64-bit x86 ISA, instead telling it to implement AMD's version instead.
Microsoft did port some versions of Windows to Itanium, so they did not reject it at first.
With poor market demand and AMD's success with amd64, Microsoft did not support itanium in vista and later desktop versions which signaled the end of Intel's Itanium.
Microsoft also ships/shipped a commercial compiler with tons of users, and so they were probably in a position to realize early that the hypothetical "sufficiently smart compiler" which Itanium needed to reach its potential wasn't actually possible.
I wanted to mention that the Pentium 4 (Prescott) that was marketed as the Centrino in laptops had 64bit capabilities, but it was described as 32bit extended mode. I remember buying a laptop in 2005(?) which I first ran with XP 32bit, and then downloading the wrong Ubuntu 64bit Dapper Drake image, and the 64bit kernel was running...and being super confused about it.
Also, for a long while, Intel rebranded the Pentium 4 as Intel Atom, which then usually got an iGPU on top with being a bit higher in clock rates. No idea if this is still the case (post Haswell changes) but I was astonished to buy a CPU 10 years later to have the same kind of oldskool cores in it, just with some modifications, and actually with worse L3 cache than the Centrino variants.
core2duo and core2quad were peak coreboot hacking for me, because at the time the intel ucode blob was still fairly simple and didn't contain all the quirks and errata fixes that more modern cpu generations have.
Speaking of marketing, that era of Intel was very weird for consumers. In the 1990s, they had iconic ads and words like Pentium or MMX became powerful branding for Intel. In the 2000s I think it got very confused. Centrino? Ultrabook? Atom? Then for some time there was Core. But it became hard to know what to care about and what was bizarre corporate speak. That was a failure of marketing. But maybe it was also an indication of a cultural problem at Intel.
Centrino was Intel's brand for their wireless networking and laptops that had their wireless chipsets, the CPUs of which were all P6-derived (Pentium M, Core Duo).
Possibly you meant Celeron?
Also the Pentium 4 uarch (Netburst) is nothing like any of the Atoms (big for the time out-of-order core vs. a small in-order core).
Pentium 4 was never marketed as Centrino - that came in with the Pentium M, which was very definitely not 64-bit capable (and didn't even officially have PAE support to begin with). Atom was its own microarchitecture aimed at low power use cases, which Pentium 4 was definitely not.
Are you referring to PAE? [1]
[1] https://en.wikipedia.org/wiki/Physical_Address_Extension
I remember at the time thinking it was really silly for Intel to release a 64-bit processor that broke compatibility, and was very glad AMD kept it. Years later I learned about kernel writing, and I now get why Intel tried to break with the old - the compatibility hacks piled up on x86 are truly awful. But ultimately, customers don't care about that, they just want their stuff to run.
Intel might have been successful with the transition if they didn't decide to go with such radically different and real-world untested architecture for Itanium.
Well that and Itanium was eyewateringly expensive and standard PC was much cheaper for similar or faster speeds.
It is worth noting that at the turn of the century x86 wasn't yet so utterly dominant yet. Alphas, PowerPC, MIPS, SPARC and whatnot were still very much a thing. So that is part why running x86 software was not as high priority, and maybe even compatibility with PA-RISC would have been a higher priority.
The writing was on the wall once Linux was a thing. I did alot of solution design in that period. The only times there were good business cases in my world for not-x86 were scenarios where DBAs and some vertical software required Sun, and occasionally AIX or HPUX for license optimization or some weird mainframe finance scheme.
The cost structure was just bonkers. I replaced a big file server environment that was like $2M of Sun gear with like $600k of HP Proliant.
Is that true in 2000, especially as consumer PCs ramped up?
Well, according to some IA-64 was a planned flop with the whole purpose of undermining HP's supercomputer division.
Nah, HP made bank on their superdome computers even though they had very few clients. People paid through the nose for those. I worked on IA-64 stuff in 2011, long after I thought it was dead :D.
The real thing that killed the division is Oracle announcing that they would no longer support IA-64. It just so happened that like 90% of the clients using Itanium were using it for oracle DBs.
But by that point HP was already trying to get people to transition to more traditional x86 servers that they were selling.
Nitpick: The author states that removal of 16-bit in Windows 64 was a design decision and not a technical one. That’s not quite true.
When AMD64 is in one of the 64-bit modes, long mode (true 64-bit) or compatibility mode (64-bit with 32-bit compatibility), you can not execute 16-bit code. There are tricks to make it happen, but they all require switching the CPU mode, which is insecure and can cause problems in complex execution environments (such as an OS).
If Microsoft (or Linux, Apple, etc) wanted to support 16-bit code in their 64-bit OSes, they would have had to create an emulator+VM (such as OTVDM/WineVDM) or make costly hacks to the OS.
Microsoft has just such an emulator. Via Windows source code leaks the NTVDM (Virtual DOS Machine) from 32-bit Windows versions has been built for 64-bit Windows targets[0].
I don't understand why Microsoft chose to kill it. That's not in their character re: backwards compatibility.
[0] https://github.com/leecher1337/ntvdmx64
Edit: Some nice discussion about the NTVDMx64 when it was released: https://www.vogons.org/viewtopic.php?t=48443
NTVDM requires Virtual 8086 mode in the processor. This doesn't exist in the 64-bit modes, requiring a software emulator. That is why OTVDM/WineVDM exist.
You can see all of this explained in the README for the very project you linked:
```
How does it work?
=================
I never thought that it would be possible at all, as NTVDM on Win32 uses V86 mode of the CPU for fast code execution which isn't available in x64 long mode. However I stumbled upon the leaked Windows NT 4 sourcecode and the guys from OpenNT not only released the source but also patched it and included all required build tools so that it can be compiled without installing anything but their installation package. The code was a pure goldmine and I was curious how the NTVDM works.
It seems that Microsoft bought the SoftPC solution from Insignia, a company that specialised in DOS-Emulators for UNIX-Systems. I found out that it also existed on MIPS, PPC and ALPHA Builds of Windows NT 4 which obviously don't have a V86 mode available like Intel x86 has. It turned out that Insignia shipped SoftPC with a complete emulated C-CPU which also got used by Microsoft for MIPS, PPC and ALPHA-Builds.
```
As to why they didn't continue with that solution, because they didn't want to rely on SoftPC anymore or take on development themselves for a minuscule portion of users who would probably just use 32-bit Windows anyways.
Yeah. Like I said, Microsoft had the emulator. NTVDM on x64 is handled just like MIPS or Alpha, by using the SoftPC emulator. It's just a new CPU architecture.
They had a proven and tested emulator yet they chose not to build it for the new x64 CPU architecture. It turns out that it wasn't too hard to build for the new architecture either. That's the crux of my confusion.
It's not like SoftPC was new and unproven code. It doesn't feel like it would have been a major endeavor to keep supporting it.
Obviously, I don't know Microsoft's telemetry told them re: the number of 16-bit application users. I know it impacted a number of my Customers (some of whom are running DOSBox today to keep old fit-for-purpose software working) and I don't support a ton of offices or people.
It seems out of character for Microsoft to make their Customers throw away software.
It's not so much running 16 bit code, but running something that wants to run on bare metal, i.e. DOS programs that access hardware directly. Maintaining the DOS virtualization box well into the 21st century probably wasn't worth it.
> The 64-bit builds of Windows weren’t available immediately.
There was a year or so between the release of AMD-64 and the first shipping Microsoft OS that supported it.[1] It was rumored that Intel didn't want Microsoft to support AMD-64 until Intel had compatible hardware. Anyone know? Meanwhile, Linux for AMD-64 was shipping, which meant Linux was getting more market share in data centers.[1]
I've written code to call 16-bit code from 64-bit code that works on Linux (because that's the only OS where I know the syscall to modify the LDT).
It's actually no harder to call 16-bit code from 64-bit code than it is to call 32-bit code from 64-bit code... you just need to do a far return (the reverse direction is harder because of stack alignment issues). The main difference between 32-bit and 16-bit is that OS's support 32-bit code by having a GDT entry for 32-bit code, whereas you have to go and support an LDT to do 16-bit code, and from what I can tell, Windows decided to drop support for LDTs with the move to 64-bit.
The other difficulty (if I've got my details correct) is that returning from an interrupt into 16-bit code is extremely difficult to do correctly and atomically, in a way that isn't a problem for 32-bit or 64-bit code.
Executing 16-bit code in Compatibility Mode (not Long Mode) is possible, that's not the problem. The problem is lack of V86 allowing legacy code to run. So Real Mode code is out wholesale (a sizable chunk of legacy software) and segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).
So yes, you can write/run 16-bit code in 64-bit Compatibility Mode. You can't execute existing 16-bit software in 64-bit Compatibility Mode. The former is a neat trick, the latter is what people actually expect "16-bit compatibility" to mean.
> segmented memory is out in Protected Mode (nearly the totality of remaining 16-bit code).
No, segmented memory is exactly what you can get working. You set up the segments via the LDT, which is still supported even in 64-bit mode; this is how Wine is able to execute Win16 code on 64-bit Linux. (Reading Wine code is how I figured out how to execute 16-bit code from 64-bit code in the first place!)
What doesn't work, if my memory serves me correctly, is all the call gate and task gate stuff. Which is effectively building blocks for an OS kernel that everyone tossed out in the early 90s and instead went with kernel-mode and user-mode with the syscalls (first software interrupts and then the actual syscall instruction in x86-64). You don't need any of that stuff to run most 16-bit code, you just need to emulate the standard Windows DLLs like kernel, ntdll, and user.
A couple of details missing from the article:
- Intel quietly introduced their implementation of amd64 under the name "EM64T". It was only later that they used the name "Intel64".
- Early Itanium processors included hardware features, microcode and software that implemented an IA‑32 Execution Layer (dynamic binary translation plus microcode assists) to run 32‑bit x86 code; while the EL often ran faster than direct software emulation, it typically lagged native x86 performance and could be worse than highly‑optimised emulators for some workloads or early processor steppings.
You go to a small shop recommended by a friend, he convinces you to get AMD despite Intel still being the reigning default. You get it home, doing a little research you realize the CPU is the best performance per price in the recent CPUs. Now you know you trusted the right person
About this part:
> In 2004, Intel wrote off the Itanium and cloned AMD64.
AMD introduced x86-64 in 2003. You don't just clone an ISA (even if based on AMD documents), design it, fab it etc. in a year or two. Intel must have been working on this well before AMD introduced the Athlon64.
The ISA was published in 2000, there was plenty of time to start working on an implementation before AMD shipped actual product.
I was one of those weird users who used the 64-bit version of Windows XP, with what I'm pretty sure was an Athlon 64 X2, both the first 64-bit chip and first dual-core one that I had.
XP64 shared a lot with Windows Server 2003. Perhaps the best Windows ever released.
I remember my Athlon 64 machine.
The last one to run Windows XP.
Core memories for me were my pc builds for the Athlon Thunderbird and later the Athlon 64 FX-60. What an experience it was to fire those machines up and feel the absolutely gigantic performance improvements.
I had a Soltek socket 754 build with chrome OCZ memory and a 9800 pro that was flashed to XT. I loved that the motherboard was black/purple.
Makes me want to play need for speed underground and drink some bawls energy
How AMD turned the tables on Intel? It always felt more like a tale of how Intel turned their back on x86.
At least with Itanium Intel was trying something fresh. In comparison, the Pentium 4 arch was extra bad because it had a very long pipeline to achieve high core frequencies. Branch mispredictions were thus very costly. And it was soon obvious that the process wouldn't scale much above 3Ghz without wasting humongous amounts of power, defeating the long pipeline's purpose.
How many feet does Intel actually have? It seems as if they have shot themselves in 4 or 5 - is it any wonder they can hardly walk?
They have also made a lot of successful products and come backs. While the Pentium 4 lost out to the Athlon's and their marketshare dropped they then released the Core series of CPUs and the Core 2 Duo was a huge hit and marked the beginning of the dark ages for AMD until they released Ryzen.
As a company they have had long periods of dominance potted with big losses to AMD on the CPU front which they always claw back. They seem this time to be taken out by their inability to get their fabs online more than anything their competitor is doing.
Chiplets were a great move that kept yields up on aggressive process shrinks and prices low.
AMD was beating the on performance before Athlon and Athlon 64 made it simply clear to everybody.
Intel spent literally 8 years and many, many billions and billions of $ to do everything possible to prevent AMD from getting volume.
The had so much production capacity and AMD so little, that they basically had the ability to pay every single large OEM not to use AMD. If you as company used AMD, you would instantly lose billions of $, you would be the last Intel costumer served, you wouldn't get the new chips early on and potentially much more. OEM were terrified of Intel. Because Intel and Microsoft were so dominate OEMs made terrible margin, and Intel could basically crush them. Intel used to joke that OMEs were their distributes nothing more.
This was to the point where AMD offered free chips to people and they refused it.
AMD had a long period of time where they had better product, but the couldn't sustaining investing in better products and fighting so many legal battles. And the regulators around the world took to long and were to soft on Intel.
Intel in the 80s invested big in memory, and got crushed by Japan. They invested big into the iAPX 186 and got crushed, it was horrible product. Luckily they were saved by the PC and were then able to have exclusivity on the back of the i386.
By the late 90s AMD was better then them and that persisted for almost 10 years. And then they took the lead for for about 8 years and then lost it. And they didn't lose it because of the fabs I don't think. When they lost on the fabs they just fell further behind.
Its really the late 80s and 90s gigantic PC boom that gave them the crazy manufacturing and market lead that AMD was not able to overcome the 10 years after that.
When you have 90% market share you can afford to make a lot of mistakes.
Good article. I remember being very skeptical of Athlon because the K6 I owned before was subjectively muss less stable than any Intel I had used until then. So felt it was only a question of time until IA64 would establish itself. Since, after all, Intel had the power to buy itself into a leader position. That feeling that AMD isn't quite as stable never really left until a few years ago, where with Spectre, I then thought that Intel was now playing catch-up with mobile-phone-like tactics rather that being design-superior.
Now again, Intel had a great opportunity with Xe but it feels like they just can't get their horsepower transferred onto the road. Not bad by any means, but something's just lacking.
Meanwhile, Qualcomm is announcing it's snapdragon X2 .. if only they could bring themselves to ensuring proper Linux support ..
Youngsters today don't remember it; x86 was fucking dead according to the press; it really wasn't until Athlon 64 came out (which gave a huge bump to Linux as it was one of the first OSes to fully support it - one of the reasons I went to Gentoo early on was to get that sweet 64 bit compilation!) that everyone started to admit the Itanium was a turd.
The key to the whole thing was that it was a great 32 bit processor; the 64 bit stuff was gravy for many, later.
Apple did something similar with its CPU changes - now three - they only swap when the old software runs better on the new chip even if emulated than it did on the old.
AMD64 was also well thought out; it wasn't just a simple "have two more bytes" slapped on 32 bit. Doubling the number of general purpose registers was noticeable - you took a performance hit going to 64 bit early on because all the memory addresses were wider, but the extra registers usually more than made up for it.
This is also where the NX bit entered.
100% -- the conventional wisdom was that the x86 architecture was too riddled with legacy and complexity to improve its performance, and was a dead end.
Itanium never met an exotic computer architecture journal article that it didn't try and incorporate. Initially this was viewed as "wow such amazing VLIW magic will obviously dominate" and subsequently as "this complexity makes it hard to write a good compiler for, and the performance benefit just doesn't justify it."
Intel had to respond to AMD with their "x86-64" copy, though it really didn't want to.
Eventually it became obvious that the amd64/x64/x86-64 chips were going to exceed Itanium in performance, and with the massive momentum of legacy on its side and Itanium was toast.
Back in that era I went to an EE380 talk at Stanford where the people from HP trying to do a compiler for Itanium spoke. It the project wasn't going well at all. Itanium is an explicit-parallelism superscalar machine. The compiler has to figure out what operations to do in parallel. Most superscalar machines do that during execution. Instruction ordering and packing turned out to be a hard numerical optimization problem. The compiler developers sounded very discouraged.
It's amazing that retirement units, the part of a superscalar CPU that puts everything back together as the parallel operations finish, not only work but don't slow things down. The Pentium Pro head designer had about 3,000 engineers working at peak, which indicates how hard this is. But it all worked, and that became the architecture of the future.
This was around the time that RISC was a big thing. Simplify the CPU, let the compiler do the heavy lifting, have lots of registers, make all instructions the same size, and do one instruction per clock. That's pure RISC. Sun's SPARC is an expression of that approach. (So is a CRAY-1, which is a large but simple supercomputer with 64 of everything.) RISC, or something like it, seemed the way to go faster. Hence Itanium. Plus, it had lots of new patented technology, so Intel could finally avoid being cloned.
Superscalars can get more than one instruction per clock, at the cost of insane CPU complexity. Superscalar RISC machines are possible, but they lose the simplicity of RISC. Making all instructions the same size increases the memory bandwidth the CPU needs. That's where RISC lost out over x86 extensions. x86 is a terse notation.
So we ended up with most of the world still running on an instruction set based on the one Harry Pyle designed when he was an undergrad at Case in 1969.
If I am remembering correctly, this was also a good time to be in Linux. Since the Linux world operated on source code rather than binary blobs, it was easier to convert software to run 64-bit native. Non-trivial in an age of C, but still much easier than the commercial world. I had a much more native 64-bit system running a couple of years before it was practical in the Windows world.
Linux for Alpha probably deserves some credit for getting everything 64-bit-ready years before x86-64 came out.
It also helps that linux had a much better 32-bit compatibility than windows did. Not sure why but it probably has something to do with legacy support windows shed moving to 64-bits.
Up until Athlon your best bet for a 64 bit system was a DEC Alpha running RedHat. Amazing levels of performance for a manageable amount of money.
Itanium wasn’t a turd. It was just not compatible with x86. And that was enough to sink it.
It absolutely was. It was possible, hypothetically, to write a chunk of code that ran very fast. There were any number of very small bits of high-profile code which did this. However, it was impossible to make general-purpose, not-manually-tuned code run fast on it. Itanium placed demands on compiler technology that simple didn't exist, and probably still don't.
Basically, you could write some tuned assembly that would run fast on one specific Itanium CPU release by optimizing for its exact number of execution units, etc. It was not possible to run `./configure && make && make install` for anything not designed with that level of care and end up with a binary that didn't run like frozen molasses.
I had to manage one of these pigs in a build farm. On paper, it should've been one of the more powerful servers we owned. In practice, the Athlon servers were several times faster at any general purpose workloads.
Itanium was compatible with x86. In fact, it booted into x86 mode. Merced, the first implementation had a part of the chip called the IVE, Intel Value Engine, that implemented x86 very slowly.
You would boot in x86 mode and run some code to switch to ia64 mode.
HP saw the end of the road for their solo efforts on PA-RISC and Intel eyed the higher end market against SPARC, MIPS, POWER, and Alpha (hehe. all those caps) so they banded together to tackle the higher end.
But as AMD proved, you could win by scaling up instead of dropping an all-new architecture.
* worked at HP during the HP-Intel Highly Confidential project.
I used it for numerical simulations and it was very fast there. But on my workstation many common programs like "grep" were slower than on my cheap Athlon machine. (Both were running Red Hat Linux at the time.) I don't know how much of that was a compiler problem and how much was an architecture problem; the Itanium numerical simulation code was built with Intel's own compiler but all the system utilities were built with GNU compilers.
>Itanium wasn’t a turd
It required immense multi-year efforts from compiler teams to get passable performance with Itanium. And passable wasn't good enough.
The IA-64 architecture had too much granularity of control dropped into software. Thus, reliable compiler designs were much more difficult to build.
It wasn't a bad chip, but like Cell or modern Dojo tiles most people couldn't run it without understanding parallelism and core metastability.
amd64 wasn't initially perfect either, but was accessible for mere mortals. =3
Wasn't the only compiler that produced code worth anything for Itanium the paid one from Intel? I seem to recall complaining about it on the GCC lists.
NOTHING produced good code for the original Itanium which is why they switched gears REALLY early on.
Intel first publicly mentioned Poulson all the way back in 2005 just FOUR years after the original chip was launched. Poulson was basically a traditional out-of-order CPU core that even had hyperthreading[0]. They knew really early on that the designs just weren't that good. This shouldn't have been a surprise to Intel as they'd already made a VLIW CPU in the 90s (i860) that failed spectacularly.
[0]https://www.realworldtech.com/poulson/
Even the i860 found more usage as a specialized CPU than the Itanium. The original Nextcube had an optional video card that used an i860 dedicated to graphics.
I lost track of it but HP, as co-architects, had its own compiler team working on it. I think SGI also had efforts to target ia64 as well. But the EPIC (Explicitly Parallel Instruction Computing) didn't really catch on. VLIW would need recompilation on each new chip but EPIC promised it would still run.
https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
In the compiler world, these HP compiler folks are leading compiler teams/orgs at ~all the tech companies now, while almost none of the Intel compiler people seem to be around.
I have worked next to an Itanium machine. It sounds like a helicopter - barely able to meet the performance requirements.
We have come a long way from that to arm64 and amd64 as the default.
The stripped down ARM 8/9 for AArch64 is good for a lot of use-cases, but most of the vendor specific ASIC advanced features were never enabled for reliability reasons.
ARM is certainly better than before, but could have been much better. =3
The Itanium had some interesting ideas executed poorly. It was a bloated design by committee.
It should have been iterated on a bit before it was released to the world, but Intel was stressed by there being several 64-bit RISC-processors on the market already.
IIRC it didn't even do great against POWER and other bespoke OS/Chip combos, though it did way better there than generic x86.
I acquired a copy of the Itanium manuals, and in flicking through it, you can barely get through a page before going "you did WHAT?" over some feature.
Itanium was mostly a turd because it pushed so many optimization issues onto the compiler.
IIRC, wasn't part of the issue that compile-time instruction scheduling was a poor match with speculative execution and/or hardware-based branch prediction?
I.e., the compiler had no access to information that's only revealed at runtime?
Itanium was pointless when Alpha existed already and was already getting market penetration in the high end market. Intel played disgusting corporate politics to kill it and then push the ugly failed Itanium to market, only to have to panic back to x86_64 later.
I have no idea how/why Intel got a second life after that, but they did. Which is a shame. A sane market would have punished them and we all would have moved on.
> I have no idea how/why Intel got a second life after that, but they did.
For the same reason the line "No one ever got fired for buying IBM." exists. Buying AMD at large companies was seen as a gamble that deciders weren't will to make. Even now, if you just call up your account managers at Dell, HP, or Lenovo asking for servers or PCs, they are going to quote you Intel builds unless you specifically ask. I don't think I've ever been asked by my sales reps if I wanted an Intel or AMD CPU. Just how many slots/cores, etc.
The Intel chipsets were phenomenally stable; the AMD ones were always plagued by weird issues.
Historically, when Intel is on their game, they have great products, and better than most support for OEMs and integrators. They're also very effective at marketting and arm twisting.
The arm twisting gets them through rough times like itanium and pentium4 + rambus, etc. I still think they can recover from the 10nm fab problems, even though they're taking their sweet time.
Gordon Moore tried to link up with Intel when he was at DEC. Alpha would have become Intels 64 bit architecture. This of course didn't happen and Intel instead linked up with DEC biggest competitor HP, and adopted their, much, much worse VLIW architecture.
Imagine a future where Intel and Apple both adopt DEC and Alpha instead of Intel HP and Apple IBM.
“Sane market” sounds like an oxymoron, technology markets have multiple failed attempts at doing the sane thing.
Intel and trying to kill their most successful product name a better duo.
When amd64 came out, Sun should have started to migrate out of SPARC.
Ironically it is Itanium that killed of most of the RISC competition, but its the Athlon that actually delivered on that killing blow.
Will there be A 128bit revolution coming soon ?
Not until we have technology for exabyte-scale memories (read: not any time soon).
Having 128bit in the adress bus is useless, no questions there !
But what about pointers provenance, tagging and capability. Having more bits would be useful to implement something like CHERI.
Why would there be?