Binaries

(fzakaria.com)

24 points | by todsacerdoti 3 hours ago ago

11 comments

  • stncls a few seconds ago ago

    > The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.

    Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)

  • gerikson 2 hours ago ago

    The HN de-sensationalize algo for submission titles needs tweaking. Original title is simply "Huge Binaries".

  • yjftsjthsd-h 38 minutes ago ago

    > I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

    I am very sympathetic to wanting nice static binaries that can be shipped around as a single artifact[0], but... surely at some point we have to ask if it's worth it? If nothing else, that feels like a little bit of a code smell; surely if your actual executable code doesn't even fit in 2GB it's time to ask if that's really one binary's worth of code or if you're actually staring at like... a dozen applications that deserve to be separate? Or get over it the other way and accept that sometimes the single artifact you ship is a tarball / OCI image / EROFS image for systemd[1] to mount+run / self-extracting archive[2] / ...

    [0] Seriously, one of my background projects right now is trying to figure out if it's really that hard to make fat ELF binaries.

    [1] https://systemd.io/PORTABLE_SERVICES/

    [2] https://justine.lol/ape.html > "PKZIP Executables Make Pretty Good Containers"

    • jmmv 10 minutes ago ago

      This is something that always bothered me while I was working at Google too: we had an amazing compute and storage infrastructure that kept getting crazier and crazier over the years (in terms of performance, scalability and redundancy) but everything in operations felt slow because of the massive size of binaries. Running a command line binary? Slow. Building a binary for deployment? Slow. Deploying a binary? Slow.

      The answer to an ever-increasing size of binaries was always "let's make the infrastructure scale up!" instead of "let's... not do this crazy thing maybe?". By the time I left, there were some new initiatives towards the latter and the feeling that "maybe we should have put limits much earlier" but retrofitting limits into the existing bloat was going to be exceedingly difficult.

  • doubletwoyou an hour ago ago

    25 GiB for a single binary sounds horrifying

    at some point surely some dynamic linking is warranted

    • nneonneo an hour ago ago

      To be fair, this is with debug symbols. Debug builds of Chrome were in the 5GB range several years ago; no doubt that’s increased since then. I can remember my poor laptop literally running out of RAM during the linking phase due to the sheer size of the object files being linked.

      Why are debug symbols so big? For C++, they’ll include detailed type information for every instantiation of every type everywhere in your program, including the types of every field (recursively), method signatures, etc. etc., along with the types and locations of local variables in every method (updated on every spill and move), line number data, etc. etc. for every specialization of every function. This produces a lot of data even for “moderate”-sized projects.

      Worse: for C++, you don’t win much through dynamic linking because dynamically linking C++ libraries sucks so hard. Templates defined in header files can’t easily be put in shared libraries; ABI variations mean that dynamic libraries generally have to be updated in sync; and duplication across modules is bound to happen (thanks to inlined functions and templates). A single “stuck” or outdated .so might completely break a deployment too, which is a much worse situation than deploying a single binary (either you get a new version or an old one, not a broken service).

      • tempay an hour ago ago

        I’ve seen LLVM dependent builds hit well over 30GB. At that point it started breaking several package managers.

      • yjftsjthsd-h an hour ago ago

        Can't debug symbols be shipped as separate files?

      • 01HNNWZ0MV43FF an hour ago ago

        I've hit the same thing in Rust, probably for the same reasons.

        Isn't the simple solution to use detached debug files?

        I think Windows and Linux both support them. That's how phones like Android and iOS get useful crash reports out of small binaries, they just upload the stack trace and some service like Sentry translates that back into source line numbers. (It's easy to do manually too)

        I'm surprised the author didn't mention it first. A 25 GB exe might be 1 GB of code and 24 GB of debug crud.

    • 0xbadcafebee an hour ago ago

      To be fair, they worked at Google, their engineering decisions are not normal. They might just decide that 25 GiB binaries are worth a 0.25% speedup at start time, potentially resulting in tens of millions of dollars' worth of difference. Nobody should do things the way Google does, but it's interesting to think about.

  • a_t48 31 minutes ago ago

    I've seen terrible, terrible binary sizes with Eigen + debug symbols, due to how Eigen lazy evaluation works (I think). Every math expression ends up as a new template instantiation.