SDS: Simple Dynamic Strings library for C

(github.com)

107 points | by klaussilveira 3 days ago ago

38 comments

  • xenotux 9 hours ago ago

    I'm surprised this is aliased to char*, not const char*. The benefit of the aliasing is convenience, but the main risk is absent-mindedly passing it to a libc function that modifies the string without updating the SDS metadata. Const would result in a compiler warning while letting the intended use cases (e.g., the printf example) work fine.

    • mid-kid 6 hours ago ago

      The only thing the SDS metadata holds is the string's length. Just like how you'd have to realloc() a regular string before using strcat(), you have to sdsgrowzero() an sds string before using strcat(). Basically, standard libc functions that tamper with the string have the same constraints as malloc()ed strings in terms of safety, only you might want to call sdsupdatelen() after truncating a string.

  • antirez 8 hours ago ago

    Hi! The Redis tree contains more advanced versions of this library. Most of the development continued there, eventually.

    • jacquesm 8 hours ago ago

      It might be worth extracting it back out, it seems pretty useful.

      • antirez 7 hours ago ago

        Indeed, but in some way the Redis version is a bit too Redis-ish, that is, memory saving concerns are taken to the extreme instead of having a more balanced approach about simplicity. In my YouTube channel C course, I'm showing something similar to SDS in the latest lessons, and I may use SDS again in later course in order to show how to integrate back the useful features that diverged. Maybe an SDS3 maybe a middle ground among the Redis version, some API error that should be corrected (but not in Redis: not worth it), and other improvements.

    • underdeserver 8 hours ago ago

      Hi! Are there no performance penalties that come from alignment issues? Or are your prefix structs aligned to cache line sizes?

    • klaussilveira 7 hours ago ago

      Thank you for creating sds, btw. Very useful to have it on the toolbelt.

      Oh, and redis. That too. :)

  • aidenn0 3 hours ago ago

    Was:

      s = sdscat(s,"Some more data");
    
    Chosen over

      stscat(&s, "Some more data");
    
    For performance reasons, or something else?
    • rwbt 2 hours ago ago

      I think it makes it obvious that the string 's' will be mutated.

  • dang 8 hours ago ago

    Related:

    Simple Dynamic Strings library for C, compatible with null-terminated strings - https://news.ycombinator.com/item?id=21692400 - Dec 2019 (83 comments)

    Simple Dynamic Strings library for C - https://news.ycombinator.com/item?id=7190664 - Feb 2014 (127 comments)

  • jgarzik 2 hours ago ago

    enums > defines

  • zoddie 10 hours ago ago

    Why not just use C++ strings and string_views? So weird to see this masochistic obsession some people have with doing everything in plain C.

    It's 2026, there are better, more memory safe, more efficient solutions out there.

    • MontyCarloHall 9 hours ago ago

      To actually answer your question (beyond the snark/appeal to authority replies you’ve already gotten), there are a couple good reasons:

      — You're working in embedded development (but somehow need a full-fledged dynamic string library).

      — While it's true that C++ is (almost) a strict superset of C, and “you don’t pay for what you don’t use” is a good rule of thumb, it can be very hard to restrict a team of developers to eschew all that complexity you dearly pay for and treat C++ as “C with classes and the STL.” Without very strict coding standards (and a means of enforcing them), letting a team of developers use C++ is often opening a Pandora's Box of complex, obscure language features. Restricting a project to plain old C heads that off at the pass.

      • LegionMammal978 9 hours ago ago

        > You're working in embedded development (but somehow need a full-fledged dynamic string library).

        The situation isn't all that implausible: e.g., many ESP32-based devices want to work with strings to interface with HTTP servers, and they do have C++ support, but the size limit is small enough that you can easily bump your head into it if you aren't careful.

        • mikepurvis 9 hours ago ago

          Or anything processing JSON— it's nice to be able to get string views directly into the original payload without having to copy them into fixed size buffers elsewhere.

      • aidenn0 3 hours ago ago

        As soon as you move to "C with classes and the STL" you've now also bought into exceptions, as the STL is not even remotely ergonomic with exceptions disabled.

      • duped 8 hours ago ago

        > and the STL

        Even this has a lot of "payment" for what you don't use. Even some C++ libraries forbid it just because of the size of debug symbols.

    • tbrockman 9 hours ago ago

      > SDS was a C string I developed in the past for my everyday C programming needs, later it was moved into Redis where it is used extensively and where it was modified in order to be suitable for high performance operations. Now it was extracted from Redis and forked as a stand alone project.

    • Zambyte 9 hours ago ago

      Suggesting C++ as a solution in the face of "masochist obsession" is... an interesting choice :-D

      • rossant 9 hours ago ago

        Totally. As a C developer, I really suffer whenever I need to touch anything C++.

    • simonebrunozzi 9 hours ago ago

      This is the guy that created Redis. I would look at his repos in a different way.

    • derefr 9 hours ago ago

      If your code is plain C, then anyone can extend it with, or embed it inside, code of literally any other language; and in so doing, they will have full access/exposure to everything in your codebase — all the same stuff that they would if they were writing their host/extension code in C.

      This is not true of C++ (or most other languages):

      • C++ has a runtime (however minimal); and so, by including any C++ code in a codebase, you're making it much more difficult to link/embed the resulting code — you now have to also dynamically link the C++ runtime, and ensure that your host code spins it up "early", before any of the linked C++ code gets to run. (This may even be impossible in some host languages!)

      • Also, even if there was no associated runtime to deal with, C++ isn't wholly C-FFI-clean. All the stuff that people like about C++ — all the reasons you'd want to use C++ — result in codebases that aren't cleanly C-FFI exposable, due to name mangling, functions taking parameters with non-C-exportable types, methods + closures not being C-FFI thunkable [and functions returning those], etc.

      • And even if you bite that bullet, and write your library in C++ but carefully wrap its API to give it C-FFI-clean linkage (usually via a hybrid C / C++ project), this still introduces a layer of FFI runtime overhead. When another non-C language consumes your code, it's then getting double FFI overhead — a call from its code to yours has to convert from its abstractions, to C's abstractions, to C++'s abstractions, and back. (This is why you don't tend to see e.g. non-C++ projects embedding LLVM, or LLVM being extended with non-C++ passes, despite LLVM being designed in this "C wrapper around a C++ core" style.)

      C is one of the only languages with a zero-impedance-mismatch, zero-overhead default or forced binding of external symbols to the C FFI (i.e. the C set of platform ABIs + C symbol naming standard.)

      The others that do this are: C3 (https://c3-lang.org/); Zig, unless you do weird things on purpose, and... that's really it. Everything else has the same two problems as C++ outlined above.

      Even Rust, even Odin, etc. only provide C-FFI linkage as an opt-in feature; and they do nothing to incentivize use of it; and so, of course, due to their useful non-C-FFI-clean features, developers are disincentivized from ever enabling it before they "need" it. So in practice, most libraries in those languages are not consumable from C [or other C-FFI-compatible languages] — and most software in those languages are not extendable in C [or another C-FFI-compatible language] — without extra effort on the upstream's part to add explicit support for doing that. And most upstreams don't bother.

      Writing software in C itself, is essentially a way for a project to "tie itself to the mast" and commit to its ABI always being C-FFI clean; such that it can be consumed not only from C, but also from any other language a project might use that supports importing C-FFI libraries. (Which is most languages.)

      • aidenn0 3 hours ago ago

        C also has a (granted very small on Unix) runtime[1]. On windows the C runtime is a bit larger, since windows processes get a single string for their arguments, which must be parsed into argc/argv.

        As far as "you have to also dynamically link the C++ runtime", try calling malloc from two different libc implementations in the same process and see "interesting" things happen. Even more interesting is calling free() on a pointer that was malloc'd from a different C library.

        1: E.g. for musl https://git.musl-libc.org/cgit/musl/tree/crt

      • anitil 5 hours ago ago

        > C++ has a runtime (however minimal)

        I'm not familiar with this, are you able to explain it? Do you mean something analogous to _start?

      • CyberDildonics 7 hours ago ago

        C++ has a runtime (however minimal);

        No it doesn't.

        Also, even if there was no associated runtime to deal with, C++ isn't wholly C-FFI-clean

        Yes it is, you just extern "C" whatever you want.

        All the stuff that people like about C++ — all the reasons you'd want to use C++ — result in codebases that aren't cleanly C-FFI exposable

        Not true at all, the biggest two things, destructors and move semantics you still have everywhere except for the boundaries with C.

        And even if you bite that bullet, and write your library in C++ but carefully wrap its API to give it C-FFI-clean linkage (usually via a hybrid C / C++ project), this still introduces a layer of FFI runtime overhead

        There is no overhead here, it is not different from C.

        I don't know where all this comes from, but I doubt it comes from heavy experience with modern C++.

        • cyber1 4 hours ago ago

          No, it does. "The only two features in the language that do not follow the zero-overhead principle are runtime type identification and exceptions, and are why most compilers include a switch to turn them off." - https://en.cppreference.com/w/cpp/language/Zero-overhead_pri...

          • CyberDildonics 4 hours ago ago

            So saying 'it has a runtime' doesn't really make sense, it has a runtime if you want for two features that aren't necessary.

    • bryanlarsen 8 hours ago ago

      Redis started in 2009, and this library was started there. string_view didn't appear until C++17.

    • uecker 9 hours ago ago

      I switched to C because I could not stand the pain of using C++ anymore. I find C refreshingly simple.

      (Also, as a comment to other responses: C++ is not a superset of C, it is a fork from 95 with divergent language evolution since then).

    • MintPaw 9 hours ago ago

      There are real downsides to even #including C++ headers. And there are certainly downsides to introducing a templated string type. It's not hard to imagine why people would want another solution.

    • gkbrk 9 hours ago ago

      How am I supposed to use C++ strings and string_views in C?

    • hiccuphippo 9 hours ago ago

      Someone had to create C++ strings so C++ developers could use them. What's wrong with someone doing the same for C so C developers can use it?

    • spookie 9 hours ago ago

      I guess those stuck on MSVC might have this perception, but newer C standards have added plenty of niceties. Unsure if the claim that Cpp is safer is correct.

    • ethin 10 hours ago ago

      But if your in C...

    • jimbob45 9 hours ago ago

      Seems like everyone wants to believe they’re as skilled and hardcore as the kernel devs. In reality, I agree - C++ is basically a superset of C and the whole point of “you don’t pay for what you don’t use” is to be able to avoid ridiculous situations like these.

      • anitil 5 hours ago ago

        > "you don’t pay for what you don’t use"

        In my experience (mostly embedded development) including C++ in a C project adds a lot of build complexity and build time, whereas C99 or C89 is trivial to install in pretty much all situations