Das Problem mit German Strings

(polarsignals.com)

44 points | by asubiotto 11 hours ago ago

16 comments

  • kazinator 5 hours ago ago

    > Because it is difficult to assume what the best encoding will be for any given workload, database systems should dynamically choose encodings based on storage and workload characteristics.

    It would be better just to take the storage requirement on the chin and not add a gratuitous variation in encoding which will bite you on the ass somehow (or someone else).

    As much as possible, pick one way of doing one thing. Your stuff already has thousands of things to do. Each time you do something in two or more ways, you add combinations between that and surrounding things being done in two or more ways.

  • thayne 7 hours ago ago

    So... why are they called Getman strings?

    • mathieuh 7 hours ago ago

      https://datafusion.apache.org/blog/2024/09/13/string-view-ge...

      > The concept of inlined strings with prefixes (called “German Strings” by Andy Pavlo, in homage to TUM, where the Umbra paper that describes them originated) has been used in many recent database systems (Velox, Polars, DuckDB, CedarDB, etc.) and was introduced to Arrow as a new StringViewArray[^3] type. Arrow’s original StringArray is very memory efficient but less effective for certain operations. StringViewArray accelerates string-intensive operations via prefix inlining and a more flexible and compact string representation.

      Seems to be nothing more than they were invented at a German university. I spent quite some time thinking it had something to do with German’s sometimes-SOV word order.

      • andai 5 hours ago ago

        Here is the paper in question:

        Umbra: A Disk-Based System with In-Memory Performance

        https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf

        Section 3.1 covers string handling.

        This article (also linked from tfa) explains German strings in more detail.

        https://cedardb.com/blog/german_strings

      • aleph_minus_one 6 hours ago ago

        > I spent quite some time thinking it had something to do with German’s sometimes-SOV word order.

        If you refer to subclauses in the German language: here the rule is rather "the finite verb is at the end of the subclause".

        • yorwba 3 hours ago ago

          It also applies to infitives and participles and the verb in nominalized noun-verb compounds. So the rule is closer to "the verb is at the end of its grammatical unit, except for the finite verb in a main clause, which appears in second position." https://en.wikipedia.org/wiki/V2_word_order

        • kaladin-jasnah 3 hours ago ago

          I think this is also called V2 word order.

          • aleph_minus_one 12 minutes ago ago

            V2 word order (finite verb comes second) is what is used in main clauses.

      • jandrewrogers 6 hours ago ago

        This general string format style has been invented many times over the decades. Unfortunately, we seem to need to relearn the tradeoffs each time.

    • on_the_train 5 hours ago ago

      They aren't. They're called German style strings. People just like to clickbait and prey on curiosity of techies.

  • thebharathpost 2 hours ago ago

    This is true

  • dekhn 11 hours ago ago

    did the hacker news title editor change the "mit" to "MIT"?

    • asubiotto 10 hours ago ago

      Seems like it. Changed it back!

      • dang 9 hours ago ago

        Oops, sorry.

        • Tadpole9181 8 hours ago ago

          Haha, is that automated or was someone trying to be helpful?

          • dang 6 hours ago ago

            It's automated. And of course it's usually right, but the wrong cases stand out like sore thumbs.