450× Faster Joins with Index Condition Pushdown

(readyset.io)

105 points | by marceloaltmann 4 days ago ago

46 comments

myflash13 8 hours ago ago
Shouldn't the query planner catch things like this? Sounds like a performance bug if this happens in Postgres.
[-]
- HDThoreaun 6 hours ago ago
  Yea this is pretty fucking basic stuff. Any competent optimization engine should be doing this. "push down indexes as much as possible" is literally the first thing a query planner should be trying to do
  [-]
  - zmmmmm 2 hours ago ago
    I had to dig through to see the details of what database was really in play here, and sure enough, it's a wrapper around a key-value store (RocksDB). So while I'll confess I know little about RocksDB it does sound an awful lot like they threw out a mature relational database engine with built in optimization and now are in the process of paying the price for that by manually optimizing each query (against a key-value store no less, which probably fundamentally limits what optimizations can be done in any general way).
    Would be curious if any RocksDB knowledgeable people have a different analysis.
    [-]
    - hxtk an hour ago ago
      > against a key-value store no less, which probably fundamentally limits what optimizations can be done in any general way
      I would disagree with this assumption for two reasons: first, theoretically, a file system is a key value store, and basically all databases run on file systems, so it stands to reason that any optimization Postgres does can be achieved as an abstraction over a key-value store with a good API because Postgres already did.
      Second, less theoretically, this has already been done by CockroachDB, which stores data in Pebble in the current iteration and previously used RocksDB (pebble is CRDB’s Go rewrite of RocksDB) and TiDB, which stores its data in TiKV.
      A thin wrapper over a KV store will only be able to use optimizations provided by the KV store, but if your wrapper is thick enough to include abstractions like adding multiple tables or inserting values into multiple cells in multiple tables atomically, then you can build arbitrary indices into the abstraction.
      I wouldn’t tend to call a KV store a bad database engine because I don’t think of it as a database engine at all. It might technically be one under the academic definition of a database engine, but I mostly see it being used as a building block in a more complicated database engine.
  - ncruces 5 hours ago ago
    Yes.
    But here they are deciding between "pushdown o.status==shipped" and "pushdown u.email==address@", in parallel both, then join (which they already did) or first doing "u.email==address@" then pushing down "u.id==o.user_id" mostly.
    This is a judgment call. Their planner is pretty dumb to not know which one is better, but “push down as much as possible” doesn't cut it: you need to actually decide what to push down and why.
    [-]
    - HDThoreaun 5 hours ago ago
      No, it is not a judgement call. The query planner should be storing the distributions of the values in every index. This makes it obvious which pushdown to do here. Again, basic stuff. Youre right though not quite simple as "push down as much as possible", it is one step past that.
      [-]
      - Jupe an hour ago ago
        Agreed. Isn't this precisely why key statistics (table statistics) are maintained in many DB systems? Essentially, always "push down" the predicate with the worst statistics and always execute (early) the predicates with high selectivity.
        I'd be very surprised if virtually every RDBMS doesn't do this already.
      - jmalicki 5 hours ago ago
        Without storing the joint distribution of the values corresponding to the conditions that span multiple tables, it can be hard to know what's a win.
        [-]
        Seattle3503 5 minutes ago ago
        Postgres by default computes univariate stats for each column and uses those. If this is producing bad query plans, you can extend the statistics to be multivariate for select groups of columns manually. But to avoid combinatorially growth of stats related storage and work, you have to pick the columns by hand.
futevolei 3 hours ago ago
Do the db guys at your company help you optimize queries and table set up at all? Ours basically don’t at all. Their job is to maintain the db apparently and us devs are left to handle this and it seems wrong. I’ve been partitioning tables and creating indexes the past few weeks trying to speed up a view and running explain analyze and throwing the results in Gemini and my queries are still slow af. I had one sql class in college, it’s not my thing. Seems like if dbas would spend a few minutes with me asking about the data and what we are trying to do they could get this guys results relatively easily. Am I wrong?
[-]
- richardw 2 hours ago ago
  We didn’t use the DBA’s for this but my last few teams, we got good at DB’s, performance etc. DBA’s were too general and they kept the lights on, but for real performance you should get one or two people who know what they’re doing for your applications. Or learn. I took on juniors who are now fantastic.
  For the first decade I wanted nothing to do with DB’s aside from places to store data. One day I saw a few things that made a massive difference and then went wild on learning how to speed things up. It’s fantastic and because few devs know this stuff well, it becomes a superpower. You wouldn’t believe what you can squeeze out of modern SQL DB’s and hardware, without touching any kind of optimised solutions. Which I love too but that’s a different post.
  Maybe ask the DBA’s a few questions and see if that triggers any interest for you. Look at query plans and how many rows are processed for a query. How many columns. What is being locked. Can you remove locks when you’re just running a query and how much does that speed things up. There are queries for all sorts of metrics, eg which indexes are huge but never used. The DB can often suggest indexes, but don’t just use add the suggestions. Use them as a starting point to reason about your own. Try get down to low millisecond queries for really frequent stuff, because it’ll make them fast and means less time locking the DB, less RAM, less temp table storage.
  All my other skills have aged. Fundamental database knowledge lasts.
- code_runner an hour ago ago
  If you make it your thing and keep on being good at your other thing, you’re gonna be 90% more valuable than most of your coworkers.
  I totally lose respect for sr engineers who can’t write sql to find even simple answers to questions.
  It’s never bad to have another arrow in your quiver
- dgacmu an hour ago ago
  Performance engineering is a modestly specialized subdomain - I think that applies to databases just as it does to code.
- dinobones 3 hours ago ago
  I always see these fancy DB engines and data lake blog posts and I am curious… why?
  At every place I’ve worked at this is a solved problem: Hive+Spark, just keep everything sharded across a ton of machines.
  It’s cheaper to pay for a Hive cluster that does dumb queries than paying these expensive DB licenses, data engineers building arbitrary indices, etc… just throw compute at the problem, who cares. 1TB of RAM /flash is so cheap these days.
  Even working on the worlds “biggest platforms” a daily partition of user data is like 2TB.
  You’re telling me a F500 can’t buy a 5 machine/40TB cluster for like $40k and basically be set?
- ww520 an hour ago ago
  Nope. If they didn't actively work against us, we would thank the lucky stars.
jamesblonde 12 hours ago ago
We call these pushdown joins in rondb. They only support an equality condition for the index condition. Joins with index condition pushdown is a bit of a mouthful.
We also went from like 6 seconds to 50ms. Huge speedup.
Reference
https://docs.rondb.com/rondb_parallel_query/#pushdown-joins
[-]
- OutOfHere 4 hours ago ago
  rondb code: https://github.com/logicalclocks/rondb
ianks 12 hours ago ago
I love this type of practical optimization for DB queries. I’ve always liked how [rom-rb](https://rom-rb.org/learn/core/5.2/combines/) made the combine pattern easy to use when joins are slow. Nice to see this implemented at DB layer
flufluflufluffy 10 hours ago ago
I read their website landing page but it’s still kinda confusing — what exactly is readyset? It all sounds like it’s a cache you can set up in front of MySQL/postgres. But then this article is talking about implementing joins which is what the database itself would do, not a cache. But then the blurbs talk about it like it’s a “CDN for your database” that brings your data to the edge. What the heck is it?!
[-]
- aseipp 9 hours ago ago
  ReadySet is basically "incremental view maintenance" but applied to arbitrary SQL queries. It acts like a caching proxy for your database, but it simultaneously ingests the replication log from the system in order to see things happen. Then it uses that information to perform "incremental" updates of data it has cached, so that if you requery something, it is much faster.
  Naive example: let's say you had a query that was a table scan and it computed the average age of all users in the users table. If you insert a new row into the users table and then rerun the query, you'd expect another table scan, so it will grow over time. In a traditional setup, you might cache this query and only update it "every once in a while." Instead, ReadySet can decompose this query into an "incremental program", run it, cache the result -- and then when it sees the insert into the table it incrementally updates the underlying data cache. That means the second run would actually be fast, and the cost to update the cache is only proportional to the underlying change, not the size of the table.
  ReadySet is derived from research on Noira, whose paper is here: https://www.usenix.org/conference/osdi18/presentation/gjengs...
- Sesse__ 9 hours ago ago
  It seems to be some sort of read-only reimplementation of MySQL/Postgres that can ingest their replication streams and materialize views (for caching). Complete with a really primitive optimizer, if the article is to be believed.
bdcravens 12 hours ago ago
What database engine is this in? You reference your product, but I assume this is in MySQL/MariaDB?
https://dev.mysql.com/doc/refman/9.4/en/index-condition-push...
[-]
- Sesse__ 11 hours ago ago
  This isn't really the same as MySQL's ICP; it seems more like what MySQL would call a “ref” or “eq_ref” lookup, i.e. a simple lookup on an indexed value on the right side of a nested-loop join. It's bread and butter for basically any database optimizer.
  ICP in MySQL (which can be built on top of ref/eq_ref, but isn't part of the normal index lookup per se) is a fairly weird concept where the storage engine is told to evaluate certain predicates on its own, without returning the row to the optimizer. This is to a) reduce the number of round-trips (function calls) from the executor down into the storage engine, and b) because InnoDB's secondary indexes need an extra storage round-trip to return the row (secondary indexes don't point at the row you want, they contain the primary key and then you have to lookup the actual row from the PK), so if you can remove the row early, you can skip the main row lookup.
- LtdJorge 11 hours ago ago
  Seems like they are caching MySQL with their own layer built on RocksDB.
dangoodmanUT 10 hours ago ago
Maybe it's not obvious initially, but in retrospect, this handling of joins feels like the obvious way to handle it.
Push down filters to read the least data possible.
Or, know your data and be able to tell the query engine which kind of join strategy you would like (hash vs push down)
[-]
- SoftTalker 10 hours ago ago
  Decades ago we used to provide hints in queries based on "knowing the data" but modern optimizers have a lot better statistics on indexes, and the need to tell the query optimizer what to do should be rare.
  [-]
  - btilly 8 hours ago ago
    Yes, but the problem is that optimizers will sometimes change join conditions without warning in production.
    There is a real need to be able to take key queries and say, "don't change the way you run this query". Most databases offer this. Unfortunately PostgreSQL doesn't. There are ways to force the join (eg using a series of queries with explicit temporary tables), but all create overhead. And the result is that a PostgreSQL website will sometimes change a good query plan to a bad one, then have problems. Just because it is Tuesday.
    [-]
    - magicalhippo 6 hours ago ago
      > There is a real need to be able to take key queries and say, "don't change the way you run this query".
      We've hit this with MSSQL too. Suddenly production is down because for whatever reason MSSQL decided to forget its good plan and instead table scan, and then continue to reuse that cached table-scanning plan.
      For one specific query MSSQL likes to do this with at a certain customer we've so far just added the minutes since start of year as a dummy column, while we work on more pressing issues. Very blunt, yet it works.
foota 6 hours ago ago
How is this different from a nested loop join? Or is this just a different way of describing it?
vjerancrnjak 11 hours ago ago
Another example of row based dbs somehow being insanely slow compared to column based.
Just an endless sequence of misbehavior and we’re waving it off as rows work good for specific lookups but columns for aggregations, yet here it is all the other stuff that is unreasonably slow.
[-]
- tharkun__ 10 hours ago ago
  It's an example. But not of that.
  It's an example of old things being new again maybe. Or reinventing the wheel because the wheel wasn't known to them.
  Yes I know, nobody wants to pay that tax or make that guy richer, but databases like Oracle have had JPPD for a long time. It's just something the database does and the optimizer chooses whether to do it or not depending on whether it's the best thing to do or not.
  [-]
  - rotis 10 hours ago ago
    Exactly. This is a basic optimization technique and all the dinosaur era databases should have that. But if you build a new database product you have to implement these techniques from scratch. There is no way you shortcut that. Reminds me about CockroachDB and them building a query optimizer[1]. They started with rule based one and then switched to cost based. Feature that older databases already had.
    [1] https://www.cockroachlabs.com/blog/building-cost-based-sql-o...
- sschnei8 10 hours ago ago
  I feel like this is more an example of:
  “We filtered first instead of reading an entire table from disk and performing a lookup”
  Where both OLAP and OLTP dbms would benefit.
  To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .
  [-]
  - vjerancrnjak 7 hours ago ago
    It's not harsh.
    Recent example, have 1GB of data in total across tables. Query needs 20 minutes. Obvious quadratic/cubic-or-even-worse behavior.
    I disable nested loop join and it's 4 seconds. Still slow, but don't want to spend time figuring out why it's slower than reading 1GB of data and pipelining the computation so that it's just 1 second, or even faster given the beefy NVME where files are stored (ignoring that I actually have good indices and the surface area of the query is probably 10MB and not 1GB).
    How can the strategy be slower than downloading 1GB of data and gluing it together in Python?
    Something is just off with the level of abstraction, query planner relying on weird stats. The whole system, outside of its transactional guarantees, just sucks.
    Another example where materializing CTE reduces exec time from 2 seconds to 50ms, because then you somehow hint to the query planner that result of that CTE is small.
    So even PostgreSQL is filled with these endless riddles in misbehavior, even though PhDs boast about who knows what in the query optimizer and will make an effort to belittle my criticism by repeating the "1 row" vs "agg all rows" as if I'm in elementary school and don't know how to use both OLTP or OLAP systems.
    Unlike column dbs where I know it's some nice fused group-by/map/reduce behavior where I avoid joins like plague and there's no query planner, stats maintenance, indices, or other mumbo-jumbo that does not do anything at all most of the time.
    Most of my workloads are extremely tiny and I am familiar with how to structure schemas for OLTP and OLAP and I just dislike how most relational databases work.
    [-]
    - atombender 4 hours ago ago
      I think part of the problem is that the people working on Postgres for the most part aren't PhDs, and Postgres isn't very state of the art.
      Postgres implements the ancient Volcano model from the 1980s, but there's been a ton of query optimization research since then, especially from the database groups at TUM Munich, University of Washington, and Carnegie Mellon. Systems like HyPer and Umbra (both at TUM) are state of the art query planners that would eat Postgres' lunch. Lots of work on making planners smarter about rearranging joins to be more optimal, improving cache locality and buffer management, and so on.
      Unfortunately, changing an old Volcano planner and applying newer techniques would probably be a huge endeavor.
    - thr0w 5 hours ago ago
      > I disable nested loop join and it's 4 seconds.
      I feel your pain. I've been through all stages of grief with `enable_nestloop`. I've arrived at acceptance. Sometimes you just need to redo your query approach. Usually by the time I get the planner to behave, I've ended up with something that's expressed more simply to boot.
marceloaltmann 4 days ago ago
Straddled joins were still a bottleneck in Readyset even after switching to hash joins. By integrating Index Condition Pushdown into the execution path, we eliminated the inefficiency and achieved up to 450× speedups.
[-]
- LtdJorge 11 hours ago ago
  Why downvote?
  [-]
  - airstrike 10 hours ago ago
    Reads like an ad written by an LLM, is my guess.
    It could just be that they translated from their original language to English and got that as a byproduct. Many such cases.
    [-]
    - Sesse__ 9 hours ago ago
      It also does not add anything interesting to the discussion. Like, why add a bland summary of the article?
      [-]
      - magicalhippo 6 hours ago ago
        So you don't have to read the article to figure out if you want to read the article?
        I for one appreciate such comments, given the guidelines to avoid submission summaries.
    - nchmy 7 hours ago ago
      its literally the author of the article.
    - OutOfHere 4 hours ago ago
      It is completely disingenuous and unfair to claim that something, especially a small blurb, is written by an LLM. And so what if it actually was written by an LLM. If you want to criticize something, do so on the merits or demerits of the points in it. You don't get a free pass by claiming it's LLM output, irrespective of whether it is or not.
      [-]
      - airstrike 3 hours ago ago
        I'm puzzled by this reply. It's perfectly fine for me to hypothesize on the reason for downvotes in response to someone else asking why it has been downvoted.
        You're free to opine on the reason for downvotes too. This metacomment, however, is more noise than signal.
        [-]
        OutOfHere 3 hours ago ago
        What you had claimed is not even a potential reason in the universe of reasons. It is a demonstration of bias, an excuse to refrain from reason.
        One line summaries of comprehensible articles can get downvoted because they don't add value beyond what's already very clear from the article.
        [-]
        airstrike 2 hours ago ago
        it is objectively a potential reason in the universe of reasons, but you're 100% free to believe whatever you want, even if it's wrong
        and the fact that multiple people upvoted my comment at a minimum suggests others also believe it to be a possible explanation
        i have no idea why you've chosen this particular hill to die on, when neither of us stands to profit from this protracted exchange