Bypass PostgreSQL catalog overhead with direct partition hash calculations

(shayon.dev)

29 points | by shayonj 4 days ago ago

11 comments

pella 4 hours ago ago

Note: PostgreSQL 18 includes many optimisations related to partitions; because of this, the improvement may be smaller. ( https://www.postgresql.org/docs/18/release-18.html )

  -- "Improve the efficiency of planning queries accessing many partitions (Ashutosh Bapat, Yuya Watari, David Rowley)"

  ..."The actual performance increases here are highly dependent on the number
  of partitions and the query being planned.  Performance increases can be
  visible with as few as 8 partitions, but the speedup is marginal for
  such low numbers of partitions.  The speedups become much more visible
  with a few dozen to hundreds of partitions.  With some tested queries
  using 56 partitions, the planner was around 3x faster than before.  For
  use cases with thousands of partitions, these are likely to become
  significantly faster.  Some testing has shown planner speedups of 60x or
  more with 8192 partitions."
  https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d69d45a5a

sgarland an hour ago ago
You can do the same thing in MySQL, though its hash partition selection algorithm is far simpler: MOD(expr, N), where expr is your chosen expression on a given column, e.g. TO_DAYS(created_at), and N is the number of partitions you specified for the table.
So you can precompute the partitions in the app, and then explicitly specify the partitions in the query. Though there isn’t a ton of value in this for any large range of dates, since you’ll end up hitting all partitions anyway.
For something like a user id, it might make sense. If you’re using something alphanumeric as a user id, you can pass it through CRC32() first, or just use KEY partitioning on the column directly.
regecks 7 hours ago ago
I’d be interested to see a benchmark of actually selecting rows by manually specifying the partition vs not.
This benchmark seems to be pure computation of the hash value, which I don’t think is helpful to test the hypothesis. A lot can happen at actual query time that this benchmark does not account for.
[-]
- shayonj 11 minutes ago ago
  That's fair. I had an EXPLAIN ANALYZE which shows the time it takes for planning and execution. And sending the query directly to the table outperforms that.
  More to come soon.
ysleepy 6 hours ago ago
Without a comparison of letting postgres calculate the partition this is just useless.
And who in their right mind would calculate a hash using a static SQL query that isn't even using the pg catalog hashing routine but a reimplementation.
I'm baffled.
Elucalidavah 7 hours ago ago
Tangential: is "without requiring knowledge of data patterns" a frequently useful requirement? I.e. isn't knowledge of data patterns basically required for any performance optimization?
nopurpose 3 hours ago ago
This sounds really interesting. Query planner acquires lightweight lock on every table partition and its indexes. There can be only 16 fast locks, exceed that number and locking becomes significantly more expensive slowing down queries significantly. Workaround I have used so far was to use plan_cache_mode=force_generic which prevents re-planning queries on parameters change.
This approach might be a better option, but sadly app needs to be modified to make use of it.
optikalfire 2 hours ago ago
Is this correct?
-- Parent table CREATE TABLE events ( id bigint, user_id bigint, event_type integer, payload text, created_at timestamp ) PARTITION BY HASH (user_id);
-- First level: 16 partitions by user_id CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (modulus 16, remainder 0) PARTITION BY HASH (event_type);
shouldn't that be by user_id for the first 16 tables?
wengo314 7 hours ago ago
this looks like maintenance nightmare going forward, but i could be wrong.
If you are stuck on specific pg version for a while, maybe it's worth it.
[-]
- nopurpose 3 hours ago ago
  hash won't change on PG version upgrade, because that would cause a massive reshuffling by pg_upgrade: something PG managed to avoid so far.
itsthecourier 6 hours ago ago
nice to see ruby getting ahead