Build durable workflows with Postgres

(dbos.dev)

65 points | by KraftyOne 4 hours ago ago

29 comments

cmdtab 2 hours ago ago
Recently moved some of the background jobs from graphile worker to DBOS. Really recommend for the simplicity. Took me half an hour.
I evaluated temporal, trigger, cloudflare workflows (highly not recommended), etc and this was the easiest to implement incrementally. Didn't need to change our infrastructure at all. Just plugged the worker where I had graphile worker.
The hosted service UX and frontend can use a lot of work though but it's not necessary for someone to use. OTEL support was there.
[-]
- diarrhea 2 hours ago ago
  Interesting!
  What made you opt for DBOS over Temporal?
  [-]
  - cmdtab an hour ago ago
    Temporal required re-architecting some stuff, their typescript sdk and sandbox is bit unintuitive to use so would have been an additional item to grok for the team, and additional infrastructure to maintain. There was a latency trade off too which in our case mattered.
    Didn't face any issue though. Temporal observability and UI was better than DBOS. Just harder to do incremental migration in an existing codebase.
- LudwigNagasena 2 hours ago ago
  What was the reason for the transition?
  [-]
  - cmdtab 2 hours ago ago
    Needed checkpoints in some of our jobs wrapping around the AI agent so we can reduce cost and increase reliability (as workflow will start from mid step as opposed to a complete restart).
    We already check pointed the agent but then figure it's better to have a generic abstraction for other stuff we do.
jumploops 28 minutes ago ago
I've been looking at migrating to Temporal, but this looks interesting.
For context, we have a simple (read: home-built) "durable" worker setup that uses BullMQ for scheduling/queueing, but all of the actual jobs are Postgres-based.
Due to the cron-nature of the many disparate jobs (bespoke AI-native workflows), we have workers that scale up/down basically on the hour, every hour.
Temporal is the obvious solution, but it will take some rearchitecting to get our jobs to fit their structure. We're also concerned with some of their limits (payload size, language restrictions, etc.).
Looking at DBOS, it's unclear from the docs how to scale the workers:
> DBOS is just a library for your program to import, so it can run with any Python/Node program.
In our ideal case, we can add DBOS to our main application for scheduling jobs, and then have a simple worker app that scales independently.
How "easy" would it be to migrate our current system to DBOS?
[-]
- KraftyOne 22 minutes ago ago
  I'd love to learn more about what you're building--just reach out at peter.kraft@dbos.dev.
  One option is that you have DBOS workflows that schedule and submit jobs to an external worker app. Another option is that your workers use DBOS queues (https://docs.dbos.dev/python/tutorials/queue-tutorial). I'd have to better understand your use case to figure out what would be the best fit.
rlili 2 hours ago ago
Some other lightweight solutions around:
https://github.com/iopsystems/durable
https://github.com/maxcountryman/underway
cpursley 3 hours ago ago
I've been using https://www.pgflow.dev for workflows which is built on pgmq and am really impressed so far. Most of the logic is in the database so I'm considering building an Elixir adapter DSL.
[-]
- mmcclure 17 minutes ago ago
  Just curious, if you’re already in Elixir and using Postgres, why not use Oban[1]? It’s my absolute favorite background job library, and the thing I often miss most when working in other ecosystems.
  [1] https://github.com/oban-bg/oban
- ishita_julep 3 hours ago ago
  what are you using the DSL for?
  [-]
  - cpursley 2 hours ago ago
    It’s used to generate the database migration that defines the flows. More syntax sugar than anything.
darkteflon an hour ago ago
Often wondered whether it would be possible / advisable to combine DBOS with, e.g., Dagster if you have complex data orchestration requirements. They seem to deal with orthogonal concerns but complement nicely. Is integration with orchestration frameworks something the DBOS team has any thoughts on?
[-]
- KraftyOne an hour ago ago
  Would love to learn more about what you're building--what problems or parts of your system would you solve with Dagster vs DBOS?
atombender 30 minutes ago ago
While DBOS looks like a nice system, I was really disappointed to learn that Conductor, which is the DBOS equivalent of the Temporal server, is not open source.
Without it, you get no centralized coordination of workflow recovery. On Kubernetes, for example, my understanding is that you will need to use a stateful set to assign stable executor IDs, which the Conductor doesn't need.
I suppose that's their business model, to provide a simplistic foundation where you have to pay money to get the grown up stuff.
alpb 3 hours ago ago
I've been following DBOS for a while and I think the model isn't too different than Azure Durable Functions (which uses Azure Queues/Tables under the covers to maintain state). https://learn.microsoft.com/en-us/azure/azure-functions/dura...
Perhaps the only difference is that Azure Durable Functions has more syntactic sugar in C# (instead of DBOS choice being Python) to preserve call results in the persistent storage? Where else do they differ? At the end, all of them seem to be doing what Temporal is doing (which has its own shortcomings and it's also possible to get it wrong if you call a function directly instead of invoking it via an Activity etc)?
[-]
- KraftyOne 3 hours ago ago
  Both do durable workflows with similar guarantees. The big difference is that DBOS is an open-source library you can add to your existing code and run anywhere, whereas Durable Functions is a cloud offering for orchestrating serverless functions on Azure.
  [-]
  - alpb 2 hours ago ago
    As far as I know, Azure Durable Functions doesn't have a server-side proprietary component and it's actually fully open source framework/clients as well. So it's actually not a cloud offering per-se. You can see the full implementations at:
    * https://github.com/Azure/durabletask
    * https://github.com/microsoft/durabletask-go
agambrahma an hour ago ago
Curious how this compares to Cloudflare, which is the other provider that is really going for simplified workflows
at0mic22 3 hours ago ago
Every few years someone discovers FOR UPDATE SKIP LOCKED and represents it. I remember it lasting for 15 years at least
[-]
- atombender 39 minutes ago ago
  The "someone" in this case happens to be Michael Stonebraker, the creator of Postgres and CTO of DBOS.
- qianli_cs 3 hours ago ago
  Yup, some features are timeless and deserve a re-intro every now and then. SKIP LOCKED is definitely one of them.
  [-]
  - skrtskrt 2 hours ago ago
    with a nice NOWAIT when appropriate
abtinf 3 hours ago ago
Why not just use Temporal?
[-]
- KraftyOne 3 hours ago ago
  We wanted to make workflows more lightweight--we're building a Postgres-backed library you can add to your existing application instead of an external orchestrator that requires you to rearchitect your system around it. This post goes into more detail: https://www.dbos.dev/blog/durable-execution-coding-compariso...
tonyhb 3 hours ago ago
Anything that guarantees exactly once is selling snake oil. Side effects happen inside any transaction, and only when it commits (checkpoints) are the side effects safe.
Want to send an email, but the app crashes before committing? Now you're at-least-once.
You can compress the window that causes at-least-once semantics, but it's always there. For this reason, this blog post oversells the capabilities of these types of systems as a whole. DBOS (and Inngest, see the disclaimer below) try to get as close to exactly once as possible, but the risk always exists, which is why you should always try to use idempotency in external API requests if they support it. Defense in layers.
Disclaimer: I built the original `step.run` APIs at https://www.inngest.com, which offers similar things on any platform... without being tied to DB transactions.
[-]
- KraftyOne 3 hours ago ago
  As the post says, the exactly-once guarantee is ONLY for steps performing database operations. For those, you actually can get an exactly-once guarantee by running the database operations in the same Postgres transaction as your durable checkpoint. That's a pretty cool benefit of building workflows on Postgres! Of course, if there are side effects outside the database, those happen at-least-once.
  [-]
  - tonyhb 3 hours ago ago
    You can totally leverage postgres transactions to give someone... postgres transactions!
    I just figured that the exactly once semantics were so worth discussing that any external side effects (which is what orchestration is for) aren't included in that, which is a big caveat.
- jedberg 3 hours ago ago
  > Anything that guarantees exactly once is selling snake oil.
  That's a pretty spicy take. I'll agree that exactly-once is hard, but it's not impossible. Obviously there are caveats, but the beauty of DBOS using Postgres as the method of coordination instead of the an external server (like Temporal or Inngest) is that the exactly-once guarantees of Postgres can carry over to the application. Especially so if you're using that same Postgres to store your application data.