I wrote Sidekiq, which Oban is based on. Congratulations to Shannon and Parker on shipping this!
I had to make this same decision years ago: do I focus on Ruby or do I bring Sidekiq to other languages? What I realized is that I couldn't be an expert in every language, Sidekiq.js, Sidekiq.py, etc. I decided to go a different direction and built Faktory[0] instead, which flips the architecture and provides a central server which knows how to implement the queue lifecycle internally. The language-specific clients become much simpler and can be maintained by the open source community for each language, e.g. faktory-rs[1]. The drawback is that Faktory is not focused on any one community and it's hard for me to provide idiomatic examples in a given language.
It's a different direction but by focusing on a single community, you may have better outcomes, time will tell!
Thanks Mike! You are an inspiration. Parker and I have different strengths both in life and language. We're committed to what this interop brings to both Python and Elixir.
By “based on” I don’t mean a shared codebase or features but rather Parker and I exchanged emails a decade ago to discuss business models and open source funding. He initially copied my Sidekiq OSS + Sidekiq Pro business model, with my blessing.
This is absolutely true (except we went OSS + Web initially, Pro came later). You were an inspiration, always helpful in discussion, and definitely paved the way for this business model.
Maybe you didn’t intend it this way, but your comment comes across as an attempt to co-opt the discussion to pitch your own thing. This is generally looked down upon here.
Knowing Mike and his work over the years, that is not the case. He is a man of integrity who owns a cornerstone product in the Ruby world. He is specifically the type of person I want to hear from when folks release new software having to do with background jobs, since he has 15 years of experience building this exact thing.
It was an off-the-cuff comment and probably not worded ideally but the intent was to discuss how Oban is branching off into a new direction for their business based on language-specific products while I went a different direction with Faktory. Since I came to the exact same fork in the road in 2017, I thought it was relevant and an interesting topic on evolving software products.
> Oban allows you to insert and process jobs using only your database. You can insert the job to send a confirmation email in the same database transaction where you create the user. If one thing fails, everything is rolled back.
This is such a key feature. Lots of people will tell you that you shouldn't use a relational database as a worker queue, but they inevitably miss out on how important transactions are for this - it's really useful to be able to say "queue this work if the transaction commits, don't queue it if it fails".
Brandur Leach wrote a fantastic piece on this a few years ago: https://brandur.org/job-drain - describing how, even if you have a separate queue system, you should still feed it by logging queue tasks to a temporary database table that can be updated as part of those transactions.
The Oban folks have done amazing, well-engineered work for years now - it's really the only option for Elixir. That said, I'm very confused at locking the process pool behind a pro subscription - this is basic functionality given CPython's architecture, not a nice-to-have.
For $135/month on Oban Pro, they advertise:
All Open Source Features
Multi-Process Execution
Workflows
Global and Rate Limiting
Unique Jobs
Bulk Operations
Encrypted Source (30/90-day refresh)
1 Application
Dedicated Support
I'm going to toot my own horn here, because it's what I know, but take my 100% free Chancy for example - https://github.com/tktech/chancy. Out of the box the same workers can mix-and-match asyncio, processes, threads, and sub-interpreters. It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues). There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.
> It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues).
We may well move some of those things to the OSS version, depending on interest, usage, etc. It's much easier to make things free than the other way around. Some Pro only features in Elixir have moved to OSS previously, and as a result of this project some additional functionality will also be moved.
Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.
> There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.
There's a lot more of everything available in the Python ecosystem =)
> Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.
That's totally fair, and I can only speak from the sidelines. I haven't had a chance to review the architecture - would it possibly make sense to swap from async as a free feature to the process pool, and make async a pro feature? This would help with adoption from other OSS projects, if that's a goal, as the transition from Celery would then be moving from a process pool to a process pool (for most users). The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.
On the other hand, already released and as you said it's much harder to take a free feature and make it paid.
Thanks again for Oban - I used it for a project in Elixir and it was painless. Missing Oban was why I made Chancy in the first place.
> The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.
That's great advice. Wish we'd been in contact before =)
Looks like a nice API. We have used the similar pattern for years, but with sqlalchemy and the same kind of sql statement for getting the next available job. Think it’s easier to handle worker queues just with postgresql rather than some other queue system to keep supported and updated for security fixes etc.
Very, very different tools, though they cover similar areas.
Temporal - if you have strict workflow requirements, want _guarantees_ that things complete, and are willing to take on extra complexity to achieve that. If you're a bank or something, probably a great choice.
Oban - DB-backed worker queue, which processes tasks off-thread. It does not give you the guarantees that Temporal can because it has not abstracted every push/pull into a first-class citizen. While it offers some similar features with workflows, to multiple 9's of reliability you will be hardening that yourself (based on my experience with Celery+Sidekiq)
Based on my heavy experience with both, I'd be happy to have both available to me in a system I'm working on. At my current job we are forced to use Temporal for all background processing, which for small tasks is just a lot of boilerplate.
I'm just coming back to web/API development Python after 7-8 years working on distributed systems in Go. I just built a Django+Celery MVP given what I knew from 2017 but I see a lot of "hate" towards Celery online these days. What issues have you run into with Celery? Has it gotten less reliable? harder to work with?
Celery + RabbitMQ is hard to beat in the Python ecosystem for scaling. But the vast, vast majority of projects don't need anywhere that kind of scale and instead just want basic features out of the box - unique tasks, rate limiting, asyncio, future scheduling that doesn't cause massive problems (they're scheduled in-memory on workers), etc. These things are incredibly annoying to implement over top of Celery.
We don't hate Celery at all. It's just a bit harder to get it to do certain things and requires a bit more coding and understanding of celery than what we want to invest time and effort in.
Again, no hate towards Celery. It's not bad. We just want to see if there are better options out there.
I like celery but I started to try other things when I had projects doing work from languages in addition to python. Also I prefer the code work without having to think about queues as much as possible. In my case that was Argo workflows (not to be confused with Argo CD)
OSS Oban has a few limitations, which are automatically lifted in the Pro version:
Single-threaded asyncio execution - concurrent but not truly parallel, so CPU-bound jobs block the event loop.
This makes it not even worth trying. Celery's interface kind of sucks, but I'm used to it already, and I can get infinitely parallel expanding vertically and horizontally for as long as I can afford the resources.
I also don't particularly like ayncio, and if I'm using a job queue wouldn't expect to need it.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Ooof. I don't mind the OSS/pro feature gate for the most part, but I really don't love that "Pro version uses smarter heartbeats to track producer liveness."
There's a difference between QoL features and reliability functions; to me, at least, that means that I can't justify trying to adopt it in my OSS projects. It's too bad, too, because this looks otherwise fantastic.
With a typical Redis or RabbitMQ backed durable queue you’re not guaranteed to get the job back at all after an unexpected shutdown. That quote is also a little incorrect—producer liveness is tracked the same way, it’s purely how “orphaned” jobs are rescued that is different.
"jobs that are long-running might get rescued even if the producer is still alive" indicates otherwise. It suggests that jobs that are in progress may be double-scheduled. That's a feature that I think shouldn't be gated behind a monthly pro subscription; my unpaid OSS projects don't justify it.
Agreed. I try to avoid using anything that has this freemium model of opensource, but I let it slide for products that provide enterprise features at a cost.
This feels like core functionality is locked away, and the opensource part is nothing more than a shareware, or demo/learning version.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Thanks for sharing, interesting project! One thing that stood out to me is that some fairly core features are gated behind a Pro tier. For context, there are prior projects in this space that implement similar ideas fully in OSS, especially around Postgres-backed durable execution:
1. DBOS built durable workflows and queues on top of Postgres (disclaimer: I'm a co-founder of DBOS), with some recent discussions here: https://news.ycombinator.com/item?id=44840693
Overall, it's encouraging to see more people converging on a database-centric approach to durable workflows instead of external orchestrators. There's still a lot of open design space around determinism, recovery semantics, and DX. I'm happy to learn from others experimenting here.
There are other projects that implement the ideas in OSS, but that's the same in Elixir. Not that we necessarily invented DAGs/workflows, but our durable implementation on the Elixir side predates DBOS by several years. We've considered it an add-on to what Oban offers, rather than the entire product.
Having an entirely open source offering and selling support would be an absolute dream. Maybe we'll get there too.
That's fair, the idea itself isn't new. Workflows/durable execution have been around forever (same story in Elixir).
The differences are in the implementation and DX: the programming abstraction, how easy recovery/debugging is, and how it behaves once you're running a production cluster.
One thing that bit us early was versioning. In practice, you always end up with different workers running different code versions (rolling deploys, hotfixes, etc.). We spent a lot of time there and now support both workflow versioning and patching, so old executions can replay deterministically while still letting you evolve the code.
While this is a Cool Thing To See, I do wish things would go the other way—and have all the BI/ML/DS pipelines and workflows folks are building in Python and have them come to Elixir (and, as would follow, Elixir). I get where the momentum is, but having something functional, fault-tolerant, and concurrent underpinning work that’s naturally highly concurrent and error-prone feels like a _much_ more natural fit.
> Inaccurate rescues - jobs that are long-running might get rescued even if the producer is still alive. Pro version uses smarter heartbeats to track producer liveness.
So the non-paid version really can't be used for production unless you know for sure you'll have very short jobs?
You can have jobs that run as long as you like. The difference is purely in how quickly they are restored after a crash or a shutdown that doesn’t wait long enough.
I have fixed many broken systems that used redis for small tasks. It is much better to put the jobs in the database we already have. This makes the code easier to manage and we have fewer things to worry about. I hope more teams start doing this to save time.
Traditional DBs are a poor fit for high-throughput job systems in my experience. The transactions alone around fetching/updating jobs is non-trivial and can dwarf regular data activity in your system. Especially for monoliths which Python and Ruby apps by and large still are.
Personally I've migrated 3 apps _from_ DB-backed job queues _to_ Redis/other-backed systems with great success.
Transactions around fetching/updating aren't trivial, that's true. However, the work that you're doing _is_ regular activity because it's part of your application logic. That's data about the state of your overall system and it is extremely helpful for it to stay with the app (not to mention how nice it makes testing).
Regarding overall throughput, we've written about running one million jobs a minute [1] on a single queue, and there are numerous companies running hundreds of millions of jobs a day with oban/postgres.
The way that Oban for Elixir and GoodJob for Ruby leverage PostgreSQL allows for very high throughput. It's not something that easily ports to other DBs.
A combination of LISTEN/NOTIFY for instantaneous reactivity, letting you get away with just periodic polling, and FOR UPDATE...SKIP LOCKED making it efficient and safe for parallel workers to grab tasks without co-ordination. It's actually covered in the article near the bottom there.
In Rails at least,aside from being used for background processing, redis gives you more goodies. You can store temporary state for tasks that require coordination between multiple nodes without race conditions, cache things to take some load off your DB, etc.
Besides, DB has higher likehood of failing you if you reach certain throughputs
I don't know how I feel about free open source version and then a commercial version that locks features. Something inside me prevents me from even trying such software. Logically I'd say I support the model because open source needs to be sustainable and we need good quality developer tools and software but when it comes to adoption, I find myself reaching for purely open source projects. I think it has to do with features locked behind a paywall. I think I'd be far more open to trying out products where the commercial version offered some enterprise level features like compliance reports, FIPS support, professional support etc but didn't lock features.
For most of the history the main locked feature was just a premium web interface(there were a few more but that was the main draw) that's included in free now and I think the locked features are primarily around most specialised job ordering engines. Things that if you need free you almost certainly don't need. Oban has been very good about deciding what features to lock away.
(I've paid for it for years despite not needing any of the pro features)
Python dudes are in for a treat, Oban is one of the most beautiful elegant parts of working with Elixir/Phoenix. They have saved me so much heartache and tears over the years working with them.
The vast majority of tasks you use a job processing framework for are related to io bound side effects: sending emails, interacting with a database, making http calls, etc. Those are hardly impacted by the fact that it's a single thread. It works really well embedded in a small service.
You can also easily spawn as many processes running the cli as you like to get multi-core parallelism. It's just a smidge* little more overhead than the process pool backend in Pro.
I use celery when I need to launch thousands of similar jobs in a batch across any number of available machines, each running multiple processes with multiple threads.
I also use celery when I have a process a user kicked off by clicking a button and they're watching the progress bar in the gui. One process might have 50 tasks, or one really long task.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Oban is cool but I really like the idea of pgflow.dev, which is based on pgmq (rust) Postgres plugin doing the heavy lifting as it makes it language agnostic (all the important parts live in Postgres). I've started an Elixir adapter which really is just a DSL and poller, could do the same in Python, etc.
I wrote Sidekiq, which Oban is based on. Congratulations to Shannon and Parker on shipping this!
I had to make this same decision years ago: do I focus on Ruby or do I bring Sidekiq to other languages? What I realized is that I couldn't be an expert in every language, Sidekiq.js, Sidekiq.py, etc. I decided to go a different direction and built Faktory[0] instead, which flips the architecture and provides a central server which knows how to implement the queue lifecycle internally. The language-specific clients become much simpler and can be maintained by the open source community for each language, e.g. faktory-rs[1]. The drawback is that Faktory is not focused on any one community and it's hard for me to provide idiomatic examples in a given language.
It's a different direction but by focusing on a single community, you may have better outcomes, time will tell!
[0]: https://github.com/contribsys/faktory [1]: https://github.com/jonhoo/faktory-rs
Thanks Mike! You are an inspiration. Parker and I have different strengths both in life and language. We're committed to what this interop brings to both Python and Elixir.
Isn’t it more accurate to say that they are both based on Resque?
Resque was the main inspiration, Sidekiq still provides compatibility with some of its APIs to this day.
https://github.com/sidekiq/sidekiq/blob/ba8b8fc8d81ac8f57a55...
Sidekiq credits BackgrounDRb and Delayed::Job and Resque as inspiration here: https://www.mikeperham.com/2022/01/17/happy-10th-birthday-si...
The API is very close, but architecturally it's different.
Additionally, delayed_job came before resque.
"based on" is sorta a stretch here.
Sidekiq is pretty bare bones compared to what Oban supports with workflows, crons, partitioning, dependent jobs, failure handling, and so forth.
By “based on” I don’t mean a shared codebase or features but rather Parker and I exchanged emails a decade ago to discuss business models and open source funding. He initially copied my Sidekiq OSS + Sidekiq Pro business model, with my blessing.
This is absolutely true (except we went OSS + Web initially, Pro came later). You were an inspiration, always helpful in discussion, and definitely paved the way for this business model.
Thank you for the clarification!
You got the beer. We got the pen. ;)
Maybe you didn’t intend it this way, but your comment comes across as an attempt to co-opt the discussion to pitch your own thing. This is generally looked down upon here.
Knowing Mike and his work over the years, that is not the case. He is a man of integrity who owns a cornerstone product in the Ruby world. He is specifically the type of person I want to hear from when folks release new software having to do with background jobs, since he has 15 years of experience building this exact thing.
It was an off-the-cuff comment and probably not worded ideally but the intent was to discuss how Oban is branching off into a new direction for their business based on language-specific products while I went a different direction with Faktory. Since I came to the exact same fork in the road in 2017, I thought it was relevant and an interesting topic on evolving software products.
> Oban allows you to insert and process jobs using only your database. You can insert the job to send a confirmation email in the same database transaction where you create the user. If one thing fails, everything is rolled back.
This is such a key feature. Lots of people will tell you that you shouldn't use a relational database as a worker queue, but they inevitably miss out on how important transactions are for this - it's really useful to be able to say "queue this work if the transaction commits, don't queue it if it fails".
Brandur Leach wrote a fantastic piece on this a few years ago: https://brandur.org/job-drain - describing how, even if you have a separate queue system, you should still feed it by logging queue tasks to a temporary database table that can be updated as part of those transactions.
This is called the "transactional outbox pattern"!
Good name! Looks like SeatGeek use that naming convention here: https://chairnerd.seatgeek.com/transactional-outbox-pattern/
This looks like a good definition too: https://www.milanjovanovic.tech/blog/outbox-pattern-for-reli...
Excellent point. Never thought of transactions in this way.
The Oban folks have done amazing, well-engineered work for years now - it's really the only option for Elixir. That said, I'm very confused at locking the process pool behind a pro subscription - this is basic functionality given CPython's architecture, not a nice-to-have.
For $135/month on Oban Pro, they advertise:
I'm going to toot my own horn here, because it's what I know, but take my 100% free Chancy for example - https://github.com/tktech/chancy. Out of the box the same workers can mix-and-match asyncio, processes, threads, and sub-interpreters. It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues). There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.> It supports workflows, rate limiting, unique jobs, bulk operations, transactional enqueuing, etc. Why not move these things to the OSS version to be competitive with existing options, and focus on dedicated support and more traditional "enterprise" features, which absolutely are worth $135/month (the Oban devs provide world-class support for issues).
We may well move some of those things to the OSS version, depending on interest, usage, etc. It's much easier to make things free than the other way around. Some Pro only features in Elixir have moved to OSS previously, and as a result of this project some additional functionality will also be moved.
Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.
> There are many more options available in the Python ecosystem than Elixir, so you're competing against Temporal, Trigger, Prefect, Dagster, Airflow, etc etc.
There's a lot more of everything available in the Python ecosystem =)
> Support only options aren't going to cut it in our experience; but maybe that'll be different with Python.
That's totally fair, and I can only speak from the sidelines. I haven't had a chance to review the architecture - would it possibly make sense to swap from async as a free feature to the process pool, and make async a pro feature? This would help with adoption from other OSS projects, if that's a goal, as the transition from Celery would then be moving from a process pool to a process pool (for most users). The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.
On the other hand, already released and as you said it's much harder to take a free feature and make it paid.
Thanks again for Oban - I used it for a project in Elixir and it was painless. Missing Oban was why I made Chancy in the first place.
> The vast, vast majority of Python libraries are not async-friendly and most still rely on the GIL. On the other hand, Celery has absolutely no asyncio support at all, which sets the pro feature apart.
That's great advice. Wish we'd been in contact before =)
Looks like a nice API. We have used the similar pattern for years, but with sqlalchemy and the same kind of sql statement for getting the next available job. Think it’s easier to handle worker queues just with postgresql rather than some other queue system to keep supported and updated for security fixes etc.
This is something my company has been considering for a while. We've been using celery and it's not great. It gets the job done but it has its issue.
I've never heard of Oban until now and the one we've considered was Temporal but that feels so much more than what we need. I like how light Oban is.
Does anyone have experience with both and is able to give a quick comparison?
Thanks!
Very, very different tools, though they cover similar areas.
Temporal - if you have strict workflow requirements, want _guarantees_ that things complete, and are willing to take on extra complexity to achieve that. If you're a bank or something, probably a great choice.
Oban - DB-backed worker queue, which processes tasks off-thread. It does not give you the guarantees that Temporal can because it has not abstracted every push/pull into a first-class citizen. While it offers some similar features with workflows, to multiple 9's of reliability you will be hardening that yourself (based on my experience with Celery+Sidekiq)
Based on my heavy experience with both, I'd be happy to have both available to me in a system I'm working on. At my current job we are forced to use Temporal for all background processing, which for small tasks is just a lot of boilerplate.
I'm just coming back to web/API development Python after 7-8 years working on distributed systems in Go. I just built a Django+Celery MVP given what I knew from 2017 but I see a lot of "hate" towards Celery online these days. What issues have you run into with Celery? Has it gotten less reliable? harder to work with?
Celery + RabbitMQ is hard to beat in the Python ecosystem for scaling. But the vast, vast majority of projects don't need anywhere that kind of scale and instead just want basic features out of the box - unique tasks, rate limiting, asyncio, future scheduling that doesn't cause massive problems (they're scheduled in-memory on workers), etc. These things are incredibly annoying to implement over top of Celery.
Yeah that list right there. That's exactly it.
We don't hate Celery at all. It's just a bit harder to get it to do certain things and requires a bit more coding and understanding of celery than what we want to invest time and effort in.
Again, no hate towards Celery. It's not bad. We just want to see if there are better options out there.
I like celery but I started to try other things when I had projects doing work from languages in addition to python. Also I prefer the code work without having to think about queues as much as possible. In my case that was Argo workflows (not to be confused with Argo CD)
OSS Oban has a few limitations, which are automatically lifted in the Pro version:
Single-threaded asyncio execution - concurrent but not truly parallel, so CPU-bound jobs block the event loop.
This makes it not even worth trying. Celery's interface kind of sucks, but I'm used to it already, and I can get infinitely parallel expanding vertically and horizontally for as long as I can afford the resources.
I also don't particularly like ayncio, and if I'm using a job queue wouldn't expect to need it.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Ooof. I don't mind the OSS/pro feature gate for the most part, but I really don't love that "Pro version uses smarter heartbeats to track producer liveness."
There's a difference between QoL features and reliability functions; to me, at least, that means that I can't justify trying to adopt it in my OSS projects. It's too bad, too, because this looks otherwise fantastic.
With a typical Redis or RabbitMQ backed durable queue you’re not guaranteed to get the job back at all after an unexpected shutdown. That quote is also a little incorrect—producer liveness is tracked the same way, it’s purely how “orphaned” jobs are rescued that is different.
"jobs that are long-running might get rescued even if the producer is still alive" indicates otherwise. It suggests that jobs that are in progress may be double-scheduled. That's a feature that I think shouldn't be gated behind a monthly pro subscription; my unpaid OSS projects don't justify it.
Agreed. I try to avoid using anything that has this freemium model of opensource, but I let it slide for products that provide enterprise features at a cost.
This feels like core functionality is locked away, and the opensource part is nothing more than a shareware, or demo/learning version.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Thanks for sharing, interesting project! One thing that stood out to me is that some fairly core features are gated behind a Pro tier. For context, there are prior projects in this space that implement similar ideas fully in OSS, especially around Postgres-backed durable execution:
1. DBOS built durable workflows and queues on top of Postgres (disclaimer: I'm a co-founder of DBOS), with some recent discussions here: https://news.ycombinator.com/item?id=44840693
2. Absurd explores a related design as well: https://news.ycombinator.com/item?id=45797228
Overall, it's encouraging to see more people converging on a database-centric approach to durable workflows instead of external orchestrators. There's still a lot of open design space around determinism, recovery semantics, and DX. I'm happy to learn from others experimenting here.
There are other projects that implement the ideas in OSS, but that's the same in Elixir. Not that we necessarily invented DAGs/workflows, but our durable implementation on the Elixir side predates DBOS by several years. We've considered it an add-on to what Oban offers, rather than the entire product.
Having an entirely open source offering and selling support would be an absolute dream. Maybe we'll get there too.
That's fair, the idea itself isn't new. Workflows/durable execution have been around forever (same story in Elixir).
The differences are in the implementation and DX: the programming abstraction, how easy recovery/debugging is, and how it behaves once you're running a production cluster.
One thing that bit us early was versioning. In practice, you always end up with different workers running different code versions (rolling deploys, hotfixes, etc.). We spent a lot of time there and now support both workflow versioning and patching, so old executions can replay deterministically while still letting you evolve the code.
Curious how Oban handles versioning today?
While this is a Cool Thing To See, I do wish things would go the other way—and have all the BI/ML/DS pipelines and workflows folks are building in Python and have them come to Elixir (and, as would follow, Elixir). I get where the momentum is, but having something functional, fault-tolerant, and concurrent underpinning work that’s naturally highly concurrent and error-prone feels like a _much_ more natural fit.
> Inaccurate rescues - jobs that are long-running might get rescued even if the producer is still alive. Pro version uses smarter heartbeats to track producer liveness.
So the non-paid version really can't be used for production unless you know for sure you'll have very short jobs?
You can have jobs that run as long as you like. The difference is purely in how quickly they are restored after a crash or a shutdown that doesn’t wait long enough.
I have fixed many broken systems that used redis for small tasks. It is much better to put the jobs in the database we already have. This makes the code easier to manage and we have fewer things to worry about. I hope more teams start doing this to save time.
Traditional DBs are a poor fit for high-throughput job systems in my experience. The transactions alone around fetching/updating jobs is non-trivial and can dwarf regular data activity in your system. Especially for monoliths which Python and Ruby apps by and large still are.
Personally I've migrated 3 apps _from_ DB-backed job queues _to_ Redis/other-backed systems with great success.
Transactions around fetching/updating aren't trivial, that's true. However, the work that you're doing _is_ regular activity because it's part of your application logic. That's data about the state of your overall system and it is extremely helpful for it to stay with the app (not to mention how nice it makes testing).
Regarding overall throughput, we've written about running one million jobs a minute [1] on a single queue, and there are numerous companies running hundreds of millions of jobs a day with oban/postgres.
[1]: https://oban.pro/articles/one-million-jobs-a-minute-with-oba...
The way that Oban for Elixir and GoodJob for Ruby leverage PostgreSQL allows for very high throughput. It's not something that easily ports to other DBs.
Interesting. Any docs that explain what/how they do this?
A combination of LISTEN/NOTIFY for instantaneous reactivity, letting you get away with just periodic polling, and FOR UPDATE...SKIP LOCKED making it efficient and safe for parallel workers to grab tasks without co-ordination. It's actually covered in the article near the bottom there.
Thank you
Good Job is a strong attempt. I believe it's based around Advisory Locks though.
https://github.com/bensheldon/good_job
In Rails at least,aside from being used for background processing, redis gives you more goodies. You can store temporary state for tasks that require coordination between multiple nodes without race conditions, cache things to take some load off your DB, etc.
Besides, DB has higher likehood of failing you if you reach certain throughputs
Oban is incredible and this type of software will continue to grow in importance. Kudos!
Is there a web U/I to view jobs, statuses, queue length etc?
I don't know how I feel about free open source version and then a commercial version that locks features. Something inside me prevents me from even trying such software. Logically I'd say I support the model because open source needs to be sustainable and we need good quality developer tools and software but when it comes to adoption, I find myself reaching for purely open source projects. I think it has to do with features locked behind a paywall. I think I'd be far more open to trying out products where the commercial version offered some enterprise level features like compliance reports, FIPS support, professional support etc but didn't lock features.
For most of the history the main locked feature was just a premium web interface(there were a few more but that was the main draw) that's included in free now and I think the locked features are primarily around most specialised job ordering engines. Things that if you need free you almost certainly don't need. Oban has been very good about deciding what features to lock away.
(I've paid for it for years despite not needing any of the pro features)
How is this different than Celery and the like?
Python dudes are in for a treat, Oban is one of the most beautiful elegant parts of working with Elixir/Phoenix. They have saved me so much heartache and tears over the years working with them.
I can't imagine why you would want a job processing framework linked to a single thread, which make this seem like a paid-version-only product.
What does it have over Celery?
The vast majority of tasks you use a job processing framework for are related to io bound side effects: sending emails, interacting with a database, making http calls, etc. Those are hardly impacted by the fact that it's a single thread. It works really well embedded in a small service.
You can also easily spawn as many processes running the cli as you like to get multi-core parallelism. It's just a smidge* little more overhead than the process pool backend in Pro.
Also, not an expert on Celery.
I use celery when I need to launch thousands of similar jobs in a batch across any number of available machines, each running multiple processes with multiple threads.
I also use celery when I have a process a user kicked off by clicking a button and they're watching the progress bar in the gui. One process might have 50 tasks, or one really long task.
Edit: I looked into it a bit more, and it seems we can launch multiple worker nodes, which doesn't seem as bad as what I originally thought
Oban is cool but I really like the idea of pgflow.dev, which is based on pgmq (rust) Postgres plugin doing the heavy lifting as it makes it language agnostic (all the important parts live in Postgres). I've started an Elixir adapter which really is just a DSL and poller, could do the same in Python, etc.
https://github.com/agoodway/pgflow