[Feature Request, Nodejs 10.5+] Execute Workers inside worker threads #253

edy · 2018-06-26T07:28:30Z

Nodejs 10.5 has a new experimental feature: worker threads. It would be cool if node-resque would run its jobs inside worker threads.

one huge benefit I can think of is that you could kill stucked jobs.

evantahler · 2018-07-16T23:41:36Z

Closing for now... until that API becomes a little more stable...

naz · 2020-11-12T00:21:08Z

@evantahler worker threads are stable as of Node v12 and can be polyfilled for older versions of node using a lib like https://github.com/chjj/bthreads.

evantahler · 2020-11-12T00:42:34Z

@naz cool!

Can you share some of the benefits you'd like to see with a threads implementation? Of course, moving CPU-bound jobs to another thread is a good idea. I'm a little worried about the need to really re-instantiate the whole process to get a worker (const worker = new Worker(__filename); from https://nodejs.org/api/worker_threads.html). Do you know of any good resources talking about sharing memory or other resources between workers and the main thread?

Either way, I think the place to try this out would be inside the multiWorker - with each worker (node-resque) being a new worker (node.js). I see some grammar issues in our future!

glensc · 2020-11-12T09:12:10Z

perhaps re-open this issue to keep it visible

naz · 2020-11-16T00:54:12Z

Hey @evantahler! For now, my main use-case has been around offloading CPU-intensive work out from main thread/even loop. The worker instance creation cost is a real concern to which I haven't found a good approach just yet. The best way to decrease the cost is using thread pool technique - example implementation/documented available in node docs.

To be completely clear, I am not actively using node-resque. For my use-case all queuing/scheduling has to be done in memory. I am experimenting with bree at the moment and it uses bthreads under the hood to polyfill worker threads. bthreads has a worker pool implemented (haven't looked under the hood yet) and from the looks of it it's main purpose is parallelization instead of worker creation cost saving.

I was researching node-resque's codebase to see how/why things are done a certain way 😅 Didn't see worker thread utilized here and though pinging would spark up a conversation. Would be happy to use this issue as discussion ground for best approaches and knowledge sharing in the context of background job processing!

evantahler · 2020-11-16T22:12:52Z

@naz yeah, let's chat!

My world-view is roughly that these are the types of background task systems that can exist (from https://blog.evantahler.com/background-tasks-in-node-js-a-survey-with-redis-971d3575d9d2)

... and when I talk about background tasks, I generally mean those that are:

distributable across multiple processes / computers
idempotent (at least internally) meaning you can run the task with only the state information included within the task's params, and look up the rest from somewhere else, like a database, API, or file

So with that worldview, node-resque really zooms in on the use-case of an API deployed across multiple servers. I think in your case, you are working on what I called local messages above - one process or thread is in charge, and sends out work to other threads/processes. In the node-resque use-case, I'm curious what it looks like for each "worker" to "fork" (terribly imprecise terms) and reconnect to all the other resources it might need - persistent database connections, tmp file use, etc. It's certainly possible (Rails has been doing this for years... and if Ruby can do, we can ;) but what does the developer API look like to on('workerNewThread' => connectToPostgres) or similar?

naz · 2020-11-17T02:01:08Z

We are on the same page about the world-view and you are spot on the case I'm trying to solve right now. In the future there will be a need to have a hybrid solution where the core of processing foreground/parallel/local messages are all done by the same "job manager" with an option of giving the manager a way to have it's work queue persisted. In other words, the job manager will be able to change it's task strategy from local to remote depending on the environment, which is a story for completely different project :)

What I think this project might gain from using Worker (from worker_threads) or fork of a process (from child_process) is a utility aspect (communication is something to solve but doesn't have to be immediately imo). The utility of having separate worker thread or forked process would be "sandboxing" workers from the parent event loop allowing them to: fail or leak memory without crashing the parent process, introducing non blocking parallelism in case there are multiple CPU intensive jobs to be done, being able to terminate jobs that have been stuck.

With above in mind, don't think there should be much of the API change in node-resque's side apart from allowing to create a new(modified) type of Worker that "forks" into a thread or child process. Because of the idempotent nature of background tasks, worker definition should be ideally self-contained - should be able to connect to resources without any additional inputs except few parameters specific to a task (comes with an overhead of recreating all the connections).

Maybe I'm way off with this thinking, but hopefully it helps :)

evantahler · 2020-11-17T02:43:46Z

I think that makes a lot of sense, and is a good idea! I guess my concerns can all be met by making the use of worker_threads optional and opt-in... and default:false to be backwards-compatible.

Implementation Questions:
For your use-case, what would be more useful:

A pool of workers that boots up when you start your app and are already running (like multiWorker), and are ready to be passed jobs. Pro: They are up and running once don't have an exec cost. Con: they run one file forever and you can't change it.
Each job gets a special __filename argument and will new WorkerThread(filename) as it's first command. Pro: flexible. Con: Each job may actually be kind of slow to start up.

In either case, we would pass the name of the job and JSON.stringify the args over as messages.

I think we really need to be clear about the limitations and isolation for using worker_threads. For example, a really common resque job is for one task to enqueue another when it's done. If you run your job in a thread, you can't access worker.queue and all the related methods. I don't necessarily agree that each worker can be truly isolated (but yes, it should be idempotent). Consider this typical job:

jobs = [{
  sendEmail: async (userId) => {
    const user = User.findOne(userId)
    await emailThing(user).send()
    }
}]

This way of writing the job assume you have already connected your User model to the database, done something like await User.connect(), etc. In the thread, you would need to do all of that again as part of the job.

I'll try to get an example going soon!

evantahler · 2020-11-17T02:51:32Z

Lol - to test this out I decided to calculate Fibonacci numbers in background tasks while on laptop battery... that was not a smart idea.

naz · 2020-11-17T05:27:10Z

Just to keep some references around - breejs/bree#45, this is an issue in an alternative job manager lib. It will hopefully contain specific performance implications of running worker threads or forking processes (or might borrow data from here 😅).

For my current usecase have decided to stick with bree for now as it's much more lightweight and easier to adjust to current in-memory queuing needs. Will be lurking around here for sure!

evantahler closed this as completed Jul 16, 2018

evantahler reopened this Nov 12, 2020

evantahler added the wishlist label Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request, Nodejs 10.5+] Execute Workers inside worker threads #253

[Feature Request, Nodejs 10.5+] Execute Workers inside worker threads #253

edy commented Jun 26, 2018

evantahler commented Jul 16, 2018

naz commented Nov 12, 2020

evantahler commented Nov 12, 2020 •

edited

Loading

glensc commented Nov 12, 2020

naz commented Nov 16, 2020

evantahler commented Nov 16, 2020 •

edited

Loading

naz commented Nov 17, 2020

evantahler commented Nov 17, 2020

evantahler commented Nov 17, 2020

naz commented Nov 17, 2020

[Feature Request, Nodejs 10.5+] Execute Workers inside worker threads #253

[Feature Request, Nodejs 10.5+] Execute Workers inside worker threads #253

Comments

edy commented Jun 26, 2018

evantahler commented Jul 16, 2018

naz commented Nov 12, 2020

evantahler commented Nov 12, 2020 • edited Loading

glensc commented Nov 12, 2020

naz commented Nov 16, 2020

evantahler commented Nov 16, 2020 • edited Loading

naz commented Nov 17, 2020

evantahler commented Nov 17, 2020

evantahler commented Nov 17, 2020

naz commented Nov 17, 2020

evantahler commented Nov 12, 2020 •

edited

Loading

evantahler commented Nov 16, 2020 •

edited

Loading