Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used Snakemake my whole life, can someone experienced with both systems share whether jumping to nextflow is worth it?


I have pipelines written in both frameworks. Nextflow (despite the questionable selection of groovy as the language of choice) is more powerful and enables greater flexibility in terms of information flow.

For example, snakemake makes it very difficult if not impossible to create pipelines that deviate from a DAG architecture. In cases where you need loops, conditionals and so on, Nextflow is a better option.

One thing that I didn't like about nextflow is that all processes can either run under apptainer or docker, you can mix and match docker/apptainer like you do in snakemake rules.


Can you describe a scenario that would be impossible to code for in a snakemake paradigm? For example at least with conditionals I imagine you could bake some flags into the output filename and have different jobs parse that. I’m not sure exactly what you mean by loop but if its iterating over something that can probably be handled with the expand or lambda functions.


Here is a scenario which is relatively trivial in Nextflow and difficult to write in snakemake:

1. A process that "generates" protein sequences

2. A collection of processes that perform computationally intensive downstream computations

3. A filter that decides, based on some calculation an a threshold whether the output from process (1) should move to process (2).

Furthermore, assume you'd like process (1) to continue generating new candidates continously and independently until N number of candidates pass the filter for downstream processing.

That's not something that you can do easily with snakemake since it generates the DAG before computation starts. Sure, you can create some hack or use checkpoints that forces snakemake to reevaluate the DAG and so on, and maybe --keep-going=true so that it won't end the other processes from failing, but with nextflow you just set a few channels as queues and connect them to processes, which is much easier.


Just make your N number of candidates check generate some empty file after N is reached and put that as input for the next job. For threshold example you can do the same thing or even bake the metric into a filename.


As I said, you can hack your way through snakemake to make it work probably using DAG reevaluation and tricks with filenames, but Nextflow allows it in a much more straightforward manner that's more easy to follow, understand and debug.


"you can mix and match"

you meant "CAN'T", right?


yep :)


I’ve used both. I would say nextflow is a more production-oriented tool. Check out seqera platform to see if any of the features there seem useful. It can also be useful to get out of the wildcards/files mindset for certain workflows. Nextflow chucks the results of a step into a hashed folder, so you don’t have to worry about unique output names.

That said, I do find snakemake easier to prototype with. And it also has plenty of production features (containers, cloud, etc). For many use cases, they’re functionally equivalent


NF Tower / Seqera would be the selling points. They offer a nice UX for managing pipelines and abstract over AWS.

Technically snakemake can do it all. But in practice NF seems to scale up a bit better.

That said, if you don’t need the UI for scientists, I’d stick to snakemake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: