If Julia had Go-style concurrency primitives (channels and lightweight threads),...

StefanKarpinski · on March 12, 2016

Threading is already an experimental feature and Go-style concurrency will be a standard feature in the future. Well, technically it may be more like Cilk or TBB but it will be very similar to Go.

kkylin · on March 13, 2016

Cilk-style concurrency would be a really nice addition to Julia.

samuell · on March 13, 2016

Interesting.

elcritch · on March 12, 2016

I implemented a test of a micro-benchmark posted on HN several weeks back. It took some extra tries but eventually the performance using the existing 'Task' feature was acceptable. Not Go level but decent and give the rate of development on Julia it'd probably be worthwhile to experiment.

samuell · on March 15, 2016

That is interesting to know. But, so the Tasks (which is implemented with light-weight threads IIUC) will all run in the same os process/thread, unless you manually create new ones?

agentgt · on March 13, 2016

I don't have knowledge of the "Common Workflow Language" but I am not sure Go's concurrency is really that much of a selling point for composition. Particularly for composition of evaluation since I would imagine how Go does concurrency would bleed through into your implementation code (probably not what you want.. or maybe it is... or maybe it doesn't for you library?). That is I'm not sure Go is really good at composition compared to functional programming languages.

For example one could use a monadic style data structure such .NET's reactive Observable (which has an analog in many different languages) and allow you to compose a stream like definition independent of how it runs. You can then feed this definition (Observable) to an evaluator which could run it on Go using channels or it could run it on Go using a cluster of machines with ZeroMQ.

I think the selling point to your library is that it compiles to native code but my question is with out changing code can I switch to running it on a cluster instead of local (something I could do with a language with an abstract form of concurrency using monads or Observables or even just using the Actor model)? It looks like I can right?

samuell · on March 13, 2016

The idea for cluster support has so far been to implement connectors for resource managers like SLURM, and basically keeping scipipe as an orchestrator.

That is, for multi-node jobs. As long as you can stay within the 16-32 or so cores on a typical HPC node (in our cluster at least), scipipe should be great for that. I think some means of simple resource management (to not overstress these 16-32 cores) is needed, but that can be done in a simple way by e.g. using a central go-routine that lends out "the right to use a cpu core" on demand.

Thanks for interesting feedback. I will think about this!

pjmlp · on March 13, 2016

You don't need language specific primitives, when a language is feature rich enough, that those features can be done in a library.

samuell · on March 13, 2016

Yep, but this adds complexity and fear of maintenance problems, depending on how "close to the core" the library is.

samuell · on March 15, 2016

For example, I looked into implementing this with Python 3.5's coroutines and async/await syntax, but this seems to add an enourmous amount of complexity. For example you need specialized versions of many of the standard library methods, just to make them usable in the async setting.

In either case I couldn't get my head around how to implement this.

In Go, the implementation is conceptually extremely simple (although the code might not always be the most readable).

tanlermin · on March 15, 2016

Have you tried dask? It already has dataflow programming built in for numpy arrays, out of core and custom structures.

http://dask.pydata.org/en/latest/

samuell · on March 16, 2016

Interesting. Do you have a link to description of their "dataflow" implementation or API?

From what I can see in the docs I'm getting afraid Dask does the same mistake as so many other recent tools: Allow only task dependencies, while what is needed in general scientific workflows is data dependencies (connect outports and inports of process). I have explained this difference in a blog post earlier: http://bionics.it/posts/workflows-dataflow-not-task-deps

(UPDATE: in all fairness, they seem to be doing something in-between, a little like Snakemake, in that they allow to specify data inputs based on a naming scheme. What we want is a totally naming scheme independent declarative way of explicitly connecting one output to one input, as that is the most generic and pluggable way you could do it.)

If they allow true data dependencies though, that would be very interesting.

tanlermin · on March 16, 2016

Very interesting I didn't realize the difference.

How about this? https://github.com/shashi/ComputeFramework.jl

But I suspect it has the same problem.

Edit: There is also this https://github.com/JuliaDB/DataStreams.jl

tanlermin · on March 17, 2016

Okay, check this out: DataFlow programming for Julia https://github.com/MikeInnes/Flow.jl

wyager · on March 12, 2016

Sounds like Haskell may be appropriate for your use case. It has green threads, a great REPL, and extremely powerful polymorphism (which provides a safe and clean alternative to e.g. macro/template-based metaprogramming, although Haskell has that too if you need it).

samuell · on March 13, 2016

Interesting, didn't know about green threads in haskell (I'm not too familiar with it over all).

SixSigma · on March 13, 2016

You may be interested to read this section on Julia's parallel computing facilities.

http://docs.julialang.org/en/release-0.4/manual/parallel-com...

Which to me as Limbo programmer, look like channels, even across machines.

freyr · on March 13, 2016

Go is really great for "workflow," but I find it completely lacking for "scientific." Have you attempted this latter part of the project yet? I don't see many examples in your project that attempt scientific or numerical operations.

samuell · on March 13, 2016

The thinking is to use external tools for the main scientific parts - hence the big focus on shell support.

This is common practice in bioinformatics already, because of the large plethora of tools which would be too much to rewrite for any particular language.

Then, Go will probably be OK for more mundane tasks such as data pre-processing, filtering, etc etc.

Otherwise, there are in fact some scientific Go libraries already, including BioGo [1] and GoChem [2].

[1] https://github.com/biogo/biogo

[2] http://gochem.org

erikpukinskis · on March 13, 2016

What kind of API do you need that requires metaprogramming?

samuell · on March 13, 2016

For example I'd be happy if I could generate structs dynamically, based on string input.

This would mean that we could automatically create true struct-based components with channel fields from the shell like syntax used in the examples in the README (like "echo > {o:foo}" to write to an out-port named "foo"), so that connecting an out-port to an in-port would go like:

Process2.InFoo = Process1.OutFoo

This is not possible with Go's reflection though, so right now, based on these shell like patterns, we can only populate maps (InPorts and OutPorts) of the process, such that the above code example becomes:

Process2.InPorts["foo"] = Process1.OutPorts["foo"]

erikpukinskis · on March 14, 2016

> For example I'd be happy if I could generate structs dynamically, based on string input.

I don't understand why you'd want to do that. That sounds like an architecture you are excited about, not a problem you are trying to solve. Can you give me some context?

samuell · on March 15, 2016

The reason is a practical one: Struct fields will show up in auto-completion. This is in our experience surprisingly important when doing iterative workflow development, to lower the amount of silly typo errors, which can waste a lot of cluster compute hours etc.

sbinet · on March 14, 2016

generating structs at runtime might land for go-1.7: https://go-review.googlesource.com/#/c/9251/4

samuell · on March 15, 2016

That would be awesome.