If Julia had Go-style concurrency primitives (channels and lightweight threads), and M:N thread multiplexing, it would be the perfect language to implement my upcoming data flow based scientific workflow language (http://github.com/samuell/scipipe).
Now I'm instead trapped with Go, which lacks a REPL, and leaves a lot to wish in terms of metaprogramming capabilities needed to create a nice programmatic API.
The lack of the mentioned features is my biggest concern with Julia.
Threading is already an experimental feature and Go-style concurrency will be a standard feature in the future. Well, technically it may be more like Cilk or TBB but it will be very similar to Go.
I implemented a test of a micro-benchmark posted on HN several weeks back. It took some extra tries but eventually the performance using the existing 'Task' feature was acceptable. Not Go level but decent and give the rate of development on Julia it'd probably be worthwhile to experiment.
That is interesting to know. But, so the Tasks (which is implemented with light-weight threads IIUC) will all run in the same os process/thread, unless you manually create new ones?
I don't have knowledge of the "Common Workflow Language" but I am not sure Go's concurrency is really that much of a selling point for composition. Particularly for composition of evaluation since I would imagine how Go does concurrency would bleed through into your implementation code (probably not what you want.. or maybe it is... or maybe it doesn't for you library?). That is I'm not sure Go is really good at composition compared to functional programming languages.
For example one could use a monadic style data structure such .NET's reactive Observable (which has an analog in many different languages) and allow you to compose a stream like definition independent of how it runs. You can then feed this definition (Observable) to an evaluator which could run it on Go using channels or it could run it on Go using a cluster of machines with ZeroMQ.
I think the selling point to your library is that it compiles to native code but my question is with out changing code can I switch to running it on a cluster instead of local (something I could do with a language with an abstract form of concurrency using monads or Observables or even just using the Actor model)? It looks like I can right?
The idea for cluster support has so far been to implement connectors for resource managers like SLURM, and basically keeping scipipe as an orchestrator.
That is, for multi-node jobs. As long as you can stay within the 16-32 or so cores on a typical HPC node (in our cluster at least), scipipe should be great for that. I think some means of simple resource management (to not overstress these 16-32 cores) is needed, but that can be done in a simple way by e.g. using a central go-routine that lends out "the right to use a cpu core" on demand.
Thanks for interesting feedback. I will think about this!
For example, I looked into implementing this with Python 3.5's coroutines and async/await syntax, but this seems to add an enourmous amount of complexity. For example you need specialized versions of many of the standard library methods, just to make them usable in the async setting.
In either case I couldn't get my head around how to implement this.
In Go, the implementation is conceptually extremely simple (although the code might not always be the most readable).
Interesting. Do you have a link to description of their "dataflow" implementation or API?
From what I can see in the docs I'm getting afraid Dask does the same mistake as so many other recent tools: Allow only task dependencies, while what is needed in general scientific workflows is data dependencies (connect outports and inports of process). I have explained this difference in a blog post earlier: http://bionics.it/posts/workflows-dataflow-not-task-deps
(UPDATE: in all fairness, they seem to be doing something in-between, a little like Snakemake, in that they allow to specify data inputs based on a naming scheme. What we want is a totally naming scheme independent declarative way of explicitly connecting one output to one input, as that is the most generic and pluggable way you could do it.)
If they allow true data dependencies though, that would be very interesting.
Sounds like Haskell may be appropriate for your use case. It has green threads, a great REPL, and extremely powerful polymorphism (which provides a safe and clean alternative to e.g. macro/template-based metaprogramming, although Haskell has that too if you need it).
Go is really great for "workflow," but I find it completely lacking for "scientific." Have you attempted this latter part of the project yet? I don't see many examples in your project that attempt scientific or numerical operations.
The thinking is to use external tools for the main scientific parts - hence the big focus on shell support.
This is common practice in bioinformatics already, because of the large plethora of tools which would be too much to rewrite for any particular language.
Then, Go will probably be OK for more mundane tasks such as data pre-processing, filtering, etc etc.
Otherwise, there are in fact some scientific Go libraries already, including BioGo [1] and GoChem [2].
For example I'd be happy if I could generate structs dynamically, based on string input.
This would mean that we could automatically create true struct-based components with channel fields from the shell like syntax used in the examples in the README (like "echo > {o:foo}" to write to an out-port named "foo"), so that connecting an out-port to an in-port would go like:
Process2.InFoo = Process1.OutFoo
This is not possible with Go's reflection though, so right now, based on these shell like patterns, we can only populate maps (InPorts and OutPorts) of the process, such that the above code example becomes:
> For example I'd be happy if I could generate structs dynamically, based on string input.
I don't understand why you'd want to do that. That sounds like an architecture you are excited about, not a problem you are trying to solve. Can you give me some context?
The reason is a practical one: Struct fields will show up in auto-completion. This is in our experience surprisingly important when doing iterative workflow development, to lower the amount of silly typo errors, which can waste a lot of cluster compute hours etc.
Now I'm instead trapped with Go, which lacks a REPL, and leaves a lot to wish in terms of metaprogramming capabilities needed to create a nice programmatic API.
The lack of the mentioned features is my biggest concern with Julia.