Hacker News new | past | comments | ask | show | jobs | submit login

If Julia had Go-style concurrency primitives (channels and lightweight threads), and M:N thread multiplexing, it would be the perfect language to implement my upcoming data flow based scientific workflow language (http://github.com/samuell/scipipe).

Now I'm instead trapped with Go, which lacks a REPL, and leaves a lot to wish in terms of metaprogramming capabilities needed to create a nice programmatic API.

The lack of the mentioned features is my biggest concern with Julia.




Threading is already an experimental feature and Go-style concurrency will be a standard feature in the future. Well, technically it may be more like Cilk or TBB but it will be very similar to Go.


Cilk-style concurrency would be a really nice addition to Julia.


Interesting.


I implemented a test of a micro-benchmark posted on HN several weeks back. It took some extra tries but eventually the performance using the existing 'Task' feature was acceptable. Not Go level but decent and give the rate of development on Julia it'd probably be worthwhile to experiment.


That is interesting to know. But, so the Tasks (which is implemented with light-weight threads IIUC) will all run in the same os process/thread, unless you manually create new ones?


I don't have knowledge of the "Common Workflow Language" but I am not sure Go's concurrency is really that much of a selling point for composition. Particularly for composition of evaluation since I would imagine how Go does concurrency would bleed through into your implementation code (probably not what you want.. or maybe it is... or maybe it doesn't for you library?). That is I'm not sure Go is really good at composition compared to functional programming languages.

For example one could use a monadic style data structure such .NET's reactive Observable (which has an analog in many different languages) and allow you to compose a stream like definition independent of how it runs. You can then feed this definition (Observable) to an evaluator which could run it on Go using channels or it could run it on Go using a cluster of machines with ZeroMQ.

I think the selling point to your library is that it compiles to native code but my question is with out changing code can I switch to running it on a cluster instead of local (something I could do with a language with an abstract form of concurrency using monads or Observables or even just using the Actor model)? It looks like I can right?


The idea for cluster support has so far been to implement connectors for resource managers like SLURM, and basically keeping scipipe as an orchestrator.

That is, for multi-node jobs. As long as you can stay within the 16-32 or so cores on a typical HPC node (in our cluster at least), scipipe should be great for that. I think some means of simple resource management (to not overstress these 16-32 cores) is needed, but that can be done in a simple way by e.g. using a central go-routine that lends out "the right to use a cpu core" on demand.

Thanks for interesting feedback. I will think about this!


You don't need language specific primitives, when a language is feature rich enough, that those features can be done in a library.


Yep, but this adds complexity and fear of maintenance problems, depending on how "close to the core" the library is.


For example, I looked into implementing this with Python 3.5's coroutines and async/await syntax, but this seems to add an enourmous amount of complexity. For example you need specialized versions of many of the standard library methods, just to make them usable in the async setting.

In either case I couldn't get my head around how to implement this.

In Go, the implementation is conceptually extremely simple (although the code might not always be the most readable).


Have you tried dask? It already has dataflow programming built in for numpy arrays, out of core and custom structures.

http://dask.pydata.org/en/latest/


Interesting. Do you have a link to description of their "dataflow" implementation or API?

From what I can see in the docs I'm getting afraid Dask does the same mistake as so many other recent tools: Allow only task dependencies, while what is needed in general scientific workflows is data dependencies (connect outports and inports of process). I have explained this difference in a blog post earlier: http://bionics.it/posts/workflows-dataflow-not-task-deps

(UPDATE: in all fairness, they seem to be doing something in-between, a little like Snakemake, in that they allow to specify data inputs based on a naming scheme. What we want is a totally naming scheme independent declarative way of explicitly connecting one output to one input, as that is the most generic and pluggable way you could do it.)

If they allow true data dependencies though, that would be very interesting.


Very interesting I didn't realize the difference.

How about this? https://github.com/shashi/ComputeFramework.jl

But I suspect it has the same problem.

Edit: There is also this https://github.com/JuliaDB/DataStreams.jl


Okay, check this out: DataFlow programming for Julia https://github.com/MikeInnes/Flow.jl


Sounds like Haskell may be appropriate for your use case. It has green threads, a great REPL, and extremely powerful polymorphism (which provides a safe and clean alternative to e.g. macro/template-based metaprogramming, although Haskell has that too if you need it).


Interesting, didn't know about green threads in haskell (I'm not too familiar with it over all).


You may be interested to read this section on Julia's parallel computing facilities.

http://docs.julialang.org/en/release-0.4/manual/parallel-com...

Which to me as Limbo programmer, look like channels, even across machines.


Go is really great for "workflow," but I find it completely lacking for "scientific." Have you attempted this latter part of the project yet? I don't see many examples in your project that attempt scientific or numerical operations.


The thinking is to use external tools for the main scientific parts - hence the big focus on shell support.

This is common practice in bioinformatics already, because of the large plethora of tools which would be too much to rewrite for any particular language.

Then, Go will probably be OK for more mundane tasks such as data pre-processing, filtering, etc etc.

Otherwise, there are in fact some scientific Go libraries already, including BioGo [1] and GoChem [2].

[1] https://github.com/biogo/biogo

[2] http://gochem.org


What kind of API do you need that requires metaprogramming?


For example I'd be happy if I could generate structs dynamically, based on string input.

This would mean that we could automatically create true struct-based components with channel fields from the shell like syntax used in the examples in the README (like "echo > {o:foo}" to write to an out-port named "foo"), so that connecting an out-port to an in-port would go like:

Process2.InFoo = Process1.OutFoo

This is not possible with Go's reflection though, so right now, based on these shell like patterns, we can only populate maps (InPorts and OutPorts) of the process, such that the above code example becomes:

Process2.InPorts["foo"] = Process1.OutPorts["foo"]


> For example I'd be happy if I could generate structs dynamically, based on string input.

I don't understand why you'd want to do that. That sounds like an architecture you are excited about, not a problem you are trying to solve. Can you give me some context?


The reason is a practical one: Struct fields will show up in auto-completion. This is in our experience surprisingly important when doing iterative workflow development, to lower the amount of silly typo errors, which can waste a lot of cluster compute hours etc.


generating structs at runtime might land for go-1.7: https://go-review.googlesource.com/#/c/9251/4


That would be awesome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: