Easy cluster parallelization with ZeroMQ

gh02t · on May 27, 2015

Worth mentioning that the cluster tools in IPython implement basically the same system as he ends up with, plus lots of other functionality and it is using ZMQ as the backend as well. If you've got a small to medium sized task (I'd say less than 500 nodes is a good number) to run on your cluster ipcluster is pretty great. Even when I have a proper queuing system like SLURM or PBS in place, I still often use ipcluster on top.

http://ipython.org/ipython-doc/dev/parallel/

nickpsecurity · on May 27, 2015

Thanks for the link. That might come in handy if I do any more clustering.

andreabedini · on May 28, 2015

I'm curious on how you use ipcluster on top of pbs

gh02t · on May 28, 2015

If you mean the practicalities of how to use it, they describe it a bit here http://ipython.org/ipython-doc/dev/parallel/parallel_process... .

If you mean how have I used it specifically, I've used it to do a variety of things. Ranging from recreating what was described in the original post to using it as an ad hoc MPI replacement. I find it's handy for managing jobs that have some coupling (and so aren't easily suited to running via the features of the batch system), but aren't tightly coupled enough to warrant the suffering of having to write MPI-based code.

avargas · on May 27, 2015

ZeroMQ is amazing. A few years ago I built a prototype project for a client that basically was fail2ban but scalable. It monitored nginx logs and broadcasted some information, and workers did the rest. Most of the data was in memory and passed around, and communication was done via ZeroMQ. It was done this way so that we could split the heavy-load components off the server and into workers, and allow the server to simply do it's job: tail nginx logs and act upon ban requests from workers. It was amazing, sadly I never completed it and deployed it on a production environment but from initial tests, it outperformed fail2ban by a lot.

Merkur · on May 27, 2015

I had fun reading - thx. I found the solution of "requesting workers" interesting. My first reflex to the round robin argument was - a better load balancer but the requesting worker made that totally not necessary! I wounder what negative sites sutch architectures may have?

nickpsecurity · on May 27, 2015

Nice article. I hope to see a benchmark of it on a less parallel problem with low-latency network hardware. Maybe a comparison of that against a vanilla MPI or parallel framework. We'll get to see what its real overhead is.

gh02t · on May 27, 2015

It's a bit hard to compare to MPI, which is considerably lower level and is oriented towards tightly coupled compute clusters. MPI is more about passing around data structures and not so much about dispatch and execution of independent tasks, which is more of a distributed computing type endeavor. You can build something like this on top of MPI (and I did once as an experiment), but it is tough to compare them directly since they are designed on different assumptions.

nickpsecurity · on May 27, 2015

Fair point. I've been out of cluster computing for years so what do you think would be comparable?

dalke · on May 28, 2015

Somewhat tangential, but you may be interested in this essay on "HPC is dying and MPI is killing it" http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/ . There were many comments here on HN, at https://news.ycombinator.com/item?id=9335441 . The essay also has a couple of followups. It really is an excellent essay.

The author suggests GASNet, Charm++, and several other things that would be comparable to MPI.

nickpsecurity · on May 29, 2015

That's an excellent essay. I agree with about every point. I worked MPI over ten years ago. It seemed like an assembler language for cluster computing: something for raw efficiency or to hold us over until we get a real language. I was playing with Cilk and other parallel tools then. Much better but little adoption or optimization.

The examples given in Spark and Chapel show how much better it can be. Such methods deserve much more investment. The bad thing is that the DOE labs are clinging so hard to their existing tools when they're in the best position to invest in new ones via huge grants they have. They built supercomputing before anyone heard of the cloud. Their wisdom combined with today's datacenter engineers could result in anything from great leaps in efficiency to something revolutionary. Yet, they act elitist and value backward compatibility over everything else. Starting to look like those mainframe programmers squeezing every ounce of productivity they can out of COBOL.

I think the change will have to come from outside. A group that works both sides with funding to do new projects. They either (a) improve MPI usage for new workloads or (b) get the alternatives up to MPI performance for HPC workloads. Then, they open the tool up to all kinds of new projects in both HPC and cloud-centered research (esp cloud compute clusters). Maybe the two sides will accidentally start sharing with each other when contributing to the same projects. Not much hope past that.

dalke · on May 30, 2015

"They built supercomputing before anyone heard of the cloud."

Minor historical pontification on my side: The term 'cloud' is only the current term for an old idea. Before cloud there was 'grid'. Before that was the Amoeba OS and other distributed OSes. Long before that, a 1960s hope for Multics was that it would be:

> ... a utility like a power company or water company. This view is independent of the existence or non-existence of remote consoles. The implications of such a view are several. A utility must be dependable, more dependable than existing hardware of commercial general-purpose computers. A utility, by its nature, must provide service on demand, without advance notice in most cases. A utility must provide small amounts of service to small users, large amounts to large users, within very wide limits. ... http://www.multicians.org/fjcc3.html

That's shortly after the CDC 6600, and before the DOE really got into the supercomputing business.

In any case, I don't know the politics of supercomputing, and my experience in that field is even older than yours. When I look at current clouds systems, and try them out, I throw my hands up, because they don't handle the types of architectures I use in my research. I do compute-heavy work on medium-sized data, with algorithms that works best with stateful nodes in a boss/worker system. Luckily for me, I don't need low latency, so building something on top of ZeroMQ is good enough for what I want, and easy to do.

nickpsecurity · on May 30, 2015

Exciting to see someone that remembers all those good, old systems! I've been studying them in recent years to solve modern problems that... they already solved. Yeah, some of the techniques go way back. To be honest, I've been thinking of re-implementing a version of Globe toolkit with modern security engineering techniques to deal with our Internet apps issues.

What do you think of that? Worthwhile to leverage such an architecture?

gh02t · on May 30, 2015

> Exciting to see someone that remembers all those good, old systems! I've been studying them in recent years to solve modern problems that... they already solved

I saw a presentation from one of the VPs of Cray at ORNL. He spoke on the current wave of GPGPU and how similar it is to the old massively vectorized architectures in supercomputers (in particular Cray ones, since he was from Cray) from the late 70s through the early 90s. Apparently they've been able to leverage a lot of their old code in the Cray compilers for their newer GPGPU features.

nickpsecurity · on May 30, 2015

Bam! Another example of people applying lessons of the past to make the present so much easier. Glad for them. Unsurprising with Cray given they're quite innovative and led the way in SIMD processing.

dalke · on May 30, 2015

I don't have the experience to answer that, nor the specific interest to research the topic, sorry.

nickpsecurity · on May 30, 2015

All good.

FullyFunctional · on May 28, 2015

Great writeup. Perhaps slightly less clean, you could cut the messages in half by having the available report the result as well (empty in the initial case).

Is the worker idling between replying and receiving new work? In other words, does ZMQ enable overlapping, say always having two outstanding requests per worker?