Using R is basically a requirement for my work (biology) unless I felt like reim...

HenrikB · on July 14, 2016

I'm also working in biology (at the UCSF Cancer Center) and one of the reasons why the future package exists in the first place is that we needed a way to process a large number of microarrays, RNA / DNA sequencing samples and HiC data (and I'd like to use everything from the R prompt). We have a large compute cluster available for this (sorry @apathy but we're using TORQUE but hope to move to Slurm soon).

Now our R script for sequence alignment basically looks like:

  ## Use nested futures where first layer is resolved
  ## via the scheduler and the second using multiple
  ## cores / processes on each of the compute node.
  library("future.BatchJobs")
  plan(list(batchjobs_torque, multiprocess))

  fastq <- dir(pattern = "[.]fq$")
  bam <- listenv()
  for (ii in seq_along(fastq)) {
     fq <- fastq[ii]
     bam[[ii]] %<-% {
       bam_ii <- listenv()
       for (chr in 1:24) {
         bam_ii[[chr]] %<-% DNAseq::align(fq, chromosome = chr)
       }
       as.list(bam_ii)
     }
  }

The future.BatchJobs package (https://cran.r-project.org/package=future.BatchJobs), which enhanced the future package, uses the framework of the BatchJobs package as its backend.

apathy · on July 14, 2016

You ran into IPC process limits. The usual way around this is just to run the tasks across a bunch of nodes, using mclapply or openMP (if running C library calls) on each.

R is not beautiful but can be coerced into getting things done if need be. But any program running processes in user space will hit IPC limits to prevent fork bombs. Either you live with it or you write threaded (ugh) or openMP (yay) hooks (typically via Rcpp) to sidestep this.

The reason I asked about topology and machines is that I'm one of the people who pestered Ripley to include parallel support on Windows at all. My graduate adviser wanted to run one of my analyses on Windows and I wanted to run several million of them on the cluster. So I bitched until BDR fixed it. This does not usually work...

HenrikB · on July 14, 2016

Well done - and thanks for pushing for this (and for BDR to implement this and many many other things)! I didn't know this history. R users and developers have soo many to thank at the same time as lots of heroic work of a few.