Hacker News new | past | comments | ask | show | jobs | submit login

This feels a little like using a hammer for cutting down a tree. You could do it, but there really are better tools for that particular job.



I think there are perfect use cases for this style of code. For instance, I sometimes find myself writing code something like this:

    some_value <- compute_some_value(dataset1)
    other_value <- compute_other_value(dataset2)
    final_answer <- some_value + other_value
where compute_some_value() and compute_other_value() are independent and both of them take a long time to run, so they would benefit from running in parallel. However, actually running them in parallel is tricky, because most parallel interfaces in R are modelled after lapply, running a single function on multiple elements of a list, and this doesn't fit that mold. You could parallelize it manually using primitives such as parallel::mcparallel, and delayedAssign, but you don't get error handling/propagation, and your code gets super messy with the implementation details of your parallelization strategy. And if you do parallelize it and then someone else calls your code in parallel, now you get too many parallel processes and risk running of memory and ending up in swapping hell.

The bottom line is that code such as the above generally just doesn't get parallelized, because the only way of doing so (as far as I know) requires pointing several guns at your foot. So this package looks very interesting and useful to me, and I also think it provides a good set of primitives with which to implement yet another "multi-backend parallel lapply" package with advantages over the others, such as doing its best to ensure consistent behavior across the different "backends".

(Edit: Also see jonchang's comment along similar lines.)



Thank you for explaining this; I was trying to see how this would be useful. Could this be used to do parallel data load (like read from csv and database at the same time?)


yes, you could draw from multiple data sources at the same time.

  csv_data %<-% read.csv("myfile.csv")
  tsv_data %<-% read.tsv("otherfile.tsv")

  csv_data %>% left_join(tsv_data, by = "id_user")


Could you please explain in more detail?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: