Hacker News new | past | comments | ask | show | jobs | submit login

I'm excited about SparkR, even though R is shunned in the field of big data. Between that and dplyr (which inspired the SparkR syntax) for data manipulation and sanitation, it should be much easier to write sane, reproducible code and visualizations for big data analysis. (the Python/Scala tutorials for Spark gave me a headache)

SparkR appears to have strong integration into Rstudio, which is big news: http://blog.rstudio.org/2015/05/28/sparkr-preview-by-vincent...




R is absolutely not shunned in big data. It is very popular.

There is a reason Microsoft acquired Revolution Analytics.


R on Spark is great, but the biggest issue in my view is R's runtime licensing, isn't it GPL? Am I worried for nothing?


I too have been mystified by R's licensing. I actually don't see how anyone can ship a commercial product using R in its current form. At very least you're in a legal gray area, at worst you are involuntarily open sourcing your product. Not that there's anything wrong with open sourcing a product, but I think there's an enormous potential issue that could foul up a lot of people down the track. The best discussion I have seen about this pretty much ends up with uninformed speculation. For now, I take the policy of "explore and prototype in R, build the real system in something else". Fortunately the flaws and limitations of R as a language make this a sensible choice for a host of other reasons as well.


Not sure that R is _shunned_ in big data, as much as there are better solutions once you get to a certain level of big.


It will be interesting to see how all the R libraries play with Spark. There are bound to be some hiccups there.


My interpretation is that it will convert DataFrames to normal data.frames when necessary. Unfortunately, this removes the performance efficiency of Spark.

Since currently SparkR only supports aggregation, it limits the usability of SparkR slightly. Future versions will apparently have MLib support which should alleviate that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: