I'm excited about SparkR, even though R is shunned in the field of big data. Bet...

threeseed · on June 12, 2015

R is absolutely not shunned in big data. It is very popular.

There is a reason Microsoft acquired Revolution Analytics.

eranation · on June 11, 2015

R on Spark is great, but the biggest issue in my view is R's runtime licensing, isn't it GPL? Am I worried for nothing?

zmmmmm · on June 11, 2015

I too have been mystified by R's licensing. I actually don't see how anyone can ship a commercial product using R in its current form. At very least you're in a legal gray area, at worst you are involuntarily open sourcing your product. Not that there's anything wrong with open sourcing a product, but I think there's an enormous potential issue that could foul up a lot of people down the track. The best discussion I have seen about this pretty much ends up with uninformed speculation. For now, I take the policy of "explore and prototype in R, build the real system in something else". Fortunately the flaws and limitations of R as a language make this a sensible choice for a host of other reasons as well.

mwexler · on June 11, 2015

Not sure that R is _shunned_ in big data, as much as there are better solutions once you get to a certain level of big.

IndianAstronaut · on June 11, 2015

It will be interesting to see how all the R libraries play with Spark. There are bound to be some hiccups there.

minimaxir · on June 11, 2015

My interpretation is that it will convert DataFrames to normal data.frames when necessary. Unfortunately, this removes the performance efficiency of Spark.

Since currently SparkR only supports aggregation, it limits the usability of SparkR slightly. Future versions will apparently have MLib support which should alleviate that.