Hacker News new | past | comments | ask | show | jobs | submit login
PyThor- Python meets R (nipunbatra.github.io)
113 points by nipun_batra on Jan 7, 2016 | hide | past | favorite | 55 comments



Beaker Notebook provides a different model for working in Python and R. Instead of wrapping one language in the other, you get a notebook where cells can be either Python or R, fully native, and they can communicate with a shared object. The result is much simpler and more natural IMO: https://pub.beakernotebook.com/#/publications/56648fcc-2e8e-...

Beaker has a feature "autotranslation" that converts the data between R, Python, JavaScript, Julia, Scala, Clojure, and many other languages, completely automatically. Learn more at http://BeakerNotebook.com


Jupyter Notebooks (Formerly IPython Notebooks) http://jupyter.org/ is kernel agnostic and can work with all the other languages also. I use it for R and Python myself. (For data work I still prefer RStudio or Python's Rodeo)

Breaker uses the same jkernel as IPython and backend of Jupyter.


that's right, but Jupyter notebooks have just one language each, they don't facilitate polyglot programming.


Their are some "cell magics" for running different languages in individual blocks in a Jupyter notebook. I gave it a try in passing with javascript a while back and I remember there was a way to share variables through as well.

You can see the available magics here: https://ipython.org/ipython-doc/3/interactive/magics.html#ce...

Unfortunately it looks like there is none for R or Julia, but perhaps they are installed by those specific packages, or their is some other way I don't know of.


Cell magics are exactly what I was looking for. Much easier to do some data cleanup stuff in Python and move stats heavy stuff to R.


RMagic for R and Python in Jupyter https://www.youtube.com/watch?v=StX_F_kq_C0


Good video. One thing I am looking to do is call Python from an R notebook, but that doesn't seem to be happening with the magic functions.


Jupyter notebooks do support multiple languages. However, sharing objects among the languages is a whole another thing.


The sharing objects thing seems like it would be fraught with errors and mismatches. R has a fundamentally different structuring than Python. From factors to numerics, and vectors.


R's "everything is a vector" approach does introduce a wrinkle but not hard to work with once you expect it: https://pub.beakernotebook.com/#/publications/560b5722-f287-...

If you can find an error in our implementation I would be happy to hear about it (ideally file an issue on github).

I am not an expert in R and I don't expect autotranslation to cover every case, but I think what we have is useful, and I look forward to continuing to improve it based on community feedback.

Thanks, -Scott


I'll give this another look. Was hoping for somethings in the Jupyter notebook that didn't materialize.


Org-Mode has a similar layout for it's org-babel stuff in emacs. So glad to see something similar that's outside of the emacs ecosystem, would love to get an org-mode exporter / importer for beaker notebooks so I can collaborate with coworkers on higher order transformations.


Very interesting! Thanks for the comment! I do see the advantages of beaker. I also see the advantages other way around :)


I thought this was interesting but the resulting code was so ugly and awkward that I would refrain from using it. I would just prototype in R and convert to Python rather than using both in this sort of Frankenstein way.


Just as a point of interest: Why would you convert a R program into a Python program? I can't think of any benefit in terms of speed or reporting?


Speed and scale (and reporting too, I guess) are very important considerations for data analysis.


But I wouldn't see Python as being speedier or better at scale and certainly not reporting. What are your speed up and scale advantages in Python over R. (Seriously interested)

Going R to Clojure or Julia seems like a gain but Python.


Sorry for delay. I think it's just better interoperability with many other systems. R has very hacked on interoperability in my opinion but I am not an expert.


This assumes you're willing to convert someone's cutting edge statistical research package in R to Python just because the code is 'ugly and awkward'


I just can't wrap my head around R. I'd love it if there was an automatic translator from R to Python. Even though the Python code would be more cluttered, it would probably be more easy to read :)


R has a lot of choices. I find that the Hadley Universe of libraries of ggplot2, ggvis, dplyr, tidyr, stringr, readr and others with the piping of %>% to be the easiest code for reading and getting things done. R has been transformed in the last 5 years but still gets a bad rap which I don't feel it deserves.

R is really more a functional language which most people don't recognize and when they learn the standard core of R with Lisp inspired ideas it really throws them for a loop. I ended up learning Racket (after trying to teach myself Haskell 3 times) and I can say I now "get it," but using the new libraries has really made that point mute.

I love R and am excited to see all the support from Microsoft and other large companies jumping on board.

This does look like a decent way to move to Python though. If someone was to do that I hope they use Rodeo IDE.


You mention that R doesn't deserve the bad rap it had 5 years ago. Less than two years ago I learned the hard way that R reference counting had only three possible counter values. 0, 1, and 2+. So if you took a second reference to something then deleted it, that object's reference count would stay at 2+ forever. Then if you modified it, Copy On Write would kick in, even though there was only one reference living. This was the source of crippling inefficiency in production code. I asked about it back then and the reply was yeah, R does that, hopefully it won't soon. Broken Copy On Write was not acceptable to me in 2014.


I'll second that R is really awesome! Being a computer scientist, it opens up for me a lot of statistical packages.


R is the only real game in town for free software for scientific computing these days.

From a political perspective R was in the right place at the right time. It was a decent high level language that could handle matrix processing gracefully. Scipy/Numpy weren't ready for production yet. The others were Matlab, SAS, and Stata, all of which R makes look like APL.

I'm glad the field has a more healthy open source landscape with worthy competitors in the form of Julia and Numpy/Scipy.

R's biggest sin is failure to force people to use functional paradigms by providing juuuust enough imperative sugar to make average Joe programmer feel at home. R is a functional language, and that's the principle under which is should be taught.

That said, R also has a large number of main technical strengths:

1. Fast basic statistics within the REPL so you can test hunches quickly.

2. Cutting edge algorithms that often aren't implemented anywhere else.

3. Hugely strong engineering packages for civil, environmental, defense, aerospace and basically any IRL engineering field you can think of.

R comes free, with these advantages and many more for the average lab tech.


> R is the only real game in town for free software for scientific computing these days.

You have a very very narrow definition of scientific computing, excluding a huge part of the field, i.e. anything written in C, C++ or Fortran. And I've never seen people in aerospace or civil engineering use R, it seems to be mostly popular in statistics heavy, "softer" fields such as biology or economics.

Edit: to give some examples of what I mean: show me a (molecular dynamics|computational fluid dynamics|finite element method|Poisson solver|magnetohydrodynamics solver|electrodynamics solver|general relativity code|quantum many-body solver|lattice field theory code) written in R. I haven't seen any.


Just wanted to see a few. R has such a strong Fortran code base that I knew that they needed to be in R somewhere.

Molecular Dynamic - https://cran.r-project.org/web/packages/bio3d/index.html

computational fluid dynamics - http://search.r-project.org/library/rjacobi/html/xinterp.htm...

finite element method - https://cran.r-project.org/web/packages/RTriangle/RTriangle....

Poisson - https://cran.r-project.org/web/packages/isotone/isotone.pdf


None of those are actually simulation codes. The first and third are pre- and post-processing tools. The second is an interpolation tool. The fourth uses Poisson distributions, which is very different from solving the Poisson equation.


We just have a mismatch in the term "scientific programming" based on our perspectives. You seem like you're in the harder sciences in academia, while I'm in data science in industry.

I'll certainly cede the point that there's a great deal of important scientific code in many languages that can't be accessed from R.


>We just have a mismatch in the term "scientific programming" based on our perspectives.

Yes, exactly this.


The thing is, R interoperates very well with C, C++ and Fortran. So when someone who uses R needs to solve one of those problems, they'll generally just use C, C++ and Fortran, then you can call the function/program from R, get your results, chart/analyse them, etc...

And of course, R makes data exploration, statistics, and all those easier problems incredibly simple.


Don't get me wrong, I'm a frequent R user, and it is definitely useful for analysing at simulation results. My point was just that there is a lot of things outside of R's capabilities. Even for analysis there are areas where R is of no use, e.g. when plotting data from 3D turbulence simulations, like Q-criterion isosurfaces:

https://www.nas.nasa.gov/SC12/assets/images/content/Chaderji...


Yeah, for legacy or speed, you might need to drop down to lower libraries. But R makes that pretty easy.


> R's biggest sin is failure to force people to use functional paradigms by providing juuuust enough imperative sugar to make average Joe programmer feel at home. R is a functional language, and that's the principle under which is should be taught.

100% in agreement. That states what I have discovered after learning Functional programming.

The sad thing is people don't know that it is functional PAST first class functions. http://link.springer.com/chapter/10.1007%2F978-3-642-40447-4...


And lists + dataframes as first class citizens. Everything (including dataframes) is pretty much a list. Most R packages nowadays have dataframes in the centre of implementation.

Compare it to other languages, lots of code is needed to convert the data from one format to another, because there's no underlying data structure (Python is much better than the rest with regards to that, but still not as good as R).


"From a political perspective R was in the right place at the right time. It was a decent high level language that could handle matrix processing gracefully. Scipy/Numpy weren't ready for production yet. The others were Matlab, SAS, and Stata".

This is forgetting xlispstat, which was frankly the best of the lot. Free, functional (based on a subset of Common Lisp), and with dynamic graphics capabilities that R is now only beginning to match. But the problem I think was that it was maybe too early. In the 1990s, Lisp and functional programming seemed to be on the way out and object-orientation was the big thing (It's telling that the book on XlispStat is called "LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics", hyping the objects rather than the functional aspects). If it had come out now, with the current interest in FP/Lisp thanks to Clojure and Racket, it probably would have been more successful.


"Matlab, SAS, and Stata, all of which R makes look like APL" -- I am a little confused, what do you mean by that?

(and tbh, I would not call either SAS or Stata matrix programming particularly graceful -- it's pretty hard to beat Matlab at that though)


> R is really more a functional language which most people don't recognize and when they learn the standard core of R with Lisp inspired ideas it really throws them for a loop.

Just to clarify, R was initially a dialect of Scheme. On the other hand, with its frustrating silent type changes, you can see its S roots in the same place that gave us C and C++. That's quite a combination in terms of learning curve.


There is Lisp-stuff in R not coming from Scheme. Like object system inspired by CLOS, FEXPRs from very old Lisps, ...


R is all about data structures. Everything is built from vectors and lists. Arrays are vectors with a 'dimension' attribute. Data-frames are lists of vectors of the same length. And so on. And factors, which again are a kind of vector, are the primary tool for partitioning the data in groups, so you can have 'ragged' arrays. When you understand how all these work together you get the hang of R.


Almost all of what you said applies equally to Python when using NumPy, Pandas, and SciPy.

In R, a Factor is also the bizarre result you get if you load a flat file incorrectly. Lots of things in R proceed without stopping on errors, and you end up with weird data that isn't really usable but still lets your program continue.


Language wars again :) I think both languages have their own strengths. I come from a programming background and took to Python. However, I often come up in situations when there are R implementations to some advanced statistical routines and none exist for Python. I'm sure vice versa would also be true. So, this is an attempt from that angle :)


> Lots of things in R proceed without stopping on errors

I'm not sure what that means, because a default R installation will always stop on an error.

> a Factor is also the bizarre result you get if you load a flat file incorrectly

That's user error. No programming language can catch that - it can only detect errors that are defined as errors.


I am curious to know your opinion about this discussion, then: http://r.789695.n4.nabble.com/Stopping-all-code-execution-wh...

People using default R installations report that special commands are needed to make it stop on all errors. Are they delusional? What about the R experts who give them solutions, are they doling out placebos?


The poster in that thread was doing the equivalent of stepping line by line through a block of code manually, ignoring all of the errors and manually executing the next line anyway.


It's true, however NumPy, Pandas, and SciPy are external libraries. I think it's a slight disadvantage. In R all these fancy data structures are built into the language and are used everywhere in a natural way. Whereas Numpy feels a little bit like an appendage or like a language inside a language. That said I don't dislike Python/Numpy and I think it definitely has its uses.


> Lots of things in R proceed without stopping on errors

This alone would make me run away very fast from R.


The problem is it's not actually an error. The quoted bit is using "error" to mean "R did something I wish it hadn't."


R is still great. Just don't try and write anything production worthy in R.


the biggest concern I have heard from friends who work in stats and mathematics academia is that it is much, MUCH easier to publish a package in CRAN than to PyPi.

I think most users of Python are that... "users". But R is used by creators of algorithms who find it very easy to create and distribute code.

In fact a lot of them talk about how difficult it is to install packages in python, while any R script has "install.packages" right in it.

I'm willing to bet if Pandas/Numpy/Jupyter built a CRAN-PY and made the packaging system similar to R, the adoption would be much different.


I do agree that stats people love R. This is one of the reasons I wrote this article. So that I can leverage all the good work they've gone. It is way better to write a wrapper than to reimplement everything from scratch. I'm coming from academia where I have a lot of systems work. So, I tend to be Python first user. But, I do want to utilise all the packages that are exclusively available in R.

Packaging is improving in Python afaik. People like Anaconda!


its not packaging per se - conda and gem are brilliant, but its intended for software creators and not math acads. install.package may not be a brilliant piece of software, but its perfect for people who develop stats algorithms and just publish to cran.

This is the package submission page to CRAN - https://cran.r-project.org/submit.html

This is the equivalent for Pypi - https://pypi.python.org/pypi?%3Aaction=submit_form


Submitting a package is the least important part of distributing it. For CRAN you can use the submit form you link to, for Python it's `python setup.py sdist upload` and you're done. CRAN in fact has much stricter standards for packages than Python -- you need a vignette, you need tests etc. -- which is why new R packages are usually on GitHub long before they make it to CRAN, whereas PyPI is happy to accept alpha and beta releases without documentation and without tests.


thats actually interesting to know - it would be illuminating to find out what really is the mental roadblock there, because this is something I have heard from multiple acads.


I wonder if a fully automatic translator would ever be feasible. The current approach of handwriting the wrapper code for the packages I'm interested in seems to work well. I guess with a bit of community support, the recipe set can become richer.


it is probably easier to just wrap R functions as a subprocess in python. you would have to write some R code to do this which might involve writing to a flat file. i do this all the time in R wrapping up something like rgf and calling from within R. i have wrapped up python and called it from R using system. the same idea works for python as well. you could probably write some kind of general wrapper that would work for most of R's ML type functions.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: