Hacker News new | past | comments | ask | show | jobs | submit login

Not a lot of people realize that Pandas was inspired by R, and in particular the Tidyverse model of handling rectangular data frames, created originally by Hadley Wickham. These days R is primarily used by data scientists in academia and certain niche industries like pharma, but its impact goes way beyond its core user base.



> and in particular the Tidyverse model

This isn't really true.

R took data frames from S, which was using the concept at least as early as 1991.

Pandas itself predates most if not all of the tidyverse. Pandas original release occurred in 2008, whereas the first release of dplyr (one of the original packages of the tidyverse), didn't come until 2014.


It's definitely true that R's base data frame (a rectangular structure with columns of mixed types, which R in turn inherited from S) was the inspiration for Pandas. The concept of verbs operating on those structures IMO was inspired by plyr (the antecedent to dplyr, first published in 2008, which introduced composability for those verbs). data.table was also an inspiration, as another commenter points out.


> The concept of verbs operating on those structures IMO was inspired by plyr

Was it?

(I have no idea. But R already had verbs operating on data frames.)


As nonfamous puts it, the concept of 'a rectangular structure with columns of mixed types' as a programming language concept goes back at least to SAS (1976). Probably older.

In terms of that organization for persistent storage it certainly goes back to the earliest computers and even pre-computer punch card sorting systems.


> R took data frames from S, which was using the concept at least as early as 1991.

And, according to its author, Pandas took data frames from R - where data frames had been present from at least as early as 1997. (That part at least was true.)


But were inherited from S, which was first released in 1973.


S was first released in 1973 but data frames were added almost two decades later, as mentioned in another comment. They were reimplemeted -a few years later- in R and served as inspiration -a decade later- for pandas.


Yeah, I know. Apologies I was on mobile, and didn't want to look up which S Book data.frames were in.


In fact I seem to recall that the Tidyverse paper references Pandas.


R also has the data.table package, possibly the fastest dataframe package in any language.


Wonder if it's faster than Julia's DataFrames.jl


Polars is apparently faster, though I haven't tested it myself.


The original Polars launch blog was proud of coming a distance second to data.table but I see the latest benchmarks against the python port of data.table give Polars a tiny edge.


Both polars and data.table should be proud!

Fast software only ever helps.


> Not a lot of people realize that Pandas was inspired by R, and in particular the Tidyverse model of handling rectangular data frames, created originally by Hadley Wickham.

Can you give a citation from this, preferably from Wes himself?


Dplyr API design is leages ahead of Pandas though. In part this is because Python doesn't have first-class Symbols or lazy evaluation, but also because the dplyr authors have a better feel for ergonomics.

Sadly this 2.0 release makes clear that Pandas is unlikely to evolve its API much further.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: