As far a I know, the term "data frame" comes from R and its predecessors like S, where a data frame is the core data structure. Logically, it is indeed like an SQL table -- a column has a single type whereas a row has heterogeneous types.
AFAIK, all implementations are column-oriented, which admits certain kind of implementation and optimization. SQL databases are mostly row-oriented, probably since updating a row at a time is a common operation.
I would think of it as a table, but embedded in a programming language rather than a database (so you don't use SQL), with more operations, and which is very often used in a read-only fashion.
The syntax in R is nicer than SQL in my opinion. It's more algebraic and composable. Instead of "SELECT name, address FROM foo WHERE age > 30", you can write foo[foo$age > 30, c('name', 'address')].
> d=data.frame(a=c(1,2,3),b=c(4,5,6))
> e=list(a=c(1,2,3),b=c(4,5,6))
> class(d)
[1] "data.frame"
> class(e)
[1] "list"
> d[c(TRUE,FALSE),]
a b
1 1 4
3 3 6
> e[c(TRUE,FALSE),]
Error in e[c(TRUE, FALSE), ] : incorrect number of dimensions
They are represented similarly in R, but they are distinct data types. The data frame is the core data structure in the sense that many functions in R operate on data frames (but not lists of vectors).
1. You shouldn't use `=` for assignments in R but `<-`. `=` does late binding.
2. You shouldn't use `class()` here but `mode()` to check the actual underlying data structure.
> mode(d)
[1] "list"
> mode(e)
[1] "list"
3. The reason `[` works differently is because it a S3 method which invokes different functions for lists and data.frames -- that's why class(d) doesn't return "list". See `methods("[")`.
Same in pandas, although it's closer to a dictionary of column names to singly-typed vectors. Most simple columnar databases are structured that way as well (a more complicated design involves chunking the columns into pages).
AFAIK, all implementations are column-oriented, which admits certain kind of implementation and optimization. SQL databases are mostly row-oriented, probably since updating a row at a time is a common operation.
I would think of it as a table, but embedded in a programming language rather than a database (so you don't use SQL), with more operations, and which is very often used in a read-only fashion.
The syntax in R is nicer than SQL in my opinion. It's more algebraic and composable. Instead of "SELECT name, address FROM foo WHERE age > 30", you can write foo[foo$age > 30, c('name', 'address')].
Some links:
http://www.r-bloggers.com/select-operations-on-r-data-frames...
Pandas is a data frame library for Python, based on R:
http://pandas.pydata.org/pandas-docs/stable/basics.html
This article explains the relevance of the relational model to data analysis / statistics (rows are observations, columns are variables):
https://scholar.google.com/scholar?cluster=77966238326629329...