They absolutely are faster, because there are optimizations available on data fr...

They absolutely are faster, because there are optimizations available on data frames that are impossible on RDDs.

Pushdown is the most obvious one. If I don't know what data store is underlying your RDD, I don't know your schema, and I don't know what column you're projecting, pushdown is impossible. I can't know that with an RDD, because all I know when you call map is that you're converting from type A to type B.

Dataframes make that class of optimization possible, because they have more information (your schema, the underlying store), and have more limited operations (select a column, not an arbitrary map operation).