The Data Engineering Ecosystem: An Interactive Map

jkestelyn · on March 6, 2015

Very nice effort. A couple things missing though to make it truly up to date:

1. In "Ingest", where's Flume? 2. Where's "Interactive SQL" (eg Impala, and for Presto)? 3. Where's "Search" (Solr, ElasticSearch)?

ddrum001 · on March 6, 2015

Thanks, if you mouse-over the Batch and Database, you'll see more categories. Presto is under Batch SQL and Elasticsearch is under Document Oriented.

We see Flume, Solr, and Impala a lot but decided to omit them to strike a balance between including more technologies and overwhelming people who are new to the field. Inevitably, we had to leave off many of our favorite technologies.

bzz01 · on March 6, 2015

While all maps like this tend to make little practical sense since they inevitably over-generalize and over-simplify things, I'd still like to point out that they got "columnar" category quite wrong: neither HBase nor Cassandra are columnar stores in a way this term is commonly understood.

HBase and Cassandra still store data in rows, however rows can be partitioned into column families which may be stored separately. Columnar databases are usually also relational (Vertica and Redshift) and support SQL or SQL-like query language.

Anyway, I think regardless of how you define columnar, HBase and Redshift shouldn't end up in the same category as they are quite different in a way they work, throughput/latency and read/write balance and use cases.

ddrum001 · on March 6, 2015

You're correct that it's very difficult to balance between giving an overview and over-simplifying. We intended this to be a starting point, knowing that we'd have to make a trade-off between including more details and giving a concise overview.

For NoSQL specifically, it's difficult to put every database in only a few categories. We discussed that Cassandra and HBase are "maps of maps" in the details, and we definitely didn't want to imply that they have the same model or use-cases as Redshift. Perhaps our next iteration will explain the "column family" more and include a separate category for the databases that were inspired by the Big Table data model.

iblaine · on March 6, 2015

No real time MPP databases like Redshift, Netezza, Aster...otherwise good graph.

ddrum001 · on March 6, 2015

Thanks, if you mouse-over the databases box, you'll see Redshift under Columnar - we do see a lot of Netezza and Aster as well.

sampathweb · on March 6, 2015

Love the text overlay. It would be nice to also have project links in the text.

ddrum001 · on March 6, 2015

Thanks, we'll definitely include links in our next iteration.