Hacker News new | past | comments | ask | show | jobs | submit login
The Data Engineering Ecosystem: An Interactive Map (insightdataengineering.com)
47 points by jecs321 on March 6, 2015 | hide | past | favorite | 8 comments



Very nice effort. A couple things missing though to make it truly up to date:

1. In "Ingest", where's Flume? 2. Where's "Interactive SQL" (eg Impala, and for Presto)? 3. Where's "Search" (Solr, ElasticSearch)?


Thanks, if you mouse-over the Batch and Database, you'll see more categories. Presto is under Batch SQL and Elasticsearch is under Document Oriented.

We see Flume, Solr, and Impala a lot but decided to omit them to strike a balance between including more technologies and overwhelming people who are new to the field. Inevitably, we had to leave off many of our favorite technologies.


While all maps like this tend to make little practical sense since they inevitably over-generalize and over-simplify things, I'd still like to point out that they got "columnar" category quite wrong: neither HBase nor Cassandra are columnar stores in a way this term is commonly understood.

HBase and Cassandra still store data in rows, however rows can be partitioned into column families which may be stored separately. Columnar databases are usually also relational (Vertica and Redshift) and support SQL or SQL-like query language.

Anyway, I think regardless of how you define columnar, HBase and Redshift shouldn't end up in the same category as they are quite different in a way they work, throughput/latency and read/write balance and use cases.


You're correct that it's very difficult to balance between giving an overview and over-simplifying. We intended this to be a starting point, knowing that we'd have to make a trade-off between including more details and giving a concise overview.

For NoSQL specifically, it's difficult to put every database in only a few categories. We discussed that Cassandra and HBase are "maps of maps" in the details, and we definitely didn't want to imply that they have the same model or use-cases as Redshift. Perhaps our next iteration will explain the "column family" more and include a separate category for the databases that were inspired by the Big Table data model.


No real time MPP databases like Redshift, Netezza, Aster...otherwise good graph.


Thanks, if you mouse-over the databases box, you'll see Redshift under Columnar - we do see a lot of Netezza and Aster as well.


Love the text overlay. It would be nice to also have project links in the text.


Thanks, we'll definitely include links in our next iteration.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: