Databases in 2021: A Year in Review

why-el · on Dec 30, 2021

Postgres's dominance is well deserved, of course. My only concerns with it, both are actively worked on, are bloat management (significant for update heavy workloads and programmers used to the MySQL model of rollback segments) and the scaling of concurrency (going over 500 connections). Bloat was taken over by Cybertec[1] after stalling for a bit and is funded (yay), while concurrency was also enhanced out of Microsoft [2]. All in all, an excellent future for our beloved Postgres.

[1] https://github.com/cybertec-postgresql/zheap [2] https://techcommunity.microsoft.com/t5/azure-database-for-po...

newlisp · on Dec 30, 2021

Another concern, no temporal tables, don't businesses demand this feature?

sa46 · on Dec 30, 2021

In Postgres land, I think most businesses work around temporal tables with audit tables using triggers to dump jsonb or hstore. I wrote up how I used table-inheritance here [1].

I agree with your point. Postgres is starting to stick out compared to alternatives:

- MS SQL supports uni-temporal tables using system time.

- Snowflake has time travel which acts like temporal tables but with a limited retention window. Seems more like a restore mechanism.

- MariaDB has system-versioned tables (doesn't look like it's in MySQL).

- Cockroach DB has uni-temporal support with system time but limited to the garbage collection period. The docs indicate you don't want a long garbage collection period since all versions are stored in a single range.

- Oracle seems to have the best temporal support with their flashback tech. But it's hard to read between the lines to figure out what it actually does.

[1]: https://news.ycombinator.com/item?id=29010446

manigandham · on Dec 30, 2021

I've never seen a business actually use them, large or small. Any auditing requirements are usually fed from other sources, like Kafka event streams, files on S3, or a OLAP data warehouse.

code_biologist · on Dec 30, 2021

How do you set up and feed the warehouse? Temporal-ish tables have been an obvious, simple, and mostly foolproof solution for many of our historical analytics and reporting needs.

Bitemporal stuff (enabling edited versions of history) is where things get hairy and I definitely question the utility outside of a dedicated use case.

manigandham · on Dec 31, 2021

Most databases have built-in CDC (change data capture) that can be exported. Otherwise the WAL logs can be read with other tooling.

Debezium is a great open-source product for streaming changes from many relational databases: https://debezium.io

code_biologist · on Dec 30, 2021

Would love to see wider support for temporal tables, but application level approaches like https://github.com/jazzband/django-simple-history have worked for the business issues I have.

p_l · on Dec 30, 2021

Interestingly enough, Postgres used to have time travel in tables before MVCC transactions were added. Apparently it wasn't exactly used feature.

roenxi · on Dec 30, 2021

Although temporal tables are a really good idea; it is possible to get away without them being a first class feature. They aren't hard to mimic if you can give up the guarantee of catching every detail. In an ideal world (ha ha, silly thought) the tables would be designed to be append-only anyway, or the amount of data would be significant. Both of which make temporal tables somewhat moot.

tpetry · on Dec 30, 2021

They are really easy to mimic in PostgreSQL with range types (tstzrange) and an exclusion constraint, so now overlapping values are allowed. I guess they will not add it to the core if a developer can add support to them so easy.

ivank · on Dec 30, 2021

I use https://github.com/xocolatl/periods for this to some success.

srcreigh · on Dec 30, 2021

What do temporal tables do that good queries don't?

InsaneOstrich · on Dec 30, 2021

Auditing of changes. We have to have a second table that stores history for any table that may need to be audited in the future

nightpool · on Dec 30, 2021

i’ve very rarely found that using a full temporal table is the right choice for online analysis—a dedicated schema serves you better in the long run and helps you design your indexes, etc appropriately. For compliance, PIT backups via WAL shipping should suffice, no?

lenkite · on Jan 3, 2022

I wish Postgres was more SQL standards compliant. Stuff like using `nextval()` instead of `NEXT VALUE` in SQL sequences is a pain.

nicoburns · on Dec 30, 2021

Is zheap definitely still an active project? Last commit seems to be Oct 2020

phonon · on Dec 30, 2021

Looks like it's still being actively worked on.

https://www.cybertec-postgresql.com/en/zheap-undo-logs-disca...

https://github.com/cybertec-postgresql/postgres/tree/zheap_u...

srcreigh · on Dec 30, 2021

Clustered indexes?

zffr · on Dec 30, 2021

The author is a professor at CMU who specializes in databases: https://www.cs.cmu.edu/~pavlo/

Not completely related, but his lectures on databases on YouTube are really good. Much better than the DB class I had at college.

adamkl · on Dec 30, 2021

Said lectures on YouTube: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

A great way to learn more about the inner workings of databases, and entertaining too.

Another choice quote (from one of his lectures):

“There’s only two things I care about in life:

1. My wife

2. Databases

I don’t give a f#ck about anything else”

_the_inflator · on Dec 30, 2021

The author is hilarious! Quote from his article: “I even broke up with a girlfriend once because of sloppy benchmark results.”

thejosh · on Dec 30, 2021

I'm really excited by all the database love in the last few years. I moved to PG from MySQL in 2014 and don't regret it since.

Timescaledb looks very exciting, as it's "just" a PG extension, but their compression work looks great. [0]

I'm also really loving clickhouse, but haven't deployed that to production yet (haven't had the need to yet, almost did for an apache arrow reading thing, but didn't end up using arrow). They do some amazing things there, and the work they do is crazy impressive and fast. Reading their changelog they power through things.

[0] https://docs.timescale.com/timescaledb/latest/how-to-guides/...

threeseed · on Dec 30, 2021

So a company that sells PostgreSQL services thinks PostgreSQL is dominating. Brilliant.

The reality is that nothing is dominating. In 2021 there were more databases than ever each addressing a different use case. Companies don't have just one EDW they will have dozens even hundreds of siloed data stores. Startups will start with one for everything, then split out auth, user analytics, telemetry etc

There is no evidence of any consolidation in the market. And definitely not some mass trend towards PostgreSQL.

Sytten · on Dec 30, 2021

Couple of points:

1. Ottertune doesn't sell PostgreSQL services, they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

2. PostgreSQL is definitely gaining market shares and fast, see the db-engine graph [1], you can compare it to the oracle trend if you are not convinced [2]

[1] https://db-engines.com/en/ranking_trend/system/PostgreSQL

[2] https://db-engines.com/en/ranking_trend/system/Oracle

srcreigh · on Dec 30, 2021

Is this ranking by # of orgs using Postgres, or relative total company value using Postgres, or some even more ambiguous effectiveness metric?

Answer: https://db-engines.com/en/ranking_definition

srcreigh · on Dec 30, 2021

A DB system that works for professionals and doesn't require any public ecosystem of training materials won't be mentioned much in public.

y4mi · on Dec 30, 2021

thats likely why they're also including mentions in job postings in their metric

> Number of job offers, in which the system is mentioned

its not a silver bullet, but I do think its at least somewhat representative of popularity.

newlisp · on Dec 30, 2021

they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

A ML program that automatically tunes your production database in real-time. What could possibly go wrong?

apavlo · on Dec 30, 2021

We are very careful to make sure that we don't allow the tuning algorithms to make changes that could be detrimental to the correctness or availability of the database. This blog article describes some of the safeguards that we employ:

https://ottertune.com/blog/prevent-machine-learning-from-wre...

We also advise our customers to not point OtterTune at a production database right away.

threeseed · on Dec 30, 2021

You can't just compare graphs like that without factoring in the cloud. PostgreSQL is a first-class, cloud managed, supported database in the top three cloud providers whereas Oracle is not. It's a massive impediment to adoption and is in no way a reflection of the database itself.

Either way nothing to suggest that PostgreSQL is any way dominating.

onphonenow · on Dec 30, 2021

Postgresql is also well supported as a managed offering for PAAS offerings as well.

Heroku has https://www.heroku.com/postgres

Fly.io - https://fly.io/docs/reference/postgres/

and lots more of these pretty small players - that still drive adoption.

Then it's very well support on AWS / GCP / Azure

So postgresql is just crushing it in terms of adoption.

I honestly have not seen major Oracle offering in a bit.

Looking at what tech companies are building on is oracle a major player these days. They used to be THE pretty much only player - those days feel gone by now.

nightpool · on Dec 30, 2021

Your own comment suggests that Postgres is dominating over Oracle, simply by saying that it’s been adopted as a major offering by the top 3 cloud providers. How is that not a reflection of the database?

dreyfan · on Dec 30, 2021

All you need is Postgres (OLTP) and if you have large datasets where Postgres falls behind for analytical work, then you reach for Clickhouse (OLAP) for those features (while Postgres remains your primary operational database and source of truth).

mritchie712 · on Dec 30, 2021

Agreed. I have a good bit of experience in SaaS and analytics and that's exactly what I landed on for building Luabase[0]. Postgres (specifically Supabase) for the app database, Clickhouse to run the analytics (which is the product).

0 - https://luabase.com/

drchaim · on Dec 31, 2021

This is the way for my also.

czhu12 · on Dec 30, 2021

It's weird to put postgres into the same bucket as elastic search as they are often used for different things.

No matter how much you tune / denormalize postgres, you'll never get the free text search performance elastic search offers. Our best efforts on a 5 million row table yielded 600ms query times vs 30-60ms.

Similarity with snow flake, you'd never expect postgres to perform analytical queries at that scale.

I know graph databases and Time series DB have similar performance tradeoffs.

I think the most interesting and challenging area is how to architect a system uses many of these databases and keeps them eventually consistent without some bound.

code_biologist · on Dec 30, 2021

Not affiliated, but for anyone looking to do searches on data stored primarily in Postgres via Elastic, ZomboDB is pretty slick.

ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. https://github.com/zombodb/zombodb#readme

tpetry · on Dec 30, 2021

The author is talking about a different classes of rdbms. I believe his intention was not to compare PostgreSQL to ElasticSearch or ClickHouse which will solve a completely different problem.

But for small to medium datasets his advice to just stick to PostgreSQL is good: Start with an easy solution which will give you anything you need (by simply installing a plugin). If you need more specialized software THEN use it, but don't start with an overcomplicated stack because ElasticSearch and ClickHouse may be the state-of-the-art open source solution to a specific problem.

zxcq544 · on Dec 31, 2021

Have you tried GIN trigram(https://www.postgresql.org/docs/14/pgtrgm.html CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);) and GIN fulltext search indexes(CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);) ? As far as I know after applying those indexes on full text search columns you can search as fast as in Elastic because those indexes are built same way as in Elastic.

t-writescode · on Dec 30, 2021

How large are your text areas? What types of indexes are you using?

_vvhw · on Dec 30, 2021

What are the distributed options for Postgres? What mechanisms are available to make it highly available i.e. with a distributed consensus protocol for strict serializability when failing over the primary? How do people typically deploy Postgres as a cluster?

1. Async replication tolerating data loss from slightly stale backup after a failover?

2. Sync replication tolerating downtime during manual failover?

3. Distributed consensus protocol for automated failover, high availability and no data loss, e.g. Viewstamped Replication, Paxos or Raft?

It seems like most managed service versions of databases such as Aurora, Timescale etc. are all doing option 3, but the open-source alternatives otherwise are still options 1 and 2?

ryanworl · on Dec 30, 2021

I think you'd still need to change the core of the database to avoid stale reads when an old primary and client are partitioned away from the new primary, or force all client communication through a proxy smart enough to contact a quorum of replicas to ensure the current primary is still the primary during transaction begin and commit.

_vvhw · on Dec 30, 2021

Ah yes, good point!

I was assuming in both cases of manual failover that the operator would have to have some way of physically shutting down the old primary, then starting it again only as a backup that doesn't reply to clients. Alternatively, the cluster would need to remain unavailable if any node is partitioned.

But none of this is really very practical when compared to a consensus protocol (or R/W quorums) and distributed database. I'm genuinely curious how people solve this with something like Postgres. Or is it perhaps something that isn't much worried about?

AtlasBarfed · on Dec 30, 2021

I can't see how #3 scales under any write load unless you have no joins.|

Well, unless each node has a complete copy of the data?

eternalban · on Dec 30, 2021

Databases are the best all around scratch every cs geek itch domain there is, with possible exception of operating systems.

The critical importance of extensibility as a primary concern of successful DB products needs to be highlighted. Realities of the domain dictate that product X matures a few years after inception, at which point the application patterns may have shifted. (Remember map-reduce?) If you pay attention, for example, you'll note that the du jour darlings are scrambling to claim fitness for ML (a subset of big-data), and the new comers are claiming to be "designed for ML".

Smart VC money should be on extensible players ..

SPBS · on Dec 30, 2021

I genuinely couldn't tell if the author was being sarcastic when he said Larry Ellison was down on his luck because he dropped from 5th richest to 10th richest (and the whole thing about pulling himself out of the gutters by clawing up to 5th richest again).

apavlo · on Dec 30, 2021

I was not being sarcastic. Larry is a good man.

hodgesrm · on Dec 31, 2021

Larry Ellison is seriously underestimated as a database leader. I worked at Sybase. Oracle beat us fair and square. The Oracle DBMS team is outstanding.

zaphirplane · on Dec 31, 2021

Didn’t Microsoft take sybase’s customers

hodgesrm · on Jan 1, 2022

Not really--MS SQL Server was Windows only and took a while to grow. Oracle ran everywhere and was simply a better database by the mid-90s. (Previously Sybase was quite far ahead.) Oracle had row-level locking and MVCC at a time when Sybase was still stuck with a cumbersome page locking model. Oracle was also more reliable at least in my experience. I used to hit page corruption pretty regularly on Sybase but almost never on Oracle.

Disclaimer: I worked at Sybase. It was an outstanding company in the early days.

fuy · on Dec 30, 2021

It obviously was sarcastic.

rafaele · on Dec 30, 2021

Seems like he's being serious and based on the linked tweet, I think he reveres Larry Ellison.

srini_reddy · on Dec 30, 2021

that's what I felt too. especially after the word "gutters". :)

sriku · on Dec 30, 2021

I've been intrigued by dgraph (https://dgraph.io) and used it to good effect in a (toy) project where it felt easy to create and evolve it's data model given changing requirements.

Dgraph uses graphql as its native query language.

Anyone here has some experience to share on it? ... Since it isn't mentioned in the article.

divan · on Dec 30, 2021

My DB discovery and the game changer of 2021 was EdgeDB.

PudgePacket · on Dec 30, 2021

Thanks for point it out, after a quick glance it actually looks like something I want to learn more about. Takes the niceties from prisma.io schema tooling and bring it closer to postgres.

girvo · on Dec 30, 2021

Oh wow. Thats what I've been looking for, for years at this point. Thanks for the shout-out, I know what I'm playing with for my next project!

Kinrany · on Dec 30, 2021

What did you like about it?

kbenson · on Dec 30, 2021

I can tell you what I liked about it when I looked, and that's that it seems to allow you to easily encapsulate what your intent as a programmer creating a record to store data is, as well to query it that way. I imagine it obviates some of the reasons reach for ORMs for certain programmatic database needs.

uvdn7 · on Dec 30, 2021

I agree with Andy that it’s just super fun to work on databases. You get to work on consensus, networking, compute, storage, etc. The workloads are always changing, you can try to optimize across the entire stack. Applications and workloads come and go, but databases will always be around.

hodgesrm · on Dec 31, 2021

Most interesting CS topics show up at least somewhere in databases.

tayo42 · on Dec 30, 2021

Wow I kind of feel like I'm reading about Javascript frameworks. I don't recognize any of the dbs or companies/projects. Didn't realize the db world was so busy

apavlo · on Dec 30, 2021

If you want to be even more overwhelmed, see my encyclopedia of database systems:

https://dbdb.io/

howdydoo · on Dec 30, 2021

Which database do you use to store your database of databases?

mlinksva · on Dec 31, 2021

Perhaps sqlite https://github.com/cmu-db/dbdb.io/blob/1f9e552a6f918aea3a29e...

Hamuko · on Dec 30, 2021

Not even PostgreSQL?

tayo42 · on Dec 30, 2021

Ok I thought a couple of them were obvious ones that are commonly known.(postgres, redis, mysql, oracle, mongo, cassandra)

ttiurani · on Dec 30, 2021

> Databases Are the Most Important Thing in My Life After My Family > I even broke up with a girlfriend once because of sloppy benchmark results.

I can't say I can relate, but I do appreciate being this passionate about things!

sigmonsays · on Dec 30, 2021

I really gotta go OT here and ask how this happened. Too funny.

Professional lives should be separate from personal but please, indulge us with a story!

ransom1538 · on Dec 30, 2021

I am so confused. https://vitess.io/ I would check this page out and view it's "Who uses Vitess" section. Postgres is awesome if you are running a stand alone server with 300 users or creating the next "uber for cats". But at scale mysql has all the solutions. DBs are not js frameworks.

bsdnoob · on Dec 30, 2021

I think PostgreSQL in an excellent general purpose solution specially for OLTP usecases but what it lacks behinds is that it's hard to scale horizontally (sharding). There are solutions for this ofcourse with citus but I haven't experimented with it however I have tried MySQL with Vitess which almost seems like dark wizardry. I hope one day vitess works with PostgreSQL.

FridgeSeal · on Dec 30, 2021

From the article:

> Rockset joined in, saying its performance is was better for real-time analytics than the other two.

So I went and read the linked Rockset comparison blog post, and while I get that it’s a marketing piece, it’s also so transparently desperate for any advantage over Druid and ClickHouse that their criteria is bizarre at best, and bordering on wildly incorrect at worst.

I’ve been burnt by commercial databases before, and I have a hard time justifying ever using one, especially considering the advent of open source databases that have feature and performance parity (if not outright superiority) and can be self-hosted on K8s, or managed-hosting can be easily purchased.

mritchie712 · on Dec 30, 2021

Altinity is doing a good job of this with Clickhouse. They offer some decent open source guides for self hosting[0] and offer a hosted option. The hosted option is as self serve as I'd like (you have to get "approved").

0 - https://github.com/Altinity/clickhouse-operator and

FridgeSeal · on Dec 30, 2021

Yeah I’ve been paying attention to the Altinity stuff for a while, they’ve got some good stuff.

I think we’ll get even more hosting options now that ClickHouse is it’s own backing company.

hodgesrm · on Dec 31, 2021

Thank you all for the very kind words about Altinity. We have always assumed that the ClickHouse market would be "crowded." By my count there are at least 7 cloud services based on ClickHouse. It's 8 if you include Firebolt, which embeds ClickHouse. There are even more hosting options on the way for ClickHouse, including clickhouse.com but also others. This is clearly going to be a competitive market with many outstanding alternatives for users.

We have a bunch of ideas at Altinity about how to make ClickHouse even more pervasive. Stay tuned in 2022.

Disclaimer: I am CEO of Altinity.

hu3 · on Dec 30, 2021

I expected more mentions of Vitess, which honestly looks like some kind of alien black magic from what I saw while consulting for a client this year.

But I guess not much else happened to it other than PlanetScale.

leetrout · on Dec 30, 2021

Which part most impressed you and which part seems like magic? Their devs / contributors are active on here...

hu3 · on Dec 30, 2021

edit: not sure why the downvotes (-3 so far) since I just stated my experience on the project. There must be something blatantly wrong in what I wrote and I would appreciate criticism.

    --------

An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.

As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.

Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.

Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].

[0] https://github.blog/2021-09-27-partitioning-githubs-relation...

samlambert · on Dec 30, 2021

I think its more about the HN community not wanting to hear that anything other than Postgres works.

skunkworker · on Dec 30, 2021

I would love to use Vitess, but it doesn’t support Postgres at the moment. And that’s a non-starter unfortunately.

ransom1538 · on Dec 30, 2021

All major companies are moving to Vitess. The battle is over. No one at scale uses Postgres.

sbmthakur · on Dec 30, 2021

Any reference for this?

anaganisk · on Dec 30, 2021

Uber - https://eng.uber.com/postgres-to-mysql-migration/

kaliszad · on Dec 30, 2021

Does somebody have experience with XTDB https://xtdb.com/index.html ? We would like to use it in our Clojure application perhaps with PostgreSQL as the backend (JDBC) to make it easier to implement a history feature.

Looking forward, instead of backward, it would be great for databases to have some kind of live-patch/ live-update feature so that one does not need any downtime at all if some rules are obeyed (with an automatic check, if that is the case). The same is for operating systems, where we have parts of the technology and even some limited deployment, but nothing of it is the default as far as I know. This situation makes it quite a bit harder to develop and maintain systems without introducing extreme complexity. It does not look like we will have less bugs/ less patches any time soon so we should make updating as easy as possible to drastically reduce the need for a maintenance window without resorting to building clusters for everything.

hbarka · on Dec 30, 2021

I’m genuinely happy with Redshift for data warehousing purposes. For this I mean not-transactional data store. I don’t want to use the term OLTP or OLAP as it puts it in a purist’s camp. Sometimes I store 3NF normalized data and many times a flattened denormalized very large fact table and often times a model similar to star schema. I don’t have to worry about building indexes anymore, which was a real chore with row-store databases like Oracle, MySQL, SQL Server, or PostgreSql. MPP column-store databases have really been a game-changer for the enterprise. We’re talking billions of rows of data easily handled in the query plan.

LunaSea · on Dec 31, 2021

The SQL version of Redshift is lagging so much behind that it makes it borderline unusable in my opinion.

hodgesrm · on Dec 31, 2021

I have always been a huge fan of Redshift, which extends to Anurag Gupta and the team that delivered it. Redshift has always struck me as one of the real breakthrough products the history of analytic databases. It collapsed deploying data warehouses from months to about 20 minutes.

It's great to see the current team is on the move again, as the original ParAccel architecture did not scale very well. There was an excellent talk on Redshift in Andy Pavlo's Vaccination Database Tech Talks, 2nd Dose. [0] It's by Ippokratis Pandis and worth a view. It covers a lot of the recent improvements, which are likely to disappoint the many critics who have counted Redshift out. (Prematurely in my opinion.)

[0] https://db.cs.cmu.edu/seminar2021-dose2/

leetrout · on Dec 30, 2021

Excited to see Dgraph on the top 10 mentions and climbing above neo4j

slekker · on Dec 30, 2021

We are experimenting with Neo4j and found that the Cypher QL albeit foreign looking in the beginning feels quite natural to read when you think about graphs. How’s your experience with Dgraph been, any thoughts? I havent really heard about it before reading this post hence the curiosity!

criticaltinker · on Dec 30, 2021

Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Does anyone else feel like a caveman when modeling a many to many relationship in a normalized schema, and then querying via SQL?

I’m surprised graph DBs aren’t more popular for this reason alone. Maybe it’s a far fetched dream, but perhaps a graph frontend can be slapped onto the Postgres backend.

apavlo · on Dec 30, 2021

> Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Graph databases will not overtake relational databases in 2030 by marketshare.

Bookmark this comment. Reach out to me in 2030. If I'm wrong, I will replace my official CMU photo with one of me wearing a shirt that says "Graph Databases Are #1". I will use that photo until I retire, get fired, or a former student stabs me.

hodgesrm · on Dec 31, 2021

Count me in on Andy's side of the bet. The most useful features of graph databases will likely be subsumed into RDBMS just as features from JSON stores and object stores were before them.

For example...One of the hits against RDBMS is that the structure is supposedly "rigid." That's simply not the case in many RDBMS, such as those using column storage. Adding columns in databases like ClickHouse is a trivial metadata operation. This means that many problems that Neo4j solves can be addressed in a more general-purpose RDBMS, because you can add columns easily to track relationships. It's pretty easy to envision other improvements to access methods to make searches more efficient.

I don't mean to undercut in any way the innovation of graph databases. It's just that the relational model is (a) extremely general and (b) can be extended.

cam0 · on Dec 30, 2021

Not a fan of graph dbs? Surprised the $325m round for Neo4j didn't make your funding paragraph.

https://techcrunch.com/2021/06/17/neo4j-series-f/

PhoenixReborn · on Dec 30, 2021

Have you looked at Hasura for the second question (graph frontend + relational backend)? That's basically GraphQL on top of Postgres.

As for the first question - I've tried using Neo4j and ArangoDB for relatively large-scale graph querying (1-2TB of data) and both couldn't hold a candle to Postgres or MySQL in terms of query performance for cost. Neo requires you to store most of your data in memory and Arango isn't great for cross-shard querying.

Unless there's some major new graph DB that comes out in the next few years I would still bet on relational being dominant in 2030.

jbergens · on Dec 30, 2021

Have you tried TigerGraph?

They say that they scale well. I have not tried any graphdb for prod work yet.

PDoyle · on Dec 31, 2021

Nonsense. Graph databases pre-date SQL. The relational model was created to overcome the limitations of graph databases.

srcreigh · on Dec 30, 2021

Relational data schemas are a graph

eurasiantiger · on Dec 30, 2021

And exactly for that reason, graph DBs can be more intuitive to work with: relational DBMSs generally don’t support any kind of graph operations or traversal queries.

option_greek · on Dec 30, 2021

You can always use an ORM which provides better usability for developers. End of the day rdbms model is suited for a wide variety of workloads and there are several other factors in play while choosing a good db including eco system, cloud vendor support, migrations, performance etc.

chishaku · on Dec 30, 2021

Which db is that?

Either way, that’s not happening.

will_gottschalk · on Dec 30, 2021

I’ll take Hasura for 500

rapnie · on Dec 30, 2021

Nice collection of open source databases: https://codeberg.org/yarmo/delightful-databases

dreig · on Dec 30, 2021

There's a very nice and comprehensive database of databases https://dbdb.io/ started and maintained by Andy Pavlo and the CMU-DB group.

jimmyed · on Dec 30, 2021

Andy forgot about the ugliest spart around benchmarks: Yugabyte v Cockroach.

endisneigh · on Dec 30, 2021

I wish there was some API that abstracted the DB and all technical details and you could connect nodes to it that are specific databases with specific capabilities and it would delegate as necessary.

Too · on Dec 30, 2021

Query language and data modeling for a db highly depends on if it is relational, graph, time series, denormalized or KV. Don't think this would be possible beyond what's already available in form of ORMs. Even getting SQL dialects to agree is a challenge some times.

srcreigh · on Dec 30, 2021

You're basically talking about Airflow/Airbyte/custom ETL + database expertise. The only way to get efficient performance is expertise, expertise is expensive, ETLs are a given when you have expertise... Just hire a DB consultant or two and you're all set.

mns06 · on Dec 30, 2021

You might want to look into debezium. We use it to extract the change log from a generic OLTP database into Materialize, a view maintenance engine. Combining that data with event streams in Kafka is very powerful for us.

paulryanrogers · on Dec 30, 2021

FDW may be a step in that direction

vbezhenar · on Dec 30, 2021

ODBC, JDBC.

beamatronic · on Dec 30, 2021

Not a word about Couchbase, which went IPO and is currently worth $1B

qaq · on Dec 30, 2021

There are so many Unicorn db companies now. It's hard to mention all of them

peakaboo · on Dec 30, 2021

And no mention of Exasol which is faster than most, if not all, of these databases for analytics.

dblooman · on Dec 30, 2021

ELI5, why do people still choose to use mongo?

redwood · on Dec 31, 2021

The continuous availability and horizontal scalability of the distributed system coupled with developer experience--document model, secondary indexes, all of that certainly has captivated a large and growing developer community… you could boil it down to a confluence of ease of use with the advanced capabilities that you may need if you are successful… still they would have petered out if it wasn't for Atlas which makes all of the above that much more accessible

menaerus · on Dec 31, 2021

Obvious, it's because MongoDB is web scale.

fullstackchris · on Dec 30, 2021

storing files

cloudengineer94 · on Dec 30, 2021

Postgress is amazing, however I work with SAP HANA every day and I gotta say this thing is completely insane.

tpetry · on Dec 30, 2021

Can you share some information about SAP HANA? Why is it insane? Insanely good or bad? I have no experience with it.

RedShift1 · on Dec 30, 2021

Well it is hard to beat keeping everything in RAM...

closeparen · on Dec 30, 2021

The world moved away from Hadoop and MapReduce… onto what?

carlineng · on Dec 30, 2021

Cloud SQL data warehouses like Snowflake and BigQuery.

throwDec21 · on Dec 30, 2021

I just wish AWS had something as good as BigQuery.

throw_me_up · on Dec 30, 2021

There are companies that use BigQuery for analytics but their infra is in AWS. BigQuery has support for external tables to S3 now. The BigQuery transfer service can also move data from AWS pretty easily. I agree though, BigQuery is astonishingly good and makes both Snowflake and Redshift look like dinosaurs imo.

AtlasBarfed · on Dec 30, 2021

Throw a dart at the Apache project list :-)

My god do we need an atlas of database related Apache projects.

It's almost as bad as java web frameworks about ten years ago.

Everyone can do everything and it's hard to know what is better for what.

FridgeSeal · on Dec 30, 2021

Add to this “Apache Streaming projects”.

I get that projects can be donated to Apache from disparate sources, but my god it’s still a disaster.

throwDec21 · on Dec 30, 2021

I'm just surprised that in 2021 BigQuery isn't more popular. I thought it would be top 10 by now, I moved to GCP because of it but feel like I'm the only one.

PhoenixReborn · on Dec 30, 2021

Because BQ is great for the ETL/data warehouse/BI use case but is terrible for online applications. I tried using BQ as the backing store for an online analytics application back in late 2019, and it was so much worse than using Clickhouse/Druid/Pinot for the same use case. IDK how much that has changed since, but I'm not too terribly surprised that it isn't higher.

jbergens · on Dec 30, 2021

I'm not on GCP but the one I would like to try is Spanner or Cloud Spanner.

I think more scalable systems will continue to gain market share. It will be interesting to see if PlanetScale, CockroachDb or some other actually becomes a big player.

RedShift1 · on Dec 30, 2021

Bigquery seems like a tool for very large but static datasets. I also had a hard time figuring out the pricing so other than some test queries I moved on to other solutions.