Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Databases in 2021: A Year in Review (ottertune.com)
323 points by jameslao on Dec 30, 2021 | hide | past | favorite | 130 comments


Postgres's dominance is well deserved, of course. My only concerns with it, both are actively worked on, are bloat management (significant for update heavy workloads and programmers used to the MySQL model of rollback segments) and the scaling of concurrency (going over 500 connections). Bloat was taken over by Cybertec[1] after stalling for a bit and is funded (yay), while concurrency was also enhanced out of Microsoft [2]. All in all, an excellent future for our beloved Postgres.

[1] https://github.com/cybertec-postgresql/zheap [2] https://techcommunity.microsoft.com/t5/azure-database-for-po...


Another concern, no temporal tables, don't businesses demand this feature?


In Postgres land, I think most businesses work around temporal tables with audit tables using triggers to dump jsonb or hstore. I wrote up how I used table-inheritance here [1].

I agree with your point. Postgres is starting to stick out compared to alternatives:

- MS SQL supports uni-temporal tables using system time.

- Snowflake has time travel which acts like temporal tables but with a limited retention window. Seems more like a restore mechanism.

- MariaDB has system-versioned tables (doesn't look like it's in MySQL).

- Cockroach DB has uni-temporal support with system time but limited to the garbage collection period. The docs indicate you don't want a long garbage collection period since all versions are stored in a single range.

- Oracle seems to have the best temporal support with their flashback tech. But it's hard to read between the lines to figure out what it actually does.

[1]: https://news.ycombinator.com/item?id=29010446


I've never seen a business actually use them, large or small. Any auditing requirements are usually fed from other sources, like Kafka event streams, files on S3, or a OLAP data warehouse.


How do you set up and feed the warehouse? Temporal-ish tables have been an obvious, simple, and mostly foolproof solution for many of our historical analytics and reporting needs.

Bitemporal stuff (enabling edited versions of history) is where things get hairy and I definitely question the utility outside of a dedicated use case.


Most databases have built-in CDC (change data capture) that can be exported. Otherwise the WAL logs can be read with other tooling.

Debezium is a great open-source product for streaming changes from many relational databases: https://debezium.io


Would love to see wider support for temporal tables, but application level approaches like https://github.com/jazzband/django-simple-history have worked for the business issues I have.


Interestingly enough, Postgres used to have time travel in tables before MVCC transactions were added. Apparently it wasn't exactly used feature.


Although temporal tables are a really good idea; it is possible to get away without them being a first class feature. They aren't hard to mimic if you can give up the guarantee of catching every detail. In an ideal world (ha ha, silly thought) the tables would be designed to be append-only anyway, or the amount of data would be significant. Both of which make temporal tables somewhat moot.


They are really easy to mimic in PostgreSQL with range types (tstzrange) and an exclusion constraint, so now overlapping values are allowed. I guess they will not add it to the core if a developer can add support to them so easy.


I use https://github.com/xocolatl/periods for this to some success.


What do temporal tables do that good queries don't?


Auditing of changes. We have to have a second table that stores history for any table that may need to be audited in the future


i’ve very rarely found that using a full temporal table is the right choice for online analysis—a dedicated schema serves you better in the long run and helps you design your indexes, etc appropriately. For compliance, PIT backups via WAL shipping should suffice, no?


I wish Postgres was more SQL standards compliant. Stuff like using `nextval()` instead of `NEXT VALUE` in SQL sequences is a pain.


Is zheap definitely still an active project? Last commit seems to be Oct 2020



Clustered indexes?


The author is a professor at CMU who specializes in databases: https://www.cs.cmu.edu/~pavlo/

Not completely related, but his lectures on databases on YouTube are really good. Much better than the DB class I had at college.


Said lectures on YouTube: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

A great way to learn more about the inner workings of databases, and entertaining too.

Another choice quote (from one of his lectures):

“There’s only two things I care about in life:

1. My wife

2. Databases

I don’t give a f#ck about anything else”


The author is hilarious! Quote from his article: “I even broke up with a girlfriend once because of sloppy benchmark results.”


I'm really excited by all the database love in the last few years. I moved to PG from MySQL in 2014 and don't regret it since.

Timescaledb looks very exciting, as it's "just" a PG extension, but their compression work looks great. [0]

I'm also really loving clickhouse, but haven't deployed that to production yet (haven't had the need to yet, almost did for an apache arrow reading thing, but didn't end up using arrow). They do some amazing things there, and the work they do is crazy impressive and fast. Reading their changelog they power through things.

[0] https://docs.timescale.com/timescaledb/latest/how-to-guides/...


So a company that sells PostgreSQL services thinks PostgreSQL is dominating. Brilliant.

The reality is that nothing is dominating. In 2021 there were more databases than ever each addressing a different use case. Companies don't have just one EDW they will have dozens even hundreds of siloed data stores. Startups will start with one for everything, then split out auth, user analytics, telemetry etc

There is no evidence of any consolidation in the market. And definitely not some mass trend towards PostgreSQL.


Couple of points:

1. Ottertune doesn't sell PostgreSQL services, they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

2. PostgreSQL is definitely gaining market shares and fast, see the db-engine graph [1], you can compare it to the oracle trend if you are not convinced [2]

[1] https://db-engines.com/en/ranking_trend/system/PostgreSQL

[2] https://db-engines.com/en/ranking_trend/system/Oracle


Is this ranking by # of orgs using Postgres, or relative total company value using Postgres, or some even more ambiguous effectiveness metric?

Answer: https://db-engines.com/en/ranking_definition


A DB system that works for professionals and doesn't require any public ecosystem of training materials won't be mentioned much in public.


thats likely why they're also including mentions in job postings in their metric

> Number of job offers, in which the system is mentioned

its not a silver bullet, but I do think its at least somewhat representative of popularity.


they sell a database optimization service that happens to support PostgreSQL (and other databases like MySQL)

A ML program that automatically tunes your production database in real-time. What could possibly go wrong?


We are very careful to make sure that we don't allow the tuning algorithms to make changes that could be detrimental to the correctness or availability of the database. This blog article describes some of the safeguards that we employ:

https://ottertune.com/blog/prevent-machine-learning-from-wre...

We also advise our customers to not point OtterTune at a production database right away.


You can't just compare graphs like that without factoring in the cloud. PostgreSQL is a first-class, cloud managed, supported database in the top three cloud providers whereas Oracle is not. It's a massive impediment to adoption and is in no way a reflection of the database itself.

Either way nothing to suggest that PostgreSQL is any way dominating.


Postgresql is also well supported as a managed offering for PAAS offerings as well.

Heroku has https://www.heroku.com/postgres

Fly.io - https://fly.io/docs/reference/postgres/

and lots more of these pretty small players - that still drive adoption.

Then it's very well support on AWS / GCP / Azure

So postgresql is just crushing it in terms of adoption.

I honestly have not seen major Oracle offering in a bit.

Looking at what tech companies are building on is oracle a major player these days. They used to be THE pretty much only player - those days feel gone by now.


Your own comment suggests that Postgres is dominating over Oracle, simply by saying that it’s been adopted as a major offering by the top 3 cloud providers. How is that not a reflection of the database?


All you need is Postgres (OLTP) and if you have large datasets where Postgres falls behind for analytical work, then you reach for Clickhouse (OLAP) for those features (while Postgres remains your primary operational database and source of truth).


Agreed. I have a good bit of experience in SaaS and analytics and that's exactly what I landed on for building Luabase[0]. Postgres (specifically Supabase) for the app database, Clickhouse to run the analytics (which is the product).

0 - https://luabase.com/


This is the way for my also.


It's weird to put postgres into the same bucket as elastic search as they are often used for different things.

No matter how much you tune / denormalize postgres, you'll never get the free text search performance elastic search offers. Our best efforts on a 5 million row table yielded 600ms query times vs 30-60ms.

Similarity with snow flake, you'd never expect postgres to perform analytical queries at that scale.

I know graph databases and Time series DB have similar performance tradeoffs.

I think the most interesting and challenging area is how to architect a system uses many of these databases and keeps them eventually consistent without some bound.


Not affiliated, but for anyone looking to do searches on data stored primarily in Postgres via Elastic, ZomboDB is pretty slick.

ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes backed by Elasticsearch. https://github.com/zombodb/zombodb#readme


The author is talking about a different classes of rdbms. I believe his intention was not to compare PostgreSQL to ElasticSearch or ClickHouse which will solve a completely different problem.

But for small to medium datasets his advice to just stick to PostgreSQL is good: Start with an easy solution which will give you anything you need (by simply installing a plugin). If you need more specialized software THEN use it, but don't start with an overcomplicated stack because ElasticSearch and ClickHouse may be the state-of-the-art open source solution to a specific problem.


Have you tried GIN trigram(https://www.postgresql.org/docs/14/pgtrgm.html CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);) and GIN fulltext search indexes(CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);) ? As far as I know after applying those indexes on full text search columns you can search as fast as in Elastic because those indexes are built same way as in Elastic.


How large are your text areas? What types of indexes are you using?


What are the distributed options for Postgres? What mechanisms are available to make it highly available i.e. with a distributed consensus protocol for strict serializability when failing over the primary? How do people typically deploy Postgres as a cluster?

1. Async replication tolerating data loss from slightly stale backup after a failover?

2. Sync replication tolerating downtime during manual failover?

3. Distributed consensus protocol for automated failover, high availability and no data loss, e.g. Viewstamped Replication, Paxos or Raft?

It seems like most managed service versions of databases such as Aurora, Timescale etc. are all doing option 3, but the open-source alternatives otherwise are still options 1 and 2?


I think you'd still need to change the core of the database to avoid stale reads when an old primary and client are partitioned away from the new primary, or force all client communication through a proxy smart enough to contact a quorum of replicas to ensure the current primary is still the primary during transaction begin and commit.


Ah yes, good point!

I was assuming in both cases of manual failover that the operator would have to have some way of physically shutting down the old primary, then starting it again only as a backup that doesn't reply to clients. Alternatively, the cluster would need to remain unavailable if any node is partitioned.

But none of this is really very practical when compared to a consensus protocol (or R/W quorums) and distributed database. I'm genuinely curious how people solve this with something like Postgres. Or is it perhaps something that isn't much worried about?


I can't see how #3 scales under any write load unless you have no joins.|

Well, unless each node has a complete copy of the data?


Databases are the best all around scratch every cs geek itch domain there is, with possible exception of operating systems.

The critical importance of extensibility as a primary concern of successful DB products needs to be highlighted. Realities of the domain dictate that product X matures a few years after inception, at which point the application patterns may have shifted. (Remember map-reduce?) If you pay attention, for example, you'll note that the du jour darlings are scrambling to claim fitness for ML (a subset of big-data), and the new comers are claiming to be "designed for ML".

Smart VC money should be on extensible players ..


I genuinely couldn't tell if the author was being sarcastic when he said Larry Ellison was down on his luck because he dropped from 5th richest to 10th richest (and the whole thing about pulling himself out of the gutters by clawing up to 5th richest again).


I was not being sarcastic. Larry is a good man.


Larry Ellison is seriously underestimated as a database leader. I worked at Sybase. Oracle beat us fair and square. The Oracle DBMS team is outstanding.


Didn’t Microsoft take sybase’s customers


Not really--MS SQL Server was Windows only and took a while to grow. Oracle ran everywhere and was simply a better database by the mid-90s. (Previously Sybase was quite far ahead.) Oracle had row-level locking and MVCC at a time when Sybase was still stuck with a cumbersome page locking model. Oracle was also more reliable at least in my experience. I used to hit page corruption pretty regularly on Sybase but almost never on Oracle.

Disclaimer: I worked at Sybase. It was an outstanding company in the early days.


It obviously was sarcastic.


Seems like he's being serious and based on the linked tweet, I think he reveres Larry Ellison.


that's what I felt too. especially after the word "gutters". :)


I've been intrigued by dgraph (https://dgraph.io) and used it to good effect in a (toy) project where it felt easy to create and evolve it's data model given changing requirements.

Dgraph uses graphql as its native query language.

Anyone here has some experience to share on it? ... Since it isn't mentioned in the article.


My DB discovery and the game changer of 2021 was EdgeDB.


Thanks for point it out, after a quick glance it actually looks like something I want to learn more about. Takes the niceties from prisma.io schema tooling and bring it closer to postgres.


Oh wow. Thats what I've been looking for, for years at this point. Thanks for the shout-out, I know what I'm playing with for my next project!


What did you like about it?


I can tell you what I liked about it when I looked, and that's that it seems to allow you to easily encapsulate what your intent as a programmer creating a record to store data is, as well to query it that way. I imagine it obviates some of the reasons reach for ORMs for certain programmatic database needs.


I agree with Andy that it’s just super fun to work on databases. You get to work on consensus, networking, compute, storage, etc. The workloads are always changing, you can try to optimize across the entire stack. Applications and workloads come and go, but databases will always be around.


Most interesting CS topics show up at least somewhere in databases.


Wow I kind of feel like I'm reading about Javascript frameworks. I don't recognize any of the dbs or companies/projects. Didn't realize the db world was so busy


If you want to be even more overwhelmed, see my encyclopedia of database systems:

https://dbdb.io/


Which database do you use to store your database of databases?



Not even PostgreSQL?


Ok I thought a couple of them were obvious ones that are commonly known.(postgres, redis, mysql, oracle, mongo, cassandra)


> Databases Are the Most Important Thing in My Life After My Family > I even broke up with a girlfriend once because of sloppy benchmark results.

I can't say I can relate, but I do appreciate being this passionate about things!


I really gotta go OT here and ask how this happened. Too funny.

Professional lives should be separate from personal but please, indulge us with a story!


I am so confused. https://vitess.io/ I would check this page out and view it's "Who uses Vitess" section. Postgres is awesome if you are running a stand alone server with 300 users or creating the next "uber for cats". But at scale mysql has all the solutions. DBs are not js frameworks.


I think PostgreSQL in an excellent general purpose solution specially for OLTP usecases but what it lacks behinds is that it's hard to scale horizontally (sharding). There are solutions for this ofcourse with citus but I haven't experimented with it however I have tried MySQL with Vitess which almost seems like dark wizardry. I hope one day vitess works with PostgreSQL.


From the article:

> Rockset joined in, saying its performance is was better for real-time analytics than the other two.

So I went and read the linked Rockset comparison blog post, and while I get that it’s a marketing piece, it’s also so transparently desperate for any advantage over Druid and ClickHouse that their criteria is bizarre at best, and bordering on wildly incorrect at worst.

I’ve been burnt by commercial databases before, and I have a hard time justifying ever using one, especially considering the advent of open source databases that have feature and performance parity (if not outright superiority) and can be self-hosted on K8s, or managed-hosting can be easily purchased.


Altinity is doing a good job of this with Clickhouse. They offer some decent open source guides for self hosting[0] and offer a hosted option. The hosted option is as self serve as I'd like (you have to get "approved").

0 - https://github.com/Altinity/clickhouse-operator and


Yeah I’ve been paying attention to the Altinity stuff for a while, they’ve got some good stuff.

I think we’ll get even more hosting options now that ClickHouse is it’s own backing company.


Thank you all for the very kind words about Altinity. We have always assumed that the ClickHouse market would be "crowded." By my count there are at least 7 cloud services based on ClickHouse. It's 8 if you include Firebolt, which embeds ClickHouse. There are even more hosting options on the way for ClickHouse, including clickhouse.com but also others. This is clearly going to be a competitive market with many outstanding alternatives for users.

We have a bunch of ideas at Altinity about how to make ClickHouse even more pervasive. Stay tuned in 2022.

Disclaimer: I am CEO of Altinity.


I expected more mentions of Vitess, which honestly looks like some kind of alien black magic from what I saw while consulting for a client this year.

But I guess not much else happened to it other than PlanetScale.


Which part most impressed you and which part seems like magic? Their devs / contributors are active on here...


edit: not sure why the downvotes (-3 so far) since I just stated my experience on the project. There must be something blatantly wrong in what I wrote and I would appreciate criticism.

    --------
An architect demoed the failure of a shard and the automatic promotion of its backup shard to main, in production. They actually test their failure models.

As I see it, sharding is not very hard. HA is not very hard given a reasonable SLA. But sharding with HA on a large setup that actually works is pretty hard.

Another thing that stuck in my mind was their high throughput-per-provisioned-hardware ratio. With not much hardware they were pulling 80k queries per second with room to spare.

Although I have to say, that's not much compared to GitHub which pulls 1.2 million queries/sec on Vitess [0].

[0] https://github.blog/2021-09-27-partitioning-githubs-relation...


I think its more about the HN community not wanting to hear that anything other than Postgres works.


I would love to use Vitess, but it doesn’t support Postgres at the moment. And that’s a non-starter unfortunately.


All major companies are moving to Vitess. The battle is over. No one at scale uses Postgres.


Any reference for this?



Does somebody have experience with XTDB https://xtdb.com/index.html ? We would like to use it in our Clojure application perhaps with PostgreSQL as the backend (JDBC) to make it easier to implement a history feature.

Looking forward, instead of backward, it would be great for databases to have some kind of live-patch/ live-update feature so that one does not need any downtime at all if some rules are obeyed (with an automatic check, if that is the case). The same is for operating systems, where we have parts of the technology and even some limited deployment, but nothing of it is the default as far as I know. This situation makes it quite a bit harder to develop and maintain systems without introducing extreme complexity. It does not look like we will have less bugs/ less patches any time soon so we should make updating as easy as possible to drastically reduce the need for a maintenance window without resorting to building clusters for everything.


I’m genuinely happy with Redshift for data warehousing purposes. For this I mean not-transactional data store. I don’t want to use the term OLTP or OLAP as it puts it in a purist’s camp. Sometimes I store 3NF normalized data and many times a flattened denormalized very large fact table and often times a model similar to star schema. I don’t have to worry about building indexes anymore, which was a real chore with row-store databases like Oracle, MySQL, SQL Server, or PostgreSql. MPP column-store databases have really been a game-changer for the enterprise. We’re talking billions of rows of data easily handled in the query plan.


The SQL version of Redshift is lagging so much behind that it makes it borderline unusable in my opinion.


I have always been a huge fan of Redshift, which extends to Anurag Gupta and the team that delivered it. Redshift has always struck me as one of the real breakthrough products the history of analytic databases. It collapsed deploying data warehouses from months to about 20 minutes.

It's great to see the current team is on the move again, as the original ParAccel architecture did not scale very well. There was an excellent talk on Redshift in Andy Pavlo's Vaccination Database Tech Talks, 2nd Dose. [0] It's by Ippokratis Pandis and worth a view. It covers a lot of the recent improvements, which are likely to disappoint the many critics who have counted Redshift out. (Prematurely in my opinion.)

[0] https://db.cs.cmu.edu/seminar2021-dose2/


Excited to see Dgraph on the top 10 mentions and climbing above neo4j


We are experimenting with Neo4j and found that the Cypher QL albeit foreign looking in the beginning feels quite natural to read when you think about graphs. How’s your experience with Dgraph been, any thoughts? I havent really heard about it before reading this post hence the curiosity!


Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Does anyone else feel like a caveman when modeling a many to many relationship in a normalized schema, and then querying via SQL?

I’m surprised graph DBs aren’t more popular for this reason alone. Maybe it’s a far fetched dream, but perhaps a graph frontend can be slapped onto the Postgres backend.


> Databases in 2030: SQL DB finally succumbs to Graph DB as #1

Graph databases will not overtake relational databases in 2030 by marketshare.

Bookmark this comment. Reach out to me in 2030. If I'm wrong, I will replace my official CMU photo with one of me wearing a shirt that says "Graph Databases Are #1". I will use that photo until I retire, get fired, or a former student stabs me.


Count me in on Andy's side of the bet. The most useful features of graph databases will likely be subsumed into RDBMS just as features from JSON stores and object stores were before them.

For example...One of the hits against RDBMS is that the structure is supposedly "rigid." That's simply not the case in many RDBMS, such as those using column storage. Adding columns in databases like ClickHouse is a trivial metadata operation. This means that many problems that Neo4j solves can be addressed in a more general-purpose RDBMS, because you can add columns easily to track relationships. It's pretty easy to envision other improvements to access methods to make searches more efficient.

I don't mean to undercut in any way the innovation of graph databases. It's just that the relational model is (a) extremely general and (b) can be extended.


Not a fan of graph dbs? Surprised the $325m round for Neo4j didn't make your funding paragraph.

https://techcrunch.com/2021/06/17/neo4j-series-f/


Have you looked at Hasura for the second question (graph frontend + relational backend)? That's basically GraphQL on top of Postgres.

As for the first question - I've tried using Neo4j and ArangoDB for relatively large-scale graph querying (1-2TB of data) and both couldn't hold a candle to Postgres or MySQL in terms of query performance for cost. Neo requires you to store most of your data in memory and Arango isn't great for cross-shard querying.

Unless there's some major new graph DB that comes out in the next few years I would still bet on relational being dominant in 2030.


Have you tried TigerGraph?

They say that they scale well. I have not tried any graphdb for prod work yet.


Nonsense. Graph databases pre-date SQL. The relational model was created to overcome the limitations of graph databases.


Relational data schemas are a graph


And exactly for that reason, graph DBs can be more intuitive to work with: relational DBMSs generally don’t support any kind of graph operations or traversal queries.


You can always use an ORM which provides better usability for developers. End of the day rdbms model is suited for a wide variety of workloads and there are several other factors in play while choosing a good db including eco system, cloud vendor support, migrations, performance etc.


Which db is that?

Either way, that’s not happening.


I’ll take Hasura for 500


Nice collection of open source databases: https://codeberg.org/yarmo/delightful-databases


There's a very nice and comprehensive database of databases https://dbdb.io/ started and maintained by Andy Pavlo and the CMU-DB group.


Andy forgot about the ugliest spart around benchmarks: Yugabyte v Cockroach.


I wish there was some API that abstracted the DB and all technical details and you could connect nodes to it that are specific databases with specific capabilities and it would delegate as necessary.


Query language and data modeling for a db highly depends on if it is relational, graph, time series, denormalized or KV. Don't think this would be possible beyond what's already available in form of ORMs. Even getting SQL dialects to agree is a challenge some times.


You're basically talking about Airflow/Airbyte/custom ETL + database expertise. The only way to get efficient performance is expertise, expertise is expensive, ETLs are a given when you have expertise... Just hire a DB consultant or two and you're all set.


You might want to look into debezium. We use it to extract the change log from a generic OLTP database into Materialize, a view maintenance engine. Combining that data with event streams in Kafka is very powerful for us.


FDW may be a step in that direction


ODBC, JDBC.


Not a word about Couchbase, which went IPO and is currently worth $1B


There are so many Unicorn db companies now. It's hard to mention all of them


And no mention of Exasol which is faster than most, if not all, of these databases for analytics.


ELI5, why do people still choose to use mongo?


The continuous availability and horizontal scalability of the distributed system coupled with developer experience--document model, secondary indexes, all of that certainly has captivated a large and growing developer community… you could boil it down to a confluence of ease of use with the advanced capabilities that you may need if you are successful… still they would have petered out if it wasn't for Atlas which makes all of the above that much more accessible


Obvious, it's because MongoDB is web scale.


storing files


Postgress is amazing, however I work with SAP HANA every day and I gotta say this thing is completely insane.


Can you share some information about SAP HANA? Why is it insane? Insanely good or bad? I have no experience with it.


Well it is hard to beat keeping everything in RAM...


The world moved away from Hadoop and MapReduce… onto what?


Cloud SQL data warehouses like Snowflake and BigQuery.


I just wish AWS had something as good as BigQuery.


There are companies that use BigQuery for analytics but their infra is in AWS. BigQuery has support for external tables to S3 now. The BigQuery transfer service can also move data from AWS pretty easily. I agree though, BigQuery is astonishingly good and makes both Snowflake and Redshift look like dinosaurs imo.


Throw a dart at the Apache project list :-)

My god do we need an atlas of database related Apache projects.

It's almost as bad as java web frameworks about ten years ago.

Everyone can do everything and it's hard to know what is better for what.


Add to this “Apache Streaming projects”.

I get that projects can be donated to Apache from disparate sources, but my god it’s still a disaster.


I'm just surprised that in 2021 BigQuery isn't more popular. I thought it would be top 10 by now, I moved to GCP because of it but feel like I'm the only one.


Because BQ is great for the ETL/data warehouse/BI use case but is terrible for online applications. I tried using BQ as the backing store for an online analytics application back in late 2019, and it was so much worse than using Clickhouse/Druid/Pinot for the same use case. IDK how much that has changed since, but I'm not too terribly surprised that it isn't higher.


I'm not on GCP but the one I would like to try is Spanner or Cloud Spanner.

I think more scalable systems will continue to gain market share. It will be interesting to see if PlanetScale, CockroachDb or some other actually becomes a big player.


Bigquery seems like a tool for very large but static datasets. I also had a hard time figuring out the pricing so other than some test queries I moved on to other solutions.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: