Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Never write a database, even if you want to, even if you think you should (twitter.com/mipsytipsy)
48 points by mooreds on Sept 30, 2023 | hide | past | favorite | 52 comments



Don’t take Twitter advice… this post does nothing but discourage play and experimentation.


The tweet is part of a conversation about building a database for your business, it has nothing to do with play and experimentation for the sake of learning


Except it's a discussion where the author is in reality bragging about how their product is only viable because they wrote their own database rather than using an off the shelf one, and then keeps repeating this "don't write a database" bit in every other tweet as a lame joke.


Link is not working for me. Title uses word `never`.


Michał Niczyporuk @mihn

So when talk "Never write a database" is coming? ;)

Charity Majors @mipsytipsy

That's it. That's the talk.

"Never write a database. Even if you want to, even if you think you should. Resist. Never write a database. Unless you have to write a database. But you don't."

I will present this talk at any conference of your choosing.


It's just a twitter take, overcompressed and lacking nuance. By all means write a toy database for learning purposes, but in 99+% of cases you should probably then turn around and use the ones with tens of thousands of engineering hours poured into their correctness properties when you want to put something into production.


I think it's pretty obvious that they're discouraging the impulse to implement a database for your job, not for just playing around and learning.


The key to making breakthroughs in knowledge - either personal or societally - is to choose your battles to a degree.

There is simply too much to master, it’s a very good skill to be able to say “this is fascinating but too far afield of what I need to do”.

Alternately, choosing when to say “after dipping my toe in the water this clicks for me, I’m going to take a risk and go further” is an even harder skill to master.

This is why many people have trouble finishing a PhD dissertation. Play is a necessary step, but play without focus is a ticket to mediocrity.

Databases in particular are something one will run across in nearly any task and 99:100 times one likely shouldn’t dwell too much on what is inside the magic box.


Important to know the context is that at Honeycomb, a custom database was indeed written: https://twitter.com/mipsytipsy/status/1706725568094650797


Every database you've ever benefited from was started by someone who didn't heed this advice!

That said, it's a hard job to do right and there are now databases of every different size made by people who are just much better than me.


"Always write a database. Even if you don't want to, even if you think you shouldn't."

Sometimes, naivety is a good helper when it comes to learning - too much knowledge of the task ahead can be demotivating. No one was born a professional developer either, and only by encountering and considering problems as they occur can one truly understand why certain decisions were made in the past and stuck around. Also, many great projects started as hobby projects. And not all projects have to become fully functional, or correct, or ready for production.


I don't think most people would disagree with that. If you want to build a database, for curiosity, or to solve a particular problem, or just because it seems fun, go ahead. But know that making a real production database tends to be harder than it looks, and so tends to be all-consuming, and a team that depends on a custom database tends to become a database team rather than an anything else team. That could be the right thing - after all building databases as an explicit goal has lead to a lot of very good businesses - but you need to go into it with your eyes open. Building a production-quality database is a long road.

The same is true of some other areas, like durable storage systems, filesystems, and compute isolation/virtualization systems. In all these cases going from zero to "undergrad quality demo" is doable in a couple days. Going from demo to running something in production is vastly harder and more expensive and requires a completely different mind set. Building the "undergrad quality demo" is a great exercise I'd recommend to anybody curious about these areas. The road to production, however, is a tougher one.

Source: I build and maintain databases for a living at AWS. One of my previous side projects (https://www.usenix.org/conference/nsdi20/presentation/brooke...) turned out to be bigger than expected and has a whole team dedicated to it. That one (Physalia) is even quite far from being a general database (for example it doesn't support SQL or cross-shard transactions).


> But know that making a real production database tends to be harder than it looks, and so tends to be all-consuming, and a team that depends on a custom database tends to become a database team rather than an anything else team.

The original statement does not include any caveat or constraint, and is an absolute statement that does not depend on specifics. The statement is not "Never write a custom database as a subproject of any other project". The original statement is "Never write a database." Like in any conceivable scenario, including a project whose main goal is to write a database.

I also feel that this sort of clickbait advice contrasts heavily with the history of SQLite.


It is a statement directed at a readership of — and therefore assuming it will only be read by — application developers, rather than systems programmers; and so the context of "Never write a custom database as a subproject of any other project" is implicit.

Also, you're reacting to an incomplete quote. The tweet is:

> "Never write a database. Even if you want to, even if you think you should. Resist. Never write a database. Unless you have to write a database. But you don't."

There are precisely two cases where you "have to write a database":

1. when you want to learn how databases are architected, and educational material on the ground is wholly inadequate (which it is!); and

2. when you discover, under a production workload, that no existing DBMS sits in exactly the right part of DBMS configuration-space to meet your scaling needs — and nor can any existing DBMS even be modified to reach that point in DBMS configuration-space — and nor can you modify the design of your application to scale in a different way; such that the only possible way to address your scaling challenges is to write an entirely novel DBMS with an entirely novel architecture.

I can only think of two times #2 has happened in recent memory: Amazon with Dynamo (the core of S3 and DynamoDB); and Google with Dremel (the core of BigQuery.) Both projects have resulted in many research papers, and later copycat DBMSes (Dynamo → Cassandra, Riak, BigTable, etc; Dremel → Redshift, Snowflake, etc.) If you don't think your DB architecture is novel enough to result in that kind of response, then it's probably not something that requires "writing a database" — it's probably instead something you can do as a hack on top of some existing database. (Think: the way Citus/Timescale/Greenplum build on Postgres.)


>Unless you have to write a database. But you don't.

The quote you quoted says you don't need to write a database, so if you want to agree, you also have to agree that Amazon and Google didn't need to write databases.

And learning would seem to be covered by the "want to" or "should"


Again, pedantry: Amazon and Google aren't people; they aren't reading advice on the Internet. So the "you" in this statement will never apply to them. It only applies to individuals reading the statement. And, indeed, "you" do not need to write a database. Some company might very well need to write a database; and therefore coerce one or more of its employees into writing a database. But, like the statement says, the employees should resist. Very likely the company is wrong about needing to write a database, and resistance will help it discover that.

However, every once in a while, even after everyone resists for months/years, it will turn out that the company really does need to write a database, and so — still mostly against their will — the individuals employed there will set out to write a database. (And once they've "written a database", and it turned out to actually be the best course of action for the company, then the advice no longer applies — because they're no longer considering "writing a database" at that point, but rather maintaining an existing database, that they happened to be the creators of.)

> And learning would seem to be covered by the "want to" or "should"

Instrumental vs. terminal goals. If I want to do X, then I need to do Y to achieve X. If I want to learn how databases work, then I need to write a database, because there's no other good way.


>If I want to do X, then I need to do Y to achieve X.

And you accuse me of pedantry? This then comes under need then?!?

>Amazon and Google aren't people

No, but the people within them make decisions, ie decisions to write a database. So the 'you' would be directed at the person in the organisation who decided to write the database.


> And you accuse me of pedantry?

...no? I was saying that my own response is pedantry, rather than being an argument that tries to address the spirit of your argument.

> This then comes under need then?!?

Yes. The definition of "need" is "something that some higher-level goal cannot be achieved without." Ultimately, at the terminal level, everything is a preference; you want to continue living, companies want to not go bankrupt, etc. Everything required to achieve those terminal preferences, are needs. You need to eat if you want to live. A company needs to make money if it wants to avoid bankruptcy. Etc.

> No, but the people within them make decisions, ie decisions to write a database. So the 'you' would be directed at the person in the organisation who decided to write the database.

In a bigcorp, ultimately, a manager or business analyst decided that someone else should write them a database. Nobody decided to, themselves, write a database.


The tweet doesn’t but the context of the tweet is a conversation about building a custom database to power a business.


Well, the company I worked at did, and it worked out just fine. Not even that many people involved, like 3-4, the main architect and coder had never done a DB before (but is a genius anyway). For the scope and within that domain (not relational, heavy writes), the performance smokes anything out there. In production for over a decade now.


Yes, the only way to truly learn that you should never write a database is to write a database. In fact, I bet that is what the author of the tweet did to learn that they should never write one.

I wrote a blog post about this: https://jacobgw.com/blog/observation/2023/07/08/shoot-yourse...


Good ear. She is cofounder of a company that built their own “distributed column store”: https://youtu.be/tr2KcekX2kk?si=ID0qB-O2ucXF4GXC


This is like the classic advice on optimization.

    The First Rule of Program Optimization: Don't do it.
    The Second Rule of Program Optimization (for experts only!): Don't do it yet.
It is because I know what a database actually does for me, and because I've had to write parts of that for myself, that I deeply understand why I don't want to write one. And why I want to push those problems to a database if I can.

Of course you should ignore my advice if you're named Jeff Dean. Look through https://research.google/people/jeff/. Among his other projects, look at BigTable, LevelDB and Spanner. All are databases, implemented for good reasons.


Or if you're Rich Hickey (Datomic).


From an engineering perspective you should not think you can solve whatever issue you have by writing your own database. Writing one is one of the harder tasks in computer science.

You should write a database system if you want to, for learning, fun, to try out a lot of different approaches, and see how and why things are solved in other DBMS systems.

You can even try using it on small fun projects.

Dont ever think of putting into production on something important. It will come back and haunt you in the most undesirable ways.


Back in the 1990s I worked at a financial services company, all our back-office code used key/value (index files) storage. I don't remember ever doing any kind of explicit record locking for updates, it was always just "last write wins" and we never seemed to have much of an issue with it. It was early in my career so there was a big component of not knowing what I didn't know at that time.


... unless you're a developer. Then go right ahead -- you'll learn a lot doing it, and become a better developer in the process.


It's like the saying about game engines. You should either set out to develop a database or you should develop an app that needs a database. If you try and do both at once you will never finish or you will end up with a half baked database.


My hope is that managed data layers become more available and the problem space of "custom data structures and indexes" can be decoupled from the even harder parts of writing a database.

Projects like pgrx (postgres extensions in rust) are starting to make it much more approachable to do so.

A little lower level: build on FoundationDB. (Projects like mvsqlite are super interesting here).

In both cases, we don't yet have fully managed providers. RDS seems to be flirting with opening up the extension ecosystem with trusted languages. I haven't seen any managed foundationdb offerings though.


It is remarkable how approximately 50% of the things Charity Majors says resonate so well with me … and how the other 50% irritate the ever-loving far out of me.

I think what she means here is “nobody ELSE should write a database,” because they certainly found it necessary and beneficial to do so.

(In all seriousness, usually you shouldn’t write a database. But somebody has to, you know? There’s no need to gate-keep like that.)


I don't know the history / timeline of Honeycomb, but was FoundationDB available open source when they decided to write their own DB?

If I ever would find myself walking into building-my-own-database-territory, I would probably first try to reach for FoundationDB and see if that could work.

I've just played around with it, but it seems like a great tool for building high performant and robust distributed databases on top of.


According to GitHub, no, but it may have been available before then. Regardless, the design of Honeycomb's database(store?) aligns more with an analytical workload, which FoundationDB lists as a non-goal. It may not have been a good fit then or even now.


Of course, in production, it is really hard to find a use case that isn’t served ‘well enough’ by existing database technology.

But: almost ever should you say never.

What better way to learn indexing tradeoffs than to write your own database?

What existed before Redis?

What existed before LevelDB?

What existed before Git?

Sure, write a database if you want. But be humble about the effort involved and reliability you’ll get.


BerkeleyDB existed before Redis, LevelDB, and Git and was widely used as a database. SVN originally used BDB as its storage layer (but switched when people complained it was too hard to build BDB). https://en.wikipedia.org/wiki/Berkeley_DB contains people-decades of hard-won experience on how to build a simple database system. I would consider SQLite, itself 20+ years old now, the intellectual heir of BDB.

It was as true then as it was today: very few people should write databases for production.


Yes, thanks: the history is fun and instructive.

My questions were rhetorical, BTW: emphasizing that progress happens when what exists now doesn't stop someone from making something different and maybe better.

We're probably not disagreeing. To summarize:

- practically, given real-world constraints and risk tolerances, yes, very few people should write databases for production. (Except villains. They should expend tremendous energy and effort writing bespoke artisanal databases, resulting in less time to be evil.)

- educationally, there is a lot to be gained by writing your own, even if only a small part [1], such as something around indexing, a write-ahead-log, a query planner, or even a query parser [2]

- if everyone thought e.g. 'writing a new database isn't worth it', collectively we'd be in a worse place

FWIW, I have a soft place in my heart for all the great projects that come out of Berkeley.

Note 1: Let me propose a game for a group of people. One person names some narrow aspect of a database, perhaps in the hopes that it is boring, trivial, or "solved". (Perhaps "autoincrementing indexes" for example.) Then the other people to talk about all the ways that the thing is actually quite hard, interesting, non-obvious, and maybe (?) even a good research area.

Note 2: It is nice that SQL is declarative. AND there is room for improvement, starting with composability. See https://news.ycombinator.com/item?id=24730713


BerkeleyDB also improved on things like dbm¹, ndbm, qdbm and cdb.

> I would consider SQLite, itself 20+ years old now, the intellectual heir of BDB.

Indeed. I've worked on projects that only use the btree part of SQLite. If all you need is a btree it's a good place to start.

1- https://en.wikipedia.org/wiki/DBM_(computing)


"When working with large amounts of data, steal as many ideas from databases as you can"

They have a lot of really simple, good ideas that can make your life a lot easier. Attempting to work on data intensive applications without stealing ideas from databases can make your life much harder.


Once you understand a consensus algorithm like raft, you kinda of have no choice but to write a database.


I tried this once at work, because it seems like such a trivial task. Spoiler: it wasn’t.

Having said that, I learned a lot from it. So I wouldn’t say don’t do it. You’ll probably learn from it. Just don’t expect to make the next MongoDB or MySQL :)


Always take commands about what to work on / not work on from random Xitterers.


I’ve often wanted a ‘database construction set’ in that I want a particular set of features and data structures and would be willing to throw out everything I don’t need for performance.


Writing my own database (which ended in a wreck) was a great learning experience. It helped me understand the problem space better.


The Xeet seemed to be against writing an engine, not an application.

Are we not really integrators of prior art, for the most part?


Why would you? Just use Excel as a database. Unless you have big data, then maybe Postrgresql.


what does "writing a database" mean?


Writing software to store data, with retrieval and querying APIs, not using any existing database system.


Building/developing/creating a https://en.wikipedia.org/wiki/Database [management system].


of course. but i would differentiate between "writing a new database engine" and "designing a database to be implemented using an existing engine (e.g. oracle)"


I updated the reply.


Does throwing a bunch of csv's in a data store count and using them as a persistent store of knowledge count? If so, I am very guilty.


I've always wanted to write a database


Do maps count?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: