Hacker News new | past | comments | ask | show | jobs | submit login
The Untold Story of SQLite (corecursive.com)
548 points by signa11 on July 3, 2021 | hide | past | favorite | 95 comments



Host Here. Thanks for sharing.

Richard is a great story teller and adventures that SQLite has taken him on pretty interesting. I think he in on hn as well ( @SQLite).

One interesting thing I found about how SQLite is developed is the amount of testing that is done. Between unit tests and parameterized tests and fuzzer tests they are doing billions of tests per release and spending days running them before each release. I know I don't like to wait for a long CI build before I deploy a service but something like SQLite needs whole different category of resilience.


That part really made me laugh. A few months ago, i developed a wrapper in swift for sqlite on a project. A coworker of mine had a bug on what sqlite returned, and i couldn't find anything wrong. He told me "it has to be a bug on sqlite", and i said "no, sqlite has no bugs… but i don't really understand what's going on". And so we posted a bug report on the sqlite forums. I told my friend "if you found a bug on sqlite, on your first try, i'll buy you a bottle of champaign".

Of course, it was a bug with my wrapper (included a \0 char at the end of strings).

I just forwarded him the link to your podcast, saying "we really had no chance"


SQLite is a great way to find bugs in your fuzzer: if your fuzzer finds bug, it’s time to debug your fuzzer!


I enjoyed this episode. I knew about all the tests before, but what came clear in this interview is that Sqlite inc is using their tests as a unique way to keep contracts on Sqlite development flowing. Since they can test it so much better than anyone else, it would be crazy not to hire them, rather than hiring someone else to develop a patch.

While they're keeping those tests proprietary, and relatedly have a closed development model, this strikes me as still better than typical open core strategies for monetizing free software, or at least pointing in better directions.

Agbell, since you're here, could I ask you add a standard link to the RSS feed for your podcast?

<link rel="alternate" title="Corecursive" type="application/rss+xml" href="https://link.chtbl.com/corecursive?platform=rss" />


> Sqlite inc

Minor nitpick: technically, the primary company that supports SQLite is an incorporated partnership named Hipp [ed note: yes, that's Mr. Richard Hipp], Wyrick [ed note: Hipp's wife] & Company, Inc. dba Hwaci (https://www.hwaci.com/).


Will do!


Done, joeyh.


Thanks, that really is a great conversation.

I'm going to have to dig to see if he describes somewhere why Fossil is such a great fit for SQLite.


Couple things:

* immutable

* built in first-class ticketing system

Happens to use SQLite as its storage engine too (SQLInception?), which brings at least a couple (off the top of my head) features:

* SQL is used to query things, which is very powerful

* the repository is a single SQLite file. Easy to store, copy, and…

* one can checkout multiple working copies into various directories. Or, in other words, I can have three discrete, independent workflows happening simultaneously, but only one copy of the repository backing them all.

It’s also got a built-in server that you can use locally or publish for remote people to access via their web browser that’s essentially GitHub-in-a-box.

Also… (kidding. I’m just going to stop here).

Edit: formatting


I've migrated to git for my personal projects, mainly because magit (emacs frontend for git) is awesome.

But fossil is a pleasure to work with, especially on Windows, because a) it comes as a single executable that one can just copy to another machine, and b) it does not require ssh or anything else to be installed.

The web interface is amazing, especially considering the size of the executable file.


Fossil have much better command line IMHO. I even started git wrapper to get rid of it's cryptic syntax.

[1]: https://github.com/severak/ginger


Regarding the multiple working copies feature, sounds the same as Git worktrees.


I took a quick scan and: it might be analogous, yes. Not sure if any subtlties apply.

The git workflow in the docs (https://git-scm.com/docs/git-worktree) looks really clunky.

Here’s what I do in fossil:

$ cd ~/work/project_x

$ fossil open ../fossils/projx.fsl (Edit, edit, edit)

$ fossil ci -m ‘descriptive msg’

(Get an idea, but for whatever reason do not want to disturb this directory)

$ mkdir ~/work/radical_idea

$ cd ~/work/radical_idea

$ fossil open ../fossils/projx.fsl <*note here there is no sense of this being an inferior or “sub” checkout*> (Edit, edit, edit)

$fossil ci -m “cool idea” —branch cool_idea # commit work to new branch

(Test..)

$ fossil co trunk #switch branch back to trunk

$ fossil merge cool_idea #merges our work from cool_idea branch into trunk branch

$ fossil ci -m ‘merge [cool_idea] for radical new feature’

$ fossil close # this working dir syncs everything necessary to the repo (../fossils/projx.fsl) and is now essentially “offline”

$ cd ..; rm -fr radical_idea

$ cd project_x (Note: this working directory now not a tip of trunk anymore…)

$ fossil update # now up to speed (Continue hacking…)


It looks to me like the main difference is that with Git you can't have multiple worktrees on the same branch. Apropos of your comment "Note: this working directory now not a tip of trunk anymore…", to Git being on the tip of a branch is a thing of note, so it doesn't ever want to accidentally invalidate another worktree's notes about being on the tip of a branch.

In Git, that'd be:

    $ cd ~/work/fossils/projx.git

    $ git worktree add ~/work/project_x master # add a worktree with the 'master' branch checked out

    $ cd ~/work/project_x
(Edit, edit, edit)

    $ git commit -m 'descriptive msg'
(Get an idea, but for whatever reason do not want to disturb this directory)

    $ git worktree add ~/work/radical_idea -b cool_idea # add a worktree with a new 'cool_idea' branch

    $ cd ~/work/radical_idea
(Edit, edit, edit)

    $ git commit -m "cool idea"
(Test..)

    $ git switch master # whoops, can't do this
    fatal: 'master' is already checked out at '~/work/project_x'

    $ # let's just do the merge in the other worktree, since we're about to switch anyway

    $ cd ~/work/project_x

    $ git worktree remove ~/work/radical_idea # or: rm -rf ~/work/radical_idea && git worktree prune

    $ git merge cool_idea
(Continue hacking…)


> <note here there is no sense of this being an inferior or “sub” checkout>

There is not in worktrees either. You can use worktrees with a bare repository for instance. It's just that for backwards compatibility and the convenience of most users a non-bare clone creates a default worktree. That's exactly what you see if you `git clone` a repository then `git worktree list`, a worktree corresponding to the working copy.


Seeing this makes me want to try out fossil now.


Here is a summary of similarities and differences on Git and Fossil functionalities: https://fossil-scm.org/home/doc/trunk/www/gitusers.md

Also quite interesting the general comparison: https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki

Both sites are written from Fossil guys, so they may not be 100% neutral.


Very good podcast, thank you. What I personally missed is a question about the future plans for SQLite. Also about version 4.


Really enjoyed this. Is it possible to subscribe using an email address? The subscribe page seems to ask for a bunch of apps (Apple Podcast etc) that I don't use. Would be great to just receive an email and listen on a browser


If you scroll to the bottom of the page and sign up for the newsletter, I will email you whenever there is a new episode out.

The newsletter will come out once a month when there is a new episode and has a bit of personal take on the episode and maybe some links to other things I've been up to.

If you just want a pure notification, I'm sure you could hook the RSS feed up to an RSS to email service. Thanks for listening!


Great podcast! I have been following it for a while. Can I ask how ended up starting it?


CoRecursive started because I did some episodes as a host on software engineering daily and it was lots of fun so I decided to start one of my own. The first 4 episodes were originally on software engineering daily.

Originally the idea was to give some platform to the areas of computing I was interested in, heavy use of types and functional programming but I think I really found my place as a podcast when I started to focus on peoples stories. So rather than tell me about SQLite, it is tell me about the story behind creating SQLite.


You are so polite, I am impressed with your ability not to push back on the interviewees. In this episode I would have asked "would sqlite really be worse today if you hadn't done your own VCS.". But I think I wouldn't be half as successful as you, people don't wanna be implicitly criticised by their interviewer :P

You are doing so great, favorite podcast by far!


> would sqlite really be worse today if you hadn't done your own VCS.

Yes. Fossil is not only the VCS for SQLite, Fossil is also built around SQLite. SQLite is a core component of Fossil. Thus, when I am working on Fossil, I am forced to interact with SQLite as a "user" instead of as a "developer". In geek-speak, it forces me to "eat my own dog food". This, in turn, prompts me to add needed features to SQLite and more generally to make the SQLite interfaces friendlier to application-developers.

One recent example: SQLite version 3.34.0 added the ability to include two or more recursive terms in a Recursive Common Table Expression. (See item 2 in https://www.sqlite.org/releaselog/3_34_0.html and subsequent links.) This feature was added specifically so that I could more easily write SQL statements that would walk the Fossil version history DAG, as described by the https://www.sqlite.org/lang_with.html#rcex3 link.

I did not develop Fossil with this "dogfooding" idea in mind. It was an unanticipated benefit of Fossil. But in the end, I think it might have been the most important benefit of using Fossil instead of some other VCS.


I'm glad you like the podcast! I do push back when something doesn't sound right. I agree though that I could have asked more about fossil. I really didn't look into fossil that much when I prepared for the interview so it was a bit of a blind spot for me.

Not included in the interview is a discussion we had about types. I really think SQLite should be stricter in enforcement of types and Richard doesn't buy it. I may include it in a future bonus episode but the reason I cut it was it didn't really include any details about story of SQLite, it was just two people disagreeing about types. I've done a lot of episodes about types in the past and I wanted this to be about build SQLite, and not about types.


I was wondering exactly about this. I, for one, would be highly interested in hearing this discussion!


I quite disagree. It's boring when they just say their talking points. The guests are experts in their field, they can handle some pushback and it would be much more educational for the rest of us to hear the answers to the questions we're really wondering about.

It doesn't have to be "implicit criticism", but an open mind doesn't blindly accept stuff, ask for justification.


The justification is where the story is, the tension and relief. This is where you really learn from the interview.


I was wondering why the episodes on the corecursive feed start at number 5.

I'll check your hosted episodes on Software Engineering Daily now!


Thanks, I have interviews on software engineering radio as well: https://www.se-radio.net/team/adam-gordon-bell/


He is on HN. It was a great segment.


Thanks for listening! I'm glad you enjoyed it.


"I'm going to write my own mail server. I was making notes on that even as we were setting up this call. That's a big problem, and that's at least as difficult if not more difficult than writing a database engine, but I don't want to be beholden to Gmail. I don't want them controlling my destiny. I don't want them controlling the record of all of my conversations. I want to control that myself, and so I'm going to go through a lot of pain and a lot of work and a lot of effort to come up with some solution that I can control myself. I can go out and lease a virtual machine out there in the cloud and run it myself and not depend on a third party to control my email."


Does anybody have any insight into " at least as difficult if not more difficult than writing a database engine"? It would seem to me that a mail server would be much easier than a database engine.


It might be difficult on a different level.

Dr.Hipp mentioned GMail bouncing email sent to him - but hosting email yourself, will just move you to a different side of the bounce - now GMail will be bouncing emails sent by you! See, for example, first comments on the recent HN thread here: https://news.ycombinator.com/item?id=27707857 (ctrl+f for "pain in the" if it's not the first one).

So it looks like convincing GMail to respect your small server is a non-technical problem, which might be an issue for a person with "only engineering" mindset. On the other side, Dr.Hipp looks like a right person to try to convince GMail to respect smaller mailservers!


My own personal experience (with a server that has SPF/DKIM/DMARC all set up, and not listed on public RBLs) has been that GMail, Hotmail, Comcast, and Charter have not been problems.

On the other hand, AT&T is most definitely a problem. They run their own internal RBL, and the best I can guess is that they're blocking Linode. Any mail I send to an AT&T customer gets bounced, and forwarding that bounce to the address they indicate it should be sent to to get delisted accomplishes nothing.


I have found that many RBL's block whole IP blocks owned by various vendors. I believe this is why Linode and others like Digital Ocean now block port 25 on VPSs and make you beg for them to open it. Too many spammers.

This is also a good reason to keep a VPS even if you aren't using it at the moment: it keeps the IP "clean" so that when you put it back into use, it is useful, as opposed to (possibly) some random IP that up until a month ago was spammer central.


> This is also a good reason to keep a VPS even if you aren't using it at the moment: it keeps the IP "clean" so that when you put it back into use, it is useful, as opposed to (possibly) some random IP that up until a month ago was spammer central.

That won't protect you from the block getting on a blacklist though.


No, it won't. But it's better than trying to send mail from an IP that a month ago was hosting phishing sites.


I once wrote some production email handling code. It is amazing email works at all given how poorly so many servers and clients stick to the spec. It was really eye opening. Email is more duct tape and bailing wire than I’ve ever seen anywhere else.


>I can go out and lease a virtual machine out there in the cloud and run it myself and not depend on a third party to control my email.

Emphasis here?


Yeah.

Counter-argument moving vms to different providers, ips etc is a lot more straightforward so maybe they don't "control" your mail..?"

And a third way of looking at is every mail has two endpoints. You don't control the server of who sent it to you or who you send it to. Google does. So they still control your mail..?


> he managed to segfault every single database engine he tried, including SQLite, except for Postgres. Postgres always ran and gave the correct answer. We were never able to find a fault in that. The Postgres people tell me that we just weren’t trying hard enough. It is possible to fault Postgres, but we were very impressed.

This is impressive for Postgres.

SQLite is incredible. I've used it over the years in a variety of important projects.


I remember in the early 2000s, I just got started with SQL and one of my reaction was "SQL is great, but why do we need a server? Why not use a simple library and files?" So I asked my coworkers if such a thing existed. And I had a lot of reactions along the lines of "it is stupid why do you want that? Just install a server on localhost" few people were convinced. It wasn't even a case of "nice idea, someone else should do it it", I think people just associated SQL with servers and by being new to it, I didn't have that preconceived idea.

A couple of years later sqlite was everywhere. It was exactly what I had in mind (but with nowhere enough skills to do it myself).


SQLite is the one technology that saved our software stack.

Back in 2016 we were struggling with management of separately-hosted SQL Server instances (network, firewall, user accounts, license, explaining shit to customers, etc.) Installing our software in a new customer environment would take days because of all of the extra mechanics involved. There were 8+ other windows services as well.

Fast forward to 2019 - We now use SQLite embedded inside the application instance. The entire system has been collapsed into a single binary distribution & service. Deploying our software to a completely blank windows server involves emailing the customer a ~120 megabyte zip file, asking them to extract it to C:\myapp and run sc.exe install.

This shift in how we manage persistence of business data allowed for us to salvage an otherwise completely lost ship. Our team is still incredibly small (<10 employees), and we are responsible for managing hundreds of customer environments. If we didn't have a reliable, self-contained persistence mechanism that "just works", we wouldn't have time to do anything else of value.


I remember installing MythTV around 2003. They very strongly insisted on backing it with MySQL, and it make it a lot harder to set up and maintain. Unless a DB is serving multiple hosts, something like SQLite really is the way to go. It also makes it halfway easy to have a simple, single-host setup, along with a path to a multi-host setup, since it's SQL (though you still have to real with dialect quirks).


> ...involves emailing the customer a ~120 megabyte zip file...

Hehe, nice test for their email servers & antivirus (which probably wants to unpack it to see if any nasty payload is included) :)


Good catch. To clarify - we email a link to the distribution (somewhere in an AWS S3 bucket), not attach the actual zip file. This would clearly be a nightmare for 99% of email providers.


Aha, that sounds a lot better, thx - indeed I was wondering if nowadays common max email size increased a lot since I set up my private email server ~10 years ago... :)


The default maximum size for Postfix is still 10 Megabytes: http://www.postfix.org/postconf.5.html#message_size_limit


SQLite is one of the finest pieces of software ever made. It’s one of those quiet innovations in the world that has multiplied the advancement of society and the chattering classes have absolutely no idea. I hope this story gets told more often, to more people. Kudos to Dr. Hipp!


He says he's going to write his own mail server. I use SQLite all the time, and I'm going to try Fossil when I get a bit of time to dig into it. I will definitely give his mail system a look when it's out.


I use SQLite pretty frequently. I switched to Fossil (from git). I'm happy. The overall architecture of having your repo be a single binary stored in one location, and your working directories being just working directories and not repos themselves is a breath of fresh air. Branching seems a little different, but my projects aren't big enough scope to have encountered any hiccups.

I did it mostly for the built in project management (wiki, bug tracking), and to get away from the big SaaS environments. But the sqlite binary file repo sold me on the architecture quickly.

Dr. Hipp is a fantastic software engineer and product designer. Both SQLite and Fossil do what they need, have only the bells and whistles needed, and have clean, consistent designs. I've browsed the source for SQLite before, and it's also clean, legible, and quality.


For those who liked the podcast, I would also recommend this one:

https://softwareengineeringdaily.com/2015/11/13/sqlite-with-...

Dr. Richard Hipp is truly an inspiration.


> Richard:It involves doing without. Here’s warning to all your listeners whenever a politician comes to you and says, “Oh, we’re going to take care of this problem for you.” What they’re really saying is, we’re going to take away some of your freedoms. The exercise for you, here, is to figure out what freedoms they’re going to take away in order to solve this problem, and they’re often well-disguised, but yeah, if you want to be free, that means doing things yourself.


The story of SQlite was told by Dr Hipp himself a few years ago. Great talk https://youtu.be/Jib2AmRb_rk


Thanks for that video.

Interesting to hear that SQLite is supported on the A350 for life service life of the design.

And Microsoft contracting for changes for Windows 10 bundling? That's shocking, considering how many small databases they have developed or purchased over the years.


That was more like a marketing speech.


No


What a lovely treat — and how fascinating to learn about the hidden role that DO—178B has played in this most ubiquitous of programs!


It makes me want to read DO—178B.

Hmmm, worldcat says the nearest library with a copy is several hundred miles away. I wonder if I can ILL it...


> The 100% MCD [modified decision/condition coverage] tests, that’s called TH3. That’s proprietary. I had the idea that we would sell those tests to avionics manufacturers and make money that way. We’ve sold exactly zero copies of that so that didn’t really work out.

That'd be an awesome contribution to open source, particularly at a time that there's a lot of collective awareness of how hard it is to secure C programs.

(There's probably still a business reason to keep it proprietary: it's a reason for people to go to them for changes, at least if they want the final product to be thoroughly tested. Alas, 'cause it would be cool to see.)

It's funny that he references preppers, survivalists, and dependency on society near the end, when talking about writing all of your own software dependencies. A while back someone thinking about it from the other end (i.e. saying we generally should accept some dependencies) said almost exactly the same thing in thread at https://twitter.com/hdevalence/status/1214234305754329088 . I draw no deep insight from this, but fun to see the same analogy from two very different perspectives. (Kind of closing the circle, the "let's be real about dependencies" article linked, https://wiki.alopex.li/LetsBeRealAboutDependencies , mentions in passing "SQLite is basically everywhere (for good reason)".)


Maybe a bit off-topic, but I noticed that https://www.sqlite.org/cpu.html hasn't been updated in over two years and given the consistently high development speed I'd be curious how the "CPU Usage" deals with all the new features. :-)


> You couldn’t just Google how to build a database

> Fortunately, I didn’t know any experts and so I did it anyway

He obviously lives by a certain ethos of self-reliance so not sure he would have talked to experts if could.

But makes me wonder: given it so easy now to find the experts compared to 2000, to google what they think - it is easier to get discouraged?


The question is: If he had asked experts early on, would he have ended up less self reliant? And would we have gotten an SQLite as good as it is now or would he have been told "oh that's too complex" and be discouraged by experts? :)


I’d love it if Dr Hipp created a client/server db. He could call it SQLMajor instead of SQLite :)


fun fact: the `ite` ending in SQLite is a reference to the `ite` ending in minerals (granite, magnetite, pyrite, bauxite...) so it is not a lightweight SQL but rather a rock-stable SQL


I don't care if this is canon or not, I like it enough that this is mine now and I'm using it


you're going to mine some sqlite?


So have we all been saying it with the wrong emphasis?

Sequelite vs Sequel-lite

How does creator pronounce it in podcast?


He pronounces it "ess-que-el ite".


Brevity usually wins, and I'll go with "sclite". Ungainly, but so is (say) "sclerotic".


Aight


or you can use postgresql


Also to note : The fossil project contains a makeheaders.c file that can be compiled as an executable. What it does is that it scans all the .c files and the .h files you give it and generates the necessary headers for each file. I am surprised it's the only simple solution I could find for this purpose. I'm grateful not to have to update .h files any more. Check out https://www.hwaci.com/sw/mkhdr/


Sounds like the old makedepend program.


Never heard of it. The only other thing I found was something called lzz but it seemed more complicated and C++ oriented. Looking up makedepend taught me that it had been superseded by Gnu Automake... Which seems to be a big machinery and another rabbit hole I am not sure I want to go into : I've already had too much fights with makefiles already. I really wonder how most C programmers do.


> DDG-79 Oscar Austin. That is a battleship

No, it is a destroyer, see https://www.surflant.usff.navy.mil/ddg79/. Other than Hollywood, no one had built a battleship since around the end of WWII.


Time I got myself a copy of this Art of Computer Programming book. See if it can make me a better coder.


It's interesting that when I interviewed him, he has all these built-it book cases behind him in his office. The idea of not needing the internet but just occasionally grabbing a book from behind you to look something up when developing software is very appealing to me.

When I interviewed Brian Kernighan one of his complaints was how so much of modern programming is just looking things up rather than building things and I think that Richard has escaped that fate by building things himself.


Before the internet, this is what we did. Read documentation and books. Now it's just a matter of applying Google-fu. I worry that this could be stunting our problem solving skills.


I really enjoyed those parts of each interview. They are very well done.


Any good recommendations for books on SQLite?

I’ve never spent much time with SQLite directly. My main interaction has been through Django and the ORM layer.

My day job uses a custom object database. While I’m familiar with SQL and relational DB concepts I haven’t worked heavily with them.


The SQLite website itself is pretty well written.


If you know SQL, all you got to do is understand the specific quirks of the database you use (e.g. In SQLite, specifying a data type for a column acts like a "hint" rather than a real constraint). The SQLite website has a pretty thorough manual that should more than suffice for this.


I had the opportunity to interact with Dr. Richard Hipp earlier in my career. He’s a wonderful human being.


Bizarre, listening to the episode and then open this here!


Thanks Adam, I love the podcast!


So what are the advantages of using DB like mysql, postgre?


Take a look at the link below, scroll down to the headline “Situations Where A Client/Server RDBMS May Work Better”.

https://sqlite.org/whentouse.html


At "Ask HN: Do you self-host your database?"

https://news.ycombinator.com/item?id=27671376

is this comment:

https://news.ycombinator.com/item?id=27673359

"We're currently transitioning from a multi-tenant 2TB postgres DB hosted on AWS RDS to using sqlite instead, a separate database for each client.

We're doing this for multiple reasons:

a) As our DB grew the service became very expensive, one of the biggest items in our AWS invoice;

b) Keeping the PG servers up to date is a pain, we simply don't have time for this;

c) We wanted to be able to migrate to other clouds and even be able to offer a self-hosted version of our platform."


SQLite was created, because client-server database Informix (similar to MySQL and PosgreSQL) did crash and and was unavailable to connect to. There are some edge cases where MySQL/PostgreSQL can start vacuum, be corrupted, etc making client/server database unavailable. SQLite is tested with billions of test cases.

In interview:

"The idea for SQLite actually came out of his frustrations with an existing database called Informix that was installed on a literal battleship."

"Why do we even need a server? Why can’t I pull this directly off the disk drive? That way if the computer is healthy enough, it can run our application at all, we don’t have dependencies that can fail and cause us to fail, and I looked around and there were no SQL database engines that would do that, and one of the guys I was working with says, “Richard, why don’t you just write one?” “Okay, I’ll give it a try.” "


access management, and scalability: can better do simultaneous writes, and use several copies horizontally

BUT: you probably need 1k+ or 10k+ requests per second to start getting advantages


> Here’s warning to all your listeners whenever a politician comes to you and says, “Oh, we’re going to take care of this problem for you.” What they’re really saying is, we’re going to take away some of your freedoms. The exercise for you, here, is to figure out what freedoms they’re going to take away in order to solve this problem, and they’re often well-disguised, but yeah, if you want to be free, that means doing things yourself.

Easily the most important take away in this excellent interview, but unfortunately, also the most contentious.


So what you are saying is we should hold big petro, chemical, and car companies accountable ourselves when they threaten billions of lives through ecosystem collapse level polution and climate catastrophe?

Great! Pick up a gun then!

Or maybe, the whole, "legislation is curbing everyones freedom!" thing is just a lie, fed to you by the most wealthy and powerfull to keep you from voting in a way that protects your freedom, and curbs theirs.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: