Richard is a great story teller and adventures that SQLite has taken him on pretty interesting. I think he in on hn as well ( @SQLite).
One interesting thing I found about how SQLite is developed is the amount of testing that is done. Between unit tests and parameterized tests and fuzzer tests they are doing billions of tests per release and spending days running them before each release. I know I don't like to wait for a long CI build before I deploy a service but something like SQLite needs whole different category of resilience.
That part really made me laugh. A few months ago, i developed a wrapper in swift for sqlite on a project. A coworker of mine had a bug on what sqlite returned, and i couldn't find anything wrong. He told me "it has to be a bug on sqlite", and i said "no, sqlite has no bugs… but i don't really understand what's going on". And so we posted a bug report on the sqlite forums. I told my friend "if you found a bug on sqlite, on your first try, i'll buy you a bottle of champaign".
Of course, it was a bug with my wrapper (included a \0 char at the end of strings).
I just forwarded him the link to your podcast, saying "we really had no chance"
I enjoyed this episode. I knew about all the tests before, but what came clear in this interview is that Sqlite inc is using their tests as a unique way to keep contracts on Sqlite development flowing. Since they can test it so much better than anyone else, it would be crazy not to hire them, rather than hiring someone else to develop a patch.
While they're keeping those tests proprietary, and relatedly have a closed development model, this strikes me as still better than typical open core strategies for monetizing free software, or at least pointing in better directions.
Agbell, since you're here, could I ask you add a standard link to the RSS feed for your podcast?
Minor nitpick: technically, the primary company that supports SQLite is an incorporated partnership named Hipp [ed note: yes, that's Mr. Richard Hipp], Wyrick [ed note: Hipp's wife] & Company, Inc. dba Hwaci (https://www.hwaci.com/).
Happens to use SQLite as its storage engine too (SQLInception?), which brings at least a couple (off the top of my head) features:
* SQL is used to query things, which is very powerful
* the repository is a single SQLite file. Easy to store, copy, and…
* one can checkout multiple working copies into various directories.
Or, in other words, I can have three discrete, independent workflows happening simultaneously, but only one copy of the repository backing them all.
It’s also got a built-in server that you can use locally or publish for remote people to access via their web browser that’s essentially GitHub-in-a-box.
I've migrated to git for my personal projects, mainly because magit (emacs frontend for git) is awesome.
But fossil is a pleasure to work with, especially on Windows, because a) it comes as a single executable that one can just copy to another machine, and b) it does not require ssh or anything else to be installed.
The web interface is amazing, especially considering the size of the executable file.
It looks to me like the main difference is that with Git you can't have multiple worktrees on the same branch. Apropos of your comment "Note: this working directory now not a tip of trunk anymore…", to Git being on the tip of a branch is a thing of note, so it doesn't ever want to accidentally invalidate another worktree's notes about being on the tip of a branch.
In Git, that'd be:
$ cd ~/work/fossils/projx.git
$ git worktree add ~/work/project_x master # add a worktree with the 'master' branch checked out
$ cd ~/work/project_x
(Edit, edit, edit)
$ git commit -m 'descriptive msg'
(Get an idea, but for whatever reason do not want to disturb this directory)
$ git worktree add ~/work/radical_idea -b cool_idea # add a worktree with a new 'cool_idea' branch
$ cd ~/work/radical_idea
(Edit, edit, edit)
$ git commit -m "cool idea"
(Test..)
$ git switch master # whoops, can't do this
fatal: 'master' is already checked out at '~/work/project_x'
$ # let's just do the merge in the other worktree, since we're about to switch anyway
$ cd ~/work/project_x
$ git worktree remove ~/work/radical_idea # or: rm -rf ~/work/radical_idea && git worktree prune
$ git merge cool_idea
> <note here there is no sense of this being an inferior or “sub” checkout>
There is not in worktrees either. You can use worktrees with a bare repository for instance. It's just that for backwards compatibility and the convenience of most users a non-bare clone creates a default worktree. That's exactly what you see if you `git clone` a repository then `git worktree list`, a worktree corresponding to the working copy.
Really enjoyed this. Is it possible to subscribe using an email address? The subscribe page seems to ask for a bunch of apps (Apple Podcast etc) that I don't use. Would be great to just receive an email and listen on a browser
If you scroll to the bottom of the page and sign up for the newsletter, I will email you whenever there is a new episode out.
The newsletter will come out once a month when there is a new episode and has a bit of personal take on the episode and maybe some links to other things I've been up to.
If you just want a pure notification, I'm sure you could hook the RSS feed up to an RSS to email service. Thanks for listening!
CoRecursive started because I did some episodes as a host on software engineering daily and it was lots of fun so I decided to start one of my own. The first 4 episodes were originally on software engineering daily.
Originally the idea was to give some platform to the areas of computing I was interested in, heavy use of types and functional programming but I think I really found my place as a podcast when I started to focus on peoples stories. So rather than tell me about SQLite, it is tell me about the story behind creating SQLite.
You are so polite, I am impressed with your ability not to push back on the interviewees. In this episode I would have asked "would sqlite really be worse today if you hadn't done your own VCS.". But I think I wouldn't be half as successful as you, people don't wanna be implicitly criticised by their interviewer :P
> would sqlite really be worse today if you hadn't done your own VCS.
Yes. Fossil is not only the VCS for SQLite, Fossil is also built around SQLite. SQLite is a core component of Fossil. Thus, when I am working on Fossil, I am forced to interact with SQLite as a "user" instead of as a "developer". In geek-speak, it forces me to "eat my own dog food". This, in turn, prompts me to add needed features to SQLite and more generally to make the SQLite interfaces friendlier to application-developers.
One recent example: SQLite version 3.34.0 added the ability to include two or more recursive terms in a Recursive Common Table Expression. (See item 2 in https://www.sqlite.org/releaselog/3_34_0.html and subsequent links.) This feature was added specifically so that I could more easily write SQL statements that would walk the Fossil version history DAG, as described by the https://www.sqlite.org/lang_with.html#rcex3 link.
I did not develop Fossil with this "dogfooding" idea in mind. It was an unanticipated benefit of Fossil. But in the end, I think it might have been the most important benefit of using Fossil instead of some other VCS.
I'm glad you like the podcast! I do push back when something doesn't sound right. I agree though that I could have asked more about fossil. I really didn't look into fossil that much when I prepared for the interview so it was a bit of a blind spot for me.
Not included in the interview is a discussion we had about types. I really think SQLite should be stricter in enforcement of types and Richard doesn't buy it. I may include it in a future bonus episode but the reason I cut it was it didn't really include any details about story of SQLite, it was just two people disagreeing about types. I've done a lot of episodes about types in the past and I wanted this to be about build SQLite, and not about types.
I quite disagree. It's boring when they just say their talking points. The guests are experts in their field, they can handle some pushback and it would be much more educational for the rest of us to hear the answers to the questions we're really wondering about.
It doesn't have to be "implicit criticism", but an open mind doesn't blindly accept stuff, ask for justification.
"I'm going to write my own mail server. I was making notes on that even as we were setting up this call. That's a big problem, and that's at least as difficult if not more difficult than writing a database engine, but I don't want to be beholden to Gmail. I don't want them controlling my destiny. I don't want them controlling the record of all of my conversations. I want to control that myself, and so I'm going to go through a lot of pain and a lot of work and a lot of effort to come up with some solution that I can control myself. I can go out and lease a virtual machine out there in the cloud and run it myself and not depend on a third party to control my email."
Does anybody have any insight into " at least as difficult if not more difficult than writing a database engine"? It would seem to me that a mail server would be much easier than a database engine.
Dr.Hipp mentioned GMail bouncing email sent to him - but hosting email yourself, will just move you to a different side of the bounce - now GMail will be bouncing emails sent by you! See, for example, first comments on the recent HN thread here: https://news.ycombinator.com/item?id=27707857 (ctrl+f for "pain in the" if it's not the first one).
So it looks like convincing GMail to respect your small server is a non-technical problem, which might be an issue for a person with "only engineering" mindset. On the other side, Dr.Hipp looks like a right person to try to convince GMail to respect smaller mailservers!
My own personal experience (with a server that has SPF/DKIM/DMARC all set up, and not listed on public RBLs) has been that GMail, Hotmail, Comcast, and Charter have not been problems.
On the other hand, AT&T is most definitely a problem. They run their own internal RBL, and the best I can guess is that they're blocking Linode. Any mail I send to an AT&T customer gets bounced, and forwarding that bounce to the address they indicate it should be sent to to get delisted accomplishes nothing.
I have found that many RBL's block whole IP blocks owned by various vendors. I believe this is why Linode and others like Digital Ocean now block port 25 on VPSs and make you beg for them to open it. Too many spammers.
This is also a good reason to keep a VPS even if you aren't using it at the moment: it keeps the IP "clean" so that when you put it back into use, it is useful, as opposed to (possibly) some random IP that up until a month ago was spammer central.
> This is also a good reason to keep a VPS even if you aren't using it at the moment: it keeps the IP "clean" so that when you put it back into use, it is useful, as opposed to (possibly) some random IP that up until a month ago was spammer central.
That won't protect you from the block getting on a blacklist though.
I once wrote some production email handling code. It is amazing email works at all given how poorly so many servers and clients stick to the spec. It was really eye opening. Email is more duct tape and bailing wire than I’ve ever seen anywhere else.
Counter-argument moving vms to different providers, ips etc is a lot more straightforward so maybe they don't "control" your mail..?"
And a third way of looking at is every mail has two endpoints. You don't control the server of who sent it to you or who you send it to. Google does. So they still control your mail..?
> he managed to segfault every single database engine he tried, including SQLite, except for Postgres. Postgres always ran and gave the correct answer. We were never able to find a fault in that. The Postgres people tell me that we just weren’t trying hard enough. It is possible to fault Postgres, but we were very impressed.
This is impressive for Postgres.
SQLite is incredible. I've used it over the years in a variety of important projects.
I remember in the early 2000s, I just got started with SQL and one of my reaction was "SQL is great, but why do we need a server? Why not use a simple library and files?" So I asked my coworkers if such a thing existed. And I had a lot of reactions along the lines of "it is stupid why do you want that? Just install a server on localhost" few people were convinced. It wasn't even a case of "nice idea, someone else should do it it", I think people just associated SQL with servers and by being new to it, I didn't have that preconceived idea.
A couple of years later sqlite was everywhere. It was exactly what I had in mind (but with nowhere enough skills to do it myself).
SQLite is the one technology that saved our software stack.
Back in 2016 we were struggling with management of separately-hosted SQL Server instances (network, firewall, user accounts, license, explaining shit to customers, etc.) Installing our software in a new customer environment would take days because of all of the extra mechanics involved. There were 8+ other windows services as well.
Fast forward to 2019 - We now use SQLite embedded inside the application instance. The entire system has been collapsed into a single binary distribution & service. Deploying our software to a completely blank windows server involves emailing the customer a ~120 megabyte zip file, asking them to extract it to C:\myapp and run sc.exe install.
This shift in how we manage persistence of business data allowed for us to salvage an otherwise completely lost ship. Our team is still incredibly small (<10 employees), and we are responsible for managing hundreds of customer environments. If we didn't have a reliable, self-contained persistence mechanism that "just works", we wouldn't have time to do anything else of value.
I remember installing MythTV around 2003. They very strongly insisted on backing it with MySQL, and it make it a lot harder to set up and maintain. Unless a DB is serving multiple hosts, something like SQLite really is the way to go. It also makes it halfway easy to have a simple, single-host setup, along with a path to a multi-host setup, since it's SQL (though you still have to real with dialect quirks).
Good catch. To clarify - we email a link to the distribution (somewhere in an AWS S3 bucket), not attach the actual zip file. This would clearly be a nightmare for 99% of email providers.
Aha, that sounds a lot better, thx - indeed I was wondering if nowadays common max email size increased a lot since I set up my private email server ~10 years ago... :)
SQLite is one of the finest pieces of software ever made. It’s one of those quiet innovations in the world that has multiplied the advancement of society and the chattering classes have absolutely no idea. I hope this story gets told more often, to more people. Kudos to Dr. Hipp!
He says he's going to write his own mail server. I use SQLite all the time, and I'm going to try Fossil when I get a bit of time to dig into it. I will definitely give his mail system a look when it's out.
I use SQLite pretty frequently. I switched to Fossil (from git). I'm happy. The overall architecture of having your repo be a single binary stored in one location, and your working directories being just working directories and not repos themselves is a breath of fresh air. Branching seems a little different, but my projects aren't big enough scope to have encountered any hiccups.
I did it mostly for the built in project management (wiki, bug tracking), and to get away from the big SaaS environments. But the sqlite binary file repo sold me on the architecture quickly.
Dr. Hipp is a fantastic software engineer and product designer. Both SQLite and Fossil do what they need, have only the bells and whistles needed, and have clean, consistent designs. I've browsed the source for SQLite before, and it's also clean, legible, and quality.
> Richard:It involves doing without. Here’s warning to all your listeners whenever a politician comes to you and says, “Oh, we’re going to take care of this problem for you.” What they’re really saying is, we’re going to take away some of your freedoms. The exercise for you, here, is to figure out what freedoms they’re going to take away in order to solve this problem, and they’re often well-disguised, but yeah, if you want to be free, that means doing things yourself.
Interesting to hear that SQLite is supported on the A350 for life service life of the design.
And Microsoft contracting for changes for Windows 10 bundling? That's shocking, considering how many small databases they have developed or purchased over the years.
> The 100% MCD [modified decision/condition coverage] tests, that’s called TH3. That’s proprietary. I had the idea that we would sell those tests to avionics manufacturers and make money that way. We’ve sold exactly zero copies of that so that didn’t really work out.
That'd be an awesome contribution to open source, particularly at a time that there's a lot of collective awareness of how hard it is to secure C programs.
(There's probably still a business reason to keep it proprietary: it's a reason for people to go to them for changes, at least if they want the final product to be thoroughly tested. Alas, 'cause it would be cool to see.)
It's funny that he references preppers, survivalists, and dependency on society near the end, when talking about writing all of your own software dependencies. A while back someone thinking about it from the other end (i.e. saying we generally should accept some dependencies) said almost exactly the same thing in thread at https://twitter.com/hdevalence/status/1214234305754329088 . I draw no deep insight from this, but fun to see the same analogy from two very different perspectives. (Kind of closing the circle, the "let's be real about dependencies" article linked, https://wiki.alopex.li/LetsBeRealAboutDependencies , mentions in passing "SQLite is basically everywhere (for good reason)".)
Maybe a bit off-topic, but I noticed that https://www.sqlite.org/cpu.html hasn't been updated in over two years and given the consistently high development speed I'd be curious how the "CPU Usage" deals with all the new features. :-)
The question is: If he had asked experts early on, would he have ended up less self reliant? And would we have gotten an SQLite as good as it is now or would he have been told "oh that's too complex" and be discouraged by experts? :)
fun fact: the `ite` ending in SQLite is a reference to the `ite` ending in minerals (granite, magnetite, pyrite, bauxite...) so it is not a lightweight SQL but rather a rock-stable SQL
Also to note : The fossil project contains a makeheaders.c file that can be compiled as an executable. What it does is that it scans all the .c files and the .h files you give it and generates the necessary headers for each file. I am surprised it's the only simple solution I could find for this purpose. I'm grateful not to have to update .h files any more. Check out https://www.hwaci.com/sw/mkhdr/
Never heard of it. The only other thing I found was something called lzz but it seemed more complicated and C++ oriented.
Looking up makedepend taught me that it had been superseded by Gnu Automake... Which seems to be a big machinery and another rabbit hole I am not sure I want to go into : I've already had too much fights with makefiles already.
I really wonder how most C programmers do.
It's interesting that when I interviewed him, he has all these built-it book cases behind him in his office. The idea of not needing the internet but just occasionally grabbing a book from behind you to look something up when developing software is very appealing to me.
When I interviewed Brian Kernighan one of his complaints was how so much of modern programming is just looking things up rather than building things and I think that Richard has escaped that fate by building things himself.
Before the internet, this is what we did. Read documentation and books. Now it's just a matter of applying Google-fu. I worry that this could be stunting our problem solving skills.
If you know SQL, all you got to do is understand the specific quirks of the database you use (e.g. In SQLite, specifying a data type for a column acts like a "hint" rather than a real constraint). The SQLite website has a pretty thorough manual that should more than suffice for this.
SQLite was created, because client-server database Informix (similar to MySQL and PosgreSQL) did crash and and was unavailable to connect to. There are some edge cases where MySQL/PostgreSQL can start vacuum, be corrupted, etc making client/server database unavailable. SQLite is tested with billions of test cases.
In interview:
"The idea for SQLite actually came out of his frustrations with an existing database called Informix that was installed on a literal battleship."
"Why do we even need a server? Why can’t I pull this directly off the disk drive? That way if the computer is healthy enough, it can run our application at all, we don’t have dependencies that can fail and cause us to fail, and I looked around and there were no SQL database engines that would do that, and one of the guys I was working with says, “Richard, why don’t you just write one?” “Okay, I’ll give it a try.” "
> Here’s warning to all your listeners whenever a politician comes to you and says, “Oh, we’re going to take care of this problem for you.” What they’re really saying is, we’re going to take away some of your freedoms. The exercise for you, here, is to figure out what freedoms they’re going to take away in order to solve this problem, and they’re often well-disguised, but yeah, if you want to be free, that means doing things yourself.
Easily the most important take away in this excellent interview, but unfortunately, also the most contentious.
So what you are saying is we should hold big petro, chemical, and car companies accountable ourselves when they threaten billions of lives through ecosystem collapse level polution and climate catastrophe?
Great! Pick up a gun then!
Or maybe, the whole, "legislation is curbing everyones freedom!" thing is just a lie, fed to you by the most wealthy and powerfull to keep you from voting in a way that protects your freedom, and curbs theirs.
Richard is a great story teller and adventures that SQLite has taken him on pretty interesting. I think he in on hn as well ( @SQLite).
One interesting thing I found about how SQLite is developed is the amount of testing that is done. Between unit tests and parameterized tests and fuzzer tests they are doing billions of tests per release and spending days running them before each release. I know I don't like to wait for a long CI build before I deploy a service but something like SQLite needs whole different category of resilience.