Hacker News new | past | comments | ask | show | jobs | submit login
Writing a SQLite clone from scratch in C (2017) (cstack.github.io)
210 points by rubyn00bie on July 4, 2021 | hide | past | favorite | 43 comments



Discussed at the time:

Writing a SQLite clone from scratch in C - https://news.ycombinator.com/item?id=15168467 - Sept 2017 (146 comments)


SQLite is already open source in the public domain.

https://sqlite.org/src/doc/trunk/README.md

Wouldn’t this be more worthwhile to write in [insert favorite modern language]?

I get it is to learn. C is a much more difficult language in which to work. A higher level language would allow one to better abstract the concepts and iterate faster


> Wouldn’t this be more worthwhile to write in [insert favorite modern language]?

sqlite isn’t your average C development, it’s got more lines of code written for tests than it does for actual code. Given the extensive tests written against sqlite, rewriting it from scratch in any language, even a more modern/safer one, would ultimately end up with a buggier program.

The only practical reason for rewriting sqlite is as a learning exercise. Which is what I believe the point behind this submitted project.


> Given the extensive tests written against sqlite, rewriting it from scratch in any language, even a more modern/safer one, would ultimately end up with a buggier program.

And yet, people keep finding use-after-free bugs in sqlite3 that allow attackers to escalate the memory corruption into arbitrary code execution... bugs that have affected major projects, including iCloud and Chrome; here are a handful: there are lots more even from just the past year :/.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-1363...

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-9327

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-1387...

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-2022...

The reality is that programming in C is just such a horrible thing--making it so easy to accidentally make mistakes that don't merely lead to "your database is corrupted" but escalate all the way to "the attacker is now running their own code on your computer"--that we simply need to stop doing it (preferably 15 years ago: there is really no excuse to not be at least using the C++ compiler to compile all of our C to get at minimum templates, deconstructors, and better type checks), get everyone else to stop doing it, and then actively go through and scour the ecosystem to remove it all, as every line of C running on your computer is a liability.

The underlying thing to appreciate here is that the goal is to make these kinds of mistakes not only much harder but impossible: I simply don't need tests to prove a Rust program (which I say as an example: I am not one of those Rust advocates that you see... I actually hate the language for not having exceptions and I even dislike some of the key people involved in both Rust and Cargo, and so my noting the dominating superiority of Rust to C should come as a pretty strong/sobering commentary; it is unfortunate as I honestly think they are making decisions that have been holding back widespread deployment of Rust, and yet they think they are doing well... hopefully they will claw their way out of the pit soon: at least some progress was finally made recently?) is at least better than sqlite3 apparently is in practice :(.


I agree with your general point that "it's very hard to write safe C code", and

> people keep finding use-after-free bugs in sqlite3

is true, but

> that allow attackers to escalate the memory corruption into arbitrary code execution... bugs that have affected major projects, including iCloud and Chrome; here are a handful: there are lots more even from just the past year :/

is just incorrect. I'd strongly encourage you to read https://www.sqlite.org/cves.html


sqlite3 dev: "yes it's an use after free, but it's fine because the attacker don't control SQL query on most applications"

Application dev: "yes it's an SQL injection, but it's fine because this database is only used for unimportant data"

The thing is that the real attacks usually come by chaining a bunch of vulnerabilities together.


Well, the first CVE I linked was a bug in the full-text search engine and was confirmed by Apple, so I'm pretty sure I'm correct; but like, even if individual bugs don't manage to affect specific projects, it seems pretty strange to just discount them all out of hand as if they aren't important: even one bug is too many if they are avoidable (and most of these C bugs are).


In addition, I recommend others read https://news.ycombinator.com/item?id=27736216. It’s a thoughtful discussion that dispels the unintentional FUD of this thread.

Before today, I didn’t know that CVEs aren’t vetted and can be easily spammed for self-gain[1]. I should be more skeptical the next time I see scores of links to CVEs with 0 comments and bare-bones descriptions.

[1] https://news.ycombinator.com/item?id=25612429


You'll have to pry C from my cold, dead hands.

But nobody uses my projects anyways :-/

edit: oh shit, saurik. I respect you work a lot. Still love my insecure "bad" language, though.

Would you have been able to make as big of an impact on securing our digital freedoms without the faults of memory and pointer errors, though? I almost feel like you owe a debt of gratitude to C's faults. I'm half kidding. But only half


I personally don't appreciate the idea of using Rust as an overall "only solution" replacement. Requires LLVM (a huge overkill on a system otherwise based on GCC), no proper support for dynamic linking, own build system, absolutely huge "ecosystem", No standardization (-> no competing implementations), SLOW to compile... Yada-yada.

Rust it self also has it's fare share of free after use etc bugs. Please check: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=Rust

I'm personally thankful that we have SQLite which is written in C - It's efficient, tiny and severe bugs are relatively rare.

As the original article quotes: “What I cannot create, I do not understand.”


> Requires LLVM (a huge overkill on a system otherwise based on GCC), no proper support for dynamic linking, own build system, absolutely huge "ecosystem", No standardization (-> no competing implementations), SLOW to compile.

All of those are secondary if Rust is able to reliably produce memory safe software.

> Rust it self also has it's fare share of free after use etc bugs. Please check: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=Rust

A relatively quick skim through those issues seems to indicate that the unsafe keyword really is unsafe. I wonder if SQLite can be written without it Rust?


> own build system

Nothing in Rust requires you to use Cargo. It's really convenient to have a good build system like Cargo—but if you like C style of manually invoking the compiler, rustc can do that too.

Rust is no worse by bundling Cargo. It strictly dominates the alternative, which would be to just ship rustc and allow the user to pick whatever build system they like. You still can pick your favorite build system; but if you don't have a particularly strong preference, Cargo is a very good default.


I 100% agree with your complaints about Rust; you might notice that I provided Rust as an example that I dislike, as it is well-known, not as an "only solution". However, I disagree (strongly) with your decision to continued interest in using C: please use almost anything at all other than C (including C++, which isn't perfect but is infinitely better than C++ due to features such as deconstructors).


I think you make very good points. In your post you also argued for the use of C++, and you commented on how much unhappy you are with the fact Rust has no exceptions.

So I take the liberty of asking you a little naughty question :-)

What is your take on "Google C++ Style Guide" advice on the use of C++ Exceptions?

https://google.github.io/styleguide/cppguide.html

"We do not use C++ exceptions"

https://google.github.io/styleguide/cppguide.html#Exceptions


That Google does it isn’t sufficient evidence that it’s good.


In this particular section, the authors already took good care to separate internal rules from general advice: "On their face, the benefits of using exceptions outweigh the costs, especially in new projects. … Things would probably be different if we had to do it all over again from scratch."


You are correct and I did not overlook that part.

Also as stated in the guidelines, some of the reasons are the intention of integrating the open source projects with an internal Google C++ code base that does not use exceptions. So integration would be difficult.

However, the question is how come such a large code base using C++ without Exceptions come to be. Specially on a company with so many employees chairs of C++ committees :-)

Just to make it clear, I prefer the C and Rust model around error handling. I will also concede that, for GUI applications and when on a coherent code base, you probably want C++ Exceptions.


Googles' ban on exceptions was a historical decision, based on the compilers available at the time. That they would not go back and revise all of their software is understandble: retrofitting exceptions into existing C++ code is painful; identifying the points where a catch handler is needed is typically a hard problem, and retrofitting exception safety anywhere, while always a good idea (even if exceptions aren't used) is a large task if you already have a lot of C++ lying around.

Googles strategy does come with some significant drawbacks, such as requiring init() functions everywhere (what happens if you forget one?), and needing to test error codes after every function call (easy to forget). It also locks them out of useful features like overloaded operators for the most part (since those have no error return option other than exceptions). In general, the 'happy path' becomes cluttered with error handling everywhere, leading to programs that dedicate more lines to error handling than actual processing.

Also, it says "WE do not use exceptions", not "NOBODY should use exceptions". It's a statement about the situation at Google, based on their unique circumstances, not a general guideline.


I haven't found a good solution to a fallible constructor without exceptions. You might want this if you have a C++ wrapper around a file or some other OS primitive. I would love some allowance for constructors returning std::optional. One problem with this is that child classes' constructors would have to return optional too or else they might throw an exception when they call optional::value.


Don't write faillible constructor. Instead, you can write a static method that returns a std::optional, for example.


That does not work for data members, unless you are willing to have the entirety of your software wrapped and unwrapped in optionals up to main, which looks pretty much like what people had in the 70s and thought : "okay, maybe there is a language feature we could have to abstract that repetitive mess"


Just have the non-throwing private constructor take the data member by rvalue reference and move them in place. This works in practice for so many projects that do C++ without exceptions.


But then you loose so many features of C++. Can't have aggregates anymore since everything needs private constructors -> more code -> more bugs.

Each possible contructor now needs a matching static method.

Can't put things in standard containers unless you rewrite all copy / move contructors (and you didn't forget to mark your move constructor noexcept, did you ?).

What happens when you have classes with more than 3 members ? Constructors with 12 arguments ? That's unambiguously terrible, and does not even save you from exceptions coming from C++ itself.

If 79% of surveyed C++ projects can use exceptions (https://www.jetbrains.com/lp/devecosystem-2020/cpp/), likely so can you.


It pains me that Rust targets LLVM, and doesn't use C as an intermediary language.

I would _love_ to use Rust on embedded and older hardware on which LLVM simply isn't available.


I'm not sure what value you're suggesting rustc would be able to bring there. If you want that in LLVM you can try your luck with the "resurrected" C backend: https://github.com/JuliaComputingOSS/llvm-cbe

I don't understand why I see so many requests for LLVM-based languages to change around their backend or IR, that seems to be a huge amount of work for comparatively little benefit. The correct thing to do there is to just add support for those to LLVM.


I don't want to "try my luck", I want to build software.

GCC, and various proprietary vendor compilers, are often the only C compilers available on some hardware; and llvm doesn't have a mature, stable and supported backend for them, either.


You might be interested in https://nim-lang.org/ and it's new deterministic memory management method!


Neat! I have tomorrow off so I'll give it a shot.


It's a surprisingly fantastic language. In my opinion, it's the perfect language. But here are some issues in my eyes:

* The community is tiny. A rounding error compared to Go, Rust, etc.

* The core team of contributors is even smaller. You could comfortably have an intimate dinner party with all of them. If they lose interest, the language would likely die.

* The library distribution story wasn't great last I checked.

Caveat: I haven't used Nim for more than a year now. Maybe all these issues are no longer relevant. I really want it to succeed, but who knows what the future holds.


Fair points, although I'd like to keep in mind: Never underestimate a single stubborn person or small team dedicated to an idea.

I'm a C++ developer and, for now, sticking to it on larger projects since I love the thinking behind the C++ evolution over the decades. Rust still feels like a hype/moving target. The big selling point appears to be memory safety, but that's just a detail and not enough to justify a switch. I'm keeping an eye on how Rust and Nim will evolve in the next 5–10 years, but for now, C++, Erlang, and Lisp are my go-to choices for projects which should last the upcoming decades.


I have spent my career as a C++ developer. What is more interesting to me about Rust isn't memory safety, but safe concurrency. But I have so much legacy code to deal with that all I can do is be influenced by Rust ideas: force all code to document assumptions about ownership and lifetimes, as if we had a borrow checker, and focus attention on redesign of code where this doesn't work cleanly.


> I have spent my career as a C++ developer. What is more interesting to me about Rust isn't memory safety, but safe concurrency.

The safety is limited to data races, not concurrency or parallelism problems.


> The underlying thing to appreciate here is that the goal is to make these kinds of mistakes not only much harder but impossible

Hate to break it to you, but it is not "impossible" to have CVEs in Rust [1].

1. https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=rust


Of course, but I would argue those aren't the same "kind of mistake".


Parent commenter didn't mean rewrite SQLite in X but rather the project author write the articles utilizing another (probably an easier to work with) language as they're showing the algorithms behind SQLite rather documenting codebase.


> sqlite isn’t your average C development, it’s got more lines of code written for tests than it does for actual code

It’d be a cool exercise to rewrite in a language that easily targets the C ABI and can thus use the same suite of tests for verification. I suspect Zig would be ideal for this, as would maybe Rust.

Edit: looks like the tests are in tcl not C so nevermind!


The author states on the page that they are building a new implementation in order to understand how to build it


Rewrite in your favorite modern language if you want to make something "better" (or "safer", or some other "-er")... but usually you end up with something worse because usually you don't have the years of trial-and-error, iteration, and refinement that most big name projects have.

Rewrite in C if you want to learn the underlying data structures, algorithms, and real-world computer science of the field. Database design is field with exceptional depth and many lessons to be learned.



If you actually read before posting, you would know that the author is doing this as a pedagogical exercise.


> Wouldn’t this be more worthwhile to write in [insert favorite modern language]?

Only if they want to learn how to write an SQLite clone.

But maybe their goal is to learn C. In that case the SQLite clone is a very good project because there's already huge test suite to test the new implementation.


There is also a pure go implementation of SQLite[0].

[0] https://gitlab.com/cznic/sqlite


Extremely well done nice




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: