The fact that the performance is so close between a database that requires NO SERVER and one that requires a dedicated server + ops management is a signal to me that all projects should start on SQLite unless there is a compelling reason not to.
It's funny because to me it says pretty much the opposite: because sqlite has no server it works entirely in-process, whereas the client will be talking to postgres over a unix socket, or possibly a full-blown network socket.
Plus it seems to be a readonly workload, something for which posgres is not ultra relevant (not that it doesn't work, but...).
Postgres doesn't require a _dedicated_ server. If your requirements allow for SQLite but it didn't exist, you could conceivably run postgres on your app server to reduce the ops load if you really felt it was a significant effort.
If you might want to use multiple app instances connected to a shared DB, I would say it's probably easier to just use a local postgres container for dev and a managed cloud DB. Really not that much effort and you get automatic backups.
If you plan to never use a shared DB, SQLite is great though.
> you could conceivably run postgres on your app server to reduce the ops load if you really felt it was a significant effort.
Still, PGSQL requires some level of setup/maintenance: create user, permissions, database, service with config/security effort. If there is a way to run PGSQL as lib and say to it: just store your data in this dir without other steps, I am very interested to learn.
You can set up Postgres with three lines of bash and a Unix socket. It is more work; I wouldn't say it's appreciably more work, and if you need more rigorous data modeling (I frequently do) then it's worth considering if this is the mode in which you'd like to run your data stores (I wouldn't, but latency isn't the most critical thing for stuff I do).
I am not familiar with SQLite, and not familiar how engineers embed it to apps, but I believe for many apps with SQLite in distribution you don't need all these steps, it is included as some kind of library and not as standalone server.
I am familiar with Java and H2, you just say in config somewhere: store db in this directory, and you are all set.
I think there is no such route currently for PGSQL.
Sorry, but it's more complex than this. As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's. You need the platform library for quite a few runtimes and now your single-command future has...multiple commands.
Which isn't bad. This is a pretty low bar to clear.
(All that said, I am of the opinion that separating data storage from application logic is better for operations, for security, and for disaster recovery unless you desperately need to prioritize avoiding network latency above everything else, so this is all pretty moot to me.)
> As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's.
That seems pretty... odd on their part. I know that Ruby gems in particular can and do include C sources, so there's no reason it couldn't ship sqlite3.c (or even the various .c files that get combined into sqlite3.c); if you're able to install Nokogiri (i.e. you have a usable C toolchain), then you should be able to compile sqlite3.c no problem.
Meanwhile, Python ships with an "sqlite3" module in its standard library (last I checked), so if you have Python installed at all then you almost certainly already have SQLite's library installed.
Nokogiri itself depends on an OS-supplied libxml2, doesn't it? And I think--don't quote me, Python isn't my thing and I only looked briefly--that Python on Ubuntu requires the `python-sqlite` APT package, which depends on both `python` and `libsqlite0`. I think CentOS is the same way? As a sysadmin, I'd expect it to be this way, so I'm not surprised by seeing this.
I think the thinking is that packing The Entire Library rather than your necessary bindings is overkill/duplication/etc etc. - it's the static versus dynamic argument in a sense, even though you may be dynamically linking while you pack your own library. Duplication versus OS-controlled security patching? Admittedly when most deployables end up being Docker containers (for good or for ill) that's less important, but most of these tools are from a distinctly earlier era of deployment tools.
> Nokogiri itself depends on an OS-supplied libxml2, doesn't it?
It does, but it still compiles bindings for it. My point is that if you're able to compile C code at all (including Nokogiri's bindings to libxml2), then strictly requiring some external libsqlite3.so is unnecessary - because it's pretty dang simple to just compile sqlite3.c alongside whatever other C code you're compiling.
> And I think--don't quote me, Python isn't my thing and I only looked briefly--that Python on Ubuntu requires the `python-sqlite` APT package, which depends on both `python` and `libsqlite0`
Right, and my point there is that if you've installed Python, then in doing so you've already installed SQLite (because your package manager did so automatically, because it's a dependency), so you don't need to run an additional command to install SQLite.
> but it's more complex than this. As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's. You need the platform library for quite a few runtimes and now your single-command future has...multiple commands.
I think usual way to do it is say having shared library file like sqlite.so in some directory or installed through OS package manager, and thin API binding for specific runtime in form of runtime package (e.g. npm).
> you desperately need to prioritize avoiding network latency above everything else, so this is all pretty moot to me
Goal here is easy distribution of embedded DB. Many apps are doing this with SQLite, for example Dropbox client. Many other examples: https://www.sqlite.org/famous.html
I see what you're saying now. But unless I'm very much missing the plot, and everyone else is talking about a web application. The post to which you originally replied even constrained it as such:
> you could conceivably run postgres on your app server
So what you've been saying has been taken in the context of packing sqlite or Postgres next to your web application, which has pretty limited use cases, and reads really weirdly in that light.
Discussion was about "app server" and ops maintenance.
We just got used to that if you want to bootstrap/install web/api stack, you need to change tons of knobs and configs and install many libs and software pieces.
But there is another approach: one self contained binary/artifact which can include embedded DB too, and be launched by one command.
As my comment says, you are going to want the stand-alone client though, for a variety of reasons. The psql client usually comes with the postgresql-server, but apps embedding SQLite3 definitely don't come with the stand-alone command-line sqlite3 client.
Another cool thing is you can initialize the Postgres db when it starts the container with your own SQL (or sh) scripts to create your schemas, tables, views etc by just copying the scripts into /docker-entrypoint-initdb.d/.
Keep in mind the suggestion was a managed product in production but if you really need to customize your local image, you can mount in arbitrary initialization scripts as well.
If you need all this customization then did SQLite really fit the bill in the first place?
> if you really need to customize your local image
I need to change config for prod pgsql instance, because default config performance wise is not necessary suitable for serious prod machines.
> If you need all this customization then did SQLite really fit the bill in the first place?
I am not familiar with SQLite, but somehow familiar with Java H2 DB, which has similar idea. You can embed it and all configs into self contained java jar file together with your app and no moves in host OS is needed.
You are likely wanting something like RDS [1], Azure [2], or even Linode (which I just discovered) [3]. Sure you have to pay a larger premium for it, but these products do allow you to focus on the app than the DBA stuff.
yeah, premium is very large if I want to do a lot of data crunching (lots of iops, cores and ram is needed), plus latency/throughput to those services will be much higher than to my local disk through PCIE bus.
I think this is what it comes down to for me too. Yes there might be some use cases that really benefit from Postgres features like GIS and the native JSON type, but ultimately, for most use-cases, it's going to hinge on whether or not you ever expect to need to scale any part of the system horizontally, whether that's the frontends or the DB itself.
> Yes there might be some use cases that really benefit from Postgres features like GIS and the native JSON type
Or a proper UUID type, materialized views, enforced data-constraints, lateral joins, queries that consider multiple indexes, etc, etc.
It's not like there's just a couple of niches where PostgreSQL's features distinguish it from SQLite. SQLite is appropriate for very simple DB use-cases where you aren't too bothered by silent data inconsistency/corruption, but not a whole lot beyond that. It's a good little piece of software, but it hardly relegates PostgreSQL to "Just GIS things" or "Just JSON use-cases".
> SQLite is appropriate for very simple DB use-cases where you aren't too bothered by silent data inconsistency/corruption, but not a whole lot beyond that.
SQLite is also appropriate for use cases where you are bothered by silent data inconsistency/corruption, it's just that often the hardware running PostgreSQL (or any "serious" DBMS, relational or not) is usually less prone to random inconsistency/corruption (non-interruptible redundant power sources, well cooled environment, running lower but more stable clock speeds on everything, ECC RAM, raid disk array, corruption-resistant FS).
If you run PostgreSQL on the cheapest consumer hardware, expect random corruption of the database when power runs out several times in a row.
Sorry, "corruption" was probably misleading there. By "silent data inconsistency/corruption" I mean, "a bug in my application led to me storing strings in an integer column, or too-long data in a too-short column, and SQLite just silently let me do that"-type errors – the data consistency is logically (but not physically) corrupted. The author encountered 2 separate occurrences of this just in their simple use-case, evidently.
Neat, will definitely give this a try. Definitely one of my minor peeves about SQLite is having to set a bunch of pragmas on every new connection. Like, especially stuff such as foreign_keys=1... really, enforcement of foreign key constraints is something each connection has to opt into rather than being set one on the whole DB file and never thought about again? Blah.
Well, unless you care about disaster recovery or availability.
For most business applications used by more than one user there are usually some expectations that data will be retained and the application will be available.
I read that as "dedicated server[process]" not "dedicated server[hardware]" While the latter would obviously imply the former, it's really the former that increases the complexity of managing whole affair. Desktop applications with a dependency on mysql/postgres running on the same machine are, in my experience without exception, a pain in the ass for users.
Why should the database ever be publicly accessible? I've only ever seen databases being accessible by applications and maybe users (but those only for analytics databases), and have never experienced anyone working full time on certs and user access for a single database. There's usually a team for that, but they do that for the whole server fleet.
There are enough SQL oopsies in this article to show that you should NEVER use sqlite for any production except as filter databases or other sort of throwaways. RDBMSs are there to enforce constraints on data.
That’s silly, SQLite is the single most widely deployed database engine in the world, possibly the most deployed software, period. It’s unit test suite is second to none. It’s reliable enough that SMF, Solaris’ boot and service management (sort of like systemd) depends on it.
That does not mean it is applicable to all databases use-cases, but production-grade it undoubtedly is.
How many installations SQLite has is not that relevant in a discussion of
SQLite vs full RDMS.
By your example, a bank should use SQLite because it is deployed widely.
You have to look where they are deployed and the use cases.
They also come with dramatically different tooling which make sa huge difference. More tooling does not mean that a full RDMS is a better choice:
It is just about what you need, what use case you need to cover and
what industry you are in.
> RDBMSs are there to enforce constraints on data.
If you did not ask SQLite to enforce type constraints on relations (with STRICT), it will work in the relaxed, backwards-compatible expected behavior of previous versions of SQLite.
That being said, if you want actual validation, you probably need more complex CHECK expressions anyways with your business rules, and those work by default on any database.
RDBMS constraints are oversold usually. For instance it is usually the case that things like foreign-key constraints are avoided in production because they cause performance and maintenance nightmares.
Widely-used constraints like UNIQUE are production-quality in SQLite.
Uh, most distributed environments you have no choice but to use a server, because you’ll have clients on a (relatively) uncontrollable set of other machines.
If you’re writing something to run in an embedded/client app environment, then yeah why would you use Postgres for your one machine? You could, but it’ll add a lot of moving parts you’ll never need and probably don’t want (like remotely accessible network ports you’ll need to secure)
It seems to me (or at least, I'm hoping this is happening) that the pendulum is swinging back to simple deployments and away from AWS/Cloud/K8s for every tiny little app.
One thing I'd love to see is that the 'default' MVP deployment is your code + SQLite running on a single VM. That will scale well into "traction" for the vast majority of applications people are building.
I have a fully working SaaS with 30+ companies paying, with 20 employees each on average. I wrote it as a self taught programmer via YouTube and stackoverflow. It is hosted on a $2 a month namecheap sharedhosting. Php back, jQuery front. I use cpanel to set things up.
It runs faster than 90% of webapps on the internet.
As a former php dev I absolutely believe this. I can't get over how slow and bloated stuff is these days. I can't even do front-end anymore.
We once did an app ages ago where the database was used to create a materialized view of code (PHP) + web pages for everyone and everything. We then rsynced that to 6 machines. This is ancient times, but this thing FLEW -> click click click click you could fly through the pages. It was a read heavy workload, but still, even the front-end (just hand coded) was somehow faster than all the latest and greatest going on these days.
Yeah, no, I meant that 30 orgs with an average of 20 employees is (likely) a trivial load but it begins to have an interesting complexity on the amount of code for RBAC.
Not OP, but like any other database? Even if the database supports it, you usually should not have users directly logged in the database anyways, so whenever the database "has" or "lacks" access control, it doesn't matter.
What does matter however, is enforcing parametrized queries everywhere. Unless all the db handles you pass to the client handling code are read-only, chaos will ensure from the DDL permissions.
I start by assuming a server-side bridge between the client and the database, and it's a given the queries are parameterized. So only the bridge is doing the "logging in".
Why is it superior to put all of the (bespoke) access control logic in the server side bridge rather than use what's available in the database (accessed by the bridge, not the client)?
I have been watching like a hawk for 6 months but I haven't stumbled upon a clear reason why this is done, except for "it helps source code db portability".
For a multiorg/multiuser application this seems like the crucial distinction between sqlite and postgresql.
Again I have no experience here, talk to me like I'm stupid (I really am!).
For a multiorg approach, you probably want isolated DBs anyways.
Within a single org, multiuser approach, there are 2 big problems that I remember with attempting to shoehorn DB auth into application auth:
* assuming you use a connection pool, you might run out of TCP connections/ports if you need to handle too much stuff;
say for example that your load balancer need 3 application nodes behind it - you will need 2 (connections per user) x 3 (application nodes) connections just to handle a user - 6 connections/user. That will eat your database connection limit very fast, for no good reason.
* assuming you don't use a connection pool, you now have horrible latency on every connection (bad) and need to handle plain text passwords (assuming you use scram-sha-256), or md5 non-replay-resistant hashes of user passwords in, either sent in every client request, or in a shared session system. No matter what you pick, you have a security disaster in the making (very bad).
In Postgres at least you can use SET ROLE when getting a connection from the pool and RESET ROLE when returning a connection to the pool, and avoid the problems you describe.
Ah, thank you very much. I will happily investigate these and ditch postgresql for sqlite if still true, for my planned use case.
sqlite looks like great technology to me (as is postgresql) but I am a bit of a fanatic for keeping the overall system as understandable as possible, so these questions are important (for me, I'm stupid).
That pain also comes if you're already deeply invested in cloud products. Nothing scales without effort. And pain points always arise where you don't expect them. But the simpler the setup, the simpler it's often to scale out the parts that need scaling. In this example maybe a dedicated server or a separate database server at some point. But there can be many customers before that's necessary.
Preemptive scaling never really works and most projects never scale enough to warrant more than one server (unless you write very inefficient code).
As I understand it the actual low-level database interface may be a shared library mechanism in either case. At least, for a local database I'm pretty sure that's a function of the connection type. You're still dlopen()-ing an ODBC library and the underlying glue between that and the rest of the userspace DB engine parts can be implemented in many ways. If there's a socket involved it would probably be AF_UNIX
Postgres and SQLite are very different in several aspects, e.g.
- types and enforcing type safety
- handling concurrency and transactions (available isolation levels)
- functions / procedures
- array support
Postgres can be much more valuable than just using it for "relational data storage". Its features can be a good reason for choosing it from the beginning.