Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fact that the performance is so close between a database that requires NO SERVER and one that requires a dedicated server + ops management is a signal to me that all projects should start on SQLite unless there is a compelling reason not to.


It's funny because to me it says pretty much the opposite: because sqlite has no server it works entirely in-process, whereas the client will be talking to postgres over a unix socket, or possibly a full-blown network socket.

Plus it seems to be a readonly workload, something for which posgres is not ultra relevant (not that it doesn't work, but...).


Postgres doesn't require a _dedicated_ server. If your requirements allow for SQLite but it didn't exist, you could conceivably run postgres on your app server to reduce the ops load if you really felt it was a significant effort.

If you might want to use multiple app instances connected to a shared DB, I would say it's probably easier to just use a local postgres container for dev and a managed cloud DB. Really not that much effort and you get automatic backups.

If you plan to never use a shared DB, SQLite is great though.


> you could conceivably run postgres on your app server to reduce the ops load if you really felt it was a significant effort.

Still, PGSQL requires some level of setup/maintenance: create user, permissions, database, service with config/security effort. If there is a way to run PGSQL as lib and say to it: just store your data in this dir without other steps, I am very interested to learn.


You can set up Postgres with three lines of bash and a Unix socket. It is more work; I wouldn't say it's appreciably more work, and if you need more rigorous data modeling (I frequently do) then it's worth considering if this is the mode in which you'd like to run your data stores (I wouldn't, but latency isn't the most critical thing for stuff I do).


I think it is more than that.

Steps will be:

- sudo apt install postgres (more steps are needed if you are not satisfied with OS default version)

- sudo su postgres

- psql -> create db, user, grant permissions

- modify hba.conf file to change connection permissions

- modify postgresql.conf to change settings to your liking because defaults are very out of touch with modern hardware

Another options would be to build your docker image, but I am not good with this.


Whereas with SQLite it's:

- sudo apt install sqlite3 # you're going to want the client

- sudo chmod myapp db.sqlite3

- sudo chmod myapp . # make sure you have directory write permissions for WAL/journal

- sudo -u myapp sqlite3 db.sqlite3 'PRAGMA journal_mode=WAL;'

Nothing is ever single-command easy in all cases.


I am not familiar with SQLite, and not familiar how engineers embed it to apps, but I believe for many apps with SQLite in distribution you don't need all these steps, it is included as some kind of library and not as standalone server.

I am familiar with Java and H2, you just say in config somewhere: store db in this directory, and you are all set.

I think there is no such route currently for PGSQL.


Sorry, but it's more complex than this. As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's. You need the platform library for quite a few runtimes and now your single-command future has...multiple commands.

Which isn't bad. This is a pretty low bar to clear.

(All that said, I am of the opinion that separating data storage from application logic is better for operations, for security, and for disaster recovery unless you desperately need to prioritize avoiding network latency above everything else, so this is all pretty moot to me.)


> As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's.

That seems pretty... odd on their part. I know that Ruby gems in particular can and do include C sources, so there's no reason it couldn't ship sqlite3.c (or even the various .c files that get combined into sqlite3.c); if you're able to install Nokogiri (i.e. you have a usable C toolchain), then you should be able to compile sqlite3.c no problem.

Meanwhile, Python ships with an "sqlite3" module in its standard library (last I checked), so if you have Python installed at all then you almost certainly already have SQLite's library installed.


Nokogiri itself depends on an OS-supplied libxml2, doesn't it? And I think--don't quote me, Python isn't my thing and I only looked briefly--that Python on Ubuntu requires the `python-sqlite` APT package, which depends on both `python` and `libsqlite0`. I think CentOS is the same way? As a sysadmin, I'd expect it to be this way, so I'm not surprised by seeing this.

I think the thinking is that packing The Entire Library rather than your necessary bindings is overkill/duplication/etc etc. - it's the static versus dynamic argument in a sense, even though you may be dynamically linking while you pack your own library. Duplication versus OS-controlled security patching? Admittedly when most deployables end up being Docker containers (for good or for ill) that's less important, but most of these tools are from a distinctly earlier era of deployment tools.


> Nokogiri itself depends on an OS-supplied libxml2, doesn't it?

It does, but it still compiles bindings for it. My point is that if you're able to compile C code at all (including Nokogiri's bindings to libxml2), then strictly requiring some external libsqlite3.so is unnecessary - because it's pretty dang simple to just compile sqlite3.c alongside whatever other C code you're compiling.

> And I think--don't quote me, Python isn't my thing and I only looked briefly--that Python on Ubuntu requires the `python-sqlite` APT package, which depends on both `python` and `libsqlite0`

Right, and my point there is that if you've installed Python, then in doing so you've already installed SQLite (because your package manager did so automatically, because it's a dependency), so you don't need to run an additional command to install SQLite.


> but it's more complex than this. As an example, NPM's `sqlite3` package does not contain the actual library. Neither does Ruby's. You need the platform library for quite a few runtimes and now your single-command future has...multiple commands.

I think usual way to do it is say having shared library file like sqlite.so in some directory or installed through OS package manager, and thin API binding for specific runtime in form of runtime package (e.g. npm).

> you desperately need to prioritize avoiding network latency above everything else, so this is all pretty moot to me

Goal here is easy distribution of embedded DB. Many apps are doing this with SQLite, for example Dropbox client. Many other examples: https://www.sqlite.org/famous.html


I see what you're saying now. But unless I'm very much missing the plot, and everyone else is talking about a web application. The post to which you originally replied even constrained it as such:

> you could conceivably run postgres on your app server

So what you've been saying has been taken in the context of packing sqlite or Postgres next to your web application, which has pretty limited use cases, and reads really weirdly in that light.


Discussion was about "app server" and ops maintenance. We just got used to that if you want to bootstrap/install web/api stack, you need to change tons of knobs and configs and install many libs and software pieces.

But there is another approach: one self contained binary/artifact which can include embedded DB too, and be launched by one command.


As my comment says, you are going to want the stand-alone client though, for a variety of reasons. The psql client usually comes with the postgresql-server, but apps embedding SQLite3 definitely don't come with the stand-alone command-line sqlite3 client.


> you are going to want the stand-alone client though

I don't want stand-alone client for app which customer downloaded from my site.


The official docker image seems quite good and handles a lot of the bootstrapping needs.


Could you be more specific what exactly it handles from the steps above?..

Also, docker adds complexity itself.


I can’t remember the exact command off top of my head but it’s something like:

docker run -it -e POSTGRES_PASSWORD -p 5432:5432 postgres:12

And bam, you have a server


Another cool thing is you can initialize the Postgres db when it starts the container with your own SQL (or sh) scripts to create your schemas, tables, views etc by just copying the scripts into /docker-entrypoint-initdb.d/.

Like:

FROM postgres:14.3

COPY pg/pg-config.sql /docker-entrypoint-initdb.d/000-pg-config.sql

COPY pg/schemas.sql /docker-entrypoint-initdb.d/010-schemas.sql

COPY pg/meddra-tables.sql /docker-entrypoint-initdb.d/020-meddra-tables.sql

# ...

RUN chmod a+r /docker-entrypoint-initdb.d/*

ENV POSTGRES_USER=myapp

ENV POSTGRES_PASSWORD=xxx

ENV POSTGRES_DB=mydb


I tend to do things like (after launching container):

docker exec postgres psql -h localhost -p 5432 -U ${DBADMINUSER} -c "some SQL statement"


There is config change in steps above (pgsql defaults suck), and having docker now, changing them becomes harder.

Also, I usually change hba.conf file too, to disallow outside connections.


Keep in mind the suggestion was a managed product in production but if you really need to customize your local image, you can mount in arbitrary initialization scripts as well.

If you need all this customization then did SQLite really fit the bill in the first place?


> if you really need to customize your local image

I need to change config for prod pgsql instance, because default config performance wise is not necessary suitable for serious prod machines.

> If you need all this customization then did SQLite really fit the bill in the first place?

I am not familiar with SQLite, but somehow familiar with Java H2 DB, which has similar idea. You can embed it and all configs into self contained java jar file together with your app and no moves in host OS is needed.


I only use this for local development personally, we use RDS in Prod


You are likely wanting something like RDS [1], Azure [2], or even Linode (which I just discovered) [3]. Sure you have to pay a larger premium for it, but these products do allow you to focus on the app than the DBA stuff.

- [1] https://aws.amazon.com/rds/

- [2] https://azure.microsoft.com/en-us/services/postgresql/#overv...

- [3] https://www.linode.com/products/postgresql/


> pay a larger premium for it

yeah, premium is very large if I want to do a lot of data crunching (lots of iops, cores and ram is needed), plus latency/throughput to those services will be much higher than to my local disk through PCIE bus.


"never use a shared DB"

I think this is what it comes down to for me too. Yes there might be some use cases that really benefit from Postgres features like GIS and the native JSON type, but ultimately, for most use-cases, it's going to hinge on whether or not you ever expect to need to scale any part of the system horizontally, whether that's the frontends or the DB itself.


> Yes there might be some use cases that really benefit from Postgres features like GIS and the native JSON type

Or a proper UUID type, materialized views, enforced data-constraints, lateral joins, queries that consider multiple indexes, etc, etc.

It's not like there's just a couple of niches where PostgreSQL's features distinguish it from SQLite. SQLite is appropriate for very simple DB use-cases where you aren't too bothered by silent data inconsistency/corruption, but not a whole lot beyond that. It's a good little piece of software, but it hardly relegates PostgreSQL to "Just GIS things" or "Just JSON use-cases".


> SQLite is appropriate for very simple DB use-cases where you aren't too bothered by silent data inconsistency/corruption, but not a whole lot beyond that.

SQLite is also appropriate for use cases where you are bothered by silent data inconsistency/corruption, it's just that often the hardware running PostgreSQL (or any "serious" DBMS, relational or not) is usually less prone to random inconsistency/corruption (non-interruptible redundant power sources, well cooled environment, running lower but more stable clock speeds on everything, ECC RAM, raid disk array, corruption-resistant FS).

If you run PostgreSQL on the cheapest consumer hardware, expect random corruption of the database when power runs out several times in a row.


Sorry, "corruption" was probably misleading there. By "silent data inconsistency/corruption" I mean, "a bug in my application led to me storing strings in an integer column, or too-long data in a too-short column, and SQLite just silently let me do that"-type errors – the data consistency is logically (but not physically) corrupted. The author encountered 2 separate occurrences of this just in their simple use-case, evidently.


> led to me storing strings in an integer column

Recent SQLite versions have STRICT mode where they forbid that.

https://www.sqlite.org/stricttables.html

> or too-long data in a too-short column

You can use a constraint for that:

    CREATE TABLE test ( 
        name TEXT NOT NULL CHECK(length(name) <= 20)
    ) STRICT;


Neat, will definitely give this a try. Definitely one of my minor peeves about SQLite is having to set a bunch of pragmas on every new connection. Like, especially stuff such as foreign_keys=1... really, enforcement of foreign key constraints is something each connection has to opt into rather than being set one on the whole DB file and never thought about again? Blah.


Well, unless you care about disaster recovery or availability.

For most business applications used by more than one user there are usually some expectations that data will be retained and the application will be available.


I read that as "dedicated server[process]" not "dedicated server[hardware]" While the latter would obviously imply the former, it's really the former that increases the complexity of managing whole affair. Desktop applications with a dependency on mysql/postgres running on the same machine are, in my experience without exception, a pain in the ass for users.


Managing a publicly accessible DB and user certs to connect to it securely is like someone’s full time job at some companies.


Why should the database ever be publicly accessible? I've only ever seen databases being accessible by applications and maybe users (but those only for analytics databases), and have never experienced anyone working full time on certs and user access for a single database. There's usually a team for that, but they do that for the whole server fleet.


There are enough SQL oopsies in this article to show that you should NEVER use sqlite for any production except as filter databases or other sort of throwaways. RDBMSs are there to enforce constraints on data.


That’s silly, SQLite is the single most widely deployed database engine in the world, possibly the most deployed software, period. It’s unit test suite is second to none. It’s reliable enough that SMF, Solaris’ boot and service management (sort of like systemd) depends on it.

That does not mean it is applicable to all databases use-cases, but production-grade it undoubtedly is.


How many installations SQLite has is not that relevant in a discussion of SQLite vs full RDMS.

By your example, a bank should use SQLite because it is deployed widely.

You have to look where they are deployed and the use cases. They also come with dramatically different tooling which make sa huge difference. More tooling does not mean that a full RDMS is a better choice:

It is just about what you need, what use case you need to cover and what industry you are in.


> NEVER use sqlite for any production

> RDBMSs are there to enforce constraints on data.

If you did not ask SQLite to enforce type constraints on relations (with STRICT), it will work in the relaxed, backwards-compatible expected behavior of previous versions of SQLite.

That being said, if you want actual validation, you probably need more complex CHECK expressions anyways with your business rules, and those work by default on any database.

https://www.sqlite.org/stricttables.html

  * unless you're a madman that runs things with "PRAGMA ignore_check_constraints = false;" enabled or equivalent; in that case, no DB can help you.


RDBMS constraints are oversold usually. For instance it is usually the case that things like foreign-key constraints are avoided in production because they cause performance and maintenance nightmares.

Widely-used constraints like UNIQUE are production-quality in SQLite.


Uh, most distributed environments you have no choice but to use a server, because you’ll have clients on a (relatively) uncontrollable set of other machines.

If you’re writing something to run in an embedded/client app environment, then yeah why would you use Postgres for your one machine? You could, but it’ll add a lot of moving parts you’ll never need and probably don’t want (like remotely accessible network ports you’ll need to secure)


yes!

It seems to me (or at least, I'm hoping this is happening) that the pendulum is swinging back to simple deployments and away from AWS/Cloud/K8s for every tiny little app.

One thing I'd love to see is that the 'default' MVP deployment is your code + SQLite running on a single VM. That will scale well into "traction" for the vast majority of applications people are building.


I have a fully working SaaS with 30+ companies paying, with 20 employees each on average. I wrote it as a self taught programmer via YouTube and stackoverflow. It is hosted on a $2 a month namecheap sharedhosting. Php back, jQuery front. I use cpanel to set things up.

It runs faster than 90% of webapps on the internet.


As a former php dev I absolutely believe this. I can't get over how slow and bloated stuff is these days. I can't even do front-end anymore.

We once did an app ages ago where the database was used to create a materialized view of code (PHP) + web pages for everyone and everything. We then rsynced that to 6 machines. This is ancient times, but this thing FLEW -> click click click click you could fly through the pages. It was a read heavy workload, but still, even the front-end (just hand coded) was somehow faster than all the latest and greatest going on these days.


Do you implement something akin to Role Based Access Control for those 30x20 employees? If so where do you store and manage that data?

I have zero experience with this but I am very curious how people do it in sqlite.


I think OP is saying their customers are mainly companies with ~20 employees, not that their company has 20 employees


Yeah, no, I meant that 30 orgs with an average of 20 employees is (likely) a trivial load but it begins to have an interesting complexity on the amount of code for RBAC.


Not OP, but like any other database? Even if the database supports it, you usually should not have users directly logged in the database anyways, so whenever the database "has" or "lacks" access control, it doesn't matter.

What does matter however, is enforcing parametrized queries everywhere. Unless all the db handles you pass to the client handling code are read-only, chaos will ensure from the DDL permissions.

https://xkcd.com/327/


I start by assuming a server-side bridge between the client and the database, and it's a given the queries are parameterized. So only the bridge is doing the "logging in".

Why is it superior to put all of the (bespoke) access control logic in the server side bridge rather than use what's available in the database (accessed by the bridge, not the client)?

I have been watching like a hawk for 6 months but I haven't stumbled upon a clear reason why this is done, except for "it helps source code db portability".

For a multiorg/multiuser application this seems like the crucial distinction between sqlite and postgresql.

Again I have no experience here, talk to me like I'm stupid (I really am!).


For a multiorg approach, you probably want isolated DBs anyways.

Within a single org, multiuser approach, there are 2 big problems that I remember with attempting to shoehorn DB auth into application auth:

* assuming you use a connection pool, you might run out of TCP connections/ports if you need to handle too much stuff;

say for example that your load balancer need 3 application nodes behind it - you will need 2 (connections per user) x 3 (application nodes) connections just to handle a user - 6 connections/user. That will eat your database connection limit very fast, for no good reason.

* assuming you don't use a connection pool, you now have horrible latency on every connection (bad) and need to handle plain text passwords (assuming you use scram-sha-256), or md5 non-replay-resistant hashes of user passwords in, either sent in every client request, or in a shared session system. No matter what you pick, you have a security disaster in the making (very bad).


In Postgres at least you can use SET ROLE when getting a connection from the pool and RESET ROLE when returning a connection to the pool, and avoid the problems you describe.


Ah, thank you very much. I will happily investigate these and ditch postgresql for sqlite if still true, for my planned use case.

sqlite looks like great technology to me (as is postgresql) but I am a bit of a fanatic for keeping the overall system as understandable as possible, so these questions are important (for me, I'm stupid).


Yup, I believe it. As an industry 90% of companies are spending 90% of their time on crap only the remaining 10% of the companies actually need.

Good resume fodder though I guess.


If people are paying for the app why not spring for a dedicated vm? It’s only like $5 more a month and is way more predictable and customizable.


It seems a lot harder to setup correctly and use, and I don't really know what benefit I'd get.


In my experience the pain will come later as it grows, I believe it.


That pain also comes if you're already deeply invested in cloud products. Nothing scales without effort. And pain points always arise where you don't expect them. But the simpler the setup, the simpler it's often to scale out the parts that need scaling. In this example maybe a dedicated server or a separate database server at some point. But there can be many customers before that's necessary.

Preemptive scaling never really works and most projects never scale enough to warrant more than one server (unless you write very inefficient code).


And it's a great pain to have


I believe it!


hmm in that case I would probably pick Postgres because despite the added network call, it is still faster than an embedded DB.


As I understand it the actual low-level database interface may be a shared library mechanism in either case. At least, for a local database I'm pretty sure that's a function of the connection type. You're still dlopen()-ing an ODBC library and the underlying glue between that and the rest of the userspace DB engine parts can be implemented in many ways. If there's a socket involved it would probably be AF_UNIX


I disagree:

Postgres and SQLite are very different in several aspects, e.g.

- types and enforcing type safety

- handling concurrency and transactions (available isolation levels)

- functions / procedures

- array support

Postgres can be much more valuable than just using it for "relational data storage". Its features can be a good reason for choosing it from the beginning.


Sure, you can start there. Once you scale though you're going to have lots of fun.


For many applications you're right. However, in the case of a dashboard for blog articles I don't think it could ever come to that.


I’ve never worked on a project that was scaled out that didn’t require at least moderate DB maintenance, such as a large migration or sharding.

A migration is all but expected. Might as well make things easy on yourself for the first year and start with SQLite.


We're talking about data from a blog, though. It's not like it's going to scale to have terabytes of entries anytime soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: