A year ago the Russian-based Lingualeo (20 mln users) presented an article where...

Twisol · on Nov 30, 2021

> In a matter of hours, commenters were able to dump sensitive data (passwords and all) by constructing specially crafted queries.

To be fair, SQL injection attacks have been a common problem for a long time, and most websites did not expose their database directly like this.

It certainly feels like PostgREST and systems like it are dancing close to the fire here, since there's a lot less in between you and an attacker. But as far as I can tell, for me, right now, it's just that: a feeling.

I'd love to have a robust analysis of the access controls Postgres provides, and more importantly, how many mistakes (relative and absolute) need to be made for a breach. I'd like to know whether the security story here is fundamentally brittle, or whether it's practically viable and just needs better support for doing the "right" thing.

Personally, I like the architectural philosophy a lot. We saw a project on HN yesterday [0] that takes a similar approach. I'm oversimplifying, but I like that instead of trying to figure out what series of API calls will put the system in the state you want, you can simply put it in the state you want. It's easier for me to look at a schema/type and understand what it's supposed to mean than to look at a piece of code and figure out what it's supposed to do.

[0] https://news.ycombinator.com/item?id=29375911

michelpp · on Nov 30, 2021

> It certainly feels like PostgREST and systems like it are dancing close to the fire here, since there's a lot less in between you and an attacker.

There are definitely fewer buffered copies of your responses, yes.

This is one advantage few people know about PostgREST, once the database produces even a single row of data, it starts to stream JSON to the frontend. Users don't have to wait for the response to fully buffer in memory in some Python or Go process before being relayed on.

As for security, Postgres supports Row Level Security. You can literally protect every single row of your data with a solid security policy using a security mechanism older and tested longer than the average age of your typical HN reader.

This security mechanism doesn't just apply to your web frontend either, but to all SQL clients regardless of the frontend language. No other framework has such a deep integration with Postres' native security mechanism.

Twisol · on Nov 30, 2021

> As for security, Postgres supports Row Level Security. You can literally protect every single row of your data with a solid security policy using a security mechanism older and tested longer than the average age of your typical HN reader.

Maybe you're referring to the underlying concept rather than the implementation, but wasn't row-level security added to Postgres in 2016?

https://www.postgresql.org/docs/9.5/release-9-5.html

> This is one advantage few people know about PostgREST, once the database produces even a single row of data, it starts to stream JSON to the frontend. Users don't have to wait for the response to fully buffer in memory in some Python or Go process before being relayed on.

This is one big reason why I like this kind of architecture, though I'd generalize it a bit. More than streaming, you don't have to reimplement access control mechanisms (as noted), subscription mechanisms (AWAIT/NOTIFY), or batching/transactions in your API layer. So much of what you do in an API server is just proxying for functionality that already exists in the DB, and that alone is a reason to seriously consider this approach.

michelpp · on Nov 30, 2021

> Maybe you're referring to the underlying concept rather than the implementation, but wasn't row-level security added to Postgres in 2016?

Yes you are correct my statement was too vague, I was referring to Postgres' role based security in general at that point. RLS specifically was added in 9.5. Before that permission checking was table/column based, so you could still protect data on a per-role basis, just not as fine a grain.

Another subtle difference is that RLS hides rows you can't see, where as a REVOKE on a table or column would cause an error if you tried to access it. In either case unauthorized users cannot see protected data.

mjay23 · on Dec 1, 2021

Postgrest does not stream rows, it waits for the db with the full response then forwards it to the browser/client

yeputons · on Nov 30, 2021

Here are some links in Russian for the curous:

* The post: https://habr.com/ru/company/lingualeo/blog/515530/

* The corresponding talk: https://pgconf.ru/en/2020/264859

* Twitter thread with data breach: https://twitter.com/SanSYS/status/1299657221085835264

If the Twitter thread is accurate, their API received a list of DB field (and their JSON names to use) from the client. It was hard to guess names of fields, so someone decided to fuzz it a little bit and the API endpoint returned all existing fields in the database.

LAC-Tech · on Nov 30, 2021

IIRC the way postgREST works is that not everything in the database is accessible, just everything in your public schema.

So you put most of your logic in views. Which has gone out of fashion or something, I think.

michelpp · on Nov 30, 2021

I heard there was this one website that released software with a bug in it once that leaked data.

kgeist · on Nov 30, 2021

Of course, it was a bug in their code, which they promptly fixed. But it's the only case of a company with a large userbase using this kind of architecture that I know of, and their experience was far from smooth, so I thought I'd share.

michelpp · on Nov 30, 2021

My point was your anecdote has nothing to do with PostgREST. It's a foggy hand wave that could be laid over just about anything with an HTTP socket on one end and a database on the other.

kgeist · on Nov 30, 2021

I agree it's just an anecdote and I have nothing against PostgREST. I think it's an interesting approach which has its uses. Currently at my job all I've been doing lately has been replacing slow ORM-based access with pure SQL queries for performance/readability and I can see the advantages of ditching the layer between the frontend and the DB entirely, at least for queries.

But there are several things I'd like to see an analysis/comparison of, between the traditional approach and PostgREST's approach:

1) what is the cost of a mistake (what if I misconfigure row-based policy vs. misconfigure in the traditional backend)

2) the size of the attack surface (if we expose DB directly, as opposed to having an additional layer)

3) how many developers are proficient in Postgres enough to avoid #1 and #2 as opposed to traditional code-based security policies

michelpp · on Nov 30, 2021

> 1) what is the cost of a mistake (what if I misconfigure row-based policy vs. misconfigure in the traditional backend)

Good question. Ok let's do a thought experiment between PostgREST and Django. Every single Django production system I've seen logs into Postgres as a superuser (so it can do migrations).

PostgREST logs into the database as a user with NO privileges (because migration is a problem for a migration tool, not a web framework!) and then switches roles based on the (signed) JWT token. The PostgREST user is NEVER a superuser.

So you flub a Django permission and users can now run queries as a superuser and potentially leak everythinv. If you flub a PostgREST configuration, you leak data maybe in one or two tables.

> 2) the size of the attack surface (if we expose DB directly, as opposed to having an additional layer)

This is going to differ depending on who you ask. People talk about attack "surfaces" but I personally think of them as "attack paths". Does more code mean more paths? I think yes. Others think no.

> 3) how many developers are proficient in Postgres enough to avoid #1 and #2 as opposed to traditional code-based security policies

Postgres' security mechanism comes down to "who can see this" it's the same question those developers would be asking in Django. If they're not proficient enough to understand now to GRANT or REVOKE are they going to be proficient enough to understand the implications of their application logging into the database as a superuser?

minusf · on Nov 30, 2021

> Every single Django production system I've seen logs into Postgres as a superuser (so it can do migrations).

then all those production systems do it wrong. all the django DB user needs (even for migrations) is to be the OWNER of the schema you give it (the NAME setting in DATABASES). how you create this database is up to you and it's not django's job.

there are 2 cases i can think of where django needs extra permissions:

1. to run the test suite django will want to create/destroy a `test_projectname` database. for this `CREATE USER ... CREATEDB ...` is sufficient, no superuser needed. this should not be in your production database anyway but a throwaway DB where you run your CI/CD.

2. when some extension needs to be installed (like pg_trgm, etc) you might need superuser, but again, it's debatable if this is django's job. some projects try to do this in their migrations, but i prefer to do this during provisioning. the postgres docs say:

> Loading an extension ordinarily requires the same privileges that would be required to create its component objects. For many extensions this means superuser privileges are needed. However, if the extension is marked trusted in its control file, then it can be installed by any user who has CREATE privilege on the current database.

edit: minor clarification.

michelpp · on Nov 30, 2021

> then all those production systems do it wrong.

I agree!

Let's put aside that being the OWNER of a database object means that by default is granted all privileges to it, and that owners aren't (typically) filtered by Row Level Security.

Google "Django Postgres Tutorial" and the first link creates a superuser:

https://www.enterprisedb.com/postgres-tutorials/how-use-post...

The second link does not, but then it goes on to `GRANT ALL PRIVILEGES ON DATABASE` to that user:

https://www.section.io/engineering-education/django-app-usin...

The documentation has a handy guide on creating superusers through the web interface, and touts what a great feature it is!

https://djangocentral.com/creating-super-user-in-django/

To a security conscious DBA, this is insanity. Yes, you can lock down a Django app just like PostgREST in fact, and add some code to do SET ROLE and all that just like PostgREST, but no one does because the documentation on database setup in the Django does doesn't even mention superusers:

https://docs.djangoproject.com/en/3.2/ref/settings/#std:sett...

minusf · on Nov 30, 2021

let's not put it aside. i find it perfectly reasonable that the DB user django uses to connect to its own database has all privileges to that database.

there are ways to limit these privileges depending on the requirements both on django and postgres level, for example to have some read only users, etc.

don't google "Django Postgres Tutorial", the only django tutorial you need is the official django tutorial.

the djangocentral article discusses django superusers, not postgres ones. a django superuser has no postgres superuser privileges, they are 2 different things.

> Django does doesn't even mention superusers

of course it does not mention superusers because django does not recommend creating db superusers for security reasons and it mentions them only in connection with installing postgres extensions, and that is a limitation of postgres, not django.

in my professional opinion a run of the mill web application does not need row level security, setting up schema-wide privileges correctly is hard enough. but if one needs it, it's one google search away that some people have tried to do it with django as well, it's a postgres concept and as such can be implemented in any other framework connecting to it.

fluidcruft · on Nov 30, 2021

There's something to be said about layers of security providing redundancy to catch errors--that's part of what goes into highly reliable operations used in aerospace and health care. If you get the postgres wrong and that's all you're relying on, then that's all you've got. If you've got the same postgres wrapped in a more rigid validation then arguably there are more bugs that need to align for a mishap. Of course the flip side is that's even more complex and more difficult to implement. The reality is people probably tend to skimp on the postgres layer's security if they're burying it deep under other things. Whether you're focusing on the postgres layer or some higher later (python or whatever), that's still just one layer of protection.

sigstoat · on Nov 30, 2021

> If you've got the same postgres wrapped in a more rigid validation then arguably there are more bugs that need to align for a mishap.

> The reality is people probably tend to skimp on the postgres layer's security if they're burying it deep under other things.

so the reality is, they don't have layers. they'll have one security layer.

best to put it in postgres, where other folks who aren't in a hurry to deploy the feature your boss wants, have tested the security model.