A year ago the Russian-based Lingualeo (20 mln users) presented an article where they described their new architecture which has a similar concept. They have a simple proxy server in Go and all the logic happens in Postgres, i.e. frontend directly manipulates DB. Their claimed advantages were superior performance and less boilerplate/faster iteration times.
In a matter of hours, commenters were able to dump sensitive data (passwords and all) by constructing specially crafted queries. Another problem they had was that many core devs left because they perceived the new architecture a step backwards. Many also noted that they don't have adequate tools for version control anymore. I don't know how PostgREST compares, maybe it doesn't have such issues.
> In a matter of hours, commenters were able to dump sensitive data (passwords and all) by constructing specially crafted queries.
To be fair, SQL injection attacks have been a common problem for a long time, and most websites did not expose their database directly like this.
It certainly feels like PostgREST and systems like it are dancing close to the fire here, since there's a lot less in between you and an attacker. But as far as I can tell, for me, right now, it's just that: a feeling.
I'd love to have a robust analysis of the access controls Postgres provides, and more importantly, how many mistakes (relative and absolute) need to be made for a breach. I'd like to know whether the security story here is fundamentally brittle, or whether it's practically viable and just needs better support for doing the "right" thing.
Personally, I like the architectural philosophy a lot. We saw a project on HN yesterday [0] that takes a similar approach. I'm oversimplifying, but I like that instead of trying to figure out what series of API calls will put the system in the state you want, you can simply put it in the state you want. It's easier for me to look at a schema/type and understand what it's supposed to mean than to look at a piece of code and figure out what it's supposed to do.
> It certainly feels like PostgREST and systems like it are dancing close to the fire here, since there's a lot less in between you and an attacker.
There are definitely fewer buffered copies of your responses, yes.
This is one advantage few people know about PostgREST, once the database produces even a single row of data, it starts to stream JSON to the frontend. Users don't have to wait for the response to fully buffer in memory in some Python or Go process before being relayed on.
As for security, Postgres supports Row Level Security. You can literally protect every single row of your data with a solid security policy using a security mechanism older and tested longer than the average age of your typical HN reader.
This security mechanism doesn't just apply to your web frontend either, but to all SQL clients regardless of the frontend language. No other framework has such a deep integration with Postres' native security mechanism.
> As for security, Postgres supports Row Level Security. You can literally protect every single row of your data with a solid security policy using a security mechanism older and tested longer than the average age of your typical HN reader.
Maybe you're referring to the underlying concept rather than the implementation, but wasn't row-level security added to Postgres in 2016?
> This is one advantage few people know about PostgREST, once the database produces even a single row of data, it starts to stream JSON to the frontend. Users don't have to wait for the response to fully buffer in memory in some Python or Go process before being relayed on.
This is one big reason why I like this kind of architecture, though I'd generalize it a bit. More than streaming, you don't have to reimplement access control mechanisms (as noted), subscription mechanisms (AWAIT/NOTIFY), or batching/transactions in your API layer. So much of what you do in an API server is just proxying for functionality that already exists in the DB, and that alone is a reason to seriously consider this approach.
> Maybe you're referring to the underlying concept rather than the implementation, but wasn't row-level security added to Postgres in 2016?
Yes you are correct my statement was too vague, I was referring to Postgres' role based security in general at that point. RLS specifically was added in 9.5. Before that permission checking was table/column based, so you could still protect data on a per-role basis, just not as fine a grain.
Another subtle difference is that RLS hides rows you can't see, where as a REVOKE on a table or column would cause an error if you tried to access it. In either case unauthorized users cannot see protected data.
If the Twitter thread is accurate, their API received a list of DB field (and their JSON names to use) from the client. It was hard to guess names of fields, so someone decided to fuzz it a little bit and the API endpoint returned all existing fields in the database.
Of course, it was a bug in their code, which they promptly fixed. But it's the only case of a company with a large userbase using this kind of architecture that I know of, and their experience was far from smooth, so I thought I'd share.
My point was your anecdote has nothing to do with PostgREST. It's a foggy hand wave that could be laid over just about anything with an HTTP socket on one end and a database on the other.
I agree it's just an anecdote and I have nothing against PostgREST. I think it's an interesting approach which has its uses. Currently at my job all I've been doing lately has been replacing slow ORM-based access with pure SQL queries for performance/readability and I can see the advantages of ditching the layer between the frontend and the DB entirely, at least for queries.
But there are several things I'd like to see an analysis/comparison of, between the traditional approach and PostgREST's approach:
1) what is the cost of a mistake (what if I misconfigure row-based policy vs. misconfigure in the traditional backend)
2) the size of the attack surface (if we expose DB directly, as opposed to having an additional layer)
3) how many developers are proficient in Postgres enough to avoid #1 and #2 as opposed to traditional code-based security policies
> 1) what is the cost of a mistake (what if I misconfigure row-based policy vs. misconfigure in the traditional backend)
Good question. Ok let's do a thought experiment between PostgREST and Django. Every single Django production system I've seen logs into Postgres as a superuser (so it can do migrations).
PostgREST logs into the database as a user with NO privileges (because migration is a problem for a migration tool, not a web framework!) and then switches roles based on the (signed) JWT token. The PostgREST user is NEVER a superuser.
So you flub a Django permission and users can now run queries as a superuser and potentially leak everythinv. If you flub a PostgREST configuration, you leak data maybe in one or two tables.
> 2) the size of the attack surface (if we expose DB directly, as opposed to having an additional layer)
This is going to differ depending on who you ask. People talk about attack "surfaces" but I personally think of them as "attack paths". Does more code mean more paths? I think yes. Others think no.
> 3) how many developers are proficient in Postgres enough to avoid #1 and #2 as opposed to traditional code-based security policies
Postgres' security mechanism comes down to "who can see this" it's the same question those developers would be asking in Django. If they're not proficient enough to understand now to GRANT or REVOKE are they going to be proficient enough to understand the implications of their application logging into the database as a superuser?
> Every single Django production system I've seen logs into Postgres as a superuser (so it can do migrations).
then all those production systems do it wrong. all the django DB user needs (even for migrations) is to be the OWNER of the schema you give it (the NAME setting in DATABASES). how you create this database is up to you and it's not django's job.
there are 2 cases i can think of where django needs extra permissions:
1. to run the test suite django will want to create/destroy a `test_projectname` database. for this `CREATE USER ... CREATEDB ...` is sufficient, no superuser needed. this should not be in your production database anyway but a throwaway DB where you run your CI/CD.
2. when some extension needs to be installed (like pg_trgm, etc) you might need superuser, but again, it's debatable if this is django's job. some projects try to do this in their migrations, but i prefer to do this during provisioning. the postgres docs say:
> Loading an extension ordinarily requires the same privileges that would be required to create its component objects. For many extensions this means superuser privileges are needed. However, if the extension is marked trusted in its control file, then it can be installed by any user who has CREATE privilege on the current database.
Let's put aside that being the OWNER of a database object means that by default is granted all privileges to it, and that owners aren't (typically) filtered by Row Level Security.
Google "Django Postgres Tutorial" and the first link creates a superuser:
To a security conscious DBA, this is insanity. Yes, you can lock down a Django app just like PostgREST in fact, and add some code to do SET ROLE and all that just like PostgREST, but no one does because the documentation on database setup in the Django does doesn't even mention superusers:
let's not put it aside. i find it perfectly reasonable that the DB user django uses to connect to its own database has all privileges to that database.
there are ways to limit these privileges depending on the requirements both on django and postgres level, for example to have some read only users, etc.
don't google "Django Postgres Tutorial", the only django tutorial you need is the official django tutorial.
the djangocentral article discusses django superusers, not postgres ones. a django superuser has no postgres superuser privileges, they are 2 different things.
> Django does doesn't even mention superusers
of course it does not mention superusers because django does not recommend creating db superusers for security reasons and it mentions them only in connection with installing postgres extensions, and that is a limitation of postgres, not django.
in my professional opinion a run of the mill web application does not need row level security, setting up schema-wide privileges correctly is hard enough. but if one needs it, it's one google search away that some people have tried to do it with django as well, it's a postgres concept and as such can be implemented in any other framework connecting to it.
There's something to be said about layers of security providing redundancy to catch errors--that's part of what goes into highly reliable operations used in aerospace and health care. If you get the postgres wrong and that's all you're relying on, then that's all you've got. If you've got the same postgres wrapped in a more rigid validation then arguably there are more bugs that need to align for a mishap. Of course the flip side is that's even more complex and more difficult to implement. The reality is people probably tend to skimp on the postgres layer's security if they're burying it deep under other things. Whether you're focusing on the postgres layer or some higher later (python or whatever), that's still just one layer of protection.
In a matter of hours, commenters were able to dump sensitive data (passwords and all) by constructing specially crafted queries. Another problem they had was that many core devs left because they perceived the new architecture a step backwards. Many also noted that they don't have adequate tools for version control anymore. I don't know how PostgREST compares, maybe it doesn't have such issues.