Hacker News new | past | comments | ask | show | jobs | submit login

I always thought it was a bit crazy to let the client control the computational complexity of a query (to a certain degree) as it makes query optimization really difficult.

A few years back I designed a REST API and made the mistake of allowing too many query parameters, which made it really hard to optimize the database as there were hundreds of possible filter combinations (some requiring joins) that a client could ask for and that needed to be covered with indexes, so while most of the common queries worked well some of the "long tail" queries would always run into timeouts.

When using GraphQL with a backend like Hasura that generates SQL queries I would think that you have the same problem for complex data models, as there will be countless combinations of filters and joins that your database would need to efficiently cover. From what I understand most organizations solve this by either restricting the types of queries you can make through GraphQL or by just defining timeouts and dropping queries that take too long (which does not offer a great user experience).

To all of you who use GraphQL in practice, how do you solve this issue?




Hasura is the best example I know of a beautiful, well engineered, well documented product that solves the completely wrong problem.

It just converts a custom syntax represented in GraphQL into arbitrary SQL queries, which is absolutely absurd. They have mechanisms to restrict access, but the fundamental foundation of the idea is flawed.

GraphQL is great in that you define the exact queries that you know to be performant. Once you resolve your top-level payload, most of the nested fields will be non-parameterized relations. These are usually just tons of multi-gets.

So in practice, you have top-level queries that you know to be performant, followed by multi-gets that you know to be performant. If you do have a parameterized field, make sure it's fast.


> GraphQL is great in that you define the exact queries that you know to be performant.

How is this meaningfully different from having an SQL query fronted by a REST API?


GQL can return the top level query plus relations and any other entities needed for the view in response to a single request. The fact you don’t need separate requests or view-specific endpoints is significant.


Right, but if the answer to "it's too slow" and/or "it's too unpredictable" is "limit it to known queries", then ... how is that meaningfully different from known queries in SQL?


You have control over what queries/entities get exposed in the graph. I think what he's saying is that most nested entities can be mapped to simple 'SELECT .. from .. where id = 1,2,3' queries, and a while a well-written JOIN might be more efficient, those have known performance and can benefit from caching, sharding etc.


This was my gut reaction when I first learned about graphql, but in practice when I used it it didn’t end up being an issue (although we didn’t have to run at particularly large scale).

If your data models are pretty well structured, then basically every graphql object is a single table that you get by an indexed field, and traversing relationships can each do a where in query on the indexed id field. The parameters you expose on queries should be for indexed fields. So in practice, you’re not really autogenerating any particularly expensive queries, and if you assume the client isn’t asking for tons of data they don’t need, it’s not much different in query patterns than rest. Except that in some cases, the client can omit fields or related objects instead of re-using the rest route that’s a superset of what they need, so you can save resources sometimes too.

Then on top of all that, you have the option to not auto-map certain queries to the orm in special cases that are more complex, so you always can still write the code yourself to be more strict about what you allow. For writes, you typically would always do this to control exactly which fields you write.


Restricting the queries is backwards. You expose should only expose the few that makes sense in the domain of your app, and optimise for those.

GraphQL scares people because it seems to to allow arbitrary complex queries from applications. But that complexity was always there. REST just hides it in an unbounded number of predictable HTTP requests.


I don't get the "arbitrary complexity" issue. It's only a problem if you're auto generating everything in your schema, but otherwise you limit the fields of problematic types by hand and that's it.

The same computatational challenges we've always had with mapping relational models to REST didn't go away, mainly: N+1 queries, recursive types and query parameters adding complexity.


> Restricting the queries is backwards. You expose should only expose the few that makes sense in the domain of your app, and optimise for those.

With exceptions:

Preventing endless recursion: Or else it'd be possible for a client to timeout with queries [1]. In graphene recursion can be checked via `info.path`.

Calculated / slow fields: Makes more sense to have dedicated objects if calculations are involved, so if they're lists/connections, there can be limits imposed for how many can be grabbed at a time.

Sometimes the only way to be sure is to only allow access to these objects if they're a direct ID lookup at the root, e.g:

repos(id: <ID>) -> stats -> TotalLikesForAllIssues.

Assume that if cold and uncached, it could take <1s to calculate that. Imagine if we were to have that ran multiple times:

User -> repos(first: 20) -> stats -> TotalLikesForAllIssues.

So in my case I want to make the data available, but it's not always a direct mapping to SQL. And I'm also making rules inside where the client is limited in how, and how much they can query.

[1] Further reading: http://facebook.github.io/graphql/#sec-Fragment-spreads-must..., https://github.com/graphql/graphql-spec/issues/91


The key word is: predictable. If your requests are predictable, then you can handle even the unbounded number.


> I always thought it was a bit crazy to let the client control the computational complexity of a query (to a certain degree) as it makes query optimization really difficult.

I like the arguments in favour of GraphQL. I think that what's missing is the ability to set resource limits on queries or users. That's relatively common in relational databases. I saw it used by Oracle DBAs on a regular basis, I've seen it done in Greenplum.

Makes me wonder if Hasura would run on Greenplum, actually. Then you would essentially put backpressure on the client to send less intensive queries.

Disclosure: I work for Pivotal, which sponsors and sells Greenplum.


Github's GraphQL API calculates a "rate limit score" based on query complexity and allocates a max points-per-hour [1].

Similarly, Postgraphile (which I'm more familiar with than Hasura) allows setting either a max query cost or a max query depth [2]. I haven't had to use either, as I don't provide a public GraphQL API (and can just whitelist the queries I intend my clients to run), but there are ways to limit query complexity.

1 - https://developer.github.com/v4/guides/resource-limitations/

2 - https://www.graphile.org/postgraphile/production/#limiting-g...


I don't see how this is a problem for graphql or hasura. If your UI allows all those filter combinations and joins, then your backend needs to allow it too and you'll have to make it performant no matter what stack you have on the backend. If you don't allow it in the UI then there's no problem if the backend allows it or not, because it's not supported behavior.


Is this really an issue?

I mean, what's interesting isn't what is possible, but what people actually do.

Normally, you have an API and 2-3 different clients. While the front-end devs define the queries while they are creating these clienst, once done, they will use these exact queries for most of the time, so you can optimize for them.


That's why GraphQL is probably okay when your backend and your clients are developed by you.

IMO, REST APIs are a better fit if you open APIs to the whole world. Then, at least, you have a limited set of queries that you can optimise and plan for.


> Is this really an issue?

Think of an "advanced search" feature on the client that allows for creating queries of arbitrary complexity (basically, a client-side query builder). The user could certainly go overboard I'd imagine.


That problem exists with any backend. If you can't optimize it properly in the backend, restrict the UI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: