I'm impressed, but for other reasons. For once, I have no idea how to properly implement this. I mean, it really looks like a lot of trouble mapping this from GraphQL to.. .SQL? And what if the system is using some kind of NoSQL database which does not really have a very verbose query language, if any? Complexity just seems to explode. Somehow I feel there is also a risk for the client to make a quite sub-optimal query. So, probably some kind of policy should be implemented. All in all, there is a level of management ability that looks lost to me with if GraphQL is implemented improperly, and to be honest, it looks like it is easy not to be. I'm really looking forward to some book or guide, since the implementation is puzzling to me.
Does anybody considered this problem at all? (Giving too much flexibility to client and allowing non-optimal queries like joining several big tables or data collections w/o proper index support.) It's so weird that all materials I saw about GraphQL hushed up this question which is essential for the future of this technology.
And it's so similar to ORM's issues all the industry experienced past 20 years. But perhaps more dangerous due to the public nature of many APIs.
Instead of thinking of it as parsing queries, think about it as nested RPC. It's quite reasonable for an implementor to set a time limit or call limit to keep algorithmic complexity attacks under control.
Ok, but nested loop in so many situations is totally losing to other algorithms to join data (like merge join, hash join) when you deal with large datasets, right? So again, it will be inefficient by default, for so many cases.
If you're talking about GraphQL, the implementation is undefined by default. It's just a query language spec. There isn't a "GraphQL Server" product. You can resolve data from many sources. It could be a no-sql database, sql-backed hadoop cluster, etc.. it's very much just the language the client talks to the server in.
If the client requests exactly what it needs, that shouldn't be more stressful on the server-side than spamming REST requests for all the same resources. Plus, it's easier to optimize when you know what the client wants. If there's something expensive, you could, for example, cache/index something extra. If the client were doing it themselves with a series of REST calls, you wouldn't be able to understand the real use-case. Even if you did know what aggregation they really needed, you wouldn't be able to fix the problem without updates to both the service and the clients.
Either way, it's easier to set sane limits than craft un-DOSable APIs. There is always a cost to satisfying queries. If you're trying to run a free service, it's a much bigger concern. If you're paying the bill, you're incentivized to investigate expensive/slow calls.
Disclaimer: I'm just talking from a REST developer perspective.
The nice thing of REST calls in the current form is that they are that - just calls. With proper monitoring you could just see which ones do you get more or less and with these or those parameters. They can be optimized as best possible, but separately. You are right, it needs more analytics to figure out a series of calls (based on some token?) and maybe bundle them up, introducing a new endpoint (thus not breaking old clients).
But yet again, that is that one "query". With GraphQL it could be anything, and that's what bugs me. I find it challenging, in a good way.
Another thing what I'm also not sure about are the queries themselves, or rather, the number of different ways you can write a query. Multiple users can request the same data, or almost same, with queries written in different ways. Backend developers should then guarantee that those queries will be executed in a similar way, with predictable performance. I guess in a similar way SQL query optimization does. I had the "joy" of working with a database that had hugely different performance just with trivial changes in the query (it was not relational, actually it is discontinued now, thankfully). It was a huge PITA. I wouldn't like to serve an API like that.
It is definitly a problem of GraphQL. Some implementations try to set a time limit for queries or they try to estimate the complexity of the query. Both are quite hard to do reliably.
GraphQL essentially moves a lot of complexity that you usually have on the client side (determining what things you need, making several subqueries, combining results) to the server side. This is great for performance or bandwidth constrained clients, but it might impact the required server performance in a negative way.
I experimented a little bit with GraphQL as a query language for services on performance constrained embedded systems instead of more RPC based approaches but have gone back to the latter one as it's far more predictable and I often didn't need the flexibility of arbitrary queries.
When you implement a GraphQL server, you don't map GraphQL to SQL. Instead, for each object type in your API, you define how to resolve each field, and you can use one of the various GraphQL server libraries to go from those object types to serving a whole API.
I would get more specific but it depends on which programming language you want to use. Check out the code examples & links to libraries in different languages on http://graphql.org/code/