Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We have a `user` GraphQL type. It has 200+ fields and resolvers into other data. Fortunately, clients don't need to pull everything.

Well, within one of our frontend apps, someone wrote a "GetUser" query fragment, wrapped it in a react hook to make it easy to use, and now anytime anyone anywhere wants to get a user, even with one field, they're getting 100+ fields they don't want. Anytime someone needs a new field on a user, they don't create a new query; they just add the field to that "GetUser" query.

Now, I've told several GraphQL advocates that setup (in our local community and online), and unfailingly they balk at it. Well, you're doing GraphQL wrong, everyone makes mistakes, you should be more selective, etc etc etc. I think that's fair; being more specific sounds like a good pattern.

Except, we are not certain the performance wouldn't suffer should we move everything to more selective queries. The issue is caching. If you have ten queries each requesting one field, the cache can't saturate until after ten network requests. But, one mega query, and now the remaining nine queries can just pull from the cache. Sure, the mega query takes longer than one mini-query, but its not longer than ten mini-queries.

I only outline this to say: GraphQL is not a magic bullet. Every single thing you can do which would ruin a REST API is also possible in GraphQL. REST API responses naturally evolve into god objects. GraphQL APIs solve the god object issue server-side by saying "god objects are fine", then push the responsibility of not abusing it onto the client (and it should be obvious, the client is the worst place to solve almost every problem, but that's another topic).

GraphQL is easier and better for some things, far harder for other things. At the end of the day, its no worse than REST. I don't recommend it, but that's only just because its new; unless a new technology offers some very easy to express, tactile benefits for your business, just wait for it to mature. When you get down to it, GraphQL's benefits are definitely not tactile, but they're still there, and I think over time it will evolve into something very interesting.



I agree with your point about caching, except, i think, it is missing one important detail that makes your argument less one-sided against GraphQL.

What you described is absolutely correct, except it is only the case if we cache by query. If we cache by objects and fields, none of those issues you described become relevant, and caching by object/field (as opposed to caching by query) in general seems like a better practice imo, aside from certain very specific scenarios.

In fact, it seems like the official GraphQL docs recommend that approach as well [0].

0. https://graphql.org/learn/caching/


No, its still an issue. If you have QueryA which requests User { firstName } and QueryB which requests User { lastName }, those queries both have to request their data in order for both fields to be cached.

If, instead, you have QueryC which does User { firstName, lastName }, but two usages of it (QueryC1/QueryC2), after QueryC1 requests, QueryC2 can use the cached results. This works whether you're doing Query/Operation level caching or Field/ID level caching. The former example works in neither. And QueryC is trivially faster than QueryA+QueryB because of network overhead.

This isn't necessarily an issue with GraphQL (and, I thought I was clear about this, but: I'm not against GraphQL). Its a behavior of both typical REST implementations and GraphQL. And, to be clear, GraphQL's out-of-box caching story is more powerful than any REST implementation I've seen short of hyper-engineered FAANG companies, because it enables really powerful field/ID level caching across multiple queries. But it doesn't work in this case.

The point is that its still very immature. Even the thought leaders (Apollo being the biggest one and worst offender) write these blog posts filled with Best Practices and Recommendations that often convey horrible advice and flat-out misrepresent GraphQL's actual advantages compared to REST. GraphQL solves a lot of REST's problems; it does NOT solve REST's "god-object" class of problems, like the grandparent comment suggests; and it introduces many new classes of problems that remain unsolved in the ecosystem because of how immature it is (one great example is OSI L7 inspection in many tertiary tools a typical SaaS app has. Many products like Datadog, CloudFront, AWS ALB, etc are capable of doing some really cool and powerful stuff out-of-the-box just by inspecting standard HTTP headers. REST is basically just standard HTTP; your resource is in the path, query parameters, http headers, its very standard. GraphQL is not, so many of these tools don't work out-of-box with GraphQL. People are catching up, but again, its immature).


Thanks for clarifying, I genuinely appreciate comments like this one that go into actual details, without vague fluff or generalized claims. At this point, I am fully with you on this one.


I wonder if this can be statically analyzed? If I have two child components that request bits of data, in theory those requests could flow up the component tree and only be triggered at the root by a component that intelligently coalesces requests: add in some logic to bucket requests by some amount (50ms/100ms) or some logic to explicitly trigger pending requests and it might allow the best of both worlds?


I think any timebox-based batching strategy would effectively just trade frontend performance for backend performance. Your backend would have to deal with fewer requests, and there's always a number of "things" the backend needs to do with every request regardless of the content (e.g. check a token against a database), so fewer requests is nice. But, some components would have to wait up-to X milliseconds to get their data, and if we're talking about a network request that takes 100ms, balancing the value of X to be big enough to actually have an impact, while being small enough to not double the time it takes for some components to get data, would prove very difficult.

The backend performance you could gain is kinda spurious anyway. We're talking about N requests being coalesced into 1 mega-request; I would rather clients send me N small requests, not 1 mega-request. Horizontal scaling is easy to set up and quick to do in real-time; vertical scaling is harder.

And I think, while a solution like this is interesting in the domain of "you've got two sibling components rendered at the same time who's queries could be combined", that's not exactly the issue I described a few comments up. Caching really doesn't enter into play here; it would not be measurably faster to just issue one mega-request for one component and let the other wait for the cache (which is, in the end, what we're talking about). I mean, it saves one network request, but 90% of a network request is just waiting on IO, and if there's one thing Javascript is really good at, its sitting around and waiting. It can wait for dozens of things at a time.

The issue is more specifically surrounding two components being rendered at different times. Imagine a User Preview card which displays a user's first name; the user clicks on that card, which takes them to a new page, the User Profile, which displays both their first and last name. Short of using a God Query for the Preview which covers a broad set of common user fields, like firing a shotgun and hoping you've got the cache saturated for unknown upcoming components, this situation will take two requests. The shotgun approach is what we do, and what many companies do; it works, but its imprecise. It can hurt the performance of the leading components which pulled the short stick, if that God Query gets too big, and as an application evolves you're gonna forget to keep that God Query updated with new things you may need.

This problem is enticing because it feels like there should be a solution, and we just haven't, as a community, found it. I think GraphQL, at its core, has the capability to solve this; its not like REST which is so undefined and hazy that any solution to something super-complex like this would only work in the domain of one company's "way of doing things". I hope one day we can figure it out, because I suspect that a general, easy to use way of preemptively saturating a cache like this would be a MASSIVE performance boost to every GraphQL client that's ever been written.


I think this is explicitly what Relay does.


Yes, caching is a pro FOR GraphQl, not against it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: