We have a `user` GraphQL type. It has 200+ fields and resolvers into other data....

filoleg · on Nov 7, 2020

I agree with your point about caching, except, i think, it is missing one important detail that makes your argument less one-sided against GraphQL.

What you described is absolutely correct, except it is only the case if we cache by query. If we cache by objects and fields, none of those issues you described become relevant, and caching by object/field (as opposed to caching by query) in general seems like a better practice imo, aside from certain very specific scenarios.

In fact, it seems like the official GraphQL docs recommend that approach as well [0].

0. https://graphql.org/learn/caching/

013a · on Nov 7, 2020

No, its still an issue. If you have QueryA which requests User { firstName } and QueryB which requests User { lastName }, those queries both have to request their data in order for both fields to be cached.

If, instead, you have QueryC which does User { firstName, lastName }, but two usages of it (QueryC1/QueryC2), after QueryC1 requests, QueryC2 can use the cached results. This works whether you're doing Query/Operation level caching or Field/ID level caching. The former example works in neither. And QueryC is trivially faster than QueryA+QueryB because of network overhead.

This isn't necessarily an issue with GraphQL (and, I thought I was clear about this, but: I'm not against GraphQL). Its a behavior of both typical REST implementations and GraphQL. And, to be clear, GraphQL's out-of-box caching story is more powerful than any REST implementation I've seen short of hyper-engineered FAANG companies, because it enables really powerful field/ID level caching across multiple queries. But it doesn't work in this case.

The point is that its still very immature. Even the thought leaders (Apollo being the biggest one and worst offender) write these blog posts filled with Best Practices and Recommendations that often convey horrible advice and flat-out misrepresent GraphQL's actual advantages compared to REST. GraphQL solves a lot of REST's problems; it does NOT solve REST's "god-object" class of problems, like the grandparent comment suggests; and it introduces many new classes of problems that remain unsolved in the ecosystem because of how immature it is (one great example is OSI L7 inspection in many tertiary tools a typical SaaS app has. Many products like Datadog, CloudFront, AWS ALB, etc are capable of doing some really cool and powerful stuff out-of-the-box just by inspecting standard HTTP headers. REST is basically just standard HTTP; your resource is in the path, query parameters, http headers, its very standard. GraphQL is not, so many of these tools don't work out-of-box with GraphQL. People are catching up, but again, its immature).

filoleg · on Nov 7, 2020

Thanks for clarifying, I genuinely appreciate comments like this one that go into actual details, without vague fluff or generalized claims. At this point, I am fully with you on this one.

fiddlerwoaroof · on Nov 8, 2020

I wonder if this can be statically analyzed? If I have two child components that request bits of data, in theory those requests could flow up the component tree and only be triggered at the root by a component that intelligently coalesces requests: add in some logic to bucket requests by some amount (50ms/100ms) or some logic to explicitly trigger pending requests and it might allow the best of both worlds?

013a · on Nov 8, 2020

I think any timebox-based batching strategy would effectively just trade frontend performance for backend performance. Your backend would have to deal with fewer requests, and there's always a number of "things" the backend needs to do with every request regardless of the content (e.g. check a token against a database), so fewer requests is nice. But, some components would have to wait up-to X milliseconds to get their data, and if we're talking about a network request that takes 100ms, balancing the value of X to be big enough to actually have an impact, while being small enough to not double the time it takes for some components to get data, would prove very difficult.

The backend performance you could gain is kinda spurious anyway. We're talking about N requests being coalesced into 1 mega-request; I would rather clients send me N small requests, not 1 mega-request. Horizontal scaling is easy to set up and quick to do in real-time; vertical scaling is harder.

And I think, while a solution like this is interesting in the domain of "you've got two sibling components rendered at the same time who's queries could be combined", that's not exactly the issue I described a few comments up. Caching really doesn't enter into play here; it would not be measurably faster to just issue one mega-request for one component and let the other wait for the cache (which is, in the end, what we're talking about). I mean, it saves one network request, but 90% of a network request is just waiting on IO, and if there's one thing Javascript is really good at, its sitting around and waiting. It can wait for dozens of things at a time.

The issue is more specifically surrounding two components being rendered at different times. Imagine a User Preview card which displays a user's first name; the user clicks on that card, which takes them to a new page, the User Profile, which displays both their first and last name. Short of using a God Query for the Preview which covers a broad set of common user fields, like firing a shotgun and hoping you've got the cache saturated for unknown upcoming components, this situation will take two requests. The shotgun approach is what we do, and what many companies do; it works, but its imprecise. It can hurt the performance of the leading components which pulled the short stick, if that God Query gets too big, and as an application evolves you're gonna forget to keep that God Query updated with new things you may need.

This problem is enticing because it feels like there should be a solution, and we just haven't, as a community, found it. I think GraphQL, at its core, has the capability to solve this; its not like REST which is so undefined and hazy that any solution to something super-complex like this would only work in the domain of one company's "way of doing things". I hope one day we can figure it out, because I suspect that a general, easy to use way of preemptively saturating a cache like this would be a MASSIVE performance boost to every GraphQL client that's ever been written.

Aeolun · on Nov 8, 2020

I think this is explicitly what Relay does.

jtsiskin · on Nov 7, 2020

Yes, caching is a pro FOR GraphQl, not against it.