2022年5月7日土曜日

Can and will GraphQL ever replace REST?

https://www.quora.com/Can-and-will-GraphQL-ever-replace-REST/answer/Roman-Scharkov

  1. GraphQL will not replace REST since they both have their pros and cons.
  2. Graph APIs do have the potential to become a new standard in API development and surpass RPC-style APIs (including RESTful APIs) in the future, it doesn’t have to be GraphQL in particular though. Major paradigm shifts need to be adopted on the market for the revolution to happen.

1. RPC vs Query Languages:

REST is basically a CRUD-like form of RPC. Calling an HTTP method (GETPOSTPUTDELETE) on a resource URL is equivalent to calling a remote procedure.

RPC is pretty simple - you call a remote function with a certain set of arguments and you get a certain set of data back - easy!

GraphQL is obviously a query language. The main difference is that it puts the clients in control over what data is returned, which is challenging because the server turns into a query execution engine which is much more difficult to implement.

Both approaches have their pros and cons.

  • RPC APIs
    • suffer from over- and underfetching problems.
    • special endpoints tend to become unmaintainable over time.
    • RESTful variants often suffer from the lack of a proper type system.
  • Graph APIs
    • must be protected from overly-complex queries.
    • suffer from caching difficulties (client-side caching only).
    • are very difficult to implement correctly!

2. The Overfetching Problem

Overfetching means you get more data than you actually needed wasting network & compute resources. Imagine you had a tabular view of a list of customers and you only wanted to display the customer’s firstNamelastName and id attributes. A typical, paged RESTful query will return all properties, even those you don’t need:

  1. GET /customers?limit=50 

you’d need to introduce special URL parameters to work this around and write the handler code in such a way that it respects these parameters:

  1. GET /customers?limit=50&props=firstName,lastName,id 

This does increase the complexity of the endpoint handler code.

GraphQL, however, solves this rather elegantly:

  1. { 
  2. customers(limit: 50) { 
  3. firstName 
  4. lastName 
  5. id 
  6. } 
  7. } 

Only those properties are resolved that were actually requested.

There is a catch though…

GraphQL resolvers are usually written in such a way that the customer profile entity is requested from the database selecting all table columns (basically as a SELECT * SQL query) before the customer graph node is resolved while a REST endpoint handler might actually fire a SELECT firstName, lastName, id query limited to a particular set of columns. Even though I’ve never seen this tiny optimization in actual production code - it’s theoretically easier to achieve with REST.


3. The Underfetching Problem

Underfetching is a major problem of RPC-style APIs because it introduces unnecessary roundtrips to the server which ultimately increases latencies especially in mobile and generally slow networks.

In a recent GraphQL documentary - Nick Schrock (co-creator of GraphQL) shared an analogy of the underfetching problem that GraphQL is solving by comparing RPC APIs to vending machines. Even though the analogy isn’t exactly accurate - it explains the RPC problem well enough: you press a button - you get something, so whenever you need multiple things - you have to press multiple buttons one at a time (serially) which is slow.

To avoid having to wait for each button to finish - people create special purpose buttons that return multiple things at once (in one roundtrip) and in terms of API development it means special purpose API endpoints, which, unfortunately, become almost unmaintainable over time, especially when there are multiple different clients each with their own needs and/or frequent frontend iterations implying changing data requirements.

Now you might say “but with HTTP/2 I can fire multiple requests at once thanks to multiplexing” - and you’d be right, but only partially because head-of-line blocking isn’t the only problem. Sometimes the choice of what function to call next depends on what data was returned for the previous call.

For example, let’s assume we have a view that displays all albums, all pictures insides it and all comments of each picture:

  1. // API client pseudo-code 
  2. var albums 
  3. var pictures 
  4. var comments 
  5.  
  6. albums = GET /user/1/albums 
  7. for a in albums { 
  8. pictures = GET /user/1/albums/$a/pictures 
  9. for p in pictures { 
  10. comments = GET /user/1/albums/$a/pictures/$p/comments 
  11. } 
  12. } 

Also, let’s assume there were 4 albums each having 10 pictures. The above view would require 45 HTTP requests in total:

  1. 1x GET /user/1/albums 
  2. 4x GET /user/1/albums/$a/pictures 
  3. 40x GET /user/1/albums/$a/pictures/$p/comments 

With HTTP/2 multiplexing that’d be ~3 roundtrips (or ~ 45 without multiplexing, actually less since HTTP/1.1 keeps many TCP/IP connections open and try multiplex it that way but much less efficiently than HTTP/2). If we were connected to the internet over a satellite we’d be waiting a whole second for the view to load (3x ~350ms)!

GraphQL, however, can resolve this view in a single roundtrip greatly reducing loading times:

  1. query($uid: ID!) { 
  2. users(id: $uid) { 
  3. albums { 
  4. name 
  5. pictures { 
  6. src 
  7. name 
  8. comments { 
  9. contents 
  10. created 
  11. } 
  12. } 
  13. } 
  14. } 
  15. } 

4. Cacheability

REST leverages very powerful HTTP caching mechanisms by default. The URL represents a cachable resource and the cache-control headers specify the terms.

There is a problem though: HTTP caching doesn’t play nicely with overfetching optimizations since this:

  1. GET /customer?props=id,email,firstName,lastName,birthDate 

…and this:

  1. GET /customer?props=id,email,firstName,lastName,birthDate,bio 

…are totally different resources from the standpoint of an HTTP client/server because, again, the URL is the resource identifier.

GraphQL relies on client-side caching only making the client implementation responsible for it. A smart GraphQL client will cache entities by resolver arguments such as id in query.users such that if we previously fetched:

  1. query($uid: ID) { 
  2. users(id: $uid) { 
  3. email 
  4. firstName 
  5. lastName 
  6. birthDate 
  7. } 
  8. } 

..then a subsequent request with an additional bio property will automatically be reduced by the GraphQL client to just the bio prop because everything else is already in client’s cache and still valid:

  1. query($uid: ID) { 
  2. users(id: $uid) { 
  3. bio 
  4. } 
  5. } 

It’s hard to tell which of the two is actually better at caching when GraphQL is used with proper client implementation because:

  • HTTP is able to leverage intermediate caches (proxies) reducing server loads
  • ..while GraphQL is able to improve user experience by caching smarter on the client’s device.

5. A GraphQL API must be protected from overly-complex queries

An unprotected public GraphQL API can easily be shot down with just a single overly-complex query like this one:

  1. query { 
  2. users { 
  3. name 
  4. friends { 
  5. name 
  6. friends { 
  7. name 
  8. friends { 
  9. name 
  10. friends { 
  11. name 
  12. } 
  13. } 
  14. } 
  15. } 
  16. } 
  17. } 

There are multiple ways to protect a GraphQL API from being overwhelmed though:

  • Query Whitelisting (a.k.a. “Persisted Queries”)
    • Easy and very effective but a little annoying sometimes.
  • Query-Depth Limitation
    • Less safe than query whitelisting.
  • Query Cost Analysis
    • Very, very difficult to get right.

Currently, I prefer the query whitelisting approach. I just keep a list of all allowed queries on the server and make it reject any other query. When I realize I need a different query on the client - I just quickly whitelist it on the server and that’s it!

One might probably ask: “but how does it differ from RPC then?” and the answer is pretty obvious - I don’t have to rewrite the server in case the client’s data requirements change - I just edit the whitelist.

Whitelisting is perhaps not the best option for APIs with multiple different 3rd-party clients because of the “can we have this query, please?” issues… or maybe not since you’re in total control over what queries can be run against your API. It depends.

6. GraphQL is very difficult to implement correctly

I’ve recently published a little GraphQL + Dgraph + Go tech demo showcasing a somewhat realistic combination of the mentioned technologies. But this demo has a major flaw - there’s no caching & batching optimization and this puts more load on the database than necessary resulting in the famous n+1 request problem.

Getting a RESTful API into production is much easier, especially with toolsets like OpenAPI while GraphQL requires much more effort and experience to get right.

7. The GraphQL schema language isn’t powerful enough yet

  • GraphQL is missing generic programming capabilities pretty badly. Paginated lists are a mess for example:
  1. type A {} 
  2. type B {} 
  3.  
  4. type ListA { 
  5. size: Int! 
  6. version: Int! 
  7. items( 
  8. after: AID 
  9. limit: Int! 
  10. ): [A!]! 
  11. } 
  12.  
  13. type ListB { 
  14. size: Int! 
  15. version: Int! 
  16. items( 
  17. after: BID 
  18. limit: Int! 
  19. ): [B!]! 
  20. } 
  21.  
  22. type Some { 
  23. as: ListA! 
  24. ab: ListB! 
  25. } 

instead of just:

  1. type A {} 
  2. type B {} 
  3.  
  4. type List<T> { 
  5. size: Int! 
  6. version: Int! 
  7. items( 
  8. after: ID<T> 
  9. limit: Int! 
  10. ): [T!]! 
  11. } 
  12.  
  13. type Some { 
  14. as: List<A>! 
  15. ab: List<B>! 
  16. } 
  • There are no package & import concepts making me either write the entire schema in a single file or resort to hacky concatenation techniques.
  • etc.

Conclusion

In my humble opinion, GraphQL is still in its infancy, it’s just the beginning of the graph-era. We’ll see other approaches to Graph-API development rise in the near future including the Service Modelling Language that I’m currently working on.

Over the past 3+ years, I realized there’s a way to drastically reduce the cost and complexity of backend development and greatly increase developer satisfaction, but GraphQL won’t cut it since it’s just a protocol specification which the developers have to smartly implement by hand.

I developed a concept of a functional, strongly statically typed, 100% declarative yet Turing-complete programming language that describes scalable and easy to maintain web services in a very concise way:

  • Turing-complete transactions (a.k.a. GraphQL “mutations”).
  • Turing-complete access permissions, graph resolution- and business logic.
  • the underlying distributed graph database (embedded).
  • CQRS (queries won’t have any side-effects, guaranteed. Only transactions are allowed to produce side-effects which are automatically rolled back in case the transaction fails for some reason).
  • isolated API tests.
  • zero-downtime database migrations.
  • automatic semver 2.0 conform schema and API versioning.
  • etc.

All these features combined in a single language so the developer won’t have to care about…

  • the n+1 request problem.
  • database & API scaling.
  • database indexing.
  • query cost analysis & complexity protection.
  • type-differences between the client, schema, server, and the database.
  • database migrations.
  • backward-compatibility and versioning.
  • client-side caching, query aggregation and more.
  • etc.
Profile photo for Roman Scharkov
Open Source Software Engineer
Backend Engineer2019–present
Studied Computer Programming at Self-Teaching
Lives in Zürich2019–present
Knows Russian
Active in 1 Space
Joined December 2016

0 コメント:

コメントを投稿