> caches SHOULD first normalize request content to remove semantically insignificant differences, thereby improving cache efficiency
This feels like a bad idea, since (a) different caches will support different content types and normalize them in different ways, leading to unexpected changes in behavior and (b) some servers may behave differently depending on something that a cache considers to be a "semantically insignificant" distinction. I'm not sure, in other words, if I trust caches to get this right.
It seems like it might be better to require clients to submit requests in a "pre-canonicalized" form, or to have caches allow this behavior but disable it by default.
Future standards or non-standard systems MAY use different encodings; conformant implementations MUST NOT alter the sequence of bytes. They MAY perform a validation check and add additional headers.
My OCD is really happy that the symmetry is restored. I always felt that GET with the optional-but-actually-forbidden request body stands out. Once we transition from GET to QUERY, all HTTP transactions will be header+body in one direction, followed by header+body in the other.
> QUERY requests are both safe and idempotent with regards to the resource identified by the request URI. That is, QUERY requests do not alter the state of the targeted resource. However, while processing a QUERY request, a server can be expected to allocate computing and memory resources or even create additional HTTP resources through which the response can be retrieved.
The possible creation of extra HTTP resources (response resorces?) seems to me contrary to idempotency. That seems more like the territory of POST.
If two identical QUERY requests might produce different response resources, how to square that with the the fact that QUERY will be cacheable?
> The possible creation of extra HTTP resources (response resorces?) seems to me contrary to idempotency.
If two repetitions of a QUERY request create the same extra HTTP resource(s), then it can be idempotent.
Idempotent means you can't tell the difference between 1 or N requests, not that you can't tell the difference between 0 and 1. Think about PUT, which is also idempotent.
GET is (and all methods, including all safe and idempotent methods, are) allowed to have side effects, per the spec. Safe and idempotent are not mathematical constructs as defined in HTTP, they are more “business” constructs.
I know what you mean. It feels like we're missing an idea of scope for resources. If there was some kind of transaction scope or session scope or something, then a QUERY could create resources within that scope, so we could know that in the long run, it has no side effects. But that would be antithetical to the idea of statelessness perhaps.
Or maybe we just need distributed garbage collection for URLs.
> GET is allowed to have side effects, just not beyond the first invocation of a given request.
GET can have side effects, and has no difference first and subsequent invocations (because it is safe as well as idempotent). Were it idempotent but not safe, it could have side effects that the client was accountable for the first request, but no different ones of that kind for subsequent uses.
The way I look at it is that the system must continue to meet its requirements (whatever they might be) whether it gets one GET request or many in response to a single action within the user agent (clicking a link, submitting a form, script making a request, etc.). In general, logging two requests instead of one does not violate any requirements and in fact logging every request, even duplicates, is the expected behavior. Adding the same item to a list twice in response to a single UI interaction, on the other hand, would not give the desired effect.
> The possible creation of extra HTTP resources (response resorces?) seems to me contrary to idempotency.
A GET request might create additional (or modify existing) resources, say if the API exposed it's own log via HTTP.
Both safe and idempotent are less expensive than one might naively think in the HTTP spec (which is good, because the naive understanding, while aesthetically seductive, isn't very practical at all.)
Some quotes from the relevant bits of RFC 7231:
“This definition of safe methods does not prevent an implementation from including behavior that is potentially harmful, that is not entirely read-only, or that causes side effects while invoking a safe method. What is important, however, is that the client did not request that additional behavior and cannot be held accountable for it.”
“The purpose of distinguishing between safe and unsafe methods is to allow automated retrieval processes (spiders) and cache performance optimization (pre-fetching) to work without fear of causing harm. In addition, it allows a user agent to apply appropriate constraints on the automated use of unsafe methods when processing potentially untrusted content.”
“Like the definition of safe, the idempotent property only applies to what has been requested by the user; a server is free to log each request separately, retain a revision control history, or implement other non-idempotent side effects for each idempotent request.”
“Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.”
> In HTTP idempotent is where the state of the server remains unchanged
No, it's not. That's closer to “safe” than “idempotent” (safe also implies idempotent, but not the other way around), but even then it is not quite right, because even safe methods are allowed to have side effects, but their is guidance about the kind and impact of side effects that it shouldn't have.
So that means that, without a cache, repeating a QUERY might create two response resources but, with a cache, only one will be created. I find that odd. My understanding of HTTP idempotency is that it's more of a "whole-server" concept (excepting perhaps things like creation of log entries and metrics). Always creating a new resource for each request seems contrary to that.
A way to square creation of response resources with idempotency could be: the second identical QUERY that arrives should always reuse the result resource created by the first QUERY.
If I QUERY the current price of a stock, and then someone else sends an identical QUERY ten seconds later, they might get a different result. This is not because QUERY isn't idempotent.
I think that, when talking about idempotency, there's the implicit assumption that the "rest of the world" stays the same while the sequence of operations is performed.
rfc2616 says:
> Methods can also have the property of "idempotence" in that (aside
from error or expiration issues) the side-effects of N > 0 identical
requests is the same as for a single request.
Idempotency is not about "you get the same result", it's about the effects of your http request on the server. Notice that the definition you quoted is in terms of side-effects, not results.
If a request changes the state of the server and another identical request changes the state of the server in a different way, it's not idempotent.
If a request doesn't change the state of the server at all it is idempotent, even if subsequent requests might get different responses (e.g. the stock quote example in my previous post).
If a request changes the state of the server but repeated identical requests don't have any different effect it is also idempotent. For example, DELETE is idempotent because DELETE-ing something N times is the same as deleting it one time.
I think this is pointing to the problem with your definition of 'idempotent'. Idempotency simply means that any number of additional identical requests will have the same effect on the state of the resource, not that they will have no effect. (And by 'have the same effect', we mean 'produce the same state', not 'alter state in the same way' - effects are algebraic projections.)
That's why it's called idempotent - 'doing the same' - rather than impotent.
As I read it I think that the idea there is to allow usage of pattern where the resulting resource refers to other resources that somehow encode the contents of the QUERY request body in their URL (or even results in redirect to such resource). For example the result of QUERY is page with html table of the data which also includes server-side rendered chart of the same data as an external image.
[Edit: the return redirect to URL that somehow encodes the query usage is even given as an example in section 4.2]
> idempotent with regards to the resource identified by the request URI
That means that a QUERY request can change the state of the server, for example by creating new resources; there's exactly one resource it's not allowed to change.
That has always been the case ... requests get logged, and if the server exposes its access logs over HTTP, that's one thing for which a request won't be idempotent
Idempotent etc in the HTTP specs has always been more or less an attempt at a promise to the client "you should be able to repeat this request if you're not sure about success/failure without anyone claiming to implement HTTP being able to throw the book at you".
A resource is defined by a path, so if you have a `QUERY /documents` or `QUERY /albums` endpoint, the resource is all documents or albums that you are searching across, so it cannot add one of those items (like `POST /album`). It is possible that this could affect some other resource (e.g. an audit trail), which would mean that a `QUERY /logs/audit` endpoint must not add an audit log entry per the idempotent requirement.
Hum... You are complaining about a request having the side effect that a server may fork another process to answer it? That's not really much anybody can do about this.
It worked fine for him because he used curl, which allows GET with a body. But I was using Paw (similar to Postman) which refused to send it. I mentioned the issue to him to which the reply was along the lines of "its a non issue, just use curl". I kid you not, 1 week after this coworker left for another job I fixed the service to accept POST requests.
If QUERY was around I'm sure I could've made a stronger case to fix it sooner.
Elasticsearch also encourages GET with body. But a request payload is undefined, according to the RFC:
A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.
I'm very happy about this proposal. The only sad thing is that it has come so late, after so many tools and protocols (e.g. GraphQL) already abuse POSTs for this use case.
I agree. It takes me back arguments I had with my PM when I worked for a small SaaS close to 10 years ago. I had to use POST for a query API because of the limitations around GET & URL encoding of the parameters for the exact reasons outlined in TFA. She insisted it be a GET until I showed real, existing client queries that couldn't be handled. Only then, did she relent. Same PM also insisted I send results of queries as a list of objects in JSON, instead of a more compact tabular format, because tables aren't REST-y. I lost that battle, and the serialized results of queries were an order of magnitude larger than they needed to be...
> She insisted it be a GET until I showed real, existing client queries that couldn't be handled... Same PM also insisted I send results of queries as a list of objects in JSON, instead of a more compact tabular format, because tables aren't REST-y.
I think I'm on the side of the PM with this one on both counts. You sound like someone who really cares about efficiency, performance, and edge cases -- a proper engineer. But PMs are supposed to bring us down to earth and say that simplicity and maintainability are more important than saving bytes and to not waste time fixing things that aren't broken.
In a past life I spent so much effort optimizing our stack to lower our AWS bill until a PM sat me down with the company's finances and showed me the teeeeny little bar that was our cloud expenses and then legitimately 20x taller bar that was salaries and basically said that spending money to buy back my or my team's time was more important.
This is cool and all but why not just expand the scope of GET requests in newer HTTP standards? Maybe have a X-GET-QUERY header to indicate the type if GET request? the problem I see with a new method is that it isn't just webservers that need to support it, it is also webapps. Ideally this would be transparent to the webapp (which would just see really big arrays of GET params). The user-agents (browsers) would ideally support this transparently where as with a new method the JS/HTML would need to explicitly support it.
Creators of new parameters to be used in the context of application
protocols:
1. SHOULD assume that all parameters they create might become
standardized, public, commonly deployed, or usable across
multiple implementations.
2. SHOULD employ meaningful parameter names that they have reason to
believe are currently unused.
3. SHOULD NOT prefix their parameter names with "X-" or similar
constructs.
Note: If the relevant parameter name space has conventions about
associating parameter names with those who create them, a parameter
name could incorporate the organization's name or primary domain name
(see Appendix B for examples).
One of the issues that QUERY solves is that POST is overloaded and is being used for purposes beyond its intended responsibility. Shifting that overloading to GET feels to me like just another hacky approach. I prefer the well-defined, single responsibility that QUERY brings and restores to POST.
Once browsers offer support, the web app+ server would need to support it. Both of those are in control of the devs. The real problem lies in getting infrastructure teams to update the expensive F5 load balancers, and PA/CP/TP firewalls to process the requests. Those aren’t in control of the devs (unless they’re operating together well as a team)
With an official readonly header to POST instead all middlewares and proxies would have automatic support and this can be adopted in months instead of years or decades...
Because unknown headers are already passed through safely in existing implementations, whereas unknown methods are handled in a variety of different ways
> The QUERY method provides a solution that spans the gap between the use of GET and POST. As with POST, the input to the query operation is passed along within the payload of the request rather than as part of the request URI. Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.
Is this really worth a change to every HTTP client library out there to support this? The limited applications that really need this can easily use POST and document their own semantics around this.
If anything the trend with GraphQL is to ignore HTTP verbs outright because they are limited and inexpressive beyond simple CRUD tasks.
Definitely. The HTTP spec has a gap that's being filled with a hack, albeit a widely accepted and implemented one. QUERY removes ambiguity, aids self-documentation of APIs, and improves caching.
It also seems trivial to fallback to POST for backwards compatibility, no? I'm not sure it needs every lib to be updated before devs can gain value from this.
A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.
Right. The HTTP RFCs have been backing off gently from the initial position that implementations should not sent bodies with GETs and that the semantics of the GET request were defined purely in the request URI.
But presumably no-one is brave/foolhardy enough actually to redefine GET as having a semantic body because a bazillion different implementations (clients, servers and middle boxes) probably become non-compliant.
>redefine GET as having a semantic body because a bazillion different implementations (clients, servers and middle boxes) probably become non-compliant.
So what actually?
apps that didnt use GET Body, will not care anyway
apps that will use HTTP GET Body will be checked anyway
So, unless somebody downgrades HTTP Server then what could be the problem?
Aside from how much easier it is to identify whether a component supports QUERY than which forms of GET it supports, GET and QUERY (like PUT and DELETE) have similar guarantees have different meaning and are sometimes (but not always) useful against the same resource for different purposes. OPTIONS lets you tell the availability of that of they are different methods, but not if one is GET w/o body and the other is GET w/body.
> Implementations are free to use any format they wish on both the request and response.
The samples should include some non-SQL, completely made up ones, as I think a lot of people are going to fixate on the SQL-like syntax and its associated problems.
Doesn’t prescribe one, no, but when it is over HTTP, it’d be perfectly reasonable to have it accept QUERY for non-mutating requests, like it can currently use GET or POST.
There's no way to do client-side caching with this, which seems like a fatal omission — in any given situation where you would consider using QUERY, it'll almost always be more efficient to put the query in the parameters of a GET requset.
I am unable to understand - what is the difference between GET and QUERY? Just that in QUERY you can send parameters in the request body? Do we need a new method for that?
Yes, because there are various assumptions about GET that won't fit if GET can suddenly contain a request body. For example, existing caching servers may continue to cache content based only on the URL and headers, ignoring the request body entirely, producing bad results. Additionally, there may be some more subtle problems related to the Content-Length header, which is supposed to NEVER be sent for GET, but would be required for QUERY (since all requests that can contain a body MUST have a Content-Length header, depending on encoding; while requests that can't contain a body MUST NOT have a Content-Length header).
Because then you don't know if something that supports GET supports the new broader definition or the old definition, whereas if does or does not support QUERY is more clear.
Also because a different method means OPTIONS tells you information about what is supported, while overloading GET would not.
And because “same guarantees” doesn't mean “means the same thing”; PUT and DELETE have the same guarantees (idempotent but not safe), but we don't use PUT with no body for DELETE.
I'm fairly certain that (technically) nothing's stopping you from doing so. However, there are so many libraries/clients/etc... that do not allow it that it would be almost impossible to patch them all. Adding a new method and having libraries add it and support it properly would be better.
The HTTP Query method is problematic: every request to a web server is by definition a query, so it is at a minimum poorly named. Second, most queries are not idempotent and the return value can and will change. In other words, YAGNI.
I don’t agree that "every request to a web server is by definition a query", similar to how not every SQL statement is a query. In terms of command-query separation, commands may return information about the execution of the command; the fact that they return something doesn’t make them a query. For example, an SQL UPDATE statement may return how many rows were updated, or that some error occurred; that doesn’t make it a query.
I didn‘t define request, I gave examples of possible kinds of requests. There are requests for information, and there are requests for action (and there may potentially be still other kind of requests — it’s an open enumeration). My point is that queries are requests for information, and not requests for action. Therefore "query" is a proper subset of "request", and thus not every request is a query.
HTTP is not just for the web. In fact, the vast majority of HTTP trafic doesn't involve the browser at all.
The examples are realistic and useful. E.g., Clickhouse uses POST methods for queries, and a ridiculous `&readonly=2` parameter to differentiate modifying queries from readonly SELECT queries.
> QUERY requests are both safe and idempotent with regards to the resource identified by the request URI.
Is that really what you want from a query operation? I read 'idempotent' as implying that result sets don't change over time, which would be surprising behavior for queries for most database-like things.
It's probably also worth mentioning that SQL's SELECT isn't idempotent in the way HTTP means it, because of the existence of session state, pessimistic locking, and the requirements of higher isolation levels. It would be useful for an RFC to define 'idempotent' in a way that clearly addressed these issues (and, for that matter, the larger topic of sessions/transactions) more clearly.
> When doing so, caches SHOULD first normalize request content to remove semantically insignificant differences, thereby improving cache efficiency
Unfortunately, again when you look at SQL by comparison, queries are not purely expressions of what to return. Practically, they also encode how to compute the query (either explicitly through hints, or implicitly through things like join order). These behaviors are weird, tricky, and change version-to-version.
> The QUERY method is subject to the same general security considerations as all HTTP methods as described in
As another commenter said, this is quite incomplete. Query parameter injection, DoS by locking, DoS by exploiting work the database needs to do to ensure isolation, DoS by extremely expensive query, etc.
> 4.2. Simple QUERY with indirect response (303 See Other)
At least the examples here are naive - most applications don't want query result sets to be easily accessible to others. The semantics of authn and authz need to be really crisp here to make sure that attackers can't access the location of other queries result sets purely by guessing.
At least a "SHOULD use auth" or "SHOULD have large, unguessable, names" would be valuable here.
But if you GET /posts/123 and then do it again, and in between, the author updated the post, you’d expect to get the latest version of the post, no? That doesn’t make it non-idempotent, because your GET requests did not change the state at all.
I'm pretty sure timwis and zinekeller in this thread (and detaro in the sibling thread) are all saying the same thing. Idempotency implies the request in question does not change the state, not that the state would not have changed because of other operations in the meantime. GET and QUERY are meant to be idempotent but whether they really are in practice depends on how they've been implemented.
> But if you GET /posts/123 and then do it again, and in between, the author updated the post, you’d expect to get the latest version of the post, no?
Not necessarily - plenty of systems offer no such guarantees, and the Web is by design eventually consistent [0]. This is what content expiration and various other cache control mechanisms are for - it's not always so important to get the latest version of a document. For example, the HN logo or index.html can probably be safely cached for days, since they are very unlikely to change, and even if they do, it's unlikely to have a major problem if someone only sees the new version after a few days.
[0] Note that, at the extreme, due to special relativity, there is no absolute notion of "latest version" on the scale of geographically distributed computers: it's physically impossible to say if a request made in China to a server in the USA happened before or after a change on the server, if they happened close enough together - order of tens of milliseconds, an eternity in compute time.
Idempotency only refers to state changes from subsequent invocations of the call.
A command to add a user to the set of users that have upvoted a post would be idempotent. Because you can run it 20x and only the first call affects anything. A command to increase the upvote count for a comment by +1 would not be idempotent.
But idempotency is relevant because of two things:
1) is it safe to automatically retry the request? - this meshes well with what you're saying
2) is it safe to return a cached version of the response, instead of sending the request again? Idempotence in your sense is necessary but not sufficient for this case - hence the various content expiration and If-Newer-Then etc headers.
> I read 'idempotent' as implying that result sets don't change over time, which would be surprising behavior for queries for most database-like things.
That's not what idempotent means in HTTP.
> As another commenter said, this is quite incomplete. Query parameter injection, DoS by locking, DoS by exploiting work the database needs to do to ensure isolation, DoS by extremely expensive query, etc.
Is application-dependent and applies to all other HTTP methods too.
> A sequence is idempotent if a single execution of the entire sequence always yields a result that is not changed by reexecution of all, or part, of that sequence.
Which isn't, because of isolation, true in general of database queries. Obviously this is in context of RFC2616 saying that sequences of idempotent HTTP operations may not be idempotent in themselves, but that definition seems very incomplete in the context of database queries.
> Is application-dependent and applies to all other HTTP methods too.
Sure. But I don't think that's a good argument in the modern world. Over the 22 years, we've learned a lot about the security concerns of running secure systems, and it seems reasonable to include those concerns in a section labelled "security considerations". SQL injection is a classic security bug, and should be a key concern of any reasonable new standard for sending queries between systems.
A full security section should probably also mention cache timing side-channels, locking-related covert channels, and other similar concerns that come up when you increase the semantic power of HTTP. It's not that POST doesn't have these concerns, it's that we've learned in the last two decades that they are real problems for many kinds of real systems.
HTTP idempotence is only concerned with effects of the request, not the result returned.
RFC7231:
> A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
note the on the server.
(or old specs, 2616: Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. - again, side-effects, not responses)
How does a read-only database query being repeated cause a change in the database?
You're right, and I was fuzzy about what I meant. I didn't mean (although wasn't clear) that QUERY would have to return the whole result set over time - clearly that's beyond the scope of HTTP's definition of idempotency.
However, because of the existence of isolation and locking concerns in databases, even fairly simple queries are not idempotent. RFC2616 goes to some effort to (fuzzily, unfortunately) talk about sequences of operations, which would be useful here.
It seems as if the idea behind this proposal is to help out database folks. If so, that is misguided. POST is a better implementation than QUERY (or GET) at least for SQL databases. Here's why.
In SQL this is a query:
SELECT a, b, c FROM foo LIMIT 1
But this is also a "query" in many if not most connectivity APIs.
INSERT INTO foo VALUES (1, 2, 3)
Most client libraries don't know and don't care about the content of the query. It's the database's job to parse it and and do the right thing. The different between the above queries is that the first one returns a result set and the second returns an update count. Here's a simple example using Python and the clickhouse-driver library.
# An UPDATE to the database
client.execute('INSERT INTO iris SELECT * FROM another_iris_table')
# A harmless "query"
result = client.execute('SELECT COUNT(*) FROM iris')
print(result)
For this to work you need to use something underneath that is generic and works regardless of output. POST does this already. The clickhouse-driver does not use HTTP protocol though other ClickHouse drivers do. I'm just using it as example of why you need a protocol than can handle any type of SQL "query" the same way on the wire. Otherwise the client will have to have a SQL parser to figure out which one to use. (Some clients actually do that but they are a very small minority.)
IMO this helps a lot more people than just database folks. Any web application which implements a fairly granular search/filtering mechanism for its resources may run into the URL character limit with GETs. QUERY sounds much more appropriate here than what most applications do today (i.e. abuse POSTs).
In that case it's not helpful to tie it to SQL. As my examples demonstrated, it's pretty useless for SQL database connectivity. If you are looking a general query mechanism it would make more sense to have something that looks like GET with a body.
ClickHouse also supports GET as a verb. In addition to URL length issues the query needs to be URL-encoded which makes it difficult to read and debug.
p.s., It's interesting to see my post downvoted. It's more productive to show why it's wrong. I've worked on DBMS connectivity for over 30 years.
> The non-normative examples in this section make use of a simple, hypothetical plain-text based query syntax based on SQL with results returned as comma-separated values. This is done for illustration purposes only. Implementations are free to use any format they wish on both the request and response.
The examples in section 4 are just that, examples. They are not intended to be the only format a query may take. The issue with POST that QUERY solves is lack of an idempotency constraint.
This feels like a bad idea, since (a) different caches will support different content types and normalize them in different ways, leading to unexpected changes in behavior and (b) some servers may behave differently depending on something that a cache considers to be a "semantically insignificant" distinction. I'm not sure, in other words, if I trust caches to get this right.
It seems like it might be better to require clients to submit requests in a "pre-canonicalized" form, or to have caches allow this behavior but disable it by default.