Building Lob's API

7Figures2Commas · on Sept 18, 2013

> Pretty print by default is one of those small things but makes your API that much easier to use. It makes it easy for customers who are debugging to easily read data and pleasant to use.

A pretty print option should be available, but the extra whitespace adds to the content size, which is obviously a bad thing when you have any real volume. As the primary consumer of a production API is not a person, pretty print should not be the default.

> Do not return total counts or loop through entire databases to get the large count.

The data that is available to consumers of an API should be based on actual use cases, and there is often a reasonable need for a consumer to have a total count. If there are performance implications associated with generating this, which won't always be the case, it's better to address those than leave the consumers of the API without the data they need.

ollyculverhouse · on Sept 18, 2013

Regarding your total count comment, thats what I thought and I completely agree. More often than not the extra performance hit is worth it for a better 'consumer experience'.

gfodor · on Sept 18, 2013

gzip compression kind of minimizes this issue.

7Figures2Commas · on Sept 18, 2013

There's still no reason to add additional whitespace to a response that isn't going to be consumed by a human.

Also, it's worth pointing out that not all of your API consumers will necessarily support compression. The best practice is to use an Accept-Encoding header to allow the consumer to explicitly request a compressed response. This ensures that you can still serve uncompressed responses where necessary.

gfodor · on Sept 18, 2013

>There's still no reason to add additional whitespace to a response that isn't going to be consumed by a human.

The point is if you don't know which requests are going to be consumed by a human, and which are being consumed by eyeballs of developers testing your API, it's a legitimate design tradeoff to consider printing your responses in a human readable way. (And yes, "?pretty_print=1" is another option, but of course this has its own set of obvious tradeoffs.)

You seem to think that a <1% hit on (uncompressed) response size doesn't outweigh the benefits of such a design choice. But the problem is you're not expressing this as an opinion but as a fact. There are plenty of scenarios (I'd argue the majority) where a tiny increase in the % of bytes you transfer is worth making it so developers can easily debug your API. Particularly when you are an API-providing startup with a few people and minimizing support-side touch points is crucial to scaling.

7Figures2Commas · on Sept 18, 2013

What I wrote is based on a) best practice, b) the understanding that the vast majority of APIs are, in production, consumed by applications, not humans, and c) the knowledge that, in high-volume production environments, small inefficiencies can add up to larger inefficiencies.

Finally, I don't know where you came up with your 1% figure but I just ran a test on an API I use and the uncompressed pretty printed response was 24% larger than the uncompressed non-pretty printed response because of all the whitespace. I'd challenge you to find a scenario under which a pretty printed response of reasonable size is only 1% larger than its non-pretty printed counterpart.

callahad · on Sept 19, 2013

To add more data to this discussion, I've been playing with the GitHub Issues API recently, which pretty-prints by default. I grabbed the last hundred open issues from the Mozilla Persona repo, and found

        == Uncompressed ==
  Pretty Printed: 312,556 bytes
        Minified: 272,604 bytes
                  -------------
           Delta:  39,952 bytes (+14.66%)

Not as bad as a 24% hit, but that's all moot because gfodor very explicitly discussed using gzip when pretty-printing. Let's see what happens there:

           == Gzipped ==
  Pretty Printed:  41,748 bytes
        Minified:  40,648 bytes
                  -------------
           Delta:    1,100 bytes (+2.71%)

Not as good as the original claim of <1%, but still pretty darn negligible.

Data derived from this API endpoint: https://api.github.com/repos/mozilla/browserid/issues?per_pa...

gfodor · on Sept 19, 2013

Thanks. My <1% claim was ill-founded, I was assuming the use of tabs and trailing braces to minimize single-character lines so a little arithmetic in my head pointed to a small relative cost. But out of the box pretty printing does not try to minimize whitespace characters like this. Seems like it could be a worthy hack.

eterm · on Sept 18, 2013

"Do not return total counts or loop through entire databases to get the large count"

How do you achieve this on a practical level? I'm pretty new to this kind of thing and I've been coming up against this in my code, getting the totals have been a headache.

ollyculverhouse · on Sept 18, 2013

I wish they would have elaborated on to why not to return the total counts. Does anyone have any ideas why this is a bad idea? I thought it would be useful for the consumer so that they can account for pagination?

bavidar · on Sept 18, 2013

Founder here. Returning total count is a very expensive action and most users don't use the result. Therefore, by returning paginated results you can allow the users that need that data, to loop through all the paginated pages and get the total count. It may add an extra step but for 98% of users the API will run faster.

GrinningFool · on Sept 18, 2013

It may make sense to expose a separate entry to retrieve the count. That way the 98% of users who don't need it can continue to ignore it, but you won't force the extra overhead of requiring the other 2% to page through all results to get the count.

While count is expensive, it seems it would be less so than full result set retrieval broken up over several requests?

You'll get some additional usage beyond the 2% who need it now (after all, if it's there people will use it while in its absence it would not be considered at all), but on the whole you should be able to find the threshold at which you'd come out ahead by offering it.

7Figures2Commas · on Sept 18, 2013

Assume that I am building an interface that displays data I have retrieved from your API.

Are you saying that your API would require my interface to mimic your pagination, as opposed to being able to retrieve data in such a way that I could paginate as I saw fit?

bavidar · on Sept 18, 2013

you can paginate however you want. The default is the return 10. You can return an amount 1-100 and use the offset parameter to get the next X results.

7Figures2Commas · on Sept 18, 2013

Maybe I'm missing something, but how can I display pagination of my own choosing if I can't calculate the total number of pages that exist?

More importantly, let's say that I am retrieving the default 10 results per request. Let's say there are 15 results. If I request the second "page", what will the next_url value in the response be?

In other words, how can you provide an accurate next_url in your responses if you're not calculating the total number of results? At some point, aren't you providing a next_url that will return 0 results?

thejosh · on Sept 19, 2013

So rather than saying 139 Results found, I would have to do over a dozen HTTP requests to get the full result count? How is that expensive?

datboitom · on Sept 18, 2013

I just pinged one of the founders and told him to reply, so hopefully he'll come and answer your question!

crymer11 · on Sept 18, 2013

Where's the HATEOAS?

rkv · on Sept 18, 2013

Bob Loblaw Lob Blog