Anyone tested S3's static page hosting under heavy load? I would think you could just update the static file as a result of some events fired by your internal monitoring process.
We use S3 behind 1 second max-age cloudfront to serve The Verge liveblog. It's been nothing but rock solid. We essentially create a static site and push up JSON blobs. See here:
This is really interesting -- thanks for sharing. It seems to me that you could probably have nginx running on a regular box and then CloudFront as a caching CDN to avoid the S3 update delay.
Probably could figure that out, yeah. But we didn't want to take any chances given how important it was to get our live blog situation under control.
[edit]
Which is to say, we wanted a rock solid network and to essentially be a drop in a bucket of traffic, even at the insane burst that The Verge live blog gets.
Could you say more about using both the Cache-Control and a query string of epoch time? In particular the query string has me puzzled. On it's face it seems to decrease your cache hit ratio, with no/little benefit.
Im assuming the epoch time is the clients local time. The clock skew across the client population increases the number of cache keys active at any one time.
The incrementing query string also forces a new cache key once per second. Those would force a cache miss and complete request to S3 even when content has not changed. It's even worse with the skew as you now force a cache miss per second for each unique epoch time in your client clocks.
Without the query string the cache could do a conditional GET for live.json. That would save latency & bytes as the origin could respond with a 304 instead of the complete 200.
Great point. I don't speak for the guys that made the decision to append the timestamp to the query, but I assume our concern is in intermediate network caches that don't honor low TTLs. Though I don't know how founded that is, we won't ever have to deal with the issue if we take control of it with the url string.
It'd be interesting to see how wide the key space is due to clock skew. I suppose we could specify some number and consider it a global counter that is incremented every second, then when someone comes in for the first time they can by synced in with the global incrementing counter. That counter is used to ensure a fresh cloudfront hit.
I think at end of the day, these issues haven't been a huge concern for a one month emergency project, but they are good points.
S3 is great for static content. I was taking the AWS ops course and the instructor mentioned some very large organizations redirect their site to S3 when under DDOS so they can remain on-line. In fact, he said that AWS recommended this solution to them?! Can you fathom someone who is under DDOS, and you tell them, hey, just redirect that our way ;)