Hacker News new | past | comments | ask | show | jobs | submit login

you can't operate at any scale at all without mechanisms in place to know perfectly well whether an issue is impacting a single customer or if your world is on fire



You'd like to think so, but surprisingly large number of "large scale" things operate on the "everything is fine" until too many people complain about the fire.


Caches make problems fun too.

Quite often you see automated tests that check how well your cache/in memory data are working. But when some other customer that isn't in the hot path tries to access their request times out. I've seen a lot of people making automated checking systems fail at things like this.


The phrase “the hardest parts of computer science is caching and naming things” come to mind.


I see 2 things here but you're off by one.


Yes, but those mechanisms take time to determine this.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: