Hacker News new | past | comments | ask | show | jobs | submit login
Do large websites like Twitter and Facebook log every single POST / GET request?
2 points by jrMunicipalDev 1 day ago | hide | past | favorite | 4 comments
I’m relatively inexperienced in handling large-scale applications, but I’ve recently been tasked with scaling a web application for a large city. One major challenge I’m facing is logging HTTP requests efficiently.

Currently, we use Kibana and keep logs for 4 weeks, and we’re handling around 1 million logs per day. As more service are added and traffic increases, I’m concerned about storage costs, performance impact, and best practices for managing logs at scale. Most of the devs working here are also fresh out of school with limited experience.

Given that platforms like Twitter, Facebook, and Reddit handle millions to billions of requests per day, do they log every single GET and POST request, or do they use sampling, aggregation, or filtering?

What are the trade-offs between complete logging vs. selective logging, especially in terms of performance, storage, compliance, and debugging?

If anyone has experience working with large-scale logging systems, I’d love to hear how different companies approach this problem. Any guidance would be greatly appreciated!






Store them in something like Google BigQuery, partitioned by day.

You pay very cheap for storage (since it is stored in Google Cloud Storage behind), almost nothing, and relatively somewhat expensive for each data you extract of the system.

No need for the interactive exploration, you can do queries on a case-by-case basis, worst case you pay to scan one day of logs (or less if you pay Reserved capacity, you pay a fixed price).

0 devops needed.

If you want a self-hosted big database that is easy and scalable for that usage, check ClickHouse.


This is fine. In our situation we also kept a couple of summary tables in the regular database so we could track things that we commonly queried (like total visits per country, authenticated visits, etc.

Before I left we also started tracking every hit but with less information on a weekly rotating basis - so we could always query the last 7 days of logs, but only needed to store about 20% of the information in the full log. It added to the cost a little bit, but queries within the 7 day limit were effectively free, which Support and Analytics loved :)


they might sample them and only store 10% not 100%

Kibina is an overhead nightmare.

1 million logs per day is a lot. Don’t want to diminish that. However, standard logging on a robust system can handle that. Store and compress them. Move them to something like s3 to process them or use them for analytics or retention requirements.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: