Do large websites like Twitter and Facebook log every single POST / GET request?

rvnx · 2025-01-25T14:42:57 1737816177

Store them in something like Google BigQuery, partitioned by day.

You pay very cheap for storage (since it is stored in Google Cloud Storage behind), almost nothing, and relatively somewhat expensive for each data you extract of the system.

No need for the interactive exploration, you can do queries on a case-by-case basis, worst case you pay to scan one day of logs (or less if you pay Reserved capacity, you pay a fixed price).

0 devops needed.

If you want a self-hosted big database that is easy and scalable for that usage, check ClickHouse.

metaloha · 2025-01-25T15:14:15 1737818055

This is fine. In our situation we also kept a couple of summary tables in the regular database so we could track things that we commonly queried (like total visits per country, authenticated visits, etc.

Before I left we also started tracking every hit but with less information on a weekly rotating basis - so we could always query the last 7 days of logs, but only needed to store about 20% of the information in the full log. It added to the cost a little bit, but queries within the 7 day limit were effectively free, which Support and Analytics loved :)

andrewfromx · 2025-01-25T15:00:56 1737817256

they might sample them and only store 10% not 100%

iJohnDoe · 2025-01-26T03:31:24 1737862284

Kibina is an overhead nightmare.

1 million logs per day is a lot. Don’t want to diminish that. However, standard logging on a robust system can handle that. Store and compress them. Move them to something like s3 to process them or use them for analytics or retention requirements.