Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While I agree generally, I think it's useful to distinguish between freeform logs and structured logs. Freeform logs are those typical logs developers form by string concatenation. Structured logs have a schema, and are generally stored in a proper database (maybe even a SQL database to allow easy processing). Imagine that each request in a RPC system results in a row in Postgres. Those are very useful and you can derive metrics from them reasonably well.


Structured logging has saved my ass so many times. Make a call to an API? Log it. Receive a call to an API? Log it. Someone starts pointing fingers, pull out the logs.

I love to attach logs to other rows in the database: (tableName, rowID, logID, actionID?). Something goes wrong in the middle of the night? Here's all the rows affected.

Log retention is easy. Use foreign keys with cascade delete, removing a row from the Log table removes all the log data. If you need to keep a log, add a boolean flag. Need to be sure? Use a rule to block deletion if the flag is set. (You're using Postgres, right?)


That’s all fine until you are running at reasonably large volume. Logging using an rdbms breaks down very quickly at scale, it’s very expensive tool for the job.


Agreed that storing structured logs into a relational database can be very expensive. But you can store structured logs into analytical databases such as ClickHouse. When properly configured, it may efficiently store and query trillions of log entries per node. Google for Cloudflare and Uber cases for storing logs in ClickHouse.

There are also specialized databases for structured logs, which efficiently index all the fields for all the ingested logs, and then allow fast full-text search over all the ingested data. For example, VictoriaLogs [1] is built on top of architecture ideas from ClickHouse for achieving high performance and high compression rate.

[1] https://docs.victoriametrics.com/VictoriaLogs/


sure. but thats not what the OP was saying. A pipeline of evented/ETLd/UDPd logs to an analytical db like clickhouse is a fairly standard and reasonable thing to do at scale. Not putting it in postgres.


Why does it break down?

Too many row insertions? Too many dB clients at the same time?

I'm genuinely curious here.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: