Hacker News new | past | comments | ask | show | jobs | submit login

We did something similar at Netflix. We had all the aggregations but also stored all the raw data. The raw data would be pushed out of the live system and replaced with aggregates and then stored as flat text in S3. If for some reasons you needed the old data, you just put in a ticket with the monitoring team to load the data back into the live system (I think this is even self service now).

The system would then load the data from S3 back into the live system via Hadoop. Turns out it was pretty cheap to store highly compressible files in S3.




I've built a similar system as well. Raw data compressed and stored in s3, but aggregations are stored in postgres. Data stored in Postgres is compressed binary representation of a histogram and I added few C functions in postgres to do things like select x,y, histogram_percentile(histogram_merge(data), 99) group by x,y etc..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: