> ElasticSearch was never really meant for log storage anyway. Indeed, it wasn't...

jsmthrowaway · on Nov 14, 2017

The snark is totally unnecessary, since the vast majority of people deploy ELK to do reporting. Full term search is achievable with grep; what does ES give you for your non-reporting use case, since troubleshooting is an extremely low frequency event? Are you primarily leaning on its clustering and replication? Genuinely curious.

The overhead per log record, building multiple indexes at log line rate, there’s just so many reasons not to do your use case in ES that I don’t even think about it. I think it’s a poorer fit than reporting, to be honest.

unethical_ban · on Nov 16, 2017

ELK > grep for searching. As the other poster said, per-field filtering and rapid pivoting is MUCH more effective workflow than greping for string fragments and hoping it matches on the proper field in a syslog message.

And you keep talking about how much you know and how ELK is literally worse than grep for searching off fields in logs for troubleshooting, but offer no alternative setups or use cases. You're hand-waving.

I've seen some of the performance issues of ELK at scale, and I'd be interested in what's out there, because its not my expertise. But you are just yelling "dataflow" and "streaming analytics".

dozzie · on Nov 14, 2017

> The snark is totally unnecessary, since the vast majority of people deploy ELK to do reporting.

You shouldn't have used authoritatively universal quantifier. There are plenty of sysadmins who use ES for this case, you apparently just happened to only be exposed to using it with websites.

Then, what ES+Kibana give me over grep? Search over specific field (my logs are parsed to a proper data structure), which includes type of event (obviously, different types for different daemons), a query language, and a frontend with histograms.

Mind you, troubleshooting around a specific event is but one of the things sysadmins do with logs. There are also other uses, all landing in the realm of post-hoc analysis.

jsmthrowaway · on Nov 14, 2017

Kibana and histograms are reporting. Now the snark is even more confusing, since you’re doing exactly what I say is a poor fit, but claiming it’s not your use case. I spend what time I can trying to show those very same sysadmins you’re talking about why ES is a poor architecture for log work, particularly at scale.

As an SRE, I’ve built high volume log processing at every employer in multiple verticals, including web. I know what sysadmins do. Not a fan of the condescension and assumptions you’re making. I have an opinion. We differ. That’s fine. Let it be fine.

dozzie · on Nov 14, 2017

> Kibana and histograms are reporting. [...] you’re doing exactly what I say is > a poor fit, but claiming it’s not your use case.

You must be from the species that can predict each and every report before it's needed. Good for you.

Also, I didn't claim that I don't use reports known in advance; I do use them. But there are cases when preparing such a report for just seeing one trend is an overkill, and there's still troubleshooting that is helped by the query language. Your defined-in-advance reports don't help with that.

> I spend what time I can trying to show those very same sysadmins you’re talking about why ES is a poor architecture for log work, particularly at scale.

OK. What works "particularly at scale", then?

Also, do you realize that "particularly at scale" is a quite rare setting, and "a dozen or less of gigabytes a day" scale is much, much more common, and ES works (worked) reasonably well for that?

jsmthrowaway · on Nov 14, 2017

You should read the Dremel and Dataflow papers as examples of alternative approaches and dial down your sarcastic attitude by about four clicks. You don’t need to define reporting ahead of time when architected well; it’s quite possible to do ad-hoc and post-hoc without indexing per record. At small scale, your questions are quite infrequent and the corpus small, meaning waiting on a full scan isn’t the end of the world.

A dozen or less gigabytes a day means: use grep. This is just like throwing Hadoop at that log volume.

This was an opportunity to learn from someone with a different perspective, and I could learn something from yours, but instead, you’ve made me regret even saying anything. I’m sorry, I just can’t engage with you further.

(Edit: I’m genuinely mystified that discussing alternative architectures is somehow arrogant “pissing on” people. Why personalize this so much?)

dozzie · on Nov 14, 2017

So, basically, you have/had an access to closed software designed specifically for working with system logs and based on that you piss on everybody who uses what they have at hand on a smaller scale. Or at least this is how I see your comments here.

I may need to tone down my sarcasm, but likewise, you need to tone down your arrogance about working at Google or compatible.

But still, thank you for the search keyword ("dremel"). I certainly will read the paper (though I don't expect too many very specific ideas from a publication ten pages long), since I dislike the current landscape of only having ES, flat files, and paid solutions for storing logs at a rate of few GB per day.

> A dozen or less gigabytes a day means: use grep. This is just like throwing Hadoop at that log volume.

No, not quite. I do also use grep and awk (and App::RecordStream) with that. I still want to have a query language for working with this data, especially if it is combined with easily usable histogram plotter.