Hacker News new | past | comments | ask | show | jobs | submit login

I think Dan has braggadoccio'd his time estimates, or his task is somewhat different from what he describes. I mean, the guy talks fast, like really fast, so I suppose he's quick but mere minutes for something like this doesn't seem realistic unless it's extremely routine. Instead, it seems ad-hoc-ish and exploratory to me, it seems like something that needs to be considered and planned out rather than done between 2 sips of coffee. (I am considering his whole task here, not just the scp'ing of files).

He's talking about log files from a few hundred THOUSAND servers that results in several terabytes of data that have to be parsed. He doesn't say exactly what he's looking for, but the point is he's trying to answer some questions about performance for more than a few different applications. Are these simple questions, or involved ones which spawn other questions? We don't know, but even if they're easy questions, there's many applications involved and many servers.

Right off the bat, for something THAT BIG, I think it's reasonable to figure out what you're going to do with a sample of logs before downloading "home depot" onto your hard-drive. So this is definitely a multi-pass kind of job: start with a survey, then try a bigger chunk, if everything's OK do the rest.

Next, I think it's advisable to consider factors about the servers themselves: the application versions, whether or not the applications were running (and why not), the hardware, the role of the server, whether or not the server was up (and why not). Is this metadata about each server available (can you trust it?) or is it something that has to be queried each time on each server? Dan says this supposed to be a couple of years of data, has each server been through upgrades? when? Is that relevant? We don't know any of these, but they would have to at least be considered for someone doing this task.

After the data is parsed there's slicing and dicing to do for the purpose of graphs. That takes lotsa of time-- I am assuming he's not just talking about extracting one figure for each application and plotting it.

For someone that is all set-up and on top of things, this seems like something that is a day's work and easily more, not counting follow-up work and validation to further investigate the additional questions that would inevitably (in my opinion) be raised on such a big dataset.




I think you underestimate the value of pipelining here. You could spend time narrowing down the set of logs to download... but in the time it takes to figure that out, you might as well just download them all.

Having "home depot" locally available for analysis is never a bad thing, plus you may be racing against time re log rotation, etc.

> For someone that is all set-up and on top of things, this seems like something that is a day's work and easily more, not counting follow-up work and validation to further investigate the additional questions that would inevitably (in my opinion) be raised on such a big dataset.

In the middle of a SEV, you don't have a day to perform this kind of analysis. 15 minutes till the SLA clock starts ticking and customers are owed refunds.


OK, but it's not a "SEV", he's looking at 2 years of logs on hundreds of thousands of servers over a wide variety of applications. Nothing appears to be "down". This is more like an investigation looking for some high-value efficiency improvements (which is something he's written about).

I am sure he did it "quick", but mere minutes of work and mostly just waiting around for something like that analysis? I doubt it!


Seems pretty normal to me - we used to do this kind of thing on the regular. SEV analysis often requires digging through months of past logs to ascertain all the affected customers. If that takes a day every time you hit a customer-facing issue, your team ends up spending all their time on after action reports, and never gets any engineering done.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: