Big data is a misnomer. Complex data is a better description. Having a terabyte of simple data with 2 columns is really not that difficult to analyze and won't give you much information. Whereas having a few hundred mbs data with complex relationships and many dimensions can yield tons of information and is far more difficult to analyze.
Difficulty in "big data" should be about its horizontal breadth (covering many aspects of a system) rather than its vertical depth (covering one aspect of a system in great resolution).
The devil is in the details. Big Data is really a massive cluster of VMs running maxed out Excel spreadsheets, and instrumented to restart automatically and restore from redundant backup, a la RAID, when the Excel process crashes one of the Windows VMs.
Current limit is 1,000,000 pr worksheet. However there is a tool called PowerPivot which lets you get around that limit and do analysis on larger data sets.
He's saying his file has way more rows than that, so maybe they upped the limit in the more recent versions of excel? (I think he also wrote a bunch of VBA and hooked into some external systems too)
True, the point is that many people writing these stories cannot really tell (or care) about the difference. "Big data" is a sexy definition, so they go with it regardless of wheter it's actually relevant.
Most of the people here do, so these comments are really preaching in the wrong place...
Probably not, but the year that I joined they had processed a million resumes. So they probably have some level of data (ranging from phone screen only to on-site interview) on anywhere from 8 to 12 million engineering candidates. For the folks who have come on site there might be a 5 - 8k words of text in their file for phone screens probably less than 1K depending on if they include a code sample or not. Most of the folks they processed at the time didn't get to on-site interviews so it probably skews to the lower end.
Its "not" big data in the sense that it needs a cluster to process but it is a pretty large sample set of the current population of engineers who might want to work there.
They've reported receiving 1million applications per year. If even a fraction of those get interviewed (with 1-5 interviews per candidate) that's a good chunk of data. Correlate that with regular performance reviews of 30k employees... I'd say that's a small Big Data problem.
He's not talking about 30k rows, he's talking about 30k people. It could easily be big data if you monitor & document their every working moment, but they probably aren't doing that so you're probably right.
1 million applications received. Say 10% of those go into some sort of evaluation process = 100k assessments/year.
Say 10% of those go through an interview panel of (on average 3 interviews) = 30k assessments/year
For 30k employees with (say on average) 2 assessments per year = 60k assessments/year.
So 1 million CVs per year on which to do some sort of evaluations, and 200k individual assessments per year. Over the past five years that roughly 6 million data points.
Since there's no hard-and-fast rule on this, that's why I called it small Big Data.
Definitely not. And it's a good example of how useful POD ("plain old data") can be. They ask 6 team members 18 questions about what they think of their boss and give those 108 datapoints to her and it's tremendously valuable.
Seems like it would fit into a normal database. Or maybe even an unwieldy Excel spreadsheet.