Is Google's HR data set really "Big Data" or just "data?" Seems like it would fi...

hkmurakami · on June 20, 2013

I quote my friend who works in "Big Data":

"sometimes I think Big Data is just Excel on 128GB of RAM"

daemonk · on June 20, 2013

Big data is a misnomer. Complex data is a better description. Having a terabyte of simple data with 2 columns is really not that difficult to analyze and won't give you much information. Whereas having a few hundred mbs data with complex relationships and many dimensions can yield tons of information and is far more difficult to analyze.

Difficulty in "big data" should be about its horizontal breadth (covering many aspects of a system) rather than its vertical depth (covering one aspect of a system in great resolution).

dasil003 · on June 20, 2013

Not to go all senselessly pedantic, but doesn't Excel have a limit of like 55,000 rows?

willismichael · on June 20, 2013

The devil is in the details. Big Data is really a massive cluster of VMs running maxed out Excel spreadsheets, and instrumented to restart automatically and restore from redundant backup, a la RAID, when the Excel process crashes one of the Windows VMs.

objclxt · on June 20, 2013

If you're being pedantic, it's 1,048,576 rows from Excel 2007, 65,536 rows before that.

hkmurakami · on June 20, 2013

2^20 rows! :)

dagw · on June 20, 2013

Current limit is 1,000,000 pr worksheet. However there is a tool called PowerPivot which lets you get around that limit and do analysis on larger data sets.

chinpokomon · on June 20, 2013

I didn't see this response when I replied to the parent. Power Pivot is pretty great when you can use it.

jjcr · on June 20, 2013

Not since 2007 where it got bumped to over 1 million. http://office.microsoft.com/en-us/excel-help/excel-specifica...

hkmurakami · on June 20, 2013

He's saying his file has way more rows than that, so maybe they upped the limit in the more recent versions of excel? (I think he also wrote a bunch of VBA and hooked into some external systems too)

chinpokomon · on June 20, 2013

Not when you use it with Power Pivot.

Joeri · on June 20, 2013

16 384 columns and 1 048 576 rows actually

n00b101 · on June 20, 2013

2^14 columns * 2^20 rows * 8 bytes per cell = 128GB. Bang on.

MortenK · on June 20, 2013

Just data. Working with "big data" is just boring old business intelligence in 99.9% of the cases.

Kurtz79 · on June 20, 2013

It's just me or this kind of comment is one of the most common in HN lately, when any story about data analysis comes up ?

We get it, big data it's not really "big" unless you talk giga(tera?)bytes.

Do not take upon yourself to educate any single person that misuses the term. It's not worth it. :)

ionforce · on June 20, 2013

I guess it's a natural reaction to people wanting to jump on that "big data" bandwagon. Kind of like size envy I guess? So sad...

Kurtz79 · on June 20, 2013

True, the point is that many people writing these stories cannot really tell (or care) about the difference. "Big data" is a sexy definition, so they go with it regardless of wheter it's actually relevant.

Most of the people here do, so these comments are really preaching in the wrong place...

ChuckMcM · on June 20, 2013

Probably not, but the year that I joined they had processed a million resumes. So they probably have some level of data (ranging from phone screen only to on-site interview) on anywhere from 8 to 12 million engineering candidates. For the folks who have come on site there might be a 5 - 8k words of text in their file for phone screens probably less than 1K depending on if they include a code sample or not. Most of the folks they processed at the time didn't get to on-site interviews so it probably skews to the lower end.

Its "not" big data in the sense that it needs a cluster to process but it is a pretty large sample set of the current population of engineers who might want to work there.

robbiep · on June 20, 2013

But then you have a tech company that isnt sprucing a buzz word

jedc · on June 20, 2013

They've reported receiving 1million applications per year. If even a fraction of those get interviewed (with 1-5 interviews per candidate) that's a good chunk of data. Correlate that with regular performance reviews of 30k employees... I'd say that's a small Big Data problem.

raverbashing · on June 20, 2013

30k? Data. Not Big Data

And "Small Big Data" is probably data as well.

ownagefool · on June 20, 2013

He's not talking about 30k rows, he's talking about 30k people. It could easily be big data if you monitor & document their every working moment, but they probably aren't doing that so you're probably right.

raverbashing · on June 20, 2013

Yes, 30k people, so it's what? Some interview reports, some performance reviews, HR report/history of the employee?

It really doesn't look like something big.

jedc · on June 20, 2013

1 million applications received. Say 10% of those go into some sort of evaluation process = 100k assessments/year. Say 10% of those go through an interview panel of (on average 3 interviews) = 30k assessments/year For 30k employees with (say on average) 2 assessments per year = 60k assessments/year.

So 1 million CVs per year on which to do some sort of evaluations, and 200k individual assessments per year. Over the past five years that roughly 6 million data points.

Since there's no hard-and-fast rule on this, that's why I called it small Big Data.

chris_mahan · on June 20, 2013

Even if it's 100 million rows. That's something a single beefy server with SQL Server 2012 can handle. That's not big data.

Big data is a million times 100 million rows.

shawabawa3 · on June 20, 2013

> Big data is a million times 100 million rows.

[citation needed]

This whole thread is pointless. There is no definition of Big data.

raverbashing · on June 20, 2013

This may be a definition: https://twitter.com/DEVOPS_BORAT/status/288698056470315008

drewvolpe · on June 20, 2013

Definitely not. And it's a good example of how useful POD ("plain old data") can be. They ask 6 team members 18 questions about what they think of their boss and give those 108 datapoints to her and it's tremendously valuable.

dk8996 · on June 21, 2013

Its "deep learning" data hahaha.