Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think I'm in the minority, but I really hate the term, "data scientist." It seems usually to mean, "senior statistician, but with training and credentials expected of an RA" (to clarify, that isn't meant as a comment on the original article). I would be especially skeptical about hiring someone who self-identifies as a "data scientist," people are trained as Statisticians, Biostatiticians, computer scientists, various subspecialties that end in "-metrician" (e.g. Econometrician, Psychometrician, Cliometrician), etc; no one is trained as a "data scientist." Unless you're hiring someone really junior, you want the "data scientist" to have a specialty -- anyone good will have one.

But the best way to find a good "data scientist" is probably the best way to find a good programmer -- be one yourself; tap your professional network; and hire people as consultants/freelancers on non-critical projects before making a real commitment. Identifying someone with a deep skill that one doesn't possess oneself is pretty much impossible. And on the flip side, I have trouble imagining that someone who really knows what he or she is doing would want to work for some unknown.

If you want someone to scrape and clean data with Perl and generate some scatter plots and histograms, look for undergrads with good grades who worked as Research Assistants, or recent grads working as RAs at consulting firms, research centers, governmental agencies, or think tanks. They'll do great (by and large), they've have had some informal training from a more senior researcher to help put everything in context, and faculty often steer their best students into those sorts of jobs, so there's a pretty strong quality screen. I'm sure there are other places to find people too.



>but I really hate the term, "data scientist."

I think most people do, but I've never heard a good term for the job. It's like "we want someone who can take large amounts of data and do something awesome with it". What do you call that?

>Unless you're hiring someone really junior, you want the "data scientist" to have a specialty -- anyone good will have one.

Not sure I agree with this, I want people who are well-rounded. I think it's great to find someone who specialized at something, but I'd want that person to be able to grow the rest of his abilities up to par. Example: let's say you specialized in machine learning. If you don't understand building scalable systems, you can't take a holistic view of a project; how will you know if your algorithms can scale to a production environment? Or, if you can't program well, you can't write code to actually get your algorithms into place. Or if you can't understand the business side of things, you won't be able to build trust with the rest of the company, and hence you won't be able to contribute.


>"we want someone who can take large amounts of data and do something awesome with it". What do you call that?

Analyst


That's close to what I used to be called. It's a very overused and vague term, I'd argue far worse than the already bad 'data scientist'. Search job postings for 'analyst' and see the wide variety of jobs that turn up.


>Analyst

Analyst is a four-letter word in programming circles, isn't it?


>"we want someone who can take large amounts of data and do something awesome with it"

Yeah, that's kind of the problem; until "awesome" gets defined, it's awfully hard to be specific about what the company needs. But this is why it's going to be really hard for someone that couldn't do part of the job themselves to hire effectively.

As to the second point, I guess it's going to depend on the business and on whether the statistical component is crucial or just peripheral. It's nice if everyone is pretty broadly well trained. But if you're a hedge fund building algorithmic trading rules, you need different people than a marketing research firm or a litigation consulting agency.


You're not alone. I have grumbled about it for years. I know Hilary Rosen and others have advocated for the term and I am very sensitive to the hardships endured by working in such an interdisciplinary manner, but it really is a goofy name.

It has always seemed to me just an excuse to run away from the "Artificial Intelligence 2.0" moniker and all the negative connotations that would imply. I dislike the label "Data Science" because there is not really much "Science" with a capital-S being done by people who adopt the moniker and with whom I have had chance to meet.

I have always thought that "Knowledge Engineer" was a more descriptive and useful term for what they actually do. The more abstract you get it seems to fall into the field properly known as Computational Mathematics.


Using the term Engineer willy-nilly is frowned upon in many countries where it's a professional designation.


Why not "Statistician," "Computational statistician," or "computer engineer/statistician"?


All good, but I think statistics as I understand it, is only a facet of the work involved. And the outcome or goal of their work seems generally to be the creation of a knowledge-based system for predictive analysis; to derive ontological meaning from numerical data (mostly about humans and human activity).

Not a bad article, if a little short, on wikipedia kinda captures it for me [0]:

* Assessment of the problem

* Development of a knowledge-based system shell/structure

* Acquisition and structuring of the related information, knowledge and specific preferences (IPK model)

* Implementation of the structured knowledge into knowledge bases

* Testing and validation of the inserted knowledge

* Integration and maintenance of the system

* Revision and evaluation of the system.

[0] http://en.wikipedia.org/wiki/Knowledge_engineering


Right, but the question is whether statistics is the most important facet of the work involved. Everything you've listed except "maintenance of the system" is part of statistics (which I'm going to define broadly as: 'what statisticians do'), and system maintenance is only missing in a "keep servers online" sense, not a "make sure that the concept we implemented remains appropriate" sense.


Theory: Data scientist is a contraction, not a description

viz > Data_Analyst + Computer_Scientist ~= Data_Scientist


Maybe the heuristic:

description of primary data set + "Analyst"

might be a good way to come up with more specific job titles.


Seconded; also +1 for any mention of cliometrics, anywhere, ever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: