I am a mathematician by trade, and was doing development along with other stuff ...

randcraw · on June 23, 2016

I don't think your points are invalid, but I think you overvalue the data that's available and relevant to most programming tasks. And without novel data, ML can offer little novel value.

Google, Facebook, M$ Research, and perhaps Yahoo are extreme outliers. They have zottabytes of broad unstructured text data, so they mine it. Everybody else has megabytes of narrow structured data, most of it commercial transations of their products. That stuff has already been effectively mined by traditional basic OLAP methods. Most/all of the value has been extracted.

Mainstream software apps have yet to show the value of using ML. Such apps have access to very limited data of very narrow relevance. The utility of ML in such domains isn't new; it's classic optimization. Or it's bayesian anticipation. But it's not a game changer. Frankly, the use of ML in most mainstream apps is more likely to add distraction and annoyance as the computer mispredicts your intent -- like Microsoft Bob did.

Maybe "life in the cloud" will create new opportunities for smarter software. But I definitely don't want free apps making their own decisions when to notify me. I guarantee that will get old immediately. So how will this work? Frankly, I can't guess. Like Apple's iAds, programming ML into the mainstream or cloud sounds like an idea that will serve the software / cloud vendor far better than the user.

giardini · on June 23, 2016

I don't know why randcraw is being downvoted here: his/her points are vital clarifications.

Humans have been gathering and analyzing data for thousands of years. We have _not_ waited for Google's latest ML or neural nets to do analyses. Otherwise I'd be carving this post onto a stone for future generations to peruse.

The valuable and understandable AI, the step that will make a difference, isn't in "big data" - it's in figuring out how to do what those humans have been doing all those thousands of years.

serge2k · on June 23, 2016

> I don't know why randcraw is being downvoted here

I can't speak for anyone else, but "M$"

dgacmu · on June 23, 2016

Think outside consumer-facing applications. Medicine, biology, geology (oil, gas, and mining), finance, transportation. Tons of data, tons of dollars, and important problems.

randcraw · on June 23, 2016

I work in a big pharma analyzing image and experimental data. In a prior life I analyzed social cliques from vast numbers of user transactions. In both cases it seems like greater volumes of data should lead to deeper insights. But as it happens, the amount of useful actionable information in that data was surprisingly limited.

Often the available sensors/assays failed to detect reliable info. Or the phenomenon of interest interdepended on too many variables expressed with too great a dynamic range for us to detect reliably or model usefully. (The present lull in genomics R&D illustrates this well, as do automated interpretation of signals like EEG and NMR spectra.) And the signals that we can extract are often uninterpretable or sporadic. Alas, gathering more data won't yield more signal. Given the present limit on sensor resolution, you just get more mixed signals.

The potential of all ML is limited by the depth of the data that are essential for the discrimination of subtler signals. In the domains you mention (medicine, biology, geology, other sciences) I'm convinced we need better sensors more than greater amounts of the same data available now. We need better hypotheses which lead to better ideas of where to look and what to look for. In general, ML can't help with that. Until we better imagine how the mechanism might work, our questions remain too vague.

To wit, I'm afraid that applying ML to most software apps will suffer from the same limited ROI. I suspect that most app and user data is too shallow for mining to add appreciable value, no matter how clever it is.

giardini · on June 23, 2016

Most of that data isn't "big data". And most of it has been analyzed thoroughly. Sure, ML will be used to re-analyze it, but with mostly the same results. As randcraw states "Most/all of the value has been extracted."

Only a wild-eyed ML "gold digger" could imagine that there is a vein of gold in those mines. The reality is that, with few exceptions, we'll find more lumps of coal.

Perhaps I should switch from an ML swamp metaphor to an ML mine metaphor? <--Hah! Do that with ML!

LeanderK · on June 23, 2016

I think it depends on the job. Maybe a web-developer has lesser gain from extensive knowledge in ml. But I agree that every computer scientist (whether he works as an software engineer or not) should have some knowledge of ml, there are many things in the curriculum that are not as important as ml.

As a snarky remark: Maybe i am not yet qualified enough for real criticism as an cs-student, but i don't like it such sharp destinations between engineering and theory. All the "trial an error" in ml can be a useful guide to solving the theory. Also i guess the work of Jeff Dean is quite often more theoretical as the work of an average engineer. While i feel that if we have not developed a theory behind such tools, we have not really understood them, no one knows how komplex these things really are. I think/feel this makes ml-related engineering harder than software projects with a well understood theory

I just hope there are enough computer-scientists/mathmaticians at universities (or google ;) ) sharply looking on all the progess made in ml from the engineering side and asking themselves "what does that really mean?", because thats a hell of an interesting problem.

I may be wrong, my lecture on ml is next semester ;)

giardini · on June 23, 2016

Or perhaps Google's ML career path is largely a ruse, a Golgafrinchan Ark Fleet Ship B, that Google is using to trim a bloated developer pool?!8-))

http://hitchhikers.wikia.com/wiki/Golgafrincham