*However, these models are largely big black-boxes. There are a lot of things we...

YeGoblynQueenne · on April 28, 2017

>> I'm getting kind of sick of this "deep learning is a black box" trope, because it's really not true anymore.

I know, right? Like, take this model I trained this morning. Here's the parameters it learned:

[0.230948, 0.00000000014134, 0.1039402934, 0.000023001323, 0.00000000000005]

I mean, what's "black-box" about that, really? You can instantly:

(a) See exactly what the model is a representation of.

(b) Figure out what data was used to train it.

(c) Understand the connection between the training data and the learned model.

It's not like the model has reduced a bunch of unfathomably complex numbers to another, equally unfathomable. You can tell exactly what it's doing- and, with some visualisation, it gets even better.

Because then it's a curve. Everyone groks curves, right?

Right, you guys?

/s obviously.

josefx · on April 28, 2017

Not to mention that the bug which causes it to detect a cat as panda is instantly visible. You really should change that 0.00000000014134 to a 0.000000000141339 .

electronvolt · on April 28, 2017

> I'm getting kind of sick of this "deep learning is a black box" trope, because it's really not true anymore.

That's fair/probably true.

I think there's two things that drive that--one, lack of a widely shared deep understanding of the field[0] (and not really needing a deep understanding to get good results--as both you and the author pointed out), and two, the fact that it feels like cheating, compared to the old ways of doing things. :P

[0] When the advice on getting a basic understanding is "read a textbook, then read the last 5 years of papers so that you aren't hopelessly behind", there just isn't going to be widespread understanding.

trevyn · on April 28, 2017

Fair. How about an excellent 4-minute YouTube video to get a basic understanding? :)

https://www.youtube.com/watch?v=AgkfIQ4IGaM

electronvolt · on April 28, 2017

I'll have to watch this later, but I'd argue the issue, at least for me, isn't really surface level understanding. (At least, the kind I think could plausibly be imparted in 4 minutes. :))

The basic idea of deep learning has always seemed straightforward to me[0]. However, at least my perception is that it feels like there's a lot of deep magic going on in the details at the level that Google/Microsoft/Amazon/researchers are doing deep learning. That's honestly true of most active research areas[1], but since those results are also the results that keep getting a lot of attention, the "it's a black box" feeling makes sense to me. :)

[0] Having done both some moderately high level math and having a CS background, I feel like most ideas in CS fit this description, though. Our devil is the details.

[1] For instance: fairly recent results in weird applications of type theory are also super cool, and require some serious wizardry, but those get much less attention. (And are, I think, more taken for granted, since who doesn't understand a type system? /s)

trevyn · on April 28, 2017

Having until very recently worked in deep learning at Google, I can assure you that if you read and watch enough recent public papers and talks, you will be very, very close to the latest thinking of researchers at these companies.

You're right that it can take some time to do this edification work and develop the understanding for yourself -- the research is broader and more specialized than it appears at first glance -- and it does help to be surrounded by smart people puzzling over the same types of problems, but there's very little secret magic here. It is, however, of benefit to these companies to develop a public image of exclusivity and wizardry in their research; I fell into this trap too, before I saw how the sausage is made.

If you want to make your own fundamental innovations in deep learning, it can be very resource-intensive, both computationally and otherwise. However, it is easy to apply the current state-of-the-art to a broad spectrum of applications in novel ways.

One of the reasons I left is that I think there is a big opportunity in applying these powerful basic principles and approaches to more domains. The research companies are, IMO, focused on businesses that are or have the potential to become very, very large, and that can take advantage of their ability to leverage massive amounts of capital. This leaves many openings for new medium-sized businesses. Of course, as you grow, you can take stabs at progressively larger problems.

iheartmemcache · on April 28, 2017

I'm with you 100%. RF has been around for ages, but it still is "black magic" to most EEs (after most people finish the standard 100 level courses describing op-amps, people tend to go into the digital domain and leave analog work to that small demographic). One EE will be able to design a fantastic 7 layers of poly, multiprocessing chip in his garage using Cadence and the TSMC 65nm libs, while someone else will be able to design a flawless cavity filter at 16 ghz. People have specific domains of expertise, even when they hold the same "EE" or "CS" or "Math" degree from the same university, largely based on which courses they elected to take in their 3rd and 4th year.

Likewise, fields advance quickly. I can grok how an z80 or 6502 works from NAND to Tetris, but even a mediocre second year grad student would wipe the floor with me. I, too, went pretty far down the road of mathematics, but watching MSRI lectures from the last few years leaves me struggling to keep up, in the field (algebraic topology) where I once felt comfortable. If you don't keep up with your field you're going to be lost.

The reason I think the 'black magic' trope keeps on being bandied about is because most people reading the articles describing ImageNet et al just don't have the background necessary to grok it[1]. If you had asked them a year ago what the convolution operator was, they'd have scratched their head. When they try to go and read that ImageNet paper they'll be left even more confused because the last time they thought about linear algebra was in their freshman year of uni. It'd be analogous to trying to write some computational fluid dynamics modeling software after not having taken/not touching diff eqs for a decade.

[1] This isn't to disparage those who didn't- everyone has their domain of expertise. I'm just trying to emphasize why the conception of 'black magic' exists. It's quite simple - when one has a tenuous grasp on the foundational knowledge upon some theory is built, you will have difficulty learning abstractions built upon said foundations.

trevyn · on April 28, 2017

Ah, this is interesting, because I've recently dabbled a bit in RF. My path went like this:

1) Interested in doing something with RF, don't know much about it, know that people say it's black magic.

2) Do some research... Ah, this is a pretty deep topic, and it might take a while to develop the necessary intuition.

3) Become competent enough to solve my immediate problem, recognize that it is a extensive field in which there is a lot of specialized practical knowledge that could be acquired.

4) Accept that I have higher life priorities than to go down the RF rabbit hole, but feel that I could learn it if I wanted to invest the time. No longer feels like black magic.

I think there is a distinction between fields like deep learning and RF, where most of the information is public if you know where to look, and say, cryptanalysis or nuclear weapon design or even stage magic, where the details and practical knowledge of the state-of-the-art are more locked behind closed doors. And for a field that you're not familiar with, it can be initially unclear which category it falls into. I think the existence of public conferences on the topic is a good indicator, though.

CarlsJrMints · on April 28, 2017

I would love to hear more about these "weird applications of type theory". Any references?

electronvolt · on April 28, 2017

So it turns out you can basically use type theory to encode a surprisingly large number of desirable traits about your program. (Caveat being that as you get more restrictive, you reject more "good" programs at compile time--no free lunch with Rice's theorem.)

For example: In this paper, they basically use types (with an inference algorithm) to catch kernel/user pointer confusion in the Linux kernel. (https://www.usenix.org/legacy/event/sec04/tech/johnson.html)

It turns out you can encode a lot of other interesting properties in a type system (esp. if you're building on top of the existing type system), though--you can ensure that a java program has no null-dereference checks (https://checkerframework.org/ has a system that does this), and Coq uses its type system to ensure that every program halts (as a consequence, though, it isn't actually Turing complete).

There's also cool things like Lackwit (http://ieeexplore.ieee.org/document/610284/) which basically (ab)used type inference algorithms to answer questions about a program ("does this pointer ever alias?", etc.).

foldr · on April 28, 2017

Surely it's a black box in a much deeper sense than that. We're nowhere near being able to prove that a given net implements a particular transformation approximately correctly. So for example, if you were to train a net to reconstruct 3D geometry with apparent success, there would be no way to validate that except by testing it on lots of examples. Similarly, we would not know how to give a precise characterisation of which images the network could analyse correctly.

PeterisP · on April 28, 2017

For the type of problems we commonly 'attack' with neural networks, the same criticism applies to all solutions - for the earlier solutions generally we can prove that they definitely don't/can't work in the arbitrary general case, and we can prove some boundaries of correctness that apply for a highly simplified artificial case with assumptions that doesn't really represent the real world problem we want to tackle. In many domains successfully using neural nets, the "we would not know how to give a precise characterisation" is a fundamental limitation of the task that applies to all possible methods.

I'm working on other ML areas and not computer vision, but using your own example, what methods of reconstructing complex 3D geometry from real world photographs have some way of proving that their transformation is correct within certain bounds?

For this particular example, can it even be theoretically provable since we know how to make misleading objects that produce visual illusions to appear to have a much different 3d shape that they do? (e.g. https://www.youtube.com/watch?v=qJGT-aZKCYk was an example found by a quick searching)

In many real world domains "lots of examples" is the only proper source of truth, any formal model would be a poor approximation of that, and you'd want to have a system that can and will diverge from that model when seeing further real world data, it should be able to correct for the simplifications of that formal model instead of being provably bound to it.

foldr · on April 28, 2017

I may have chosen a bad example in the form of reconstructing 3D geometry. But take sentence parsing instead. If I parse a sentence using classical techniques, I know exactly how that works and exactly which sentences would or wouldn't be accepted. That level of understanding isn't (currently) possible with deep learning techniques, even if in some cases they perform better.

It's true that some problems are so ill-defined that all you can judge is whether or not a particular technique succeeds over a sample of interesting cases. But not all problems are like that.

>what methods of reconstructing complex 3D geometry from real world photographs have some way of proving that their transformation is correct within certain bounds?

The issue I have with this is that it's essentially giving up on understanding how reconstruction of 3D geometry works. One might at least hope that the techniques that make it possible to do this from real-world photographs are, with some idealization, the same techniques that make it possible to do this (nondeterministically) from a 2D perspective rendering of a 3D scene made of polygons. And we certainly can prove results about the effectiveness of those techniques. I think it's far too early to give up on that possibility and just say "it's all a mess, and whatever methods happen to work happen to work".

>we know how to make misleading objects that produce visual illusions to appear to have a much different 3d shape that they do?

But that supports my point, I think. We can prove that those objects have the propensity to give rise to illusions given certain assumptions regarding the algorithms used to reconstruct the scene. We can't (yet) prove what kinds of objects would fool a deep learning model.

PeterisP · on April 28, 2017

Okay, let's take sentence parsing instead, I've got much more backround in that. If we're looking at classical techniques in the sense of 'classical' as in techniques popularized pre-neural networks some 10+ years ago, e.g. something like Charniak-Johnson or Nivre's Maltparser (and generally augmented with all kinds of tricks, model ensembles, transfer learning, custom preprocessing for structured data e.g. dates, and a whole NLP pipeline before the syntax part starts - all this was pretty much a must-have in any real usage) then all the same criticisms apply, the factors that the statistical model learns aren't human-understandable, and the concept of "accepted sentences" is meaningless (IMHO rightfully so), the parser accepts everything but the question is about the proper interpretation/ranking of potentially ambiguous locations. Even simple methods such as lexicalized PCFG would fall into this bucket; pretty much all "knowledge" is embedded in the learned probabilities and there isn't much meaningful interpretability lost by switching from a PCFG to neural networks.

On the other hand, if we think of 'classical techniques' as in something a textbook would describe as an example, e.g. a manually, carefully built non-lexicalized CFG, then these fall under the "highly simplified artificial case" I talked about earlier - they provide a clean, elegant, understandable solution to a small subset of the problem, a subset that no one really cares about. They simply aren't nowhere near competitive on any real world data, they either "don't accept" a large portion of even very clean data, or provide a multitude of interpretations while lacking the power to provide good rankings comparable to state-of-art pipelines run by statistical learning or, recently, neural networks.

Furthermore, syntactic parsing of sentences has the exact theoretical limits to provability - there is no good source of "truth", and there is no good source of truth possible; if you follow descriptive linguistic approach then English (or any other language) can only be defined by lots of examples; and if you follow a perscriptive approach (which could be made into some finite formal model) then you get enough "ungrammatical" sentences in literal literary English e.g. reviewed&corrected works by respected authors; even more so in any real world text you're likely to encounter. Careful human annotators get disagreements in some 3% primitive elements, i.e. every 2-3 sentences - how can you prove that any model will get the correct answer if for almost half the sentences we cannot agree what exactly the correct answer is?

foldr · on April 28, 2017

Of course it is not possible to prove that a particular algorithm "does the right thing", because "doing the right thing" is an inherently vague notion. The issue with neural networks is that we often don't understand how they work even under idealized conditions. In the case of the PCFG, we can characterize precisely which sentences will or won't be parsed for a given parsing algorithm. We are never going to have an explanation of how real world parsing works because the real world is too complicated. But we might hope to figure out how the techniques that work in the real world can be understood as extensions of techniques that work in idealized conditions. The PCFG is a good example of that. There's nothing to understand about the probabilities. As you say, they're amalgamations of an indefinite number of real-world factors. But there is a core to the parsing algorithm that we do understand.

computerex · on April 28, 2017

People have used genetic algorithms to evolve a set of images to fool DL machine vision models. Your point stands, we don't know of a deterministic method that I know of that can generate images optimized to fool DL models, and that'd indicate a black box model.

sherjilozair · on April 28, 2017

> I'm getting kind of sick of this "deep learning is a black box" trope

I'm not even a deep learning fan (see username), but this is a meme people need to get rid of. This meme is probably hurting people more than the actual impermeability of deep learning.