This seems like a great introduction to the history. I have a problem with it, t...

twillmas · on Sept 22, 2017

I'd also like to see more of a "teaching" post that can walk through the math incrementally.

Many of the deep learning courses assume "high school math", but my school must have skipped matrices, so I've been watching Khan Academy videos.

Are there any good posts / books on walking through the math of deep learning from a true beginner's perspective?

yorwba · on Sept 22, 2017

The other replies are already telling you that these are just examples. I want to stress that these are completely unrelated examples, which is bad form IMO.

If the first example had been kept, then the second would have been "Error = (6 - (2·3 + 1))² + (9 - (2·6 + 1))² + (18 - (2·12 + 1))² = (-1)² + (-4)² + (-7)² = 66", which is what compute_error_for_line_given_points evaluates to.

The third would have been "Error = (6 - (m·3 + b))² + (9 - (m·6 + b))² + (18 - (m·12 + b))² = 3·b² + 42·b·m - 66·b + 189·m² - 576·m + 441" and its derivative would have to be taken in two directions, giving "dError/dm = 42·b + 378·m - 576" and "dError/db = 6·b + 42·m - 66". Visualizing that slope would require a 3D plot.

letlambda · on Sept 22, 2017

>Am I missing something fundamental here? Yeah, these aren't magic formula, they are just examples.

>In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?

It's an example. The first two arguments define a line y = 2x + 1, the pairs are (x,y) points being used to compute the error.

"To play with this, let’s assume that the error function is Error=x^5−2x^3−2"

This is just an example of a function used as exposition to talk about derivatives.

It isn't even an error function though. An error function has to be a function of at least two variables.

emilwallner · on Sept 22, 2017

Good point. They are all example data. The "[[3,6],[6,9],[12,18]]" can be thought of as the coordinates of a comet, and 2 is your predicted correlation, the slope, followed by 1 your predicted constant, the y-intercept. In this case, you want to change 2 and 1 to find the combination that results in the lowest error. It the same with "Error = 4^2 + (-1)^2 + 6^2", it's an example of an error function. Does that make sense?