Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems like a great introduction to the history. I have a problem with it, though.

In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?

Later in that same example, there is an "Error = 4^2 + (-1)^2 + 6^2". Where did those values come from?

Later, there's another form: "Error = x^5 - 2x^3 -2" What about these?

There seem to be magic formulae everywhere, with no real explanation in the article about where they came from. Without that, I have no way of actually understanding this.

Am I missing something fundamental here?



I'd also like to see more of a "teaching" post that can walk through the math incrementally.

Many of the deep learning courses assume "high school math", but my school must have skipped matrices, so I've been watching Khan Academy videos.

Are there any good posts / books on walking through the math of deep learning from a true beginner's perspective?


The other replies are already telling you that these are just examples. I want to stress that these are completely unrelated examples, which is bad form IMO.

If the first example had been kept, then the second would have been "Error = (6 - (2·3 + 1))² + (9 - (2·6 + 1))² + (18 - (2·12 + 1))² = (-1)² + (-4)² + (-7)² = 66", which is what compute_error_for_line_given_points evaluates to.

The third would have been "Error = (6 - (m·3 + b))² + (9 - (m·6 + b))² + (18 - (m·12 + b))² = 3·b² + 42·b·m - 66·b + 189·m² - 576·m + 441" and its derivative would have to be taken in two directions, giving "dError/dm = 42·b + 378·m - 576" and "dError/db = 6·b + 42·m - 66". Visualizing that slope would require a 3D plot.


>Am I missing something fundamental here? Yeah, these aren't magic formula, they are just examples.

>In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?

It's an example. The first two arguments define a line y = 2x + 1, the pairs are (x,y) points being used to compute the error.

"To play with this, let’s assume that the error function is Error=x^5−2x^3−2"

This is just an example of a function used as exposition to talk about derivatives.

It isn't even an error function though. An error function has to be a function of at least two variables.


Good point. They are all example data. The "[[3,6],[6,9],[12,18]]" can be thought of as the coordinates of a comet, and 2 is your predicted correlation, the slope, followed by 1 your predicted constant, the y-intercept. In this case, you want to change 2 and 1 to find the combination that results in the lowest error. It the same with "Error = 4^2 + (-1)^2 + 6^2", it's an example of an error function. Does that make sense?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: