This seems like a great introduction to the history. I have a problem with it, though.
In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?
Later in that same example, there is an "Error = 4^2 + (-1)^2 + 6^2". Where did those values come from?
Later, there's another form: "Error = x^5 - 2x^3 -2" What about these?
There seem to be magic formulae everywhere, with no real explanation in the article about where they came from. Without that, I have no way of actually understanding this.
The other replies are already telling you that these are just examples. I want to stress that these are completely unrelated examples, which is bad form IMO.
If the first example had been kept, then the second would have been "Error = (6 - (2·3 + 1))² + (9 - (2·6 + 1))² + (18 - (2·12 + 1))² = (-1)² + (-4)² + (-7)² = 66", which is what compute_error_for_line_given_points evaluates to.
The third would have been "Error = (6 - (m·3 + b))² + (9 - (m·6 + b))² + (18 - (m·12 + b))² = 3·b² + 42·b·m - 66·b + 189·m² - 576·m + 441" and its derivative would have to be taken in two directions, giving "dError/dm = 42·b + 378·m - 576" and "dError/db = 6·b + 42·m - 66". Visualizing that slope would require a 3D plot.
>Am I missing something fundamental here?
Yeah, these aren't magic formula, they are just examples.
>In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?
It's an example. The first two arguments define a line y = 2x + 1,
the pairs are (x,y) points being used to compute the error.
"To play with this, let’s assume that the error function is Error=x^5−2x^3−2"
This is just an example of a function used as exposition to talk about derivatives.
It isn't even an error function though. An error function has to be a function of at least two variables.
Good point. They are all example data. The "[[3,6],[6,9],[12,18]]" can be thought of as the coordinates of a comet, and 2 is your predicted correlation, the slope, followed by 1 your predicted constant, the y-intercept. In this case, you want to change 2 and 1 to find the combination that results in the lowest error. It the same with "Error = 4^2 + (-1)^2 + 6^2", it's an example of an error function. Does that make sense?
In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?
Later in that same example, there is an "Error = 4^2 + (-1)^2 + 6^2". Where did those values come from?
Later, there's another form: "Error = x^5 - 2x^3 -2" What about these?
There seem to be magic formulae everywhere, with no real explanation in the article about where they came from. Without that, I have no way of actually understanding this.
Am I missing something fundamental here?