The start of linear algebra was seen in
high school algebra, solving systems of
linear equations.
E.g., we seek numerical values of x and y
so that
3 x - 2 y = 7
-x + 2 y = 8
So, that is two equations in the two
unknowns x and y.
Well, for positive integers m and n, we
can have m linear (linear is in the
above example but omitting here a careful
definition) equations in n unknowns.
Then depending on the constants, there
will be none, one, or infinitely many
solutions.
E.g., likely the central technique of ML
and data science is fitting a linear
equation to data. There the central idea
is the set of normal equations which are
linear (and, crucially, symmetric and
non-negative semi-definite as covered
carefully in linear algebra).
(2.2) Gauss Elimination
The first technique for attacking linear
equations is Gauss elimination. There can
determine if there are none, one, or
infinitely many solutions. For one
solution, can find it. For infinitely
many solutions can find one solution and
for the rest characterize them as from
arbitrary values of several of the
variables.
(2.3) Vectors and Matrices
A nice step forward in working with
systems of linear equations is the subject
of vectors and matrices.
A good start is just
3 x - 2 y = 7
-x + 2 y = 8
we saw above. What we do is just rip out
the x and y, call that pair a vector,
leave the constants on the left as a
matrix, and regard the constants on the
right side as another vector. Then the
left side becomes the matrix theory
product of the matrix of the constants
and the vector of the unknowns x and y.
The matrix will have two rows and two
columns written roughly as in
/ \
| 3 - 2 |
| |
| -1 2 |
\ /
So, this matrix is said to be 2 x 2 (2 by
2).
Sure, for positive integers m and n, we
can have a matrix that is m x n (m by n)
which means m rows and n columns.
The vector of the unknowns x and y is 2 x
1 and is written
/ \
| x |
| |
| y |
\ /
So, we can say that the matrix is A; the
unknowns are the components of vector v;
the right side is vector b; and that the
system of equations is
Av = b
where the Av is the matrix product of A
and v. How is this product defined? It is
defined to give us just what we had with
the equations we started with -- here
omitting a careful definition.
So, we use a matrix and two vectors as new
notation to write our system of linear
equations. That's the start of matrix
theory.
It turns out that our new notation is
another pillar of civilization.
Given a m x n matrix A and an n x p matrix
B, we can form the m x p matrix product
AB. Amazingly, this product is
associative. That is, if we have p x q
matrix C then we can form m x q product
ABC = (AB)C = A(BC)
It turns out this fact is profound and
powerful.
The proof is based on interchanging the
order two summation signs, and that fact
generalizes.
Matrix product is the first good example
of a linear operator in a linear
system. The world is awash in linear
systems. There is a lot on linear
operators, e.g., Dunford and Schwartz,
Linear Operators. Electronic
engineering, acoustics, and quantum
mechanics are awash in linear operators.
To build a model of the real world, for
ML, AL, data science, ..., etc., the
obvious first cut is to build a linear
system.
And if one linear system does not fit very
well, then we can use several in patches
of some kind.
(2.4) Vector Spaces
For the set of real numbers R and a
positive integer n, consider the set V of
all n x 1 vectors of real numbers. Then V
is a vector space. We can write out the
definition of a vector space and see that
the set V does satisfy that definition.
That's the first vector space we get to
consider.
But we encounter lots more vector spaces;
e.g., in 3 dimensions, a 2 dimensional
plane through the origin is also a vector
space.
Gee, I mentioned dimension; we need a
good definition and a lot of associated
theorems. Linear algebra has those.
So, for matrix A, vector x, and vector of
zeros 0, the set of all solutions x to
Ax = 0
is a vector space, and it and its
dimension are central in what we get in
many applications, e.g., at the end of
Gauss elimination, fitting linear
equations to data, etc.
(2.5) Eigen Values, Vectors
Eigen in German translates to English as
special, unique, singular, or some such.
Well, for a n x n matrix A, we might have
that
Ax = lx
for number l. In this case what matrix A
does to vector x is just change its length
by l and keep its direction the same. So,
l and x are quite special. Then l is an
eigenvalue of A, and x is a
corresponding eigenvector of A.
These eigen quantities are central to the
crucial singular value decomposition, the
polar decomposition, principal components,
etc.
(2.6) Texts
A good, now quite old, intermediate text
in linear algebra is by Hoffman and Kunze,
IIRC now available for free as PDF on the
Internet.
A special, advanced linear algebra text is
P. Halmos, Finite Dimensional Vector
Spaces written in 1942 when Halmos was an
assistant to John von Neumann at the
Institute for Advanced Study. The text is
an elegant finite dimensional introduction
to infinite dimensional Hilbert space.
is an entertaining article about Harvard's
course Math 55. At one time that course
used that book by Halmos and also, see
below, Baby Rudin.
For more there is
Richard Bellman, Introduction to Matrix
Analysis.
Horn and Johnson, Matrix Analysis.
There is much more, e.g., on numerical
methods. There a good start is LINPACK,
the software, associated documentation,
and references.
(5) More
The next two topics would be probability
theory and statistics.
For a first text in either of these two,
I'd suggest you find several leading
research universities, call their math
departments, and find what texts they are
using for their first courses in
probability and statistics. I'd suggest
you get the three most recommended texts,
carefully study the most recommended one,
and use the other two for reference.
Similarly for calculus and linear algebra.
For more, that would take us into a ugrad
math major. Again, make some phone calls
for a list of recommended texts. One of
those might be
W. Rudin, Principles of Mathematical
Analysis.
aka, "Baby Rudin". It's highly precise
and challenging.
For more,
H. Royden, Real Analysis
W. Rudin, Real and Complex Analysis
L. Breiman, Probability
M. Loeve, Probability
J. Neveu, Mathematical Foundations of the
Calculus of Probability
The last two are challenging.
For Bayesian, that's conditional
expectation from the Radon-Nikodym theorem
with a nice proof by John von Neumann in
Rudin's Real and Complex Analysis.
After those texts, often can derive the
main results of statistics on your own or
just use Wikipedia a little. E.g., for
the Neyman-Pearson result in statistical
hypothesis testing, there is a nice proof
from the Hahn decomposition from the
Radon-Nikodym theorem.
I have been inspired by some of your past posts suggesting a path for studying mathematics and doing graduate level work, and have changed my direction to try and follow what you suggest. Is there any way I can get in touch with you privately? (I'm not looking for help with specific technical questions if you're concerned about that.)
Are you doing the "Get the book. Read the book. Do the exercises." method? If you are, what's your experience?
I have had some books stored up since forever, and graycat's post did motivate me to finally get around to reading them, but I find it hard to integrate into my daily routine. His 24h challenge killed my productivity for a day, and I can't really afford to get distracted by some tricky proof when I'm supposed to do something else.
Yes, I'm working through a few books that way. I didn't see his 24h challenge so I'm not sure what it is, but what has been effective for me is blocking off a few hours every day to work on this stuff. I haven't gotten to the really difficult material he's talking about yet, but I'm looking forward to seeing how this goes. Good luck to both of us!
In a different comment chain on the same submission (https://news.ycombinator.com/item?id=15024640), he challenged the commenters disagreeing with him to do these exercises in 24 hours. The tone was pretty abrasive, TBH, but I found the questions interesting enough that I tackled them in earnest.
I posted my solution attempts, so don't scroll down too far if you want to try them on your own ;)
(2) Linear Algebra
(2.1) Linear Equations
The start of linear algebra was seen in high school algebra, solving systems of linear equations.
E.g., we seek numerical values of x and y so that
So, that is two equations in the two unknowns x and y.Well, for positive integers m and n, we can have m linear (linear is in the above example but omitting here a careful definition) equations in n unknowns.
Then depending on the constants, there will be none, one, or infinitely many solutions.
E.g., likely the central technique of ML and data science is fitting a linear equation to data. There the central idea is the set of normal equations which are linear (and, crucially, symmetric and non-negative semi-definite as covered carefully in linear algebra).
(2.2) Gauss Elimination
The first technique for attacking linear equations is Gauss elimination. There can determine if there are none, one, or infinitely many solutions. For one solution, can find it. For infinitely many solutions can find one solution and for the rest characterize them as from arbitrary values of several of the variables.
(2.3) Vectors and Matrices
A nice step forward in working with systems of linear equations is the subject of vectors and matrices.
A good start is just
we saw above. What we do is just rip out the x and y, call that pair a vector, leave the constants on the left as a matrix, and regard the constants on the right side as another vector. Then the left side becomes the matrix theory product of the matrix of the constants and the vector of the unknowns x and y.The matrix will have two rows and two columns written roughly as in
So, this matrix is said to be 2 x 2 (2 by 2).Sure, for positive integers m and n, we can have a matrix that is m x n (m by n) which means m rows and n columns.
The vector of the unknowns x and y is 2 x 1 and is written
So, we can say that the matrix is A; the unknowns are the components of vector v; the right side is vector b; and that the system of equations is where the Av is the matrix product of A and v. How is this product defined? It is defined to give us just what we had with the equations we started with -- here omitting a careful definition.So, we use a matrix and two vectors as new notation to write our system of linear equations. That's the start of matrix theory.
It turns out that our new notation is another pillar of civilization.
Given a m x n matrix A and an n x p matrix B, we can form the m x p matrix product AB. Amazingly, this product is associative. That is, if we have p x q matrix C then we can form m x q product
ABC = (AB)C = A(BC)
It turns out this fact is profound and powerful.
The proof is based on interchanging the order two summation signs, and that fact generalizes.
Matrix product is the first good example of a linear operator in a linear system. The world is awash in linear systems. There is a lot on linear operators, e.g., Dunford and Schwartz, Linear Operators. Electronic engineering, acoustics, and quantum mechanics are awash in linear operators.
To build a model of the real world, for ML, AL, data science, ..., etc., the obvious first cut is to build a linear system.
And if one linear system does not fit very well, then we can use several in patches of some kind.
(2.4) Vector Spaces
For the set of real numbers R and a positive integer n, consider the set V of all n x 1 vectors of real numbers. Then V is a vector space. We can write out the definition of a vector space and see that the set V does satisfy that definition. That's the first vector space we get to consider.
But we encounter lots more vector spaces; e.g., in 3 dimensions, a 2 dimensional plane through the origin is also a vector space.
Gee, I mentioned dimension; we need a good definition and a lot of associated theorems. Linear algebra has those.
So, for matrix A, vector x, and vector of zeros 0, the set of all solutions x to
Ax = 0
is a vector space, and it and its dimension are central in what we get in many applications, e.g., at the end of Gauss elimination, fitting linear equations to data, etc.
(2.5) Eigen Values, Vectors
Eigen in German translates to English as special, unique, singular, or some such.
Well, for a n x n matrix A, we might have that
Ax = lx
for number l. In this case what matrix A does to vector x is just change its length by l and keep its direction the same. So, l and x are quite special. Then l is an eigenvalue of A, and x is a corresponding eigenvector of A.
These eigen quantities are central to the crucial singular value decomposition, the polar decomposition, principal components, etc.
(2.6) Texts
A good, now quite old, intermediate text in linear algebra is by Hoffman and Kunze, IIRC now available for free as PDF on the Internet.
A special, advanced linear algebra text is P. Halmos, Finite Dimensional Vector Spaces written in 1942 when Halmos was an assistant to John von Neumann at the Institute for Advanced Study. The text is an elegant finite dimensional introduction to infinite dimensional Hilbert space.
At
http://www.american.com/archive/2008/march-april-magazine-co...
is an entertaining article about Harvard's course Math 55. At one time that course used that book by Halmos and also, see below, Baby Rudin.
For more there is
Richard Bellman, Introduction to Matrix Analysis.
Horn and Johnson, Matrix Analysis.
There is much more, e.g., on numerical methods. There a good start is LINPACK, the software, associated documentation, and references.
(5) More
The next two topics would be probability theory and statistics.
For a first text in either of these two, I'd suggest you find several leading research universities, call their math departments, and find what texts they are using for their first courses in probability and statistics. I'd suggest you get the three most recommended texts, carefully study the most recommended one, and use the other two for reference.
Similarly for calculus and linear algebra.
For more, that would take us into a ugrad math major. Again, make some phone calls for a list of recommended texts. One of those might be
W. Rudin, Principles of Mathematical Analysis.
aka, "Baby Rudin". It's highly precise and challenging.
For more,
H. Royden, Real Analysis
W. Rudin, Real and Complex Analysis
L. Breiman, Probability
M. Loeve, Probability
J. Neveu, Mathematical Foundations of the Calculus of Probability
The last two are challenging.
For Bayesian, that's conditional expectation from the Radon-Nikodym theorem with a nice proof by John von Neumann in Rudin's Real and Complex Analysis.
After those texts, often can derive the main results of statistics on your own or just use Wikipedia a little. E.g., for the Neyman-Pearson result in statistical hypothesis testing, there is a nice proof from the Hahn decomposition from the Radon-Nikodym theorem.