I'm not sure what you are imagining the scope of your first project to be, but I would recommend you begin by understanding and implementing some well-known algorithms. Start with a Guassian Mixture model trained with the EM algorithm. Then do linear and logistic regression, perceptron is pretty simple and a simpler version of the widely used support vector machine.
The handwriting recognition database here is fantastic for testing a variety of simple ML models:
In our machine learning class, we would use data from the KDD cup for our projects. Why don't you create a submission for old KDD cups and see if your model can do better than random? 1998 is good for logistic and linear regression:
These should all be relatively small tasks. Learning how to interpret your results and iterate your models to make them better will take longer than understanding the algorithms.
There are so many possible directions that somebody might want to go, though. If you're ultimately interested in reinforcement learning, you don't immediately need to understand EM in order to "get" Q learning. Or to work on discrete problems related to computer vision, having a basis in linear programming and algorithms will get you further than logistic regression or SVMs.
I'll agree that anybody doing machine learning should understand the basic idea of maximum likelihood learning (e.g., logistic regression). But how far do you go beyond that? This is really the heart of the issue, I think. Yes, there are a ton of things that are really useful to know, and any machine learning student should learn them at some point. If you're trying to do an interesting project that teaches you something about research (and thus looks good on a graduate school application), though, what is the best use of your time?
You're right in that you need direction. Each subclass of machine learning has some elementary algorithms worth learning:
For unsupervised learning or probabilistic latent models, the EM algorithm, Metropolis-Hastings sampling, variational methods (variation EM, variational Bayes, expectation propagation).
For supervised learning, linear regression, perceptrons, support vector machines.
For reinforcement learning, Q-learning, E^3, Rmax.
Start by doing...pick a relatively easy algorithm, implement it, and fully understand why it's doing what it's doing. If you start out by implementing something with extreme math-fu it may just seem like magic.
My first ML project was to implement STAGGER (Schlimmerand Granger, 1986), which is a very simple algorithm for handling concept drift. Then I trained it on the domain {red, green, blue} X {square, circle, triangle} X {small, medium, large}. I fed it 40 positive examples of small red square, then 40 positive examples of large green triangle, then 40 positive examples of medium blue circle, and watched the learned concept change. I understood how and why it worked, and that felt pretty good.
For what it's worth, I think it's important to start with interesting problems. My first interaction with ML was an implementation of Naive Baye's for classifying spam, borrowing much ideas from PG's A Plan for Spam from scratch (ie no libraries). This is what got me really interested in the field, much more than randomly picking up topics- there are just so many areas to choose from. Another approach would be to read up on standard supervised learning techniques and just observing how the parameters for these algos behave on datasets. something like a Weka really comes in handy if you wish to focus on analyzing behavior of such techniques first. Best of luck!
Doing. If I just read, I don't get the other parts, like getting the debugging aspect correct, finding out the requirements of the environment, etc. Plus, I have less investment if I haven't typed the code in myself.
Edit: And continue with the formalized learning through reading (and experimentation). Later get a handle on common conventions, as they usually help accuracy/readability.
The handwriting recognition database here is fantastic for testing a variety of simple ML models:
http://yann.lecun.com/exdb/mnist/
In our machine learning class, we would use data from the KDD cup for our projects. Why don't you create a submission for old KDD cups and see if your model can do better than random? 1998 is good for logistic and linear regression:
http://www.kdnuggets.com/meetings/kdd98/kdd-cup-98.html
Using the 2007 dataset, you can try out some of the matrix factoring methods that have worked well for the Netflix prize:
http://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html
These should all be relatively small tasks. Learning how to interpret your results and iterate your models to make them better will take longer than understanding the algorithms.