What Every Computer Science Major Should Know

elehack · on Aug 24, 2011

I would bump up compilers to their own level.

Implementing intepreters is cool. But the key value of studying compilers is that it provides you with the tools to build robust, extensible input processing (and, to a slightly lesser extent, data processing and output) code. If programmers are confident with using regular expressions and context-free grammars as appropriate, even if it's just coding up recursive descent parsers, I think they can produce much better input processing in their software.

And a large quantity of software needs to parse input, and possibly turn it in to different output, at some point in its execution.

pmiller2 · on Aug 24, 2011

This reads to me much less like "What every CS major should know" than "Wouldn't it be nice if all CS majors knew...." What I mean by that is I'm not sure it's possible to learn all of that stuff in 4 years at any school anywhere, but it's all useful in certain very broad contexts.

Of course, I may lack perspective, having been a math major (or due to some other deficiency in my background), but I consider myself pretty comfortable with about half the list and somewhat familiar with about half the remainder. How much of it did I learn as an undergrad, though? Not very damn much, and I don't think I would have even been exposed to it all even if I had done CS instead of math.

regehr · on Aug 24, 2011

Of course book-learning and in-class learning needs to be supplemented with a large amount of self-directed learning. As a professor I'm sure Matt is perfectly well acquainted with the limitations of a 4-year degree program.

petey · on Aug 24, 2011

This brings up an interesting question: assuming that these things can't be taught in four years and that they matter, should we expect that a degree in CS should require more time?

sindoc · on Aug 24, 2011

Keywords that I believe would be worth mentioning:

- Lambda Calculus - Reflective and Meta-programming - Meta-object Protocol - Closures - Continuations - Monads - Arrows - First-class Everything - Stack and Register-based Programming - XML - Linear Algebra - Fractal and Wavelet Image Compression - Regular Expressions - Clojure - LaTeX

NB. The concepts to which the above keywords refer, may or may not have been covered by the article. The keywords themselves are however absent.

Additional reading suggestions:

- Jon Bentley's Programming Pearls - Tom Mitchell's Machine Learning - Douglas Hofstadter's Gödel, Escher, Bach - Brian Kernighan and Rob Pike's Unix Programming Environment

mattmight · on Aug 24, 2011

Great suggestions.

I was about to add GEB, but realized it doesn't quite fit under any category.

I could add a "general" reading list, but do you think it fits in a category?

timtadh · on Aug 24, 2011

I recommend The Incompleteness Phenomenon over GEB[1]. It thoroughly presents Gödel's Completeness and Incompleteness theories at a level appropriate to the collegiate Math or Computer Science student.

[1] http://www.amazon.com/Incompleteness-Phenomenon-Martin-Golds...

sindoc · on Aug 24, 2011

The discussion of Gödel's theorems is not the only thing that makes GEB an interesting read.

What's more, GEB presents Aristotelianism vs. Platonism in the context of _computation_.

_That_ is a question CS majors should at least be aware of.

The question of whether computations can _ever_ be brought to have consciousness.

Then again, given my limited knowledge, I am far from being an authority on the topic and look forward to gaining more insight by hearing other opinions.

timtadh · on Aug 25, 2011

Undoubtedly. However, from a standpoint of teaching a class to Math and Computer Science students you can and should present the backing theories in full detail. Student benefit greatly from having a common write up on the proofs. The philosophical discussion can be saved for the classroom (with supporting supplementary readings).

sindoc · on Aug 24, 2011

The best match among existing categories, IMHO is ``Theory,'' which I assume refers to "Theory of Computation."

It is there that formal systems are brought up and CS majors should be aware of the hierarchy and limitations of formal systems, which are discussed in GEB.

If I were to place GEB under an arbitrary category, I would probably suggest ``Philosophy of Computation.''

silentbicycle · on Aug 24, 2011

You mention version control in passing, but it probably deserves more emphasis. CS majors who aren't comfortable using version control are likely to be at a disadvantage.

OCaml may work as an alternative to SML.

You could also mention Information Retrieval, alongside Machine Learning.

Compilers/parsing/interpreters/etc. could really be its own top level item.

Learning to use profilers / performance optimization. _The Practice of Programming_ covers this (and many other things!) quite well.

All in all, a great list. Probably also useful to suggest starting points for people (like myself) who are self-taught.

mattmight · on Aug 24, 2011

To be clear: I consider understanding version control essential. I'll emphasize it.

I agree that OCaml and SML are roughly interchangeable for educational purposes. I picked SML because it's slightly "cleaner."

As a compilers prof, I agree that compilers should be its own item, but politically, that's a tough sell. Teaching compilers needs to be sold in the service of another end for most faculties.

silentbicycle · on Aug 24, 2011

Cleaner, yes. I picked OCaml because it seemed easier to find information about, and more widely ported.

Also, I think most people fail to appreciate how useful compiler techniques are outside of "compilers". Even just knowing a little about lexing and parsing goes a long way.

For Prolog, I'd also recommend _The Art of Prolog_ by Sterling & Shapiro and/or Clocksin's _Clause & Effect_.

Siah_twitter · on Aug 24, 2011

I think probabilistic learning (or machine learning) should have its own section. Recommder systems, sentiment analysis, and graphical models do not really fall into the traditional understanding of AI. It is more stats than CS and logic.

mattmight · on Aug 24, 2011

I agree. I'll split it out into its own section.

mreid · on Aug 24, 2011

Mitchell's ML book is showing its age a bit now. You may want to consider adding Bishop's _Pattern Recognition and Machine Learning_: http://research.microsoft.com/en-us/um/people/cmbishop/prml/

tef___ · on Aug 24, 2011

many cs majors are underexposed to operational and system concerns

so i'd add Jim Gray's 'Why Do Computers Stop and What can be done about it' http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf

Also, 'Reflections on Trusting Trust' is a good one to add under security. I'd also recommend 'What's your threat model?' http://iang.org/ssl/wytm.html

tbekolay · on Aug 24, 2011

The only thing that I had in my undergrad education that I miss in this list is experience with an assembly language. It probably fits in best in the discussion on Architecture: programming in an assembly language, at the very least, makes programmers appreciate high-level languages and makes them think about the underlying hardware.

mattmight · on Aug 24, 2011

Great point!

I'll make that an explicit point under architecture and PL.

sindoc · on Aug 25, 2011

Data Structures and Algorithms

------------------------------

+ Disjoint Sets and Union-Find

+ In-Memory Sorting:

  - O(n * log n) (e.g. Quicksort)

  - O(n) (e.g. Bucket Sort)

+ External Sorting (e.g. Polyphase Sort)

+ B-Trees

+ AVL Trees

+ Graph/Tree Search/Traversal:

  - Depth-First

  - Breath-First

Reading:

http://en.wikipedia.org/wiki/Introduction_to_Algorithms

Discrete Mathematics

--------------------

+ Equivalence Relations

+ Recurrence Relations

Formal Logic

-------------

+ Propositional

+ First-order

+ Second-order

+ Common Fallacies

Also

----

+ EWDs: Dijkstra's Systematic Manuscripts:

  - http://www.cs.utexas.edu/users/EWD/

Mavrik · on Aug 24, 2011

Missing: "How to install, use and administer Windows."

It is awfully sad how many of self-called "hackers" are unable to reconfigure an IP address on worlds most-popular desktop OS.

I have trouble trusting such "engineers" with ability to design great software when they couldn't even educate themselves on both major world perspectives on OS design.

tzs · on Aug 24, 2011

For the sake of discussion, here's a counter argument to that. Windows is the most common desktop OS, as you note.

That means it is the one most likely to be officially supported everywhere you go. You get to your new job, you'll find a Windows box on your desk, set up by IT, configured the way they like it--and the way they want it to stay.

If you run OS X or Linux at home, and only use Windows on boxes provided and administered by someone else, why learn how to administer Windows boxes? None of us have time to become experts in every technology we encounter as users. Not knowing how to reconfigure an IP address on Windows seems no worse than not knowing how to change a spark plug on a rental car.

danking00 · on Aug 24, 2011

Are the more significant inabilities you are _actually_ complaining about here?

I don't have any idea how Windows 7's (or Vista, or OS X, etc. hell, I vaguely remember how to do this on my *nix box) networking software works, but I guarantee you I could google it and solve my problem in minutes.

kunjaan · on Aug 24, 2011

People may be used to different technologies. Many of us just google or see the doc on the fly for such trouble shooting or configuring settings.

But these types of skills/knowledge is different from understanding a computer scientist would require. I would be more worried if someone who claims to know networking doesn't know popular protocols or algorithms like for example the sliding window protocol.

The issues you face in designing, understanding and analyzing such algorithms will change you more than understanding where windows stores the setting to configure one of their protocol parameters.

petey · on Aug 26, 2011

Do you think that malware deserves its own section? Or might it be simply a related field? I have yet to hear of a computer security course that does (real) forensics or encourages creativity to better understand the criminal mind.

gameweld · on Aug 24, 2011

* Software engineering (version control) - Is that the best lesson we have to offer?

You mention probability in the beginning, but make no mention of machine learning.

mattmight · on Aug 24, 2011

Machine learning gets a nod under AI.

fpgeek · on Aug 25, 2011

Sadly, every computer science major should know that going into patent law is one of the better bets.

scc4fun · on Aug 24, 2011

The OWASP Top 10 Security Vulnerabilities.