Communicating Sequential Processes (Hoare),
The Next 700 Programming Languages (Landin),
As We May Think (Bush),
Can Programming Be Liberated from the von Neumann Style (Backus)
And this seems to be a cool course: https://canvas.harvard.edu/courses/34992/assignments/syllabu...
> This course examines papers every computer scientist should have read, from the 1930s to the present. It is meant to be a synthesizing experience for advanced students in computer science: a way for them to see the field as a whole, not through a survey, but by reliving the experience of its creation. The idea is to create a unified view of the field of computer science, for students who already know something about it, by replaying its entire evolution at an accelerated frame rate.
I actually found this to be an odd mix. Are we selecting papers that had an influence on computer science (as in, the theory of computation), or that had an impact on technology? Or are we just using CS as a catch-all for "all things computer"?
The Turing paper is foundational for CS, but without it, would the technology have evolved differently? Probably not. Most software engineers have not read it. Conversely, the IP standard is a technological cornerstone, but there's hardly any science in it. It's just a specification of a fairly simple protocol that you need to know when doing almost anything network-adjacent.
Something doesn't feel quite right to me seeing the PageRank paper in a short list alongside Turing and Shannon's foundational work on computability and information theory. “On Computable Numbers, with an Application to the Entscheidungsproblem” is almost 90 years old at this point and just as true and relevant now as it was then. Is PageRank even as relevant today as it was in 1998, let alone another 50 years from now?
We can’t estimate what will be relevant 50 years from now.
If quantum or biological computing find some success, none of these are going to be relevant.
That said, pagerank is important to understand the modern tech world. Search is something we take for granted, and it’s great there’s a deeply mathematical (and somewhat counterintuitive) theory behind it.
I wonder how pagerank was influential to CS as a field?
even mapreduce is more a rally clever technique than a boundary pushing or boundary identifying extension of the field. unlike, say, CSP --- which is missing in the list.
still unlike the conciseness and structure of the list. it could evolve into a nice book :-D
My understanding of the term "history of CS" would be "how CS evolved", as a field. How do we think about "processing 'data' with computers". what can we compute? what are limitations in how fast we can compute? how to we talk about and present algorithms, prescriptions for these computations? how do we talk about data and how we structure it (leading to SQL, and sgml/xml/JSON)?
Pagerank in contrast is a very specific type of breakthrough, a breakthrough for it's application domain. But it's not a breakthrough for how we compute or how we think about computation.
The big insight of the PageRank paper is that you can use MC integration to approximate PR, not PR itself. Hence making the problem much easier to distribute.
I've read five of of the seven papers on the list. The two I haven't read are Cerf and Kahn's, and Berner-Lee's.
Turing's paper on computability was particularly hard to follow, for me, because he used these gothic-font upper-chase characters to name all sorts of objects, and all those characters looked kinda the same to me! I had to use auxiliary materials to be able to make my way through the paper. Today, I would recommend reading it with Charles Petzold's easy-to-follow book on the paper: https://www.amazon.com/Annotated-Turing-Through-Historic-Com...
Cook's paper on NP-completeness was also hard to follow (again, for me). As with Turing's paper, I had to use auxiliary materials to make my way. Today, I would recommend reading instead an introductory book on computational complexity that works through Cook's proof.
Shannon's paper is a work of art, clearly articulated and beautifully written. It's just not casual reading, to put it mildly.
Brin and Page's paper, and Codd's paper, are not hard to follow, at least as I remember them, but understanding Brin and Page's work requires some knowledge of Linear Algebra.
Where does the Brin-and-Page paper require linear algebra? It mentions "eigenvector" once, in a tangential remark. The "simple iterative algorithm" is how you find the fixed point of any contraction mapping, linear or not. Knowing that it is also an eigenvector is just a distraction -- you aren't going to use Gaussian elimination, not if you know what is good for you.
It doesn't require linear algebra to understand the paper or how the algorithm works, but it does require linear algebra to understand why the algorithm works. In general, since the induced 1-norm of a stochastic matrix S is exactly equal to 1 but not smaller than 1, the mapping x↦Sx is NOT a contraction. Neither convergence of the power method nor uniqueness of fixed point are guaranteed. (If there are multiple fixed points, there are multiple inconsistent rankings.)
In the paper, the significance of the so-called "damping factor" is not clear. However, with linear algebra, we know that the damping factor makes the stochastic matrix positive rather than merely nonnegative. Hence the Perron eigenvalue is "simple" (i.e. of multiplicity one), the Perron vector is unique and the power iteration must converge to it.
That's easier explained using fixed-point theory: The damping factor makes the mapping x↦Sx into an actual contraction (on the space of probability distributions). Not to mention that it has a simple common-sense justification (you don't want to get stuck in a subnetwork that only links to itself, or drown out a subnetwork that has more outlinks than inlinks).
There is probably some gain from understanding the algorithm specifically as a Markov chain iteration (if nothing else, it provides a great example for Markov chain iteration), but I think it's perfectly possible -- and easier -- to understand it as a fixed-point iteration on a compact space. And I am someone who does algebra for a living and normally explains everything algebraically if ever possible...
Yes, but the paper blathers on and on in prose about history and related work and goals and future work. It might only mention eigenvector once in a tangential remark, but that remark is like 80% of the algorithm content of the paper.
Oh man, if you think Shannon's A Mathematical Theory of Communication is his most foundational contribution to computer science, you haven't seen his master's thesis from a decade prior.
> He sketches out a hypothetical “Turing Machine,” proving that, if something is computable at all, a machine (in principle) can handle it.
That's not what Turing proved. Instead, what he proved in his paper was that there are some problems which aren't solvable by Turing Machines (and therefore presumably by any machine). That's the Entscheidungsproblem (decision problem) referenced in the title.
What TFA references is the so-called Church-Turing-Thesis, which is exactly that, a thesis. It can't really be proven although we have very strong reason to believe it given that in almost 100 years nobody has found a system of computation more powerful than Turing Machines.
And in 100 years we have gone completely the opposite direction in terms of where we think artificial intelligence lies. Rather than constructing a machine with all the known truths, modern searching for artificial intelligence using machines is mostly sifting through human commentary to create an organized feuilleton machine versus Leibniz's axiomatic approach
No, Turing had precisely that approach of feeding the machine with learning material in mind[1], but you have to build a machine apt to consume generic knowledge body before you throw at it anything.
There have also been attempts to canonicalize knowledge on the web (semantic web) and lots of things inside of computers are in fact deterministic.
But that is not the direction that AI seems to be taking, it is "smart" because it does a good job of parroting humans that sound "smart", not because it "knows" truths.
I beleive you are wrong that "nobody has found a system of computation more powerful than Turing Machines". A turing machine can not perform indeterminacy, however, the actor model can.
Non-deterministic Turing machines [1] are the standard way to define Non-deterministic complexity classes like NP or NEXP, so there are definitely Turing machines with indeterminacy.
I read that sentiment here a few years ago but couldn't get anything more out of it than actors can race, but a turing machine is determistic. I could very well have it wrong.
If you were computing with actors, and you also had a sufficiently-detailed spec about the actor model, is there some particular algorithm you could not compute by just executing a TLA+ spec of your actor algorithm using Turing-ish software?
Since everyone likes chiming in with their own additions to the list, here's mine:
While Cook was the first to introduce NP-completeness, Karp's paper presenting 21 problems that could be reduced polynomially to 3SAT was also an enormeous cornerstone that helped kick off a more general interest in Cook's theory.
It's not papers but I would give special mention to Why Software Is Eating the World by Marc Andreessen and Amazon's original 1997 letter to shareholders.
"Companies in every industry need to assume that a software revolution is coming. This includes even industries that are software-based today."
Just wrote a blog about the explosion of papers titled after Attention Is All You Need [1]. Also figured out the name probably didn’t originate from one of the authors.
To my understanding, the underlying issue is the way to structure code in a maintenance friendly way. It’s just very easy to go awry with unrestricted wild goto. There are more often than not some alternatives control flows which are easier to mentally follow. And things like label in Java[1] already capture most of relevant cases in which a generic goto statements might feel like a valid approach. This doesn’t mean that there is absolutely no case where a goto might be the most elegant easiest way to implement something, but that few cases are exceptional.
I mean, no one feels like using a laser is a proper way to cut butter, but using lasers is sometime the best cutting accurate option.
"Modern" goto (well, there aren't many modern languages with goto, but anyway) is semi-structured. Most importantly, it's local: you cannot jump into or out of subroutines, which was Dijkstra's major peeve. Using goto for backwards jumps is also usually discouraged post-Dijkstra.
Solid list. Two that influenced me personally were:
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67-82.
And the corresponding search paper. Got me started in search and optimization (and Prolog).
Licklider, J. C. (1960). Man-computer symbiosis. IRE transactions on human factors in electronics, (1), 4-11.
More of a philosophical outlook but the thought of man-computer symbiosis instead of "computer solves it" has stuck with me (and is quite relevant in this day and age).
I would also add J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", 1977 [1]. LZ (Lempel-Ziv) is the foundation of many data compression algorithms that are still in use today.
Does anyone actually use Paxos in real life? And let me ask the same question for all other academic distributed algorithms right away. I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee; apparently they would just rely on timing and hoping things don't get too twisted.
Yes, it's very widely used at Google through Chubby, which underpins many core pieces of infrastructure, including name resolution. (It used to be common practice to depend more directly on Chubby for synchronization of state via shared global files, but that fell out of favor about 6 years ago due to reliability risks associated with just blasting out changes globally without any canarying.)
> I recall seeing a study some 10 years ago checking the major cloud providers for Byzantine fault tolerance and finding none of them to exhibit the qualities that the known algorithms would guarantee.
I'm curious which study you're referring to. I could believe that while Paxos might be used as a building block, it might not be used in a consistent manner throughout.
Also, note that not all of the variations of Paxos handle Byzantine faults.
Yes, if you’re using any sort of multi master db, you’re using a variant of paxos, raft or something which you shouldn’t really trust until aphyr blasts it into the low earth orbit.
The paper itself is very approachable and worth spending an hour or so on.
I remember trying to read the paper and giving up somewhere early in the proof. I certainly don't think it gives a great intuition why the algorithm holds, so "approachability" is a matter of definition (does it tell a fun story? sure yeah).
It would be interesting to see the opposite of this; which papers are really interesting and look useful, but did not end up having a significant impact?
I'd argue both as well as McCulloch & Pitts. Maybe Boltzmann or Rummelhart (Backprop) as well. Honestly, I wouldn't know where to stop, there are so many cool papers.
Yeah. But before AlexNet GPUs were only for graphics and esoteric papers in scientific computing. The realization that conv layers map well to cuda cores has led to GPU production being a national security issue.
I read Shannon’s paper every year and it gives me new insights every single time. It transcends computer science. And Claude was one of the coolest “nerds” I’ve met.
This is the arch-rival of Unix+C, and also one of his best CS friends (GNU Emacs it's still there, and it was widely used on Unixen, among reusing GNU (Unix clone) tools).
If this is up your alley, Ideas That Created the Future[1] has like 40 more, decent introductions to their ideas, and provides a sort of thematic throughline from Aristotle to the modern computer science.
I think any list like this has to include:
"Time, clocks, and the ordering of events in a distributed system" By Lamport also, in almost any networked system having strict ordering is fantastically useful. And of course how could we also forget "The Byzantine generals problem".
I guess if we all add our favourite papers, we’ll end up with a very long list indeed. My own additions:
As we may think by Vannevar Bush, and then something on Project Xanadu. There doesn’t seem to be any foundational paper on the latter, but there is Ted Nelson’s book Literary Machines.
We have Shannon's "Communication Theory of Secrecy Systems" as arguably the beginning of modern cryptography and then Diffie & Hellman's "New Directions in Cryptography" which first introduced public-key cryptography.
BTC is just combining all the past research into an application, which has it's own place but sadly not here. You might wanna read this [0] for all the past ideas that satoshi took
I'm not so sure. In my (under-read) mental model, blockchain takes you from fail-stop fault-tolerance (a la Paxos) to Byzantine fault-tolerance, i.e. how do you compute in a massively distributed system when no node has any reason to trust any other node.
Especially the Codd paper (what if Codd had died in World War 2?) makes me wonder: What are the non-obvious (kind of arbitrary even) yet extremely helpful concepts that no one has thought of or that didn't find the resonance that they should have?
I mean widely useful things, maybe not very obvious things, that no one has thought of yet or that are rotting in obscurity. If Codd hadn't basically invented relational databases, would we even know what we're missing?
For example: Rust took a bunch of ideas from some research languages that one guy did 1-3 decades ago at the time. These ideas have probably always been good and useful, but they weren't put to use, not all of them at least.
I would add "On the Criteria to Be Used in Decomposing Systems into Modules" (1972, ~7800 citations) by Parnas, though more software engineering than computer science in a strict sense.
Not a single woman?
No Barbara Liskov - Programming with abstract data types?
No Grace Hopper - The education of a computer?
No Frances Allen - Program optimization?
Should the work of women be on that list for the sole reason that they are women? There are many more men who have written papers far influential than the ones you've mentioned yet they didn't make the list. If you believe in equality, then you have to believe that the work of people who happen to be women can compete on their own merit. The absence of women in that list isn't necessarily evidence of bias as implied in your remark.
I'd put Liskov's Programming with abstract data types up against any of them. Fran Allen's work was so fundamental it's hard to find compiler stuff that doesn't build on her work.
You asked for "citations", the thread is literally filled with references to them. How is it not bad faith to have to prove to you things that you can easily check for yourself?
You misunderstood the request. Your original comment was claiming that there were many papers far more influential than any of the papers named that were by women. I was requesting evidence of this influence. In response you say that what, all of the references filling this thread are more influential than say Liskov or Allen? If not all, which ones?
The original comment you were responding to was pointing out that none of the papers listed were by women, and suggested several that were that are undeniably influential. Perhaps you think they aren't because you haven't read them, or presumably even heard of them?
I don’t think representation needs to be a thing for a personal list on a blog. Government? Absolutely. Corporate? maybe. That said, of course there have been many critical female contributions in the field. However, it’s also a numbers game since CS academia has been very gender/sex lopsided to this day. So production would represent that (sad) reality.
Ha, me too! When the first title included Entscheidungsproblem I thought this would be intellectual edgelording. But, this is a legit list.
Of course, you can't do justice to the entire field with such a short list. Two papers on web but none on graphics, e.g. But it's a fine reading list for sure.
Come on, we now have systems that can believably answer arbitrary questions in human language. This is literally what I dreamed of when I got into computing like 25 years ago, and would be considered science fiction only 5 years ago. As a side effect, entire tasks as important as machine translation and summarization have pretty much been solved for major languages.
Regardless of whether you buy the full hype or you think they're just stochastic parrots, I think it more than qualifies to make the second list (and probably the first, but I get that there's no perspective to be so sure about that).
The paper itself (as a paper, i.e. an explanation of the underlying results) is quite bad, by the way. It's better to learn about Transformers from one of the many good blog posts. But that doesn't detract from its influence.
And this seems to be a cool course: https://canvas.harvard.edu/courses/34992/assignments/syllabu... > This course examines papers every computer scientist should have read, from the 1930s to the present. It is meant to be a synthesizing experience for advanced students in computer science: a way for them to see the field as a whole, not through a survey, but by reliving the experience of its creation. The idea is to create a unified view of the field of computer science, for students who already know something about it, by replaying its entire evolution at an accelerated frame rate.