This is a bit of a mess to read but I don't see anything particularly novel. There is a lot of bad software out there, though, and what the author describes sounds a lot closer to good software. No idea what this has to do with Waterloo though.
A few relevant quotes:
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they’ll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
It's no coincidence that ADTs and data-driven design are commonplace today -- I would even argue they are so commonplace that most programmers are not even aware they are stylistic choices.
Many, many folks get caught up chasing complexity for all the wrong reasons. Define your data, stake your boundaries and just write the damn code.
Maybe not novel, but I think the author is describing an "A-Ha" moment that most programmers have to go through at some point. I think most new programmers don't think about data, they think about goals. "I want this sprite to move from here to there, and when you press a button a laser comes out of the gun" or "I want to make a webpage that lets you create polls for Twitter". They think about the end results and just mess around with data structures until they accomplish their goals. This style of programming is very brittle and breaks as soon as you change your goals. Moving to a mental model where you think about data first, and that your goals are an expression of that underlying data leads to simpler code and more robust architectures that can be expanded as needed.
> This is a bit of a mess to read but I don't see anything particularly novel.
It's okay that someone blogs about an old idea and puts a new name on it. It's still a good thing even if it's obvious to some. The author came to their own understanding of this concept in a particular way and decided to share that way with other people. You don't have to knee-jerk critique them for it.
I don't think that Brooks was talking about tables (relations) in the RDBMS sense. Still, you point stands: you can have really mangy data structures, too.
I worked with a bunch of smarter-than-me UW grads after graduating.
My “how to write large systems” takeaway from that early point in my career was to focus on the interfaces between various parts. What I’d never thought about until now is that is a very data centric viewpoint.
- What system has what data?
- In what shape?
- What shape does the next system need its data in?
- Are the interfaces between these orthogonal? Shallow? Easy to grok? Tight (as opposed to leaky)?
Sounds a lot the style I learned from working through HTPD[1]. Define data types, then pass the data around through functions. You work through it in a dynamic language, but you keep track of all your data types, writing them above your function definition and make sure they line up, kind of like a manual, self-imposed type-checking system.
One thing that amazed me while working through HTDP was that almost all my bugs came from not understanding the data types correctly, or messing something up with the process of manual type-checking. Once I understood the data structures I was trying to pass around and compute with, the bugs almost always melted away.
Now I program basically everything in a language with type checking (mostly in TypeScript) with thinking about data types and type definitions as the foundation. I'm amazed to see how 95% of the pain, complexity, and bugs has just melted away.
Funny how school lessons are not applied in the real world. I worked at a Waterloo unicorn, and we had a single refund function that either authorized or denied a refund, with dozens of parameters in the function header
I'd like to do more of this. but every time i use typescript in react/nextjs i end up hitting breakpoints in transpiled js code instead of the TS source, esp. in my tests/running single-files.. and i've tried so many debug configurations
I went to the University of Waterloo in the mid 2000s and never heard this kind of philosophy.
Still, having quite a bit of experience under my belt, I don’t think it’s actually going to be uniquely helpful at writing code. As I’ve gained more mastery, I’ve started to think more abstractly about the system as a whole. Sure, data flow is one aspect and you should consider. The mechanical aspects of code are necessary to consider too (eg what makes code maintainable and robust against mistakes). But how all the different pieces cooperate to create a complex system that’s solving your problem, that’s the way to get some real insights. Thinking about the system, you can start to think about how to change requirements rather than just trying to solve within some external constraints. Being able to move seamlessly up and down the abstraction stack is hugely important.
I agree 100% that focusing on the code is completely misguided. But so is focusing on data. Data by itself is useless. It’s what you can do with the data and how you can use it. Just shuttling bits around is by itself pointless unless all you’re building is basic data viz. And ultimately, this by itself is only one approach. For example, AI systems depend on data cleaning today. That’s not at all about how you shuttle data around not will that perspective help you. ML systems depend on more scientific rigor approaches. A data perspective might help you optimize the performance of those systems, but that’s a smaller aspect of what AI systems are trying to solve (not unimportant, but smaller than the entire thing itself). Smaller perspectives aren’t bad but they limit the space you can play in (which may be the goal some times but keep that in mind).
All that being said, a system’s level perspective is also limiting. You’re some times not going to have the domain expertise to actually solve some problems by yourself. You want to take on lots of different perspectives and have a good sense of which situation falls for a given perspective more. And some times, you may not have the ability to take a certain perspective. That’s where colleagues can help to complement your weaknesses.
My dad studied CS at UW in the 70s. He told me about using a language called “WATFIV” (pronounced “Wat Five”), which stood for “Waterloo FORTRAN Four.”
Here I was years ago wondering whether the Yourdon formalism or the Gane and Sarson formalism was better for doing your data flow analysis.
It turns out doing dataflow analysis is just pretty much scorned by the programming community so it was moot.
People just want to start coding and get that immediate dopamine hit of positive feedback. The answer to which formalism was better was "Agile" where you don't need to plan or even understand data flow (because it will emerge spontaneously); just write code between Ritalin hits.
One result of this is that there are no good (free) tools out there to support either dataflow analysis formalism. I get blank looks for coworkers when I ask to see the dataflow analysis for their systems.
Of course, I picked up the dataflow analysis thing working at a Waterloo startup as my first post-graduation job. The university I went to focused on data structures and algorithms (the how of software) rather than dataflow (the what of software). My first job taught me that data structures and algorithms are necessary but not sufficient.
Not just another discussion on programming style, but program development, with a focus on data.
I like the idea of having models and enforcing themselves. For example, testing that three different API endpoints of a service match each other's idea of their objects. This is a sanity check when we verify the frontend state.
If we could separate scraping from modeling constraints, we could potentially collect data separately from the verification step. Then we aren't left waiting for UI DOM stuff when we verify the model. The latter can happen separately, and extremely quickly.
At my last job I supported a system that was largely built by one person over the preceding 20+ years. This was roughly his philosophy as well.
He told me, paraphrased since this was 20 years ago, "Whenever I'm designing a new feature, I always look at what needs to come out at the end, then I can figure out what needs to go in at the start and how it has to flow through."
BMath (CS) 2002 here. This description is spot on the way I think about development, and is a bit of a superpower to be able to do well. I'm not totally sure it's a UW-ism though. I can certainly recall a couple of very formative Tompa courses where he impressed the importance of taking a data-first view of design, and I think we had a stronger bias towards data structures than most other schools whose grads I've worked with. But overall I think that sentiment grew weaker in my upper years, when a more conventional algorithms approach took over.
I will say though, that I've also noticed the contrast before with MIT grads, who tend to have a very strong LISP bent to their styles. It's true that each school has their own unique flavour, and much like accents it may just be that you don't notice your own.
0. There is no magic anywhere. Anywhere down a stack, in a system or in code, they are all just bits of code. The behavior lives somewhere. (Genchi genbutsu.)
1. Get the point A. to the point B is dataflow analysis. One can even deconstruct the RTL micro- and macrocode designs of a CPU or GPU this way. Input, process, output, and feedback encapsulate represented behavior be it a shell pipe, streaming IO class, Kafka, firewall, audio effect generator, or microcontroller.
Static compilers try to be efficient data flow analysts with as much liveness and constraint information as possible to apply optimization transformations. It's interesting that static optimization passes act are usually implemented
as middleware patterns that stack.
Waterloo uses the HTDP book to teach freshmen introductory programming and CS. Now, I am sure, there are many students who take CS135 with no knowledge of what programming is. They are taught a functional language without state or mutation.
My question is how do they fare when they are to use imperative languages later on in the CS program where they have to use messy for loops and mutation and memory allocation? Is it better because they did CS135 first or hard?
To be frank, I don't think imperative language use is going away anytime soon. So, they need to learn the best use of both the worlds, hence, asking.
This was touched on a bit in the artical -- A problem that I've always had with OOP (as in class & method syntax) languages. I'm always worrying about the functions (i.e. code) too much. There is this tendency to abstract from the data prematurely, ending up with many small data silos that talk to each other without real understanding what's happening in the global picture. That's often not a great way to design data structures. Of course, syntax is just syntax, and in most languages the OOP part is only optional, but for me this has been a real effect. I've had more success just constraining myself to procedural.
This is a bit unrelated, but does anyone know how Waterloo became one of the best schools for Math/CS in Canada? I'm from Canada myself and almost went to Waterloo, and its always been a bit weird that Waterloo, a school that isn't really known for anything, doesn't have a long history, and is located in a random town unconnected to large firms, banks, and/or other universities (like the Bay area) has become so good for CS. Does anyone know the history behind it?
You have it a bit upside down. The 'Bay Area' doesn't get good students because of the Valley, it's the other way around. The Valley was made by good students from Stanford and Berkley.
Unis have always been a bit out of the way, they are not 'sponsored by banks' they were sponsored by Churches and congregations, then the elite.
Waterloo was Waterloo College, a Lutheran Seminary, and grew out from there.
It was successful probably because it was very much focused on Tech, unlike most other schools, and didn't appeal to multi generational families, but 'anyone'. The local mennonites are also extremely good students, you don't hear about them, but they get good grades.
It's a great tech school, but one of the ugliesst, most sparse and uninspiring campuses imaginable. If we think of traditional Uni like an 'Ivy Campus' or 'Oxford' aesthetically - UW is like one of those 1960's, concrete block kind of Soviet Utilitarian places. I mean it could be worse.
It also embraced the co-op system earlier than the other universities, which lead to more real world experience in tech for the students, and I think that filtered up to the faculty too over time.
>UW is like one of those 1960's, concrete block kind of Soviet Utilitarian places
The way I came to appreciate the aesthetic of UW was to realize it is a school of the 20th and 21st centuries, whereas the Ivies are schools of the 18th and 19th centuries.
When you think of an abstract university setting, ‘the future’ looks like UW.
There’s a number of policy decisions they made to sacrifice the wealth of the school / professors and care deeply about conflicts of interest. For example, any patents you get/inventions you make are your property for professors and students. Professors there are actively allowed to fight textbook fees like teaching from their own material. They’re often prohibited from benefiting if it’s their own book (I think they can give it away at cost or marginal markup). The students are not particularly affluent so there’s a good hacker culture going on (necessity breeds creativity). Engineering exams have a formal exam bank (can’t remember if student run or university sponsored) and give you all historical exams for that subject. This ensures that professors can’t just keep reusing the same material which would otherwise help students cheat by getting previous years exams (vs actually learning the material). There’s a focus on a mix of individual study / evaluation and group work. There’s also the famous co-op program that they pioneered that everyone is trying to mimic that connects them to industry. I think in the CS and maths departments they do a good job training for international competitions to get that prestige up. They also had really talented educators that really cared about getting kids to have fun in the first year (attrition rates would be a lot worse if the brutalism started early).
At this point it’s a reinforcing flywheel just like it is with MIT, Stanford, and Berkeley. I think they went with a different route though. They give minimal scholarships and afaik they don’t go out of their way to recruit wunderkids.
I suspect there isn’t any single answer / magic secret. They just built a good culture centered around teaching kids STEM effectively, kids and parents recognized it quickly enough which created natural competition to get in until it became a flywheel effect.
> inventions you make are your property for professors and students.
Originally this was only for professors (and I'm not so sure there weren't conflicts of interest). My first employer fought the university for the right to his Masters' thesis, and won, establishing the precedent for grad students (and hence my job).
His thesis was 1983, so it was well established by then. (It's not itself online, but in case anyone's curious, a description of a project using the work is at http://doi.org/10.1145/1096419.1096446 thanks to ACM now making all old papers free.)
It's a giant undergrad school (36,000 undergrads and barely 6,000 postgrads). There are more undergrads at Waterloo than at MIT, Stanford, CMU, Caltech, Harvard, Princeton and Yale... Combined! and they do coop, meaning their "break" semester alternates from from summer to fall to winter.
From what alumni told me, undergrads are incentivized to apply everywhere for internships as part of their courses and especially during the off-cycles (winter) for internships when they are effectively the only ones looking. Coop also means someone who can't convince an employer to pay for them won't graduate, so there's a nice selection bias.
Reminds me of being taught hand-cut recursive descent compilers, back in the day. The process was: define what info is required and passed around, then the code becomes (almost) trivial.
Yeah, the majority of programming is data plumbing. Getting it from Point A in Format 1 to Point B in Format 2, and maybe changing a couple of values along the way (which might involve a detour to Point C in Format 3 so you can use that fancy GPU to work on a bunch of them at once or whatever). The important parts of a coding philosophy are, IMO, how easy it is to understand and how easy it is to maintain.
Another beauty of this kind of thinking is that it’s really empowering if you don’t have any computer science background. Coming from somewhere like physics, grasping how computers are actually simple opens very direct paths for getting things done.
This isn't about code but about architecture. It's generally true that if you architect the system well then the code will be relatively easy to write, but it helps to know how to code when you go architect.
Spot on. Updated to our current hardware and networking environment, you must also ask how much data needs to be moved, what latency is tolerable, and are the delivery requirements best effort, at least once, or exactly once.
Many programmers don't need to think about algorithms because what you get from a library is often good enough, say a simple sort-function.
But it is true that while most programming can be seen as trasnforming one data to another, it is the algorithms that do that transformation, such as when transforming a list to a sorted list.
But an algorithm can be packaged into a function so that you only need to know what the function does, not how it does it. Data can not similarly be "encapsulated". Except you could say that getter-methods are a way of encapsulating data.
Applying philosophies to everything, typically fail as size and complexity increase. Large enterprises spent over a decade implementing systems by modeling data and flows during the era of structured analysis. Popularity waned as results did not match initial excitement.
Modeling an ADT sure, but an air traffic control system by following the data flows? Not likely.
A few relevant quotes:
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they’ll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
It's no coincidence that ADTs and data-driven design are commonplace today -- I would even argue they are so commonplace that most programmers are not even aware they are stylistic choices.
Many, many folks get caught up chasing complexity for all the wrong reasons. Define your data, stake your boundaries and just write the damn code.