Thanks for posting! I'm a big fan of Mike Acton and the data oriented paradigm. Abstraction and encapsulation (which is what OOP is to me) often hinders deeper understanding of the problem, or at least hinders efficient implementation. Instead it creates conceptual and maintenance boundaries (which might be what's needed).
Yup I'm a fan of Mike Acton as well, always fun to show someone who hasn't seen Data Oriented Design a path to a 10-50x perf improvement that they didn't think was possible.
Back when I first saw it was in one of Bruce Dawson's courses where we did simple image manipulation. I think the task was to implement bitblit(move one rect of a bitmap into another rect on another bitmap).
I remember him grabbing one random student's assignment, throwing in some rudimentary timing and then saying, "lets see if we can do any better". By the end of ~15 minutes he'd sped the thing up 800x with a combination of reading by row rather than column, aligned reads and some other tricks that looked like black magic at the time. Looking back now it all seems fairly obvious once you know how the hardware works.
Because it isn't just about cache optimization. How you organize your data impacts the types of transformations that can and need to be done. Utilizing SIMD is another thing that is extremely sensitive to data layout.
In the mid-late 90s, when I started writing C++, I had a gut instinct that methods and data shouldn't be encapsulated together. I couldn't explain why though.
Rich Hickey started clarifying it for me a little though. I can't dig up the interview he did, but he talked about classes being mini-DSLs and hindering reuse. That makes sense to me.
I much prefer generic functions and multi-dispatch in Common Lisp's CLOS and the new language Stanza - it just seems more flexible.
But generally, I want to think about data transformations and not little "machines" (objects) doing things things to their state internally.
Yeah, I do think of OOP as dynamic / multi-dispatch. It's the cleaner approach to abstraction.
At the end of the day most programs can be thought of as state machines in some way. But it's often hard to split them into smaller state machines. The thing is, state is hard to compose if it's encapsulated :-)
> If there's a rocket in the game, rest assured that there is a "Rocket" class (Assuming the code is C++) which contains data for exactly one rocket and does rockety stuff.
Probably true.
> With no regard at all for what data tranformation is really being done, or for the layout of the data.
I agree that OO design is usually not done with regard for the layout of the data.
> Or for that matter, without the basic understanding that where there's one thing, there's probably more than one.
Say what? In C++, you'd simply create another object that is another instance of the Rocket class. Behold, "more than one".
> Though there are a lot of performance penalties for this kind of design, the most significant one is that it doesn't scale. At all. One hundred rockets costs one hundred times as much as one rocket.
Well, see, one hundred rockets have one hundred times the data that one rocket has. I don't see how you can avoid that, no matter how you represent the data or whether you do it as OO or something else.
But you don't duplicate the code 100 times, just the data. You might also have 100 instances of a pointer to a virtual function table, which means you've wasted 100 times the size of one pointer. That's not impressing me with the inefficiency here.
So you can color me unimpressed with this whole argument on lie #2.
You’ve missed a lot of important real world details about what’s going on in the system, which is the heart of what Mike Acton talks about.
> Say what? In C++, you'd simply create another object that is another instance of the Rocket class. Behold, "more than one".
So now you have a bunch of separate instances of rockets which are probably scattered throughout heap memory which will lead to cache misses on every access. Mike is describing the lost optimization potentials here because you didn’t think to reason about how you will use this data.
A simple example, every rocket needs to update its position every frame and probably obeys a velocity equation defined in the game. Iterating through every rocket scattered out through heap memory is already going to kill you with cache misses. But additionally, since all the rockets obey the same equation, we should be using SIMD to compute everything which can let us do 4x-16x operations (or more depending on hardware) for the same cost as doing one. But chances are your Rocket class ivars are not nicely laid out for SIMD (AoS vs. SoA), so you will be forced to copy or swizzle a bunch of data which negates the performance benefits you are trying to win with SIMD. If you designed your data upfront with this idea in mind (many rockets for batch operations), then you get both cache optimization wins and SIMD wins, and now we are talking about speed ups that can be easily 10x-100x. And we haven’t even touched the possibility of further parallelizing this across multiple cores.
A real world example they gave at GDC on SIMD was a many-players-to-many-doors problem they had to solve. On every frame, any player near any door had to automatically open like in Star Trek. 30 doors and 100 players means they have 3000 tests they have to run. The original algorithm wasn’t data oriented and what Mike Acton would probably call typical C++ BS: A single Door class. On strict CPU budgets to handle everything else in the game, the cache misses alone were worrisome. In the talk they (obviously) convert to SIMD with the idea of ‘many’ using Data Oriented Design (which also solves the cache miss problem). They got a 20x-100x speed up (depending on the number of players and doors).
Ignoring the possibility for compression, read "cost" as the cost of executing a procedure on many instances at once. The cost decreases if you have control over data layout. Think sorting / searching, cache efficiency, SIMD.
It helps to think about relational databases (and there, why are column stores much faster for many tasks).
With DAO you are building a refinery- you look at what comes through, how long it takes to process, how its stored and where your pipe has the smallest diameter and what distillation unit takes the longest.
With Object Orientation you will do the same, but with n different typed bottles in boxes, the boxes in containers which are driven around as a whole by trucks. You might be able to do so in great comfort - everybody on this planet knows how to ship boxes. But you will trade off for control and efficiency.
I totally disagree that #2 is a lie. Code should be designed around a model of some part of the world, it's just that what the author is describing is a pretty bad way of modeling the world.
Take this part: "If there's a rocket in the game, rest assured that there is a "Rocket" class (Assuming the code is C++) which contains data for exactly one rocket and does rockety stuff. With no regard at all for what data tranformation is really being done, or for the layout of the data. Or for that matter, without the basic understanding that where there's one thing, there's probably more than one."
This is an enormous straw man. You don't need to be using C++, or even a real OO language, to design your code around a model of the world. I'd go so far as to say that C++ is a pretty poor choice of language for modeling the real world. And even in an OO language, no decent practitioner writes one-class-per-object. That's not how objects are intended to work, and even pretty bad practitioners of OO don't usually screw it up that badly.
And the underlying thing here is that if you model interactions in the real world accurately, at least the parts that are relevant to what you're trying to do, the data transformations and layout tend to fall into place naturally. Of course there are exceptions; we don't have leak-free abstractions yet.
> And even in an OO language, no decent practitioner writes one-class-per-object.
But that's the stereotypical example of OO design. You have duck->paint(), duck->quack(), duck->plunge() all in one class (file) and of course the dependency mess and the scattering of aspects throughout the project.
These have definitely been problems in my own software design attempts and in many of the projects I've seen.
And even if you make more classes, so that your design is more like one class per concept/aspect, I think the criticism of Mr. Acton is: if you have many instances of a given class, then there must be a better way than calling a method on each individual instance.
In other words, the idea is that classes are fine (they promote modularization), but there shouldn't be more than one instance of each class.
>> > And even in an OO language, no decent practitioner writes one-class-per-object.
> But that's the stereotypical example of OO design. You have duck->paint(), duck->quack(), duck->plunge() all in one class (file) and of course the dependency mess and the scattering of aspects throughout the project.
So much nonsense here. One class per object is absolutely not the stereotypical example of OO design. class != file. And dependency management is usually a problem because junior devs pull in a billion half-baked libraries to solve a problem--it's not an inherent problem with OO and it's certainly not a problem with trying to model the real world.
I'm not even particularly in love with OO. I particularly think that functional paradigms often do a better job of modeling the real world. What I'm really disagreeing with is the claim that modeling the real world is a bad practice.
> And even if you make more classes, so that your design is more like one class per concept/aspect, I think the criticism of Mr. Acton is: if you have many instances of a given class, then there must be a better way than calling a method on each individual instance.
If that was Acton's criticism, then he should have said that instead of saying that "code should be designed around a model of the world" is a lie. Particularly since if you're acting on a large list of objects, then representing it as if you're going through and then each object is acting is a pretty bad representation of reality.
> In other words, the idea is that classes are fine (they promote modularization), but there shouldn't be more than one instance of each class.
Now you're just confused. Acton specifically was criticizing one instance per class in the section I quoted, and now you're saying that's what he's supporting?
And for the record, if there's only one instance of your class, you didn't need a class.
> So much nonsense here. One class per object is absolutely not the stereotypical example of OO design.
I didn't say that. I said: The stereotypical example is "one class per (concept / class of) real world thing(s)". Like "Rocket" or "Duck".
> If that was Acton's criticism, then he should have said that instead of saying that "code should be designed around a model of the world" is a lie.
It needs just a little context or reading between the lines to understand the intentions instead of twisting words to make accusations.
> Now you're just confused. Acton specifically was criticizing one instance per class in the section I quoted, and now you're saying that's what he's supporting?
I am not confused. He wasn't criticizing "one runtime instance per class", but "one per runtime instance per real world thing". That's something different.
The quote reads rest assured that there is a "Rocket" class which contains data for exactly one rocket. I translate, he suggests to combine all "real world rockets" in a single runtime object instead of representing each rocket in its own runtime object.
Concretely, he would make a "Rockets" class instead of a "Rocket" class, because that typically allows for simpler and more efficient implementation. (Of course, if there were also planes or missiles or bullets, he would think twice before making a Rockets class).
As I commented elsewhere on this page, there are very close analogies to relational databases -- especially the column-store flavour.
Mike has given multiple talks on Data Oriented Design.
Another example he gave at CppCon was a Chair class. In a real game, you may have a static chair, a dynamic lighting chair, a breakable chair, a physics chair. There is a tendency to make these all relate through some common Chair class because they all share some "chairness" in the real world. But in reality, the data and transformations each need have nothing in common and trying to shoehorn them into some relationship because it resembles something in the real world is counterproductive.
What we call the "real world" already is a model ... and a simplistic one.
Lots of classes don't have a "real world" equivalent (linked lists, allocators, renderers, finite-state machines, functors, observers, octrees, parsers/loaders...).
The "real world" will mostly give you incentive to abuse inheritance, while missing what the interfaces and abstractions are.
My argument isn't related to performance.
I don't have a problem with having a "Rocket" class ; however, if this class is responsible for everything "Rockety" (audio rendering, video rendering, serialization, target following, collision detection, ...), then it's gathering lots of dependencies at one single point, making the class painfull to reuse, and more generally painfull to depend upon.
I've seen this, and it's a design nightmare. Were you going to reuse this "Rocket" class in another application anyway? Does the rest of your application sees rockets as instances of "Rocket" classes?
> Lots of classes don't have a "real world" equivalent (linked lists, allocators, renderers, finite-state machines, functors, observers, octrees, parsers/loaders...). The "real world" will mostly give you incentive to abuse inheritance, while missing what the interfaces and abstractions are.
This is a pretty myopic view of what exists in the real world. Lists absolutely exist in the real world: a shopping list, a to-do list, a list of messages received. Lists are such a common data structure because they frequently provide a good model of things in the real world, in this case, a way that humans organize sequential items, tasks, or messages.
As for your other examples, sure, some of them are pretty poor pretty poor representations of what exists. And that's a great argument for why you should find a better abstraction. My argument from square one was that you should be modeling your things off the real world.
Really? In what world do you live where Rockets render audio?
Again, this is just a straw man. What you're describing isn't even attempting to model reality, so you can't use it as a basis for arguing against modeling reality.
I'm not saying that he said rockets render audio, I'm saying that has nothing to do with modeling the real world. If you're criticizing the idea of modeling the real world, we should talk about actual models of the real world.
We agree it's not a good way. But it's how the stereotypical OOP project will implement it.
The other way I'm familiar with is implementing rocket audio rendering at a central location (where also sheep and houses audio rendering is implemented).
That's the data oriented way. But the "drawback" is that one then needs access to the "raw" rocket data (one could still do it with a Rocket class, but would have to make a very complicated data-getter interface). Not very OO.
Thanks for your friendly reaction to my criticism, by the way.
(Please note that I was specifically talking about "linked" lists: when we use them, their "linked" part rarely matches something in the real-world)
OK, let's go for more examples of well-used classes not representing real-world things: a solver, an interpreter, a compiler, an AST, an arithmetic expression, a grammar, a saved game, a DSP filter, a target architecture, a process, a thread, an AI script, ...
My "Rocket" example isn't a strawman, it's a counter-example. You can ignore the audio rendering part if you don't find it plausible.
Rockets do make their own rocket noise, don't they?
(if you prefer, we can call this "sound synthesis" instead of "audio rendering").
My point is that this code probably shouldn't be in the same class than, for example, the "target following" code.
It would be an obvious violation the single responsibility principle (SRP).
Actually, I've seen lots of SRP violations caused by blindly "modelling the real world" (shape drawing functions inside the "Picture" class, inverted inheritance hierarchies, focusing on implementation reuse instead of focusing on interfaces, Square class derived from Rectangle class, ...). This alone is not a good enough design guide, and is sometimes counterproductive.
Have a look at how game developers struggle with deep class hierarchies. Many of them are moving, for the better, to Entity/Component based designs.
This is a step back from "real-world modelling" (since you don't need/have a "Rocket" class anymore).
I'm not sure that code is ephemeral. It seems to congeal into a thixatropic mass.
But it's clear that data itself - that mutable-state heterogeneous-structured blob - has deep value, and handling that data appropriately is very important. This isn't treated adequately in the zeitgeist.