The data-oriented design process for game development

evilturnip · on May 27, 2022

Although the article focuses on Unity DOTS, Unreal Engine 5 introduced a fast and performant data-oriented system called Mass.

Originally introduced for its particle system, the underlying system is a pure data-oriented framework that is supposed to be extremely fast.

https://docs.unrealengine.com/5.0/en-US/overview-of-mass-ent...

dgb23 · on May 27, 2022

> The core of DOD is not about optimization or making fast programs through hardware consideration; it is about organizing programs around a deep knowledge of data and its transformation.

This is a very surprising take to say the least. I always understood data orientation in game dev as exactly that. Can experts chime in on this?

Tomis02 · on May 27, 2022

Not an expert but have a strong opinion. Many developers reject DOD because "performance isn't that important in [some software type]". However, using DOD can have other benefits unrelated to performance, IMO.

From the article: "DOD promotes solving concrete problems as opposed to generic ones". Given that DOD is biased against unnecessary abstractions, this can improve the code's readability (e.g. no more Abstract Factory Manager Generators) and thus maintainability. As Mike Acton pointed out, it's very useful to be able to easily reason about what your software is doing (not just a particular class, in isolation).

The quote you mentioned seems to be in opposition to the OOP-centric view of the world, and hints to alternative ways of organizing code (PODs and transformations).

memco · on May 27, 2022

Andrew Kelley gave a presentation where he discusses how DOD was actually giving significant performance improvements for zig: https://vimeo.com/649009599. DOD may not be explicitly concerned with performance, but it could provide performance improvements in some situations.

runevault · on May 27, 2022

DOD 100% gives performance benefits. The problem becomes some people acting like that is the ONLY benefit, which is why you see more and more pro DOD people trying to make the other benefits clear so that people will stop thinking it is ONLY an optimization technique, because then people who aren't as worried about blazing fast performance will still consider it in case the other benefits help their problem case.

memco · on May 27, 2022

Part of why I reference the talk is that he was advocating for it in a broader context because he found it simplified the architecture and reduced memory and execution time for his project which isn’t a game. I would love to see more exposition on the subject in different contexts so people get a feel for what DOD might look like in different spheres of development especially in more general programming environments. I don’t yet have much experience with it myself but I know that each time I have explored a different design pattern I have learned useful things so I look forward to learning more about it and finding ways to use it.

runevault · on May 27, 2022

Totally, the Zig stuff has me super excited. I've wanted to see what DOD would do in a compiler for a long time. Also going composition in a way that doesn't require classes in the traditional sense can allow easier transformation of how various types of data are processed. For me that might be the most exciting part of DOD.

bitwize · on May 27, 2022

I'm not an expert but I am writing a game using DOD and ECS, in Java, for Android, and it's a 2D game so performance is not an utmost concern. I can say that using the ECS pattern and being data-driven has absolutely saved my bacon and made the dev process so much easier. Tricky things, like adding new enemy or elevator behavior, have turned out to be not so tricky when I get to the implementation bit, because the ECS makes things orthogonal when they would be tangled up with each other in a strictly OO model.

DixieDev · on May 27, 2022

It's a strange sentence for sure. Organising your programs around data and its transformation happens to mean you are aware of how hardware organises, represents, and transforms data, and accommodate that to keep things running smoothly.

I would say the purpose of DoD is most certainly to make fast programs, but guess the point of that quote is just to further emphasise that this is all about data.

Warwolt · on May 27, 2022

I think this is a very important distinction though, because if DOD is to be possible to consider as a programming paradigm you'd have to be able to state how a program is meant to be built up from scratch.

An object oriented programmer would propose the program to be built up from objects, the functional programmer would propose to build with pure functions. I would image a DOD programmer would propose to build the program with data transforms, which is a more general thing than just "make program go fast".

Of course making good use of hardware is easier in a paradigm that uses hardware level data transforms as its building block, but I don't think it makes any sense to make that the _defining_ feature of the paradigm.

jstimpfle · on May 29, 2022

The emphasis is on the data, not the transforms. The code is often not the interesting part.

zmgsabst · on May 27, 2022

I see DOD as enabling optimization, because your code is “naturally” laid out in a way that exposes data representations and transforms you can create optimized versions of.

But you can do DOD without that being your goal, because ultimately DOD is saying that you use “classes of objects and maps between them” as your framework for design. That it allows you to substitute efficient versions for each is a benefit, not a requirement.

pdpi · on May 27, 2022

The shape of the architecture, and the practical effects of that architecture on performance are two distinct topics. You can have have a long and interesting discussion about the advantages and disadvantages of data-oriented programming versus the object-oriented approach without ever touching the topic of performance.

Also, DOD isn’t fast unto itself, it just lends itself to fast implementations.

drittich · on May 28, 2022

Not an expert. My take is this:

I was thrilled several times when I started exploring ECS. As a long time pseudo OO developer (not a fan) I found that first of all ECS really did help me write fast update loops, and when the loop become slow it was easy to determine what was slow simply by commenting out different system update calls. IMO it really delivers on the performance front of you pay attention.

Second, you can't always do this, but I had cases where once an entity had no more components, it simply ceased to exist, and there was no separate collection simply containing entities themselves. A novel approach compared to OO, and I found it refreshing.

Third, it encourages you to think about each component separately, because they will have their own collection and engine. This makes refactoring so easy, because you are looking at typically a handful of properties at most, and a small dedicated engine routine. This delivers on many of the promises of microservices.

The last obvious but IMO incredibly important benefit I found was the loose coupling that resulted. Sure not everything is isolated, I would often need to share components between services, but usage and purpose was clear and obvious, and I found myself refactoring components with ease. Contrast this with OO design that encourages abstract concepts like User which, it turns out, have the gravity of black holes and can swallow up any number of properties that SEEM like they should go there. Good luck teasing this apart later, and even remembering what half of them do.

I found the concept in the article that humans are more likely to make additive than subtractive changes profound, and it seems we must recognize this and fight against it as developers wherever we can. (It's definitely true in the audio world, where, e.g., people are much more likely to make additive than subtractive EQ changes, unless they're professionals.)

peteradio · on May 27, 2022

Saying you want to make a highly optimized fast program doesn't guide you how to actually do it. The pattern says "when we focused on the data layer, optimization and speed came more naturally".

interroboink · on May 27, 2022

Compare this with the first sentence after the abstract:

"Data-oriented design (DOD) grew when game developers needed to use modern hardware architectures for performant games, and existing software processes did not meet their needs."

So, I think performance and DOD have been closely tied together from the start. There has been some evangelism to push it out of "it's just for optimization" territory (and reasonably so), but I'm not sure I agree with their notion of "core".

danbolt · on May 27, 2022

I’ve used ECS in various games, and there’s a huge advantage in being able to “map out” your tick from a bird’s eye view.

Or, since each of your systems are relatively self-contained, your `update` function will more or less allow you to inspect which order they’re in and when they’ll iterate. I’ve worked on games where that logic has been hidden behind a bunch of vtables, and that made it unclear where/when certain logic was happening.

drittich · on May 28, 2022

ECS is a good example of multi-benefit DoD because it's clearly meant to be performant, but there are significant benefits that come from using discrete components rather than typically more complex OO classes, for example. I've built pretty large update loops (using dozens of systems), but the complexity doesn't seem to grow much as I add more, and systems themselves are generally small. Maintaining correct order of system update calls has been the trickiest part.

corysama · on May 27, 2022

DOD actually started as an alternative to class inheritance hierarchies. The performance benefits were recognized and popularized after it was implemented.

https://www.gamedevs.org/uploads/data-driven-game-object-sys...

thisNeeds2BeSad · on May 27, 2022

Eh, inheritence is relatively performance friendly.. all the stuff is after all wrapped into one onion shaped object. The problem starts with pointers - aka loose composition and not having "work-bundles" for the hootloop, which usually ideally come as arrays of work-packages.

MA, MA, MA

jayd16 · on May 27, 2022

The goal is to ship games. Games have a fuzzy definition and what you're building always changes during the process. Iteration is very important. You can that by noticing that every game engine has a non-compiled scripting tier usually on top of a slow to compile core.

DoD lets you tweak data w/o a recompile. Data is composable and transferable. DoD is highly iterative and that's a major reason why its in the running.

That said, its also nice that DoD can fairly easily be array oriented and that's where a lot of the speed comes from.

As far as DOTS goes, Unity has been advertising the performance pretty hard because in the short term its going to be a step down ergonomically.

jcelerier · on May 27, 2022

Since the article talks a bit about this part of DoD - plugging my C++ library for automatic conversion of AoS layout to SoA: https://github.com/celtera/ahsohtoa if it can be useful to anyone ; the entire code is pretty short and fully documented to explain the techniques used: https://github.com/celtera/ahsohtoa/blob/main/include/ahsoht...

tyleo · on May 27, 2022

Why not just write it as SoA to begin with? I think the library is cool. It’s cool that this can be done with C++ and it’s cool that someone did it. However, when considering something like this for production I always think, “folks will understand things better if they have to type it themselves and can _see_ the structure in the source code.”

Are there real production uses for this or do you consider it a toy?

jcelerier · on May 27, 2022

> Why not just write it as SoA to begin with?

To give my personal reason: I have absolutely never managed to find SoA style code readable at all aha

> folks will understand things better if they have to type it themselves

I guess it comes down to if you work in a top-down or bottom-up way to create an understanding of a codebase. Generally, I start by looking at the class names to get a general idea of the architecture and make some mental diagrams of it, and then I zoom in to the level of detail needed. I know other people work better by reading individual functions and working upward from there but it's definitely not my case.

klik99 · on May 27, 2022

I’ve been doing a lot of stuff in the UE4/5 Slate framework lately which uses a declarative construction syntax - the amount of boilerplate to enable the required declarative syntax is frustrating and would love to have it automatically done.

But truthfully I’m not a fan of doing this kind of declarative DSL within c++, it’s just not a pleasant experience and would rather use yaml/toml or embedded lua for the data driven interface

dgb23 · on May 27, 2022

I think that’s generally a good point. I guess the utility here is to have a clearer model of “this stuff belongs to this object”. Reminds me a bit of relational projections and views. But at the same time it goes against a principle in this paradigm, to program closer to the machine and the data layout.

chrsig · on May 27, 2022

This definitely has my attention...I like that you parameterized the underlying storage. Dislike the dependency on boost :(

Not having to have an extra code generation step is definitely nice though.

The SoA -> AoS difficulty has driven me to julia for my personal projects, there's a library called StructArrays.jl[0] that is quite similar to your project here.

[0] https://github.com/JuliaArrays/StructArrays.jl

jcelerier · on May 27, 2022

The dependency on boost is not mandatory:

- Either you can use standalone PFR (it does not really change anything as it's almost exactly the same code, just outside of namespace boost) but apparently that is an issue ahah

- Or you can lobby the committee to accept P1061 ; there's already a compiler implementation based on clang-12: https://github.com/ricejasonf/llvm-project/tree/ricejasonf/p...

With it, the code becomes even simpler and would not need PFR at all ; a function such as

    std::size_t create()
    {
      [&]<std::size_t... N>(std::index_sequence<N...>)
      {
        (std::get<N>(vec).push_back({}), ...);
      }
      (indices{});

      return size() - 1;
    }

just becomes

    std::size_t create()
    {
      auto& [... v] = vec;
      ((v.push_back({}), ...);

      return size() - 1;
    }

Warwolt · on May 27, 2022

Nice article! I've been curious about DoD for a while, but have found most existing articles and videos on the subject to be very vague and fuzzy. This more precise treatment of what the author means by DoD is welcome in my opinion.

runevault · on May 27, 2022

Hopefully once Unity's ECS portion of DOTS gets stabilized we'll start seeing more and more practical examples of the design pattern, because Unity tutorials are rarely as architecture astronaut, focused on "this is how I did xyz and now my character jumps!"

colbyhub · on May 27, 2022

Lately, I've been wanting to explore the DOD pattern in the world of full-stack web development to see if there would be similar benefits. Might put together a proof-of-concept this weekend!

Has anyone else explored this?

wswope · on May 27, 2022

I've been doing something in this vein for a big personal project, using this python library: https://nackjicholson.github.io/aiosql/.

In short, I'm using a run of the mill stack (Caddy/Gunicorn/Flask/Postgres) - but with the twist that all my core logic is defined in plaintext SQL files, which get bound into namespaced Python methods by aiosql. Routing, error handling, templating, etc. are all done in Python - but data manipulation and processing are outsourced to the DB level. All database object definitions are laid out in a massive, idempotent "init_db" method that gets called at launch, so I can essentially point the app at a fresh instance of Postgres and rebuild from scratch. The design is primarily driven by my personal distaste for ORMs, but I've found it extremely beneficial in terms of rigid typing, integrity checks, and performance.

evilturnip · on May 27, 2022

If it's just to serve CRUD apps or run web sites, bottleneck is network time and browser rendering time, unless you're doing actual data processing.

And if you are doing data processing, easier to use a data science library like Pandas for Python which implicitly have DOD built-in.