> and may even be useful in higher-level garbage-collected languages to reduce pressure on the garbage collector.
> The approach described here works very well though with a data-oriented architecture, where central systems work on arrays of data items packed tightly in memory.
I have been pushing us into a new data-oriented architecture for our software, but for reasons of domain modeling (managing complexity) rather than performance. This is a very interesting read from my perspective.
Let me give an example - We took deep, circular object graphs like:
And turned them into 2 flat arrays of Account and Customer. A 3rd relation type is also developed to arbitrarily relate these types together. The relation type also conveys all facts that pertain to the nature of the relationship, rather than forcing this knowledge onto Account or Customer.
This ideology is carried to the logical extreme - We do not permit any nesting of complex types in our revised domain model object graph. There is a root type which contains all of the List<T>s of domain types (and applicable relation types). The domain model is composed such that we can directly map every single type & public property into SQL tables. And we do. Because SQL is amazing for executing business logic.
Now that I am thinking this through, performance is certainly also seeing a benefit here. We are no longer serializing these absolutely horrific JSON object graphs. It's just a bunch of flat complex type arrays now, with everything related together using GUIDs.
From a developer QOL perspective, one could argue that this is more difficult to work with because everything is separated by relations. I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.
> I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.
In my own database mapping code for my game (I use RethinkDB) I made a type called DatabaseLink<T> where T : DatabaseEntry, with a Value property that will retrieve the link target from the database cache. It's almost as fast/convenient as a simple field, and it serializes to a GUID.
> Because SQL is amazing for executing business logic.
Can you explain this a bit more? I've always heard that there are supposed benefits to writing business logic in SQL, and I've made efforts to put it into practice, but the more I deal it, the more I dislike it.
I just finished rewriting a complex calculation that was being done in SQL into C#, and the C# runs about 10 times faster, is easier to understand, easier to debug, easier to change, and easier to test. The SQL was written as a series of views each building on top of another, some doing grouping, some doing calculations, etc. until it gets the final result which is copied into a table.
Let's say I need to calculate intermediate result A (like maybe a giant case statement) as input to calculate B, C, and D, which will then go on to be combined in various ways. In order to avoid inlining complex calculation A in 3 different places, I can either write a sub-select, a CTE, put it in a view, or copy it to a new table variable. But no matter how I handle it, I end up with layers and layers of nested select statements, each performing one more step of the calculation. Doing the same thing in an imperative language usually ends up seeming trivial compared to doing the same thing in SQL. In C#, I'd just add a function or read-only property to calculate A, and then use that wherever it's needed, but in SQL, adding such a requirement can mean restructuring the entire query.
In another case, I've taken a stored procedure that selected data into a table variable and manipulated that, and rewrote the whole thing in C#, getting a 15x performance gain (from almost a minute down to 3-4 seconds, and pretty much all that time is retrieving relevant rows from various tables). It does the exact same work, but working with classes instead of a table variable. When I originally started working on the stored procedure, it used cursors, and took 15 minutes to run.
In another case I was recently working on, what should be an efficient calculation joining 5 tables together into a view ended up taking several seconds to select one record and is called quite frequently (i.e. like 50 times in a second during application startup). I added an index to speed it up, but instead that index made other queries unbearably slow because the optimizer for those queries completely changed how they were executed based on the new index. So I put triggers to copy the view to a cache table every time something changes that the view depends on. But now it takes several seconds for each update of a row, which is still a win, because writes are maybe a few times a day, but reads are continuous. But it means that updating multiple rows at once will now be really, really slow, because the cache table has to always be up to date, and the trigger has to update it preemptively rather than lazily, because querying from the cache table can't cause it to update itself. I might end up having to rewrite the trigger to do the joining in C# if it causes too many performance problems.
On another database, I have to manually call update statistics pretty frequently, or the query optimizer breaks (i.e. extreme slowness), because frequently appending hundreds of records to a table that holds over a hundred thousand rows with an auto-increment primary key should obviously invalidate the entire query plan. I'm not blaming this on a poor job by the query optimizer. Rather, on the fact that SQL depends too much on letting a generic query optimizer decide on the fly what it thinks is the best way to accomplish a complex query, instead of writing code that will always have deterministic performance characteristics no matter what the statistics say. It's a blessing and a curse that the query optimizer can change based on actual data. But that seems much more important for writing ad hoc queries, and when writing an application, I think it's more important to have more direct control over the performance characteristics.
SQL definitely has its strengths, and I have no desire to stop using it, but I don't understand the claim that it's good for writing general business logic. Maybe I just haven't seen the places where it really shines in this regard, so I would like to hear about ways it's better, because maybe I'm doing it wrong.
We don't use SQL as the authoritative store of business state. We use C# types and all the code that goes around them for most of our heavy-lifting.
We map our C# domain model instance into a SQL database (SQLite in-memory) each time we want to execute business rule SQL against the state. These databases are ephemeral. For some interactions, we might generate 5-6 of them per user action. It is very fast to do this when your row counts are low (i.e. projections per user or per user transaction).
Pointers are just handles to heap memory. All i see advocated for here is a per "system" memory manager that also has custom code that gets called on deref. On one side you get a chance to do some logic before a deref, on the other you now no longer know how long a deref takes.
I like this approach too and use it a lot, so it's good to see it advocated for. I just don't understand why they have to be "better than pointers" when they are just a different tradeoff.
Not mentioned: the collateral damage. You'll probably trash your debugger's object visualizer in doing this. And probably other facilities in your IDE that helps you deal with class members, really.
This is not unsolvable problem. In the simplest case you can (in debug build) add private pointer to actual object to the handle structure. If you're doing something more complicated..then you presumably know what you're doing and the debugging strategies/tradeoffs.
I haven't thought this through totally but you can write visualizers in VS[1]. I don't know if it'd work but there's a fair amount of power in natvis expressions.
I do write natvis but it seems to be based on the assumption that the object (+ globals) are sufficient for displaying the data. Which isn't the case if your handle is an index into some other object/array at a random place in memory (unless you make the array a global).
In my opinion, when you're serious about debugging your code, you'll use your own debugging tools, and from experience I'd say that using handles is not an issue.
But yeah, standard IDEs won't see anything behind the opaque reference.
I've worked in fields where no debugger where available, not even printf. (embedded software, interrupt driven)
I had to write ad-hoc debugging tools with difficult contraints.
I grew the habit of debugging by reading the code first, then using traces, then writing custom tests or ad-hoc tools as last resort solution. (extremely rare)
I used Visual Studio debugger and GDB a few times, but I've not been convinced by the productivity gains, quite the contrary, but I am probably not using them well.
In practice, the most difficult bugs are often tied to concurrency issues, and I haven't found much help with those general purpose tools.
I mean, these are orthogonal, not mutually exclusive. You need to use whatever combination of tools makes sense for your tasks. Sometimes it's a debugger, sometimes it's printf, sometimes it's a hard reboot, sometimes it's an infinite loop, etc... and sometimes there just isn't any great solution. Nobody was suggesting you have to use a debugger everywhere or that a debugger is the solution to every problem. Just like none of the solutions you mentioned is. I'm just saying if you are using a debugger/IDE (which I can assure people use quite seriously, even if you don't come across such situations personally), then you might want to watch out for the repercussions.
I'm already having to write my own operating system, CPU architecture, and programming language in order to do "Hello world" the "serious" way, and now you want me to write my own debugger, too?
I agree. Perhaps also known as "data-oriented design". Decompose entities into components by modelling the data in a normalised way, as is done when modelling for relational databases. Group together similar data as records in homogeneous arrays, and batch process it without need for OOP polymorphic branching etc.
This way of structuring applications may not be a good fit for all software development contexts. It may be a good fit for game development with high performance requirements, or similar kinds of situations.
I've been learning about it in a simulation environment, not a game dev one. I think some, but not all, of the constraints are similar. I'll take a look at those references.
Nah there’s far far more to ECS than just allocating this way.
Put it this way: You can do ECS without this strategy, and you can do this strategy without doing ECS. Therefore: they are not conceptually related.
However, they go really well together.
(My cress: introduced/advocated-for ECS at two different game studios, with varying levels of success. One already used the handle strategy heavily, for all kinds of resources, but definitely wasn’t doing ECS.)
How would this work for a hierarchical structure like a B-tree?
Also: If, say, 30% of your system is pointers and you have them "spread all over the place", wouldn't moving are your pointers to a central location result in pointers "spread all over" two different places.
"Cache oblivious" systems organize B-trees and related so related objects are close together in memory. It seems like a system with bits of things relating to each other scattered about would have to just use some ad-hoc similarity measure to minimize cache hits.
I don't think this advice is suppose to apply to /actual/ data structures. Rather, it applies to "business-logic objects" (e.g. video game entities or whatever).
> How would this work for a hierarchical structure like a B-tree?
For something like B-trees you want to make a distinction between internal and external references. Internal references ("child pointers") can just be 32-bit indices into an array of fixed-size items (each the size of a cache line or a multiple thereof) or even naked pointers if you don't mind the inefficiency on 64-bit platforms. Depending on the requirements, external references are often best represented as some kind of handle with explicit registration. This is especially helpful if you need to manage the lifetime for persistent B-trees (where different trees can share subtrees) since the external reference handles serve as GC roots, so you require them to be explicitly registered while the internal references are owned by the B-tree heap and can therefore be direct while still being traceable or refcountable; this also works well for dealing with reclamation in a concurrent system.
So, you almost always want to distinguish internal vs external references, and handles are primarily a solution candidate for external references.
You could encode the item's location in the tree into the handle itself, though this assumes that the tree will never be rebalanced while that handle remains active.
This is basically (from what I understand) how hierarchal triangular meshes (HTM) works. The "coordinate" is a number describing the location of a coordinate's triangle within a larger triangle within a larger triangle within a larger triangle and so on until you get to an octahedron.
Sometimes a tree or graph really is the right way, but if you want memory locality, you could have an array of nodes and an array of edges that reference the nodes by index. That’s effectively what the author is proposing (pooling), rather than allocating all of the nodes and edges individually.
AST, CST, CFG, DFG, PDG, or any other data structure for representing a program. But I can't imagine it being done with a simple array or a hash map. The all end in T or G! How would you approach that with your way of doing things? I guess if you work on game engines this is a relevant problem? Or if not maybe a graphics data structure like a BSP? How do you do that without trees? Or a NPC navigation data structure? How do you do that without graphs?
Just wanted to say thank you for this comment. It led me to this paper http://www.cs.ucf.edu/~jmesit/publications/scsc%202005.pdf which introduced some new ideas to me alongside of concepts that I were familiar with so it’s been generally easy but useful reading so far :)
I've watched this talk before, and I really think it's just about transposition. Column rather than row. If you have a graph and represent it as an adjacency array... lol it's still a graph you're just looking at it differently! You haven't gotten rid of the inherent graph property of the data structure!
But all of these are true for pointers too? Preallocate memory, return a pointer to it, lazily fill in parts later. You're already doing that with a handle anyway, whether it's indexing into array memory or something else.
As I see it, this just isn't true. I think you're going to have to elaborate on this with an actual concrete example (i.e. code) if you'd like to make a convincing argument.
> But can you agree that a pointer does not give you anything else than a direct reference to a memory area?
Depends what you mean. Not if I take it literally, given you can generally shove a few extra bits into pointers. But in a sense it's approximately true.
> Because of the indirection, a handle can do much more, depending on the implementation.
Not unless you're playing semantics with what "handle" means. I'm going on the assumption that we can assume it's an opaque reference of equal size as a pointer to data that is or will be in the machine's user-mode memory. If that's not what you mean, then this is a seriously pointless argument, given you can obviously define "handles" to anything (even data on another machine) where nobody in their right mind would be arguing you could just replace it with a pointer...
Again, this is why you really need to provide examples to make a convincing argument (and you really can't expect people to wait for you to publish an article if you're going to bring it up beforehand). So we don't end up arguing over something that turns out to be pointless.
> I'm going on the assumption that we can assume it's an opaque reference of equal size as a pointer to data that is or will be in the machine's user-mode memory.
If the handle is an array index or a hash, it can be much smaller than a pointer. 8 bits in extreme cases, often 16 bits (if there's only a few thousands objects of that type), and we almost never ever need more than 32 bits.
Such handles can easily have less memory overhead than actual pointers, even after accounting for the redundancies needed for some runtime checks.
If you shove a few extra bits into a pointer, then it can't directly be used to dereference something.
In practice it would be what I call an immediate handle (a handle that can be resolved to a pointer with a test and simple bitmask)
void* ResolveHandle(handle h); this is the basic API, and it will return nullptr if this handle is not allocated or has been freed. Or it will block if this handle is a promise. The handle itself can be an index, but also the hash of a string, the string being a path to a ressource that is not even loaded yet.
> If you shove a few extra bits into a pointer, then it can't directly be used to dereference something. In practice it would be what I call an immediate handle (a handle that can be resolved to a pointer with a test and simple bitmask)
That's just a smart pointer. Unlike a handle, it's anything but opaque. (And in your comment you specifically lumped smart pointers under "pointers" too, so we should be in agreement.)
> void* ResolveHandle(handle h); this is the basic API, and it will return nullptr if this handle is not allocated or has been freed. Or it will block if this handle is a promise.
Again, there's nothing preventing you from making a smart pointer that does exactly what you're talking about here, which (again) both of us would lump under "pointer".
Perhaps worth elaborating on: The concepts of handles and pointers, as I'm talking about them at least, are fairly language-agnostic. What makes a handle a "handle" to me (and I would've thought to you too) isn't the wrapper around it; it's the opaqueness. Not from a mechanical sense (i.e. not the public vs. private or the language syntax, which aren't even relevant in every language) but from an informational sense: whether it can be "dereferenced" without additional information.
Are there any programming languages that natively support them? Like, something that's physically a tuple of (data structure pointer, index), but with the syntax allowing you to directly access it like any other reference. Bonus points for "get all handles to me" as an operation.
The D language uses that to represent slices. It's a native type that has two fields - a pointer and a length. They're used like pointers, but with bounds checking.
Not just good by overflows, it’s a better paradigm because you’re passing the context of the totality of the data to the target. It’s impossible to get the “wrong” length because it’s an intrinsic part of the reference.
but there's been no uptake on it. I don't get this reluctance to exterminate the biggest source of security bugs in C code, but that sort of reluctance is why I started the D project.
> I don't get this reluctance to exterminate the biggest source of security bugs in C code
The forward thinking people who care about security bugs in C code - those ones who aren't reluctant - did exactly what I'd argue you've done: given up on C. It's selection bias in action.
You started the D project. I'm RIIRing. Others prefer Python. Or perhaps one might use a language that compiles down into C, allowing you the best of both worlds - a decent programming language, and the ability to target whatever wretched excuse for a C89 compiler that has been forced upon you by whatever niche platform you're targeting.
Fixing C's biggest mistake won't fix C. Fixing C's 10 biggest mistakes won't fix C. Fixing C's 100 biggest mistakes won't fix C, and by that point, it will no longer be C. There's an argument to be made for incremental improvements, so I won't discourage your efforts, but it's also the kind of pragmatic, boring, middle ground, half answer that won't make anyone outright excited either.
For excitement, DasBetterC is the answer. It looks and feels just like C, but is a modern language.
But adding slices is an easy add to C, it is backwards compatible, it has zero impact on performance, it does not degrade anything about the C experience, it can be incorporated incrementally, and half of the memory vulnerabilities are taken behind the woodshed and shot.
I'll even claim it leads to performance increases (from not having to do strlen over and over).
I really do like slices, but implementing them as an (interior) pointer and a length creates some problems for GC'd languages. Is D GC'd by default?
The main problem is that a GC needs to find the start of an object to scan it. Go originally solved this with 2-bit tags per word of memory. Now, I think they use size classes.
For Virgil, I am considering adding slices but initially implementing them as a triple of (Array, start, length), to simplify this problem. This is pretty much required for Virgil's JVM backend, but the native backend can be more efficient in the future.
Btw, can you racily write slice values and observe tearing in D?
It has a GC, but uses it only when operator "new" is called and a couple other cases.
> can you racily write slice values and observe tearing in D?
Shared data (meaning data accessible by multiple threads) is handled separately through locking library functions. Shared data types are distinct from other types.
Can you racily write the same field containing a slice from multiple threads? Unless you have atomic multi-word writes, you could observe tearing and go out of bounds silently.
One thing that JVM guarantees is that racy programs cannot break the type system. This gets hard when you have multi-word quantities.
One problem you encounter if you have something like this is that you end up with unnecessary states in your program. Like you start storing these in arrays and then frequently find the data structure pointers for all your objects are the same, because the owner of that array already assumes the same underneath data structure. Doubly (or multiply) so if the object itself has another pointer to its owning(?) data structure inside it. This inserts degrees of freedom into your program that you don't have any use for, and probably wouldn't have otherwise.
What's the downside here? It's hard to predict sometimes, but in general, there's increased potential for redundant fields to go out of sync. (And if your arrays are large, it can end up being a giant waste of memory too.) One thing I've gradually learned over time is that one of the best mechanisms for preventing your program from entering an erroneous state is to make it so that erroneous states cannot exist in the first place. If something can't go wrong, it won't go wrong. That means avoiding degrees of freedom in your program that you don't actually need. Which is precisely the opposite of what we have here, where we've redundantly replicated the same information copied across lots of objects.
And there are also other side-effects possible here, e.g., now you'll run into trouble if you ever decide to use the handles atomically (since now it's 2 pointers' worth of data), and you might also run into trouble with your IDE given that you can no longer use some built-in facilities for OOP, debugging, and/or visualization. There are likely more that I'm failing to remember off the top of my head.
Of course there are some advantages too, but my point is: it's hardly an obvious design choice, and it can work out to your disadvantage in ways you don't expect.
OP specifically uses that redundancy to prevent use-after-free errors. That redundancy is limited to the handle, and checked at every dereference (in the get_pointer() so you don't forget to check). Desyncs between a handle and the object it points to ought not to be possible at this point.
If you remove the redundancy, you have to give up on use-after-free checks. If you want to prevent use-after-free anyway, you need to use a simpler allocation pattern (stack or arena), or fall back to heap allocated pointers (smart pointers in C++, various kind of references in Rust…). But heap allocated pointers were precisely what we were trying to avoid in the first place, most notably because they're not cache friendly, cause memory fragmentation, make leaks harder to detect, and in many cases simply take up much more space than handles (remember, pointers are 8 bytes nowadays).
Using handles atomically is probably a consequence of a very bad idea: threads with shared mutable state and locks. As much as humanely possible, you don't want to share any mutable memory to begin with. Send messages instead, it's less error prone, and with current computers, it's often faster as well. See http://ithare.com/ on (re)actors for the gory details.
About debugging: some people read code by running it in the default step by step debugger provided by their IDE. I don't understand that way of thinking, so I don't bother catering to it. My code then is optimised for readability without special tools. Which I hardly ever need in practice. (Of course, sometimes tools like GDB or Valgrind do save my life, but that's a rare enough occurrence that losing a bit of time when that happens is not a big deal.)
Last quip: the existence of OOP tools is hardly a criterion when choosing how OOP your program should be. If pointer fest hurts performance too much, or if your domain (there are many) is not suited to OOP, tools won't compensate for the impedance mismatch.
Few things that expect a pointer do it by expecting an opaque dereferenceable thing. Most things that expect pointers actually expect a specific concrete pointer type (even if it's a pointer to an opaque thing.)
IMHO, a language with "real suppport" for handles would use them everywhere, instead of pointers—i.e. make all places where pointers pass through "wide enough" for handles to be passed instead—with reduction to a pointer being a kind of WPO for cases where it's proven to be possible.
That can give you the syntax! But what I'm curious about is more the implications of actually having this as a part of the language. Does it open up interesting space in optimization or memory safety?
This is a bit like what Regent does. "Pointers" are really offsets into regions, which means that we can transparently move them around a distributed machine on the user's behalf.
> move all memory management into centralized systems (like rendering, physics, animation, …), with the systems being the sole owner of their memory allocations
> when creating an item, only return an ‘index-handle’ to the outside world, not a pointer to the item
> in the index-handles, only use as many bits as needed for the array index
> only convert a handle to a pointer when absolutely needed, and don’t store the pointer anywhere
It's not. If you implement this strategy in a garbage collected language, you can end up with use-after-frees or double frees (though with the advantage of type safety—if you use a handle of type Foo after freeing it, you'll never stomp on the memory of an object of type Bar, for any Bar≠Foo).
In C garbage is collected "manually", by calling malloc and free. If you design your own heap, you also have to design the memory management around it.
Strange to read if you were programming the Macintosh in the 80s. I kind of thought of handles as just what you used when you didn't have an OS that supported an MMU...
> The approach described here works very well though with a data-oriented architecture, where central systems work on arrays of data items packed tightly in memory.
I have been pushing us into a new data-oriented architecture for our software, but for reasons of domain modeling (managing complexity) rather than performance. This is a very interesting read from my perspective.
Let me give an example - We took deep, circular object graphs like:
And turned them into 2 flat arrays of Account and Customer. A 3rd relation type is also developed to arbitrarily relate these types together. The relation type also conveys all facts that pertain to the nature of the relationship, rather than forcing this knowledge onto Account or Customer.This ideology is carried to the logical extreme - We do not permit any nesting of complex types in our revised domain model object graph. There is a root type which contains all of the List<T>s of domain types (and applicable relation types). The domain model is composed such that we can directly map every single type & public property into SQL tables. And we do. Because SQL is amazing for executing business logic.
Now that I am thinking this through, performance is certainly also seeing a benefit here. We are no longer serializing these absolutely horrific JSON object graphs. It's just a bunch of flat complex type arrays now, with everything related together using GUIDs.
From a developer QOL perspective, one could argue that this is more difficult to work with because everything is separated by relations. I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.