Handles are the better pointers (2018)

bob1029 · on April 3, 2021

> and may even be useful in higher-level garbage-collected languages to reduce pressure on the garbage collector.

> The approach described here works very well though with a data-oriented architecture, where central systems work on arrays of data items packed tightly in memory.

I have been pushing us into a new data-oriented architecture for our software, but for reasons of domain modeling (managing complexity) rather than performance. This is a very interesting read from my perspective.

Let me give an example - We took deep, circular object graphs like:

  Account.Customers[0].Accounts[1].Customers[2].FirstName

And turned them into 2 flat arrays of Account and Customer. A 3rd relation type is also developed to arbitrarily relate these types together. The relation type also conveys all facts that pertain to the nature of the relationship, rather than forcing this knowledge onto Account or Customer.

This ideology is carried to the logical extreme - We do not permit any nesting of complex types in our revised domain model object graph. There is a root type which contains all of the List<T>s of domain types (and applicable relation types). The domain model is composed such that we can directly map every single type & public property into SQL tables. And we do. Because SQL is amazing for executing business logic.

Now that I am thinking this through, performance is certainly also seeing a benefit here. We are no longer serializing these absolutely horrific JSON object graphs. It's just a bunch of flat complex type arrays now, with everything related together using GUIDs.

From a developer QOL perspective, one could argue that this is more difficult to work with because everything is separated by relations. I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.

AlphaSite · on April 3, 2021

It sounds like you’re describing normalised data, is that a fair statement?

bob1029 · on April 3, 2021

Yes - Something around 3NF/BCNF.

tehlike · on April 3, 2021

I was going to implement exactly this, but out of laziness ended up using this: https://www.npmjs.com/package/json-serialize-refs

For my use case serialized json was 30+% smaller.

pixelbro · on April 3, 2021

> I would say if you are fortunate enough to be working with something like .NET, you can abuse the shit out of LINQ and not really feel the added complexity.

In my own database mapping code for my game (I use RethinkDB) I made a type called DatabaseLink<T> where T : DatabaseEntry, with a Value property that will retrieve the link target from the database cache. It's almost as fast/convenient as a simple field, and it serializes to a GUID.

linknoid · on April 3, 2021

> Because SQL is amazing for executing business logic.

Can you explain this a bit more? I've always heard that there are supposed benefits to writing business logic in SQL, and I've made efforts to put it into practice, but the more I deal it, the more I dislike it.

I just finished rewriting a complex calculation that was being done in SQL into C#, and the C# runs about 10 times faster, is easier to understand, easier to debug, easier to change, and easier to test. The SQL was written as a series of views each building on top of another, some doing grouping, some doing calculations, etc. until it gets the final result which is copied into a table.

Let's say I need to calculate intermediate result A (like maybe a giant case statement) as input to calculate B, C, and D, which will then go on to be combined in various ways. In order to avoid inlining complex calculation A in 3 different places, I can either write a sub-select, a CTE, put it in a view, or copy it to a new table variable. But no matter how I handle it, I end up with layers and layers of nested select statements, each performing one more step of the calculation. Doing the same thing in an imperative language usually ends up seeming trivial compared to doing the same thing in SQL. In C#, I'd just add a function or read-only property to calculate A, and then use that wherever it's needed, but in SQL, adding such a requirement can mean restructuring the entire query.

In another case, I've taken a stored procedure that selected data into a table variable and manipulated that, and rewrote the whole thing in C#, getting a 15x performance gain (from almost a minute down to 3-4 seconds, and pretty much all that time is retrieving relevant rows from various tables). It does the exact same work, but working with classes instead of a table variable. When I originally started working on the stored procedure, it used cursors, and took 15 minutes to run.

In another case I was recently working on, what should be an efficient calculation joining 5 tables together into a view ended up taking several seconds to select one record and is called quite frequently (i.e. like 50 times in a second during application startup). I added an index to speed it up, but instead that index made other queries unbearably slow because the optimizer for those queries completely changed how they were executed based on the new index. So I put triggers to copy the view to a cache table every time something changes that the view depends on. But now it takes several seconds for each update of a row, which is still a win, because writes are maybe a few times a day, but reads are continuous. But it means that updating multiple rows at once will now be really, really slow, because the cache table has to always be up to date, and the trigger has to update it preemptively rather than lazily, because querying from the cache table can't cause it to update itself. I might end up having to rewrite the trigger to do the joining in C# if it causes too many performance problems.

On another database, I have to manually call update statistics pretty frequently, or the query optimizer breaks (i.e. extreme slowness), because frequently appending hundreds of records to a table that holds over a hundred thousand rows with an auto-increment primary key should obviously invalidate the entire query plan. I'm not blaming this on a poor job by the query optimizer. Rather, on the fact that SQL depends too much on letting a generic query optimizer decide on the fly what it thinks is the best way to accomplish a complex query, instead of writing code that will always have deterministic performance characteristics no matter what the statistics say. It's a blessing and a curse that the query optimizer can change based on actual data. But that seems much more important for writing ad hoc queries, and when writing an application, I think it's more important to have more direct control over the performance characteristics.

SQL definitely has its strengths, and I have no desire to stop using it, but I don't understand the claim that it's good for writing general business logic. Maybe I just haven't seen the places where it really shines in this regard, so I would like to hear about ways it's better, because maybe I'm doing it wrong.

bob1029 · on April 3, 2021

We don't use SQL as the authoritative store of business state. We use C# types and all the code that goes around them for most of our heavy-lifting.

We map our C# domain model instance into a SQL database (SQLite in-memory) each time we want to execute business rule SQL against the state. These databases are ephemeral. For some interactions, we might generate 5-6 of them per user action. It is very fast to do this when your row counts are low (i.e. projections per user or per user transaction).

nopassrecover · on April 4, 2021

What are the advantages of this over using LINQ on the in-memory object model?

bob1029 · on April 4, 2021

You don't have to recompile the application to write new SQL queries.

dang · on April 3, 2021

If curious, past threads:

Handles Are the Better Pointers (2018) - https://news.ycombinator.com/item?id=24141541 - Aug 2020 (106 comments)

Handles are the better pointers - https://news.ycombinator.com/item?id=17332638 - June 2018 (80 comments)

delusional · on April 3, 2021

Handles are the better handles?

Pointers are just handles to heap memory. All i see advocated for here is a per "system" memory manager that also has custom code that gets called on deref. On one side you get a chance to do some logic before a deref, on the other you now no longer know how long a deref takes.

I like this approach too and use it a lot, so it's good to see it advocated for. I just don't understand why they have to be "better than pointers" when they are just a different tradeoff.

dataflow · on April 3, 2021

Not mentioned: the collateral damage. You'll probably trash your debugger's object visualizer in doing this. And probably other facilities in your IDE that helps you deal with class members, really.

rini17 · on April 3, 2021

This is not unsolvable problem. In the simplest case you can (in debug build) add private pointer to actual object to the handle structure. If you're doing something more complicated..then you presumably know what you're doing and the debugging strategies/tradeoffs.

dataflow · on April 3, 2021

Yeah I didn't mean it was unsolvable, I just meant this is something you'll want to consider in the process and adjust accordingly.

bentcorner · on April 3, 2021

I haven't thought this through totally but you can write visualizers in VS[1]. I don't know if it'd work but there's a fair amount of power in natvis expressions.

[1] https://docs.microsoft.com/en-us/visualstudio/debugger/creat...)

dataflow · on April 3, 2021

I do write natvis but it seems to be based on the assumption that the object (+ globals) are sufficient for displaying the data. Which isn't the case if your handle is an index into some other object/array at a random place in memory (unless you make the array a global).

stephc_int13 · on April 3, 2021

In my opinion, when you're serious about debugging your code, you'll use your own debugging tools, and from experience I'd say that using handles is not an issue.

But yeah, standard IDEs won't see anything behind the opaque reference.

dataflow · on April 3, 2021

> In my opinion, when you're serious about debugging your code, you'll use your own debugging tools

So what does this mean about my past debugging experiences? And and those of everyone else who uses an IDE? They've been jokes all along?

stephc_int13 · on April 3, 2021

I think this is highly context dependent.

I've worked in fields where no debugger where available, not even printf. (embedded software, interrupt driven)

I had to write ad-hoc debugging tools with difficult contraints.

I grew the habit of debugging by reading the code first, then using traces, then writing custom tests or ad-hoc tools as last resort solution. (extremely rare)

I used Visual Studio debugger and GDB a few times, but I've not been convinced by the productivity gains, quite the contrary, but I am probably not using them well.

In practice, the most difficult bugs are often tied to concurrency issues, and I haven't found much help with those general purpose tools.

dataflow · on April 3, 2021

I mean, these are orthogonal, not mutually exclusive. You need to use whatever combination of tools makes sense for your tasks. Sometimes it's a debugger, sometimes it's printf, sometimes it's a hard reboot, sometimes it's an infinite loop, etc... and sometimes there just isn't any great solution. Nobody was suggesting you have to use a debugger everywhere or that a debugger is the solution to every problem. Just like none of the solutions you mentioned is. I'm just saying if you are using a debugger/IDE (which I can assure people use quite seriously, even if you don't come across such situations personally), then you might want to watch out for the repercussions.

seibelj · on April 3, 2021

I find print debugging to be fine. Unless we are talking Java - coding in that requires the IntelliJ command center.

yellowapple · on April 3, 2021

I'm already having to write my own operating system, CPU architecture, and programming language in order to do "Hello world" the "serious" way, and now you want me to write my own debugger, too?

adrift · on April 3, 2021

What's not to like?

nicetryguy · on April 3, 2021

Mmhmm... What debugging tools have you written?

tel · on April 2, 2021

This appears to be more-or-less the entity-component system (ECS) architecture as well.

shoo · on April 3, 2021

I agree. Perhaps also known as "data-oriented design". Decompose entities into components by modelling the data in a normalised way, as is done when modelling for relational databases. Group together similar data as records in homogeneous arrays, and batch process it without need for OOP polymorphic branching etc.

Further reading/viewing, somewhat gamedev oriented (as is the author of this article): Richard Fabian's Data-Oriented Design book https://www.dataorienteddesign.com/dodbook/ ; Mike Acton's 2014 CppCon "data oriented design and c++" talk https://www.youtube.com/watch?v=rX0ItVEVjHc

This way of structuring applications may not be a good fit for all software development contexts. It may be a good fit for game development with high performance requirements, or similar kinds of situations.

tel · on April 3, 2021

That's helpful, thanks!

I've been learning about it in a simulation environment, not a game dev one. I think some, but not all, of the constraints are similar. I'll take a look at those references.

DecoPerson · on April 3, 2021

Nah there’s far far more to ECS than just allocating this way.

Put it this way: You can do ECS without this strategy, and you can do this strategy without doing ECS. Therefore: they are not conceptually related.

However, they go really well together.

(My cress: introduced/advocated-for ECS at two different game studios, with varying levels of success. One already used the handle strategy heavily, for all kinds of resources, but definitely wasn’t doing ECS.)

spaetzleesser · on April 3, 2021

"There is no problem in computer science that can't be solved with an additional level of indirection" :-)

joe_the_user · on April 2, 2021

How would this work for a hierarchical structure like a B-tree?

Also: If, say, 30% of your system is pointers and you have them "spread all over the place", wouldn't moving are your pointers to a central location result in pointers "spread all over" two different places.

"Cache oblivious" systems organize B-trees and related so related objects are close together in memory. It seems like a system with bits of things relating to each other scattered about would have to just use some ad-hoc similarity measure to minimize cache hits.

snicker7 · on April 2, 2021

I don't think this advice is suppose to apply to /actual/ data structures. Rather, it applies to "business-logic objects" (e.g. video game entities or whatever).

psykotic · on April 3, 2021

> How would this work for a hierarchical structure like a B-tree?

For something like B-trees you want to make a distinction between internal and external references. Internal references ("child pointers") can just be 32-bit indices into an array of fixed-size items (each the size of a cache line or a multiple thereof) or even naked pointers if you don't mind the inefficiency on 64-bit platforms. Depending on the requirements, external references are often best represented as some kind of handle with explicit registration. This is especially helpful if you need to manage the lifetime for persistent B-trees (where different trees can share subtrees) since the external reference handles serve as GC roots, so you require them to be explicitly registered while the internal references are owned by the B-tree heap and can therefore be direct while still being traceable or refcountable; this also works well for dealing with reclamation in a concurrent system.

So, you almost always want to distinguish internal vs external references, and handles are primarily a solution candidate for external references.

yellowapple · on April 2, 2021

You could encode the item's location in the tree into the handle itself, though this assumes that the tree will never be rebalanced while that handle remains active.

This is basically (from what I understand) how hierarchal triangular meshes (HTM) works. The "coordinate" is a number describing the location of a coordinate's triangle within a larger triangle within a larger triangle within a larger triangle and so on until you get to an octahedron.

stephc_int13 · on April 2, 2021

I don't like to use tree or linked-list, I prefer arrays and hash tables.

I haven't found many problems that couldn't be solved with simple arrays or hash maps.

Same thing about recursive functions, I never use them.

chrisseaton · on April 3, 2021

How would you for example represent a program in memory with simple arrays and hash maps, where most people would use a tree or graph?

munchbunny · on April 3, 2021

Sometimes a tree or graph really is the right way, but if you want memory locality, you could have an array of nodes and an array of edges that reference the nodes by index. That’s effectively what the author is proposing (pooling), rather than allocating all of the nodes and edges individually.

stephc_int13 · on April 3, 2021

Do you mean an AST?

chrisseaton · on April 3, 2021

AST, CST, CFG, DFG, PDG, or any other data structure for representing a program. But I can't imagine it being done with a simple array or a hash map. The all end in T or G! How would you approach that with your way of doing things? I guess if you work on game engines this is a relevant problem? Or if not maybe a graphics data structure like a BSP? How do you do that without trees? Or a NPC navigation data structure? How do you do that without graphs?

stephc_int13 · on April 3, 2021

BSP is a pretty outdated algorithm in the game engine space.

I very often use spatial hashing instead of quad-trees or octrees.

I rarely write interpreters but from memory a simple stack can go a long way.

Scenes are stored as flat arrays of components (ECS style)

Navigation (path finding) is very dependent on the internal level architecture, many tricks can be used to avoid trees or complex computations.

codetrotter · on April 3, 2021

Just wanted to say thank you for this comment. It led me to this paper http://www.cs.ucf.edu/~jmesit/publications/scsc%202005.pdf which introduced some new ideas to me alongside of concepts that I were familiar with so it’s been generally easy but useful reading so far :)

chrisseaton · on April 3, 2021

You probably know more than me but I don't really get how you can go from text source code to a stack without going through a tree or with recursion!

stephc_int13 · on April 3, 2021

https://en.wikipedia.org/wiki/Shunting-yard_algorithm

stephc_int13 · on April 3, 2021

I suggest watching this excellent talk on Data Oriented Design from Mike Acton:

https://www.youtube.com/watch?v=rX0ItVEVjHc

chrisseaton · on April 3, 2021

I've watched this talk before, and I really think it's just about transposition. Column rather than row. If you have a graph and represent it as an adjacency array... lol it's still a graph you're just looking at it differently! You haven't gotten rid of the inherent graph property of the data structure!

stephc_int13 · on April 2, 2021

I've used handles for memory management for more than 15 years and I'll never go back to pointers, smart or not.

I'll publish an article about the memory management architecture of Crayfish, my new engine, when I find the time.

Handles have many nice properties, for example you use them to easily implement promises.

dataflow · on April 3, 2021

> Handles have many nice properties, for example you use them to easily implement promises.

Confused, (how) is this different from pointers?

stephc_int13 · on April 3, 2021

Handles in the general way are an opaque reference that can be resolved into a pointer.

Index into an array is only one of the multiples implementations.

It can also be a hashed key or even contains directly a pointer

Because handles are weak references they can be used as promises, returned immediately but resolvable only when the the actual work is done.

dataflow · on April 3, 2021

But all of these are true for pointers too? Preallocate memory, return a pointer to it, lazily fill in parts later. You're already doing that with a handle anyway, whether it's indexing into array memory or something else.

stephc_int13 · on April 3, 2021

You can't always pre-allocate and the pointer is not giving you much information about the availability of the data behind it.

dataflow · on April 3, 2021

As I see it, this just isn't true. I think you're going to have to elaborate on this with an actual concrete example (i.e. code) if you'd like to make a convincing argument.

xaedes · on April 3, 2021

"pointer is not giving you much information about the availability of the data behind it"

"elaborate on this with an actual concrete example"

The pointer may get invalidated, for example by reallocation of the underlying data structure:

    #include <iostream>
    #include <vector>

    struct ItemPrinter
    {
        int* ptr;
        void print()
        {
            std::cout << (*ptr) << std::endl;
        };
    };

    int main()
    {
        std::vector<int> items{1};
        ItemPrinter foo{&(items[0])};
        while(items.size()<1000)
        {
            foo.print();
            items.push_back(2);
        }
    }

When push_back exceeds the vector capacity, it needs to reallocate and `foo.ptr` gets invalidated.

Output when executed via http://cpp.sh/

stephc_int13 · on April 3, 2021

You'll have to wait until I publish my article on this subject if you want to see some code.

But can you agree that a pointer does not give you anything else than a direct reference to a memory area?

Because of the indirection, a handle can do much more, depending on the implementation.

dataflow · on April 3, 2021

> But can you agree that a pointer does not give you anything else than a direct reference to a memory area?

Depends what you mean. Not if I take it literally, given you can generally shove a few extra bits into pointers. But in a sense it's approximately true.

> Because of the indirection, a handle can do much more, depending on the implementation.

Not unless you're playing semantics with what "handle" means. I'm going on the assumption that we can assume it's an opaque reference of equal size as a pointer to data that is or will be in the machine's user-mode memory. If that's not what you mean, then this is a seriously pointless argument, given you can obviously define "handles" to anything (even data on another machine) where nobody in their right mind would be arguing you could just replace it with a pointer...

Again, this is why you really need to provide examples to make a convincing argument (and you really can't expect people to wait for you to publish an article if you're going to bring it up beforehand). So we don't end up arguing over something that turns out to be pointless.

loup-vaillant · on April 3, 2021

> I'm going on the assumption that we can assume it's an opaque reference of equal size as a pointer to data that is or will be in the machine's user-mode memory.

If the handle is an array index or a hash, it can be much smaller than a pointer. 8 bits in extreme cases, often 16 bits (if there's only a few thousands objects of that type), and we almost never ever need more than 32 bits.

Such handles can easily have less memory overhead than actual pointers, even after accounting for the redundancies needed for some runtime checks.

stephc_int13 · on April 3, 2021

If you shove a few extra bits into a pointer, then it can't directly be used to dereference something. In practice it would be what I call an immediate handle (a handle that can be resolved to a pointer with a test and simple bitmask)

void* ResolveHandle(handle h); this is the basic API, and it will return nullptr if this handle is not allocated or has been freed. Or it will block if this handle is a promise. The handle itself can be an index, but also the hash of a string, the string being a path to a ressource that is not even loaded yet.

dataflow · on April 3, 2021

> If you shove a few extra bits into a pointer, then it can't directly be used to dereference something. In practice it would be what I call an immediate handle (a handle that can be resolved to a pointer with a test and simple bitmask)

That's just a smart pointer. Unlike a handle, it's anything but opaque. (And in your comment you specifically lumped smart pointers under "pointers" too, so we should be in agreement.)

> void* ResolveHandle(handle h); this is the basic API, and it will return nullptr if this handle is not allocated or has been freed. Or it will block if this handle is a promise.

Again, there's nothing preventing you from making a smart pointer that does exactly what you're talking about here, which (again) both of us would lump under "pointer".

Perhaps worth elaborating on: The concepts of handles and pointers, as I'm talking about them at least, are fairly language-agnostic. What makes a handle a "handle" to me (and I would've thought to you too) isn't the wrapper around it; it's the opaqueness. Not from a mechanical sense (i.e. not the public vs. private or the language syntax, which aren't even relevant in every language) but from an informational sense: whether it can be "dereferenced" without additional information.

isaiahg · on April 3, 2021

I haven't read the whole article yet but it says pretty early on that they're indexes to an array

jtolmar · on April 2, 2021

I like handles too.

Are there any programming languages that natively support them? Like, something that's physically a tuple of (data structure pointer, index), but with the syntax allowing you to directly access it like any other reference. Bonus points for "get all handles to me" as an operation.

WalterBright · on April 2, 2021

The D language uses that to represent slices. It's a native type that has two fields - a pointer and a length. They're used like pointers, but with bounds checking.

Goodbye buffer overflows.

https://dlang.org/spec/arrays.html#dynamic-arrays

koolba · on April 3, 2021

Not just good by overflows, it’s a better paradigm because you’re passing the context of the totality of the data to the target. It’s impossible to get the “wrong” length because it’s an intrinsic part of the reference.

WalterBright · on April 3, 2021

I've proposed this as a backwards-compatible improvement to C:

https://www.digitalmars.com/articles/C-biggest-mistake.html

but there's been no uptake on it. I don't get this reluctance to exterminate the biggest source of security bugs in C code, but that sort of reluctance is why I started the D project.

MaulingMonkey · on April 3, 2021

> I don't get this reluctance to exterminate the biggest source of security bugs in C code

The forward thinking people who care about security bugs in C code - those ones who aren't reluctant - did exactly what I'd argue you've done: given up on C. It's selection bias in action.

You started the D project. I'm RIIRing. Others prefer Python. Or perhaps one might use a language that compiles down into C, allowing you the best of both worlds - a decent programming language, and the ability to target whatever wretched excuse for a C89 compiler that has been forced upon you by whatever niche platform you're targeting.

Fixing C's biggest mistake won't fix C. Fixing C's 10 biggest mistakes won't fix C. Fixing C's 100 biggest mistakes won't fix C, and by that point, it will no longer be C. There's an argument to be made for incremental improvements, so I won't discourage your efforts, but it's also the kind of pragmatic, boring, middle ground, half answer that won't make anyone outright excited either.

WalterBright · on April 3, 2021

For excitement, DasBetterC is the answer. It looks and feels just like C, but is a modern language.

But adding slices is an easy add to C, it is backwards compatible, it has zero impact on performance, it does not degrade anything about the C experience, it can be incorporated incrementally, and half of the memory vulnerabilities are taken behind the woodshed and shot.

I'll even claim it leads to performance increases (from not having to do strlen over and over).

titzer · on April 3, 2021

I really do like slices, but implementing them as an (interior) pointer and a length creates some problems for GC'd languages. Is D GC'd by default?

The main problem is that a GC needs to find the start of an object to scan it. Go originally solved this with 2-bit tags per word of memory. Now, I think they use size classes.

For Virgil, I am considering adding slices but initially implementing them as a triple of (Array, start, length), to simplify this problem. This is pretty much required for Virgil's JVM backend, but the native backend can be more efficient in the future.

Btw, can you racily write slice values and observe tearing in D?

WalterBright · on April 3, 2021

> Is D GC'd by default?

It has a GC, but uses it only when operator "new" is called and a couple other cases.

> can you racily write slice values and observe tearing in D?

Shared data (meaning data accessible by multiple threads) is handled separately through locking library functions. Shared data types are distinct from other types.

titzer · on April 3, 2021

Can you racily write the same field containing a slice from multiple threads? Unless you have atomic multi-word writes, you could observe tearing and go out of bounds silently.

One thing that JVM guarantees is that racy programs cannot break the type system. This gets hard when you have multi-word quantities.

WalterBright · on April 3, 2021

> Can you

You have to use an atomic library function to do it.

dataflow · on April 3, 2021

> (data structure pointer, index)

One problem you encounter if you have something like this is that you end up with unnecessary states in your program. Like you start storing these in arrays and then frequently find the data structure pointers for all your objects are the same, because the owner of that array already assumes the same underneath data structure. Doubly (or multiply) so if the object itself has another pointer to its owning(?) data structure inside it. This inserts degrees of freedom into your program that you don't have any use for, and probably wouldn't have otherwise.

What's the downside here? It's hard to predict sometimes, but in general, there's increased potential for redundant fields to go out of sync. (And if your arrays are large, it can end up being a giant waste of memory too.) One thing I've gradually learned over time is that one of the best mechanisms for preventing your program from entering an erroneous state is to make it so that erroneous states cannot exist in the first place. If something can't go wrong, it won't go wrong. That means avoiding degrees of freedom in your program that you don't actually need. Which is precisely the opposite of what we have here, where we've redundantly replicated the same information copied across lots of objects.

And there are also other side-effects possible here, e.g., now you'll run into trouble if you ever decide to use the handles atomically (since now it's 2 pointers' worth of data), and you might also run into trouble with your IDE given that you can no longer use some built-in facilities for OOP, debugging, and/or visualization. There are likely more that I'm failing to remember off the top of my head.

Of course there are some advantages too, but my point is: it's hardly an obvious design choice, and it can work out to your disadvantage in ways you don't expect.

loup-vaillant · on April 3, 2021

OP specifically uses that redundancy to prevent use-after-free errors. That redundancy is limited to the handle, and checked at every dereference (in the get_pointer() so you don't forget to check). Desyncs between a handle and the object it points to ought not to be possible at this point.

If you remove the redundancy, you have to give up on use-after-free checks. If you want to prevent use-after-free anyway, you need to use a simpler allocation pattern (stack or arena), or fall back to heap allocated pointers (smart pointers in C++, various kind of references in Rust…). But heap allocated pointers were precisely what we were trying to avoid in the first place, most notably because they're not cache friendly, cause memory fragmentation, make leaks harder to detect, and in many cases simply take up much more space than handles (remember, pointers are 8 bytes nowadays).

Using handles atomically is probably a consequence of a very bad idea: threads with shared mutable state and locks. As much as humanely possible, you don't want to share any mutable memory to begin with. Send messages instead, it's less error prone, and with current computers, it's often faster as well. See http://ithare.com/ on (re)actors for the gory details.

About debugging: some people read code by running it in the default step by step debugger provided by their IDE. I don't understand that way of thinking, so I don't bother catering to it. My code then is optimised for readability without special tools. Which I hardly ever need in practice. (Of course, sometimes tools like GDB or Valgrind do save my life, but that's a rare enough occurrence that losing a bit of time when that happens is not a big deal.)

Last quip: the existence of OOP tools is hardly a criterion when choosing how OOP your program should be. If pointer fest hurts performance too much, or if your domain (there are many) is not suited to OOP, tools won't compensate for the impedance mismatch.

Macha · on April 2, 2021

If I'm understanding your ask correctly, this is basically what implementing Deref in Rust or operator-> in C++ gives you.

derefr · on April 3, 2021

Few things that expect a pointer do it by expecting an opaque dereferenceable thing. Most things that expect pointers actually expect a specific concrete pointer type (even if it's a pointer to an opaque thing.)

IMHO, a language with "real suppport" for handles would use them everywhere, instead of pointers—i.e. make all places where pointers pass through "wide enough" for handles to be passed instead—with reduction to a pointer being a kind of WPO for cases where it's proven to be possible.

Blikkentrekker · on April 3, 2021

Implementing `Deref` in Rust implements deref coercions, meaning that it can be used in any function that can accept an appropriate reference.

This is purely syntactic sugar, and no different the compiler inserting `x.deref()` instead of `x` for function arguments. — it's simple q.o.l..

jtolmar · on April 3, 2021

That can give you the syntax! But what I'm curious about is more the implications of actually having this as a part of the language. Does it open up interesting space in optimization or memory safety?

eslaught · on April 3, 2021

This is a bit like what Regent does. "Pointers" are really offsets into regions, which means that we can transparently move them around a distributed machine on the user's behalf.

http://regent-lang.org/

fulafel · on April 3, 2021

In a safer language you can use normal language provided references, no?

The HotSpot JVM implements compressed object references that work conceptually a bit like this under the hood, but it's invisible semantically.

Blikkentrekker · on April 3, 2021

Rust allows one to implement custom referencing on any datatype as one chposes, so the effect is the same.

It's not that important, however, it simply makes `x` syntactic sugar for `x.deref()`.

ajuc · on April 3, 2021

PL/SQL :)

The_rationalist · on April 3, 2021

Are Java varHandles related? https://docs.oracle.com/javase/9/docs/api/java/lang/invoke/V...

MaxBarraclough · on April 3, 2021

Related: the famous C++ FQA has an entry on handles:

https://yosefk.com/c++fqa/ref.html#fqa-8.7

jhgb · on April 3, 2021

> move all memory management into centralized systems (like rendering, physics, animation, …), with the systems being the sole owner of their memory allocations

> when creating an item, only return an ‘index-handle’ to the outside world, not a pointer to the item

> in the index-handles, only use as many bits as needed for the array index

> only convert a handle to a pointer when absolutely needed, and don’t store the pointer anywhere

DataDraw, anyone? http://datadraw.sourceforge.net

amelius · on April 2, 2021

How is garbage collected?

moonchild · on April 3, 2021

It's not. If you implement this strategy in a garbage collected language, you can end up with use-after-frees or double frees (though with the advantage of type safety—if you use a handle of type Foo after freeing it, you'll never stomp on the memory of an object of type Bar, for any Bar≠Foo).

omginternets · on April 3, 2021

I believe this was C, so presumably it’s not?

amelius · on April 3, 2021

In C garbage is collected "manually", by calling malloc and free. If you design your own heap, you also have to design the memory management around it.

perl4ever · on April 5, 2021

Strange to read if you were programming the Macintosh in the 80s. I kind of thought of handles as just what you used when you didn't have an OS that supported an MMU...

dkneeland · on April 3, 2021

Is it just me that finds this font really hard to read? It makes it really hard to focus.

worik · on April 3, 2021

Drugs are for people who cannot handle reality

Reality is for people who cannot handle drugs

Handles are for people whom reality drugs

hnedeotes · on April 3, 2021

Roses are blue violets too who da hell dropped acid on my peek-a-boo