I've always found these kinds of tricks fascinating. There's all sorts of crazy things you can do to save some bytes here and there. One of my personal favorites is the XOR linked list[0]. It's a doubly linked list that uses as much memory as a singly linked list. It works by XORing the previous pointer with the next pointer. As you're doing a traversal you can compute the next pointer by making use of next = prev XOR (prev XOR next) giving you the address of the next node in the list.
Anyways, I did some digging into TAOCP. In section 2.3.1 Knuth mentions threaded binary trees[1]. They are trees that you can traverse without a stack and do not use more memory than a regular binary tree. This is compared to the method used in the post which requires you to store the parent pointers. The trick is that in a binary tree, lots of pointers are going to be null. Instead of storing null there, you can store a special value that provides information on how to traverse the tree. Indicating the difference between a regular pointer and one of these special values requires only a single bit. You can often store that bit within the pointer itself, requiring no additional space over a regular binary tree.
For a threaded binary tree, if the left child is null, you instead store the predecessor node. If the right child is null, you store the successor node. Finding the successor now becomes:
1) Check if the right child is null. If so the right child pointer points to the successor. We are done.
2) Follow the right child pointer. Then keep following the left child pointer until the left child is null. This gives us the smallest value greater than the current value.
One small advantage of threaded binary trees is you traverse the tree slightly less. If the right child is null, instead of traversing back up the tree as in the post, you can jump straight to the successor node.
>You can often store that bit within the pointer itself
Alternatively you can follow what java did with compressed ops and use 4bytes per pointer on 64bit archs (instead of 8) and shift the pointers prior to dereference - effectively enabling 32GB heaps with 4bytes pointers.
It's a good idea IMO, but the two aren't alternatives, i.e. not mutually exclusive — you can do both. See Knuth's “A Flame About 64-bit Pointers” from 2008 (https://cs.stanford.edu/~knuth/news08.html, starting with “It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less than 4 gigabytes of RAM”). Work on making this possible in Linux started as the x32 ABI in 2011 (see Wikipedia https://en.wikipedia.org/w/index.php?title=X32_ABI&oldid=887... or this LWN article: https://lwn.net/Articles/456731/); unfortunately it looks like there's discussion about removing it (Dec 2018 thread starting here https://lkml.org/lkml/fancy/2018/12/10/1145 though apparently I can't figure out how to navigate the LKML tree using constant space).
I've found myself using 32-bit "array indexes" to half my "pointer-sizes" for some code I'm writing (for fun). When doing linked data-structures (linked lists, trees, etc. etc.), a huge amount of the data ends up being a pointer.
Consider a typical binary-tree with a 64-bit integer as its value. You use 24-bytes per node (value, left child, right child) if you use 64-bit pointers, but only use 16-bytes per node if you use 32-bit pointers (or array indexes).
Now, a 32-bit pointer can have at most 4billion nodes across the whole tree. But 4billion is more than usable for many programs.
EDIT: Consider that 4billion nodes of size 16 bytes (where each node is the value + left child + right child struct discussed earlier) will take up 64GB of RAM.
EDIT2: And half the time, my brain gets short-circuited and I end up recreating some terrible form of segmented memory before having to slap myself for thinking up such a horrible idea. In any case, a surprising amount of memory is used up as pointers in almost all the code I write. If you care about fitting as much data into L1 cache (64kB on modern machines), you will absolutely want to minimize your data usage.
Woooow, this is sooo cool !!! I can’t unsee it anymore !! I just did one of the variations on paper (the subtraction one presented at the bottom of the wikipedia link), this is great !!
Im wondering though, this isnt totally free, is it ? I guess there’s an extra addition/subtraction calculation everytime you go from one cell to the next since instead of reading the next/prev pointer you now need to read a value and calculate the pointer from it, am i understanding this correctly ? (Not a computer scientist, so ive never had a formal algorithm class).
Very very very neat nonetheless, thank you very much ;)
The extra calculation doesn't change the theoretical asymptotic time complexity because O((k+1)f(n)) = O((k)f(n)) = O(f(n)). In a real computer most of the calculation time will be spent dealing with cache misses and reading from memory, especially in a linked list which is very easy to code in a cache-unfriendly manner. So it is basically free.
upper 16 bits are usually unused on x86_64 as well (since the x86_64 virtual address space is only 48-bit), though whether they're one or zero depends on bit 48
And doubles have some 2^50 NaN's. It's possible to create a tagged pointer of either a double, 32-bit integer, pointer or a small string and still have space for more.
This reminds me of all the problems Apple had with 24 bit addressing and smuggling data in the unused eight bits at the top of each pointer. Andy Hertzfeld considers that his worst blunder:
Not UB but rather implementation defined. The compiler will not do anything weird here, but you rely on platform behavior that's not specified by the C lang spec.
Not UB in the slightest. You can take a pointer, cast it to uintptr_t, do whatever you want to that int, then cast it back to a pointer. It's only UB if the resulting pointer isn't valid or you cast it to the wrong type (and strict aliasing is enabled.
Morris traversal of a binary tree is fairly straightforward once you clue into the fact that the child nodes left and right pointers are being used to reference the predecessor node and successor node respectively. That's it. Simple idea with big implications. I'll have a look at XOR Linked Lists. I've heard of them before but the implementation always looked awkward enough for it to not really be worth it imo.
The downside of an XOR-linked list is that a single pointer to a list node isn’t enough information to traverse the list. Instead, you need to know the addresses of two adjacent nodes. They also don’t have an innate directionality to them; swapping your two local pointers will have you following the list in the opposite direction.
If you require the highest performance and your trees are write once, read often (and immutable), it's worth looking into if you should just sort the nodes in traversal order in one contiguous block of memory. Storing the edges as indices into the block is handy in this case.
The article used parent pointers in node to avoid using a stack (native or custom made). One downside of a parent pointer is that your identical subtrees can no longer share structure.
I agree. I would also point out that this doesn't have much advantage (if it has any) over simple recursion, performance wise.
First, the depth (the amount of storage for traversal) typically is on the order of log2(n) and so this gets amortized very quickly when traversing nodes.
What if the tree is unbalanced?
If the tree is unbalanced you will have all sorts of other problems like problems with inserting or searching. Since trees are typically used for efficient searching and traversal is done more efficiently with just plain array (or list) it is really hard to find any real-world trees that are not kept balanced somehow.
Suppose your `Tree` is an array of `Nodes` sorted in whatever traversal order is expected to be good. Edges are fully defined as pairs of indices (Node[A] -> Node[B] means you need to know just A and B): Nodes can contain their own upstream indices.
If you need per-traversal state per node (such as visited status or intermediate dataflow), it's very handy to allocate another flat array for that and use the node indices.
I've looked into this in a compiler setting, transforming ASTs and intermediate representations.
I was confused by that as well before reading the answers: the confusion stemmed from only thinking about the linear traversal or search use cases, where an ordered array plus binary search is all you'd ever need for reading. But when the tree shape itself is still needed, indices can be better than pointers, particularly when adding sidecar arrays for extension or as temporary scribble space (see sibling comment).
Hurting because of memory alignment or something else?
Given we have indices along the serialisation for child nodes I would have thought not having nodes start at the consistent intervals doesn't hurt us for traversal at least.
I'd quite like to use this idea for a file backed immutable tree. The array of children would have variable size with bit field to index then nth child in the sparse array.
Constant time array indexing depends on a fixed array member size. So, the actual memory layout of storing the nodes with non-fixed size could not use an array of nodes.
I suppose you could (should) write your own allocator to ensure nodes are allocated in a contiguous piece of memory, and have a separate array of indexes / pointers that tells you where the n-th node starts. That is another array that'll have to be kept in cache though.
> You can get rid of any stacks whatsoever by keeping a parent pointer in the tree node data structure.
You can actually in-order traverse a binary tree without any memory overhead at all (no parent pointer nor an implicit or explicit stack) using Joe Morris' traversal.
I found that this can be done to be surprising but effectively it uses the NULL slots as temporary storage.
Reminds me of Sean Parent's recounting[1] of how his son's beads and string toy could represent a tree data structure with stateless traversal and some other interesting properties, like:
> ...to erase elements from some point in my model to some other point in my model, what I do is grab those two locations on the string, and I pick the whole thing up, and the beads that fall off are the ones that get erased ... (48:41)
It seems like by adding parent pointer one is shifting complexity from the traversal into the data structure, in order to avoid a stack. I know there are places where avoiding an explicit stack can be nice (perhaps a BVH tree in a shader or other places where constant space is required?) but are there more good reasons for it?
A DAG (such as a singly linked tree) has so many benefits for memory safety, GC performance, ability to extend it from a binary tree to N children etc., that it seems like an odd tradeoff in most scenarios.
Obviously, adding read-only data in the tree and to save mutable data in the traversal state can be amortized if we have many ongoing traversals rather than one. For example, in a massively parallel GPU computation running an arbitrary number of tree searches simultaneously in a fixed memory amount.
> You can get rid of any stacks whatsoever by keeping a parent pointer in the tree node data structure. Effectively, this turns the tree into a (sort of) state machine. While traversing, you need no memory other than the current and previous node/state. The traversal algorithm is very simple:
While this sounds innocent, truly consider the ramifications of this decision.
Your node has now grown to be +8 bytes (the size of a 64-bit pointer). If you have 1-million nodes in your binary tree, you are using +8MB of memory.
-------
Lets consider the alternative: lets say you have 1-million nodes, and you have a red-black tree (imperfect balance: instead of a perfectly balanced 20-depth tree you have some branches which are 40-depth).
Traversing to 40-depth requires 40x pointers on the stack, or 320 bytes.
----------
In effect, you're spending 8-bytes per node (which could be millions, or even billions of nodes), to save log2(depth) space off of your stack frame.
While the state-machine is very "clean", it seems like a very bad tradeoff from an algorithmic / memory space point of view. I'd rather spend log2(depth) space on the stack, rather than O(+8 bytes * number nodes) in the heap.
I think there's something to be said about the clarity and cleanliness of the state-machine design. Its quite possible that some algorithms are easier to write with this "parent pointer in node" methodology. However, anyone seeking the highest performance per unit-memory will have to see that the +8 bytes per node is a terrible, terrible tradeoff.
-------------
> Update: Todd Lehman pointed out that given node-level locks, this algorithm allows concurrent traversal and update of the tree. Any atomic operation other than detaching a non-leaf node is safe.
Hmm... the parallel-angle I haven't thought about. But it does seem to be a potential building block as maybe even a lock-free data-structure.
Or just use locks, since locks are easier to think about.
Insertion and deletion into the tree seems possible, but "rebalance" is very difficult for me to think about (since it possibly requires modifying many, many nodes). Locking all nodes involved would be a heavy approach and probably work.
Red-black colors are kept track of for self-balancing purposes. Maybe more colors are needed to make the methodology "clean" for multi-threaded / lock free purposes.
> You can get rid of any stacks whatsoever by keeping a parent pointer in the tree node data structure. Effectively, this turns the tree into a (sort of) state machine.
How can it be a state machine if you aren't mutating it?
In fact, you can traverse a tree without recursion (or any equivalent stack-like structure) and without parent pointers. And then you actually use the tree as a state machine: you stash the reverse path pointer in the tree itself by overwriting the downward links temporarily and then putting them back.
This is useful for garbage collection; you can have a garbage collector that is guaranteed never to blow the stack in the marking phase.
> How can it be a state machine if you aren't mutating it?
You are confusing a state machine (which is an abstract model of computation) with the behavior of a program implementing it. The set of possible states and the conditions of transition from one state to the next never change.
These conditions can depend on mutable state, as is the case described by the author (in this case, mutable state is kept in the current and previous node pointers).
Note that if your language has non-precise garbage collection, having parent pointers guarantees that any spurious pointer into any node on the tree will keep the entire tree alive.
Generally, trees are poor for performance as you zig-zag around memory and stall on cache misses. But in situations where you have to do tree traversal and you have to be as fast as possible, trading the memory for the parent pointer (which probably fits within the same cache line as the rest of the node anyhow, so in practice basically free) vs recursion or an explicit stack which stress memory in the tight loop, it is probably a winner! Definitely worth profiling in those situations.
Assume you have 64GB of memory devoted to binary tree (of just integers), assuming the tree is balanced.
Your node is two pointers + integer (24 bytes). That gives 2666666667 nodes. Balanced tree will have 32 levels (log2(2666666667) ~= 31). When doing recursion you just need to keep pointer to the level, so this is 32 times pointer or just about 256 bytes of your cache.
Also, even if we were storing data in memory (and not in cache) this still is orders of magnitude less effort than accessing the memory for the large data structure.
L1 cache lines are typically 64 bytes long, and you're only reading 16 bytes of it per level of the tree. And a random pointer tree traversal is the worst case for CPU branch prediction, too.
There are methods for optimising the use of various tiers of cache, trying to pack multiple nodes into one 64 byte line, arranging nodes such that left and right children are usually adjacent in memory so prediction losses are minimised, and other tricks.
> When doing recursion you just need to keep pointer to the level, so this is 32 times pointer or just about 256 bytes of your cache.
What language do you use where a recursive function invocation only requires a call frame large enough to hold a single parameter? By what magic does the function know where to return? And what are you doing in that function that doesn't require any temporaries whatsoever?
At best you're looking at a couple of kilobytes. For all but the most constrained environments (toaster, a 4Kb kernel thread stack, etc), this isn't really worth worrying about. And compared to the pointer chasing, all those pushes and pops aren't much of a problem, either.
However, in a language like C without function closures, non-recursive traversal can keep code much more concise and clear. This is the real win, IMO.
You could simulate the recursion with an iteration, a depth byte and a stack-array of 32 pointers (64 for overkill). This removes the frame pointer and makes the temporaries clear too.
But that's not recursion, which pretty much by definition implies the use of repeated invocations of the same function(s). Explicitly building a stack is exactly what the article was discussing, though it's far more clever than the obvious approach. I've seen AVL implementations that use an explicit array for maintaining a stack in lieu of recursion or parent pointers.
In any event, you always need a stack of some sort. Recursion is one way to accomplish that, but saying that you're simulating recursion by explicitly building a stack reverses the categories.
well, these are dynamic structures which are constructed over a periodoftime with pointers pointing all over the place...imagine creating a singly-linked-list of 20/30m integers, and then running a trivial sum/accumulate over the whole thing vs doing the same thing with a vector/array.
The goto's did get my nerves bristling a bit on first sight - but it's the right tool for the job (or at least gets the job done), so I was OK with it. I guess other languages may have better/cleaner ways to achieve it.
I love the discussion that it sparked here too. It's wonderful and educational to see so many smart people breaking down the problem space, discussing de/merits of this particular technique, presenting alternative approaches with analysis on memory use, performance..
I missed the introductory part where you mentioned that these are binary trees only and not just trees. I think you should mention it more prominently.
Yes, sure, there's an isomorphism. Using that fact, is it then so trivial to show how this FSM approach works out if you don't have a binary tree to start with? I think it would make an interesting follow-up blog post. But failing that, it would have been nice to call out at the top that these are binary trees.
It definitely doesn't work if you don't change your in-memory data representation, and it requires you to change it in a way that forecloses the possibility of FP-persistent tree structures or efficient DAGs. Compared to that, representing your ordered-tree nodes as cons chains is a barely-significant change.
It does generalize, though, to ordered-tree nodes that aren't represented as cons chains. You just have to walk through the n child pointers when you're moving back up the tree to search for the one you're returning from, adding an extra runtime cost factor proportional to n but no extra memory. Moreover, if you devote three or four registers to an amnesic stack, you can avoid that cost for the bottom two or three levels of the tree, which will necessarily contain the vast majority of tree nodes.
My studies at the university was about Pre-order, Post-order, in-order traversal, red-and-black trees, AVL-trees. This included the proofs of correctness, Big-O analysis of the data-structures, and multiple implementations (recursive vs iterative), and practical issues (ex: using a profiler, unit tests, source control, object-oriented programming, and roughly 4 different implementations for Scheme, Ruby, Haskell and Java).
I'm proud of my university education, but it decidedly did NOT cover the method discussed in this blog post. (Nor other popular subjects, such as Robin Hood hashing. My professor did cover skip lists and cuckoo hashing however)
There's so much to learn about trees out there, that I doubt any university class can really cover them all (even with 3-months per semester and maybe 3 or 4 different classes discussing the subject).
Even IF you did go to a university, there's no guarantee you've learned everything. And even if you were taught it at a university, there's no guarantee you'd actually remember it all. I'm sure I've forgotten plenty of things from my classes (or in the case of "Dynamic programming", I never really learned it the first time around...)
I encountered this method during highschool (around 2005), though I can't remember if it was part of class or independently during class time.
I think it came about because a couple of us were determined to prove trees could be used without recursion, as at the time it was a daunting subject for most of us and the teacher wasn't very good at explaining it (how does it stop??/etc). The idea is based on applying a doubly-linked list - something that was part of the class work - to trees.
Just an observation, for some reason people take as elitism the observation of having a complete well rounded curriculum has certain advantages. And this article has a good point to illustrate that there is value in taking an engineer rather than a boot camp code fighter.
Data structures are a very specific subject of IT Engineering, every person studying subject should take the whole curriculum, including recursivity scanning data structures and how to transform these recursive procedures into iterative code (or FSM as the article says).
I'd take an English major who is self-taught in computer science over a CS grad any day. And I'd take a bootcamp kid who went on to study the theory themselves over a Stanford kid who only knows this shit because he was tested on it. I'm honestly, at this point, not sure why we even teach software engineering in schools, as we so manifestly fail at imparting anything to the kids, and the ones that come out as decent engineers would have been so anyway. Oh right, it's signalling.
Whatever comparison you do, you should compare a representative subset of both populations in similar situations in life, not just cherry pick the self selected ones, the successful and survivors, ones from one population against the potatos from the other.
There is no point of comparison in five dedicated years of highly academic study versus 6 months of touching the techs du jour. Will the boot campers have also graph theory background? And compilers and assembly or just webpack preprocessors? Database design or just picking stacks? and compilers and network protocols?
Are you sure you are not comparing a 35 year old responsible person who made a boot camp to sustain his family or an entitled post-grad IT engineer who is still a bit high from the graduation party? It would be hypocrite not to admit that a hungry serious battle tested individual will have the right attitude to build a role rather than the not serious entitled individual who just wants to see what he can get from there and does not make any effort in continuously improve.
Once in a plane I met a CEO who told me he would take any day an engineer for the decision making role or business related roles. Was he crazy or it was just his previous experience conditioning him? you can take whoever you want for your company, even someone who took gender studies and ended up taking a code camp hoping to pay its former debt and ends up making high impact blog posts about thyr experience in the digital IT world as person of XYZ gender.
> Whatever comparison you do, you should compare a representative subset of both populations in similar situations in life
Well, yes and no.
If you're an 18 year old who wants to program computers and you read Turing and Dijkstra for fun, then go study math or physics, which you won't have much time to do later. Or pick a subject in the humanities if you like to read and write. Is it a great way to become a programmer? Maybe not, it would be better to just go and get an internship or a job, but (1) smart companies are hard to find and (2) you'd be missing out on a chance to get a university education at the age when most people do so.
On the other hand, if you're picking coworkers, go with the self-motivated one who had to learn everything they learned because they were interested in it, over the one who had to learn it to get a degree in a field that everyone knows is a ticket to a high-paying job.
> Once in a plane I met a CEO who told me he would take any day an engineer for the decision making role or business related roles. Was he crazy
Sure, this is the same point: if the only thing you're good at is your specialization, then you're probably not going to be very good at it. Unless it's something like chess that really has minimal relation with anything else in society.
The book "Elephant In The Brain" devotes an entire chapter to how the education system is effectively one large, expensive signaling mechanism. It was an interesting read that I'd recommend checking out. For the record I completely agree with you and I'm saying this as someone with a STEM degree. The "signaling" aspects of our education system completely break down in tech. I would personally need a lot more info than whether someone is an MIT or Stanford grad to want to work with them.
Looking at it from the other side: maybe most would have become decent engineers, but a large number wouldn't have the maturity to apply themselves.
They would be unhappy because they didn't know the landscape of their profession - students get a taste of everything.
Some need a safe space to develop life skills they didn't learn at home: self-reliance, communication...
And yeah, signalling. Because starting at a job now is hard, getting fired is risky. The English major has the maturity, the goal and the life skills. He already has less risk with his degree. The bootcamp kid? Not so much. The bootcamp history major? Good!
The English major (assuming a good liberal education) is well-placed to understand his or her role as an engineer. He or she probably well knows how much there is to learn!
Of course, as you point out, a "large number" won't apply themselves, and won't become engineers, but they'll still have a well-rounded education. So if I was advising a smart youngster on how to get a job writing software in 6-8 years, I'd advise them to study English or Math or Physics or History before computer science. Pick a general concentration in the humanities or sciences that interests you and learn everything you can about the world through the lens of that field. Then go and pick up some programming languages, do an online masters program in the hot tech du jour, and write some code and get an internship or a job. Whatever you do, don't waste your chance at a university education studying computer science if that's what you want to do. The curriculum will bore you, it will move too slowly in all the wrong directions, and you'll pick up a bunch of obsolete preconceptions that you'll be poorly equipped to discard later.
Of course this has implications for what CS programs should be doing and mostly aren't, but that's neither here nor there.
No, I'm disagreeing. Kids just out of highschool are not full adults. They should be able to learn to apply themselves while not losing the chance to be an engineer. University allows for that freedom, an apprenticement doesn't.
Universities are win-win-win for people, bosses and government. Especially for the majority, the average students who can't retrain or self-study as easily. And the minorities, parents without degrees, middle and lower class or even outside the family business.
Aside, I was that smart youngster, I chose math, I wouldn't gave graduated. English would have been much worse. I was bored but at least not burned out. And I have used every course practically, 6 years of education and almost nothing that can be reliably scrapped to create a bootcamp.
Kids out of high school would have been parents already or within a few years in most of human history. We keep them infantilized but that is a feature of our culture, not what kids are capable of. You could say a 30 year old isn't a "full adult" either, assuming you continue to mature and grow, right?
My question is why do we think it's better to put all the learning for a particular career at the front of your life when you have no experience, and don't even know why you're going to need to know any of what you're learning? It's wasteful to learn something you think you're going to use and then find yourself in a career you prepared for and don't like. The average 18 year old doesn't know much about adult life (because we insulate them from it more and more) and has no idea what a professional in most fields actually does every day, so why do we expect them to pick an occupation before having any experience in it and then commit themselves to it for an intensive course of study?
I'm not arguing against university education, if it is truly general. But if it is just a passport to a particular career path, which is increasingly the case, then (a) it's too early to pick a career, and (b) if you want a career, and know what you want, it makes more sense to start working first and then learn what you need according to the demands of the work. That's why I'm in favor of continuing education and apprenticeships, and in general having more diverse life paths rather than trying to fit everyone to the same pattern, which is what seems to be happening now with college being expected for everyone.
Obviously there are careers like medicine that are going to require additional intensive training, and that needs to be front-loaded for obvious reasons. However, most people are working longer and changing careers more than in the past, and the ability to learn as you go should be more valuable to society than expecting that everyone get a university education while at the same time turning universities into trade schools.
Dude, I think you didn't read the article, or at least you didn't understand it. Aristotle knows that shit cold, and a transformation of a recursive procedure into iterative code, which requires an explicit stack, isn't what the article presents.
Anyways, I did some digging into TAOCP. In section 2.3.1 Knuth mentions threaded binary trees[1]. They are trees that you can traverse without a stack and do not use more memory than a regular binary tree. This is compared to the method used in the post which requires you to store the parent pointers. The trick is that in a binary tree, lots of pointers are going to be null. Instead of storing null there, you can store a special value that provides information on how to traverse the tree. Indicating the difference between a regular pointer and one of these special values requires only a single bit. You can often store that bit within the pointer itself, requiring no additional space over a regular binary tree.
For a threaded binary tree, if the left child is null, you instead store the predecessor node. If the right child is null, you store the successor node. Finding the successor now becomes:
One small advantage of threaded binary trees is you traverse the tree slightly less. If the right child is null, instead of traversing back up the tree as in the post, you can jump straight to the successor node.[0] https://en.wikipedia.org/wiki/XOR_linked_list
[1] https://en.wikipedia.org/wiki/Threaded_binary_tree