Hacker News new | past | comments | ask | show | jobs | submit login

I've always found these kinds of tricks fascinating. There's all sorts of crazy things you can do to save some bytes here and there. One of my personal favorites is the XOR linked list[0]. It's a doubly linked list that uses as much memory as a singly linked list. It works by XORing the previous pointer with the next pointer. As you're doing a traversal you can compute the next pointer by making use of next = prev XOR (prev XOR next) giving you the address of the next node in the list.

Anyways, I did some digging into TAOCP. In section 2.3.1 Knuth mentions threaded binary trees[1]. They are trees that you can traverse without a stack and do not use more memory than a regular binary tree. This is compared to the method used in the post which requires you to store the parent pointers. The trick is that in a binary tree, lots of pointers are going to be null. Instead of storing null there, you can store a special value that provides information on how to traverse the tree. Indicating the difference between a regular pointer and one of these special values requires only a single bit. You can often store that bit within the pointer itself, requiring no additional space over a regular binary tree.

For a threaded binary tree, if the left child is null, you instead store the predecessor node. If the right child is null, you store the successor node. Finding the successor now becomes:

    1) Check if the right child is null. If so the right child pointer points to the successor. We are done.
    2) Follow the right child pointer. Then keep following the left child pointer until the left child is null. This gives us the smallest value greater than the current value.
One small advantage of threaded binary trees is you traverse the tree slightly less. If the right child is null, instead of traversing back up the tree as in the post, you can jump straight to the successor node.

[0] https://en.wikipedia.org/wiki/XOR_linked_list

[1] https://en.wikipedia.org/wiki/Threaded_binary_tree




>You can often store that bit within the pointer itself

Alternatively you can follow what java did with compressed ops and use 4bytes per pointer on 64bit archs (instead of 8) and shift the pointers prior to dereference - effectively enabling 32GB heaps with 4bytes pointers.


It's a good idea IMO, but the two aren't alternatives, i.e. not mutually exclusive — you can do both. See Knuth's “A Flame About 64-bit Pointers” from 2008 (https://cs.stanford.edu/~knuth/news08.html, starting with “It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less than 4 gigabytes of RAM”). Work on making this possible in Linux started as the x32 ABI in 2011 (see Wikipedia https://en.wikipedia.org/w/index.php?title=X32_ABI&oldid=887... or this LWN article: https://lwn.net/Articles/456731/); unfortunately it looks like there's discussion about removing it (Dec 2018 thread starting here https://lkml.org/lkml/fancy/2018/12/10/1145 though apparently I can't figure out how to navigate the LKML tree using constant space).


I've found myself using 32-bit "array indexes" to half my "pointer-sizes" for some code I'm writing (for fun). When doing linked data-structures (linked lists, trees, etc. etc.), a huge amount of the data ends up being a pointer.

Consider a typical binary-tree with a 64-bit integer as its value. You use 24-bytes per node (value, left child, right child) if you use 64-bit pointers, but only use 16-bytes per node if you use 32-bit pointers (or array indexes).

Now, a 32-bit pointer can have at most 4billion nodes across the whole tree. But 4billion is more than usable for many programs.

EDIT: Consider that 4billion nodes of size 16 bytes (where each node is the value + left child + right child struct discussed earlier) will take up 64GB of RAM.

EDIT2: And half the time, my brain gets short-circuited and I end up recreating some terrible form of segmented memory before having to slap myself for thinking up such a horrible idea. In any case, a surprising amount of memory is used up as pointers in almost all the code I write. If you care about fitting as much data into L1 cache (64kB on modern machines), you will absolutely want to minimize your data usage.



Woooow, this is sooo cool !!! I can’t unsee it anymore !! I just did one of the variations on paper (the subtraction one presented at the bottom of the wikipedia link), this is great !!

Im wondering though, this isnt totally free, is it ? I guess there’s an extra addition/subtraction calculation everytime you go from one cell to the next since instead of reading the next/prev pointer you now need to read a value and calculate the pointer from it, am i understanding this correctly ? (Not a computer scientist, so ive never had a formal algorithm class).

Very very very neat nonetheless, thank you very much ;)


The extra calculation doesn't change the theoretical asymptotic time complexity because O((k+1)f(n)) = O((k)f(n)) = O(f(n)). In a real computer most of the calculation time will be spent dealing with cache misses and reading from memory, especially in a linked list which is very easy to code in a cache-unfriendly manner. So it is basically free.


Thank you very much for this explanation !


> You can often store that bit within the pointer itself

Yep, the lower 1-3 bits in many pointers will always be zero due to how C run-time allocators and structure alignment works out of the box.


upper 16 bits are usually unused on x86_64 as well (since the x86_64 virtual address space is only 48-bit), though whether they're one or zero depends on bit 48


And doubles have some 2^50 NaN's. It's possible to create a tagged pointer of either a double, 32-bit integer, pointer or a small string and still have space for more.

Sadly the original Mozilla link is broken. https://wingolog.org/archives/2011/05/18/value-representatio...


This reminds me of all the problems Apple had with 24 bit addressing and smuggling data in the unused eight bits at the top of each pointer. Andy Hertzfeld considers that his worst blunder:

http://www.folklore.org/StoryView.py?project=Macintosh&story...


yeah, 64bit is a lot of address space, and masking/shifting provides quite a few bits if one is willing to manage their own pointers.


Hard to get this done properly in C I imagine. You're getting into UB the moment you start bit-fiddling with pointers I think.


Not UB but rather implementation defined. The compiler will not do anything weird here, but you rely on platform behavior that's not specified by the C lang spec.


Nah, so long as you cast to an integer (preferably uintptr_t) before fiddling with it you're fine¹

¹ I think, but to be honest it's been some time since I really knew the arcane C rules and I'm not confident anymore


Not UB in the slightest. You can take a pointer, cast it to uintptr_t, do whatever you want to that int, then cast it back to a pointer. It's only UB if the resulting pointer isn't valid or you cast it to the wrong type (and strict aliasing is enabled.


Hm, good to know.


Morris traversal of a binary tree is fairly straightforward once you clue into the fact that the child nodes left and right pointers are being used to reference the predecessor node and successor node respectively. That's it. Simple idea with big implications. I'll have a look at XOR Linked Lists. I've heard of them before but the implementation always looked awkward enough for it to not really be worth it imo.


The downside of an XOR-linked list is that a single pointer to a list node isn’t enough information to traverse the list. Instead, you need to know the addresses of two adjacent nodes. They also don’t have an innate directionality to them; swapping your two local pointers will have you following the list in the opposite direction.


Dont forget that a regular tree with N children can be represented with a binary tree. Left points to first child. Right points to next sibling.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: