Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good article, but I'll mention two gotchas:

1. Storing nodes in a resizable array means as the input program grows the compiler will need a larger and larger block of contiguous memory (which may or may not be available). You could work around this by allocating page-sized blocks to pool from.

2. Care needs to be taken with how AST nodes are represented in code. For example, using a union type to store nodes is self-defeating as a union type is as large as its largest member and since not all AST nodes are equal size, this means small AST nodes will be pointlessly padded to account for the size of the largest AST node.



> small AST nodes will be pointlessly padded to account for the size of the largest AST node.

This is a great point, and something I mentioned in my blog post on custom-bitwidth integer types: https://alic.dev/blog/custom-bitwidth

Discriminated unions can work, but you have to be clever with your memory footprint.


Sometimes I wish we didn't end up with flat address spaces. The "virtual memory" technique allows us to stitch fragmented regions of physical memory into a contiguous region of virtual memory and if virtual address space was segmented, it simply would not ever become fragmented; any memory region could always be grown in-place, without intersecting with any other region, so realloc() implementations could just drop their memcopy() paths... oh well.


Can be mitigated by a rope(?) structure (i.e. the vector is split into chunks of fixed size, no re-allocation required) with real pointers. It will be unsafe inside, but safe (read-only) interface seems to be possible.

Deallocation will be O(n), but still much faster than a tree.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: