Heap allocators already are ridiculously smart. But there's just too many divergent use cases for allocators.
I wrote a simple string heap allocator (grab a 16MB block and hand out 256-byte chunks) and the resulting code was 85x* faster than jemalloc ( http://pastebin.com/dAqa4dbN ) [* I know the way I am benchmarking is shit and probably way off from real world usage, it's hard to do benchmarking ideally though.]
But of course, mine wasn't thread safe, couldn't handle larger blocks (fallthrough to malloc), sucked at fragmentation, and couldn't detect double-frees.
Similarly, I found a massive speed boost for strings by using my own hand-rolled memcpy [[ while(l--) * t++ = * s++; ]] ... undoubtedly because all the trickery in setting up SSE and handling lengths not evenly divisible by the register sizes was more expensive than just dumbly transferring the data, when most strings are very small in size. Though I am sure memcpy would destroy me if I wanted to copy 2GB of data in one go.
All of the extra logic to be smart will help you when it actually works, but will make your worst case substantially more painful.
Let's say you had to create 100 objects. Would you rather create all 100 objects in 10ns each, or 95 objects in 1ns each, and the remaining 5 objects in 1000ns each? What if you were writing a particularly weird app that ended up needing 1000ns for every alloc it did?
One immediate concern I'd have with a smart exponential allocator was when you were memory constrained and your application was a web server or SQL database.
Back when we used 32 bit machines, the exponential container would kill us. Couldnt use all of the memory without running out of ram. When ur at 1gb of space do you really want to double that on insertion? Yeah, yeah demand paging, doesn't work in practice. My instinct is that u want a doubling allocator with a max, some sigmoid that responds to free space and allocation rates.
For FF, they might be better off just making a couple memory pools per page and bump allocating. Throw the pool away on exit and be able to migrate a long running tab into a dynamically allocated pool.
I suspect a large portion of your performance improvement was just from code inlining. Your simple allocator was probably inlined, whereas malloc requires a function call (as the code lives in a shared library).
I wrote a simple string heap allocator (grab a 16MB block and hand out 256-byte chunks) and the resulting code was 85x* faster than jemalloc ( http://pastebin.com/dAqa4dbN ) [* I know the way I am benchmarking is shit and probably way off from real world usage, it's hard to do benchmarking ideally though.]
But of course, mine wasn't thread safe, couldn't handle larger blocks (fallthrough to malloc), sucked at fragmentation, and couldn't detect double-frees.
Similarly, I found a massive speed boost for strings by using my own hand-rolled memcpy [[ while(l--) * t++ = * s++; ]] ... undoubtedly because all the trickery in setting up SSE and handling lengths not evenly divisible by the register sizes was more expensive than just dumbly transferring the data, when most strings are very small in size. Though I am sure memcpy would destroy me if I wanted to copy 2GB of data in one go.
All of the extra logic to be smart will help you when it actually works, but will make your worst case substantially more painful.
Let's say you had to create 100 objects. Would you rather create all 100 objects in 10ns each, or 95 objects in 1ns each, and the remaining 5 objects in 1000ns each? What if you were writing a particularly weird app that ended up needing 1000ns for every alloc it did?
One immediate concern I'd have with a smart exponential allocator was when you were memory constrained and your application was a web server or SQL database.