Am I the only one wondering why facebook hasn't implemented a compression backend into memcache much like Reiser4 and ZFS has done?
They've made it very clear that they're RAM limited (in particular with respect to capacity), so why not just have the processor compress/decompress memcache operations back and forth with a highly efficient and relatively low compression algorithm?
It's not even like you couldn't tune the algorithm to detect duplicate/similar data and create atomic globs of data that represent multiple informational objects.
It seems like their big cost is putting together machines with tons of RAM for their memcache clusters, so why not bring that cost down?
I've wondered the same thing--compression would be enormously helpful to us since we're RAM-bound (even with tons of RAM) and store a lot of easily compressible HTML. Further, our memcached instances show almost no CPU load.
memcache clients already support client-side compression, which compresses the data before it goes over the network. It wouldn't make sense to move that to the server.
If you are making an argument to recode your entire site from PHP to some other language, the answer is you just lost that argument.
This only works if execution time was a major part of the argument, and the site meets the conditions for benefiting from HipHop discussed in the article.
I was afraid that the usefulness of HipHop would be as limited as is regarding that it's not an easy feat to create a PHP-to-C++ compiler that handles C library dependencies (which PHP has a lot!) well.
BTW It was the second time in a week that there was a product that created incredible buzz in HN community without anyone able to trying out the product (the other was iPad of course) and I was amazed at the amount of well-informed opinion based on such little information.
PS. This blog is a good reading if you are interested in Facebook architecture, scaling and design issues in general.
We're in the process of opening the list and approving members to the
group. Code will follow soon after any recent cherges have been merged
to the branch and we're sure that anything Facebook specific is
removed.
From the presentation, it sounded like it generates a C++ object for each user created php object (1 to 1).
We will all know when they release the source.
They've made it very clear that they're RAM limited (in particular with respect to capacity), so why not just have the processor compress/decompress memcache operations back and forth with a highly efficient and relatively low compression algorithm?
It's not even like you couldn't tune the algorithm to detect duplicate/similar data and create atomic globs of data that represent multiple informational objects.
It seems like their big cost is putting together machines with tons of RAM for their memcache clusters, so why not bring that cost down?