1 & 2. I don't see any reason to believe there will ever be a new machine with non-power-of-two word size, and I'm doubtful that there are even any historical post-standardization C implementations like that. Certainly Linux, which makes much bigger assumptions about word size and the relationship of types, will never support such hypothetical systems. Generality is good if it buys you anything practical, but my approach has been to avoid excess generality unless it has a practical benefit. This approach has been very beneficial in the dynamic linker, wherein assuming that things that are the same on real-world archs actually are the same, the amount of per-arch code to maintain is only some 30 lines (which might grow to 60-100 once TLS is supported), as opposed to many hundreds or even thousands in other implementations.
At this point, musl does not have official documentation/manual. When it does, these sorts of requirements as well as all the implementation documentation required by ISO C and POSIX will be documented.
3. The casts are all necessary (C does not define implicit conversions between pointer types) and correct. Casting to (void *) rather than an explicit type is my preference because it reduces duplication of the type in multiple places, but in any case this is purely a stylistic matter and has nothing to do with the generated code or correctness.
4. The ONES macro was leftover from other files on which this code was based (strchr and strcpy). Obviously it's not needed in memcpy which copies a fixed number of bytes without searching for any terminator character, so it could/should be removed. Thanks for catching this.
Machines with a word size that's not a multiple of 8 cannot support POSIX at all. POSIX requires CHAR_BIT==8, and to support the POSIX and C11 memory model for threads, bytes must be individually addressible without a read-modify-write cycle on a larger memory unit. So this kind of thing really is irrelevant as far as POSIX is concerned.
The lore is that the the name POSIX, not the idea for POSIX, came from RMS and was a form of disguised spite for the process (i.e. that he intended it to be read as Piece Of Shit *IX, or similar). I'm not sure whether there's any evidence to corroborate this part of the lore, but in the early days POSIX was definitely not representative of RMS's ideas/vision. These days the standards process and even the document itself are a lot more open (though still not up to RMS's standards).
CHAR_BIT==8 was not mandated by POSIX until 2001 when it was aligned with C99, and seems to have been mandated as a consequence of C99's requirement that [u]intXX_t types not have any padding bits, and the requirement (in POSIX) that network-related interfaces use uint8_t, uint16_t, and uint32_t (which, per C99, cannot all exist unless CHAR_BIT==8).
I was going to note that it's not MIT License as I recall from last time I checked, but turns out they relicensed it recently (May 6, 2012, http://www.etalabs.net/musl/download.html ) to MIT License. Thanks for pointing it out!
Some things come to mind when looking through the source. Perhaps they are pedantic gripes, but perhaps they need some work. I had these observations while looking through memcpy.c:
1. Assumption of power of 2 word size. The ALIGN macro is defined as one less than the size of size_t (which should be enough to hold the length of something in memory). I would be more comfortable with it being defined as the the size of a processor WORD rather than as a length of memory.
2. Using the ALIGN macro in calculations of misaligned memory copies. This goes to #1's point: Suppose the word size is not a power of 2? Suppose the word size is 6 octets rather than 4 or 8. The & mask might produce incorrect results since there would be 0's in the mask.
3. There seems to be a lot of casting going on where it either shouldn't be or it looks wrong. A good example of this is the cast to void on line 19. If there is a cast, why isn't to the (size_t*) type? Even if this is correct, I now have to think a lot more about it, or hunt through the commit log. Yes, I could. No, I shouldn't have to.
4. The ONES macro is unused, and the ALIGN MACRO are not part of a header somewhere. An alignment macro seems like it might be beneficial in other parts of the library. I recognize that this increases interdependency, but I feel it may be justified since it would be useful in other place, and may be changed in the future.
I am very pleased that this library is being developed. I think competition drives up the quality of code, and libc is one of the places where high quality code is invaluable. (Looking at the comparison chart they provided was impressive!) I must admit I don't have good solution to the issues I found, so I'm afraid my criticism may not be as constructive as I would prefer.
Thanks for making this, it was quite pleasant to look at non Drepper'd libc code. :)
I agree that it's conceptually nice to support non-power of two word sizes, but how often do those come up these days in practice? If none of the architectures they're targeting actually have such a word size, it seems like it might be a useful simplifying assumption.
I would agree in full, those oddball machines don't come up very often. But when they do, its really nice to have something that makes as few assumptions about the underlying hardware as possible. If I recall correctly C89 doesn't make any requirements about size (except that sizeof(char) is always 1).
It is rare that libraries get rewritten from the ground up, so this would be the perfect time to get the code right.
I think the usual benchmark if you get to the point that you can run Linux and use an extensive libc is compatibility with existing software, not necessarily a conformant implementation.
I've been experimenting with using NetBSD pkgsrc on musl to see how far it goes. Many hundreds of packages compile and pass all their tests, including, for instance, every major language interpreter, various development libraries and resources, and XFCE4. Things that don't compile usually fall into one of two categories: (1) using #ifdef __linux__ to mean #ifdef __GLIBC__, and (2) fundamentally unportable packages that have huge hacks for every single platform they support. There's rumbling about creating a package support wiki where people can document what packages work, don't work, or require patches to work, but it doesn't yet exist.
Don't really get the theory behind changing the default stack size for threads. Feels like they did it just to be different which might get someone scratching their heads for a bit.
The glibc default thread stack size is unacceptable/broken for a couple reasons. It eats memory like crazy (usually 8-10 megs per thread), and by memory I mean commit charge, which is a highly finite resource on a well-configured system without overcommit. Even if you allow overcommit, on 32-bit systems you'll exhaust virtual memory quickly, putting a low cap on the number of threads you can create (just 300 threads will use all 3GB of address space).
With that said, musl's current default is way too low. It's caused problems with several major applications such as git. We're in the process of trying to establish a good value for the default, which will likely end up being somewhere between 32k and 256k. I'm thinking 80k right now (96k including guard page and POSIX thread-local storage) but I would welcome evidence/data that helps make a good choice.
1 & 2. I don't see any reason to believe there will ever be a new machine with non-power-of-two word size, and I'm doubtful that there are even any historical post-standardization C implementations like that. Certainly Linux, which makes much bigger assumptions about word size and the relationship of types, will never support such hypothetical systems. Generality is good if it buys you anything practical, but my approach has been to avoid excess generality unless it has a practical benefit. This approach has been very beneficial in the dynamic linker, wherein assuming that things that are the same on real-world archs actually are the same, the amount of per-arch code to maintain is only some 30 lines (which might grow to 60-100 once TLS is supported), as opposed to many hundreds or even thousands in other implementations.
At this point, musl does not have official documentation/manual. When it does, these sorts of requirements as well as all the implementation documentation required by ISO C and POSIX will be documented.
3. The casts are all necessary (C does not define implicit conversions between pointer types) and correct. Casting to (void *) rather than an explicit type is my preference because it reduces duplication of the type in multiple places, but in any case this is purely a stylistic matter and has nothing to do with the generated code or correctness.
4. The ONES macro was leftover from other files on which this code was based (strchr and strcpy). Obviously it's not needed in memcpy which copies a fixed number of bytes without searching for any terminator character, so it could/should be removed. Thanks for catching this.