The concern with performance seems a little overwrought. strlcpy is only slow in the bad case where it truncates, which is ideally not the common case. I've never heard or seen of a performance bottleneck traced to a strlcpy in the hot path.
If you really cared about performance, you'd be using nothing but memcpy with careful length tracking. Regardless of algorithmic runtime, any function that examines bytes as it copies will be slower than a length based copy.
I think one use for this kind of thing is if you are doing some sort of logging you may have a fixed size buffer, both to keep overhead down (you don't want to allocate extra memory) and also prevent overly verbose logs from spamming output. In this case, waiting to calculate the length of a 10 MB string just so you can fit it in a 1 KB buffer is unacceptable. For your second point: not necessarily! memcpy would be slightly faster if you are keeping the length around, but if you're dealing with arbitrary strings you wouldn't have that. Calculating the full length beforehand is just a no-go as I mentioned above, and using memchr first to get the length and then memcpy is not going to be faster.
If you really cared about performance, you'd be using nothing but memcpy with careful length tracking. Regardless of algorithmic runtime, any function that examines bytes as it copies will be slower than a length based copy.