tl;dr - healthy individual without family history could be more likely to refuse the screening (less refuses were diagnosed with CRC than the control). This is the reason the study can't omit refuser from calculations.
You're right, find won't call stat() unless it needs to. It doesn't need to for recursing through the directory tree because readdir() returns dirent.d_type these days to determine if a name is a directory.
Yes, those stat() calls can be very slow in aggregate on a large tree, especially from HDD (due to seeking) or network filesystem, if the stat information isn't already in cache.