Even if the problem isn't there, it's really easy to fix in that layer: just insert an instruction before popcnt that kills the value in the destination register, and there won't be anything to wait for. Intel does regular microcode updates to fix this sort of thing, so I would anticipate seeing this one get fixed in the not-too-distant future.