While micro-optimizations are interesting, there are two questions left unanswered:
- Does this change noticeably affect the total runtime? The checksum seems simple enough that the slight difference here wouldn't show up in PNG benchmarks.
- The proposed solution uses AVX2, which is not currently used in the original codebase. Would any other part of the processing benefit from using newer instructions?
If checksum calculation was any substantial portion of image decoding, I think that would be a strong case for simply not checking the checksum.
If you put corrupted data into a PNG decoder, I don't think it's awfully important to most users whether they get a decode error or a garbled image out.
- Does this change noticeably affect the total runtime? The checksum seems simple enough that the slight difference here wouldn't show up in PNG benchmarks.
- The proposed solution uses AVX2, which is not currently used in the original codebase. Would any other part of the processing benefit from using newer instructions?