Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.

And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

> What if wikipedia is using the incorrect word in a lot of locations,

Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor, but to beat trivial compressions you do have to be more smart than plain looking at the data - that's the point.

> And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile

That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.




> And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

It's possible to have an algorithm that is consistently closer in meaning but also consistently gets commas (or XML) wrong and pays a penalty every time.

Let's say both that algorithm and its competitor are using 80MB at this stage, before fixups.

Which one is more correct?

If you say "the one that needs fewer bytes of fixups is more correct", then that is a valid metric but you're not measuring human knowledge.

A human knowledge metric would say that the first one is a more correct 80MB lossy encoding, regardless of how many bytes it takes to restore the original text.

> Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor

You can't just declare it won't be a deciding factor. If different algorithms are good at different things, it might be a deciding factor.

> That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.

Prove it!

Specifically, prove they disagree with what I'm saying.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: