Some background before i explain why your suggestion is "not even wrong". If you...

mellosouls · on Sept 14, 2023

Some background before i explain why your suggestion is "not even wrong".

The phrase you quote is generally used to imply a stupid or unscientific suggestion. Your succeeding comments about what you think AGI is carry a certitude that isn't warranted.

It's good that you are trying to supply knowledge where you think it is lacking, and I understand there are fora where this sort of public school lecturing is amusing but I think your tone is misplaced here.

ltbarcly3 · on Sept 14, 2023

Years long compression challenge with dozens of geniuses participating: exists

Random person on the internet: let me improve this thing I've never heard of by using the one fact I know about compression, there are two kinds

It's absolute hubris and a waste of everyone's time to chime in with low value, trash comments like "they should make it lossy". It's not unreasonable at all to take a snarky tone in response. "not even wrong" absolutely applies here, and they carefully, patiently, and in great detail explained why.

bonoboTP · on Sept 14, 2023

I often feel the same way when discussions pop up here or on other forums, about topics I'm familiar with. Like randos declaring that researchers in deep learning are "obviously doing it wrong" and they should instead do X, where X is like an entire subfield existing for years with a lot of activity, etc.

So I get where you're coming from. But I'd suggest that a place like HN is in fact a place for random people to inject their half-baked takes. It is a just discussion board where lots of the comments will be uninformed or wrong. Take it or leave it. If you want something else, you need to find more niche communities that are - by the nature of it - more difficult to find and less public, including IRL discussion, clubs, conferences etc. But it has its use: we, you and me can jump in any thread and type out what we think after 2 minutes and get some response. But of course someone even more novice might think that we know more than just that 2 minutes consideration, and they learn our junk opinion as if it was the result of long experience. It's unavoidable, since nobody knows who the rest of the commenters are.

Online discussions are incredibly noisy, and often even the people who seem to use the jargon and seem knowledgeable to the outsider can be totally off-base and essentially just imitate how the particular science or field "sounds like". Unfortunately, you only learn this gradually and over a long time. If you learn stuff through forums, Reddit, HN, blogs, substacks etc. it can be very misleading from the first-person experience because you will soak up lots of nonsense as well. Reading actual books and taking real courses is still very much relevant.

HN and co. are more like the cacophony of what the guy on the street thinks. Very noisy, and only supposed to be a small treat over rigorous study. You shouldn't expect to see someone truly breaking new ground in this comment thread. If it disturbs you, you can skip the comments. But trying to "forbid" it, or gatekeep is futile. It's like trying to tell people in a bar not to discuss how bad the soccer team coach is, because they don't really have the relevant expertise. Yeah, sure, but people just wanna chat and throw ideas around. It's on the reader to know not to take it too seriously.

_a_a_a_ · on Sept 15, 2023

ISWYM but the problem isn't really people making suggestions, but the way they make suggestions that grates. If Mr Internet-random-guy wants to introduce the issue of lossy compression then by all means ask why not, but don't say they should.

It comes across as arrogance, probably because it is, then it sucks up plenty of the time of others who do actually know the subject, putting something right.

Even more bloody annoying is when people ask when even the most immediate web search would get the answer. Wikipedia is usually a very good place to start. I guess that for these people, the cost is externalising it to other's wasted time.

Then again, we all take turns at being the stupid one, so am I to complain.

bonoboTP · on Sept 15, 2023

It's inherent in reading comments. And it's also inherent in encountering mere mortals in the real world. And remember how the most annoying and stupid people keep going on about how all the people they meet are stupid and annoying. There's no point in piling on another layer. Close the tab, or comment constructively and charitably. Else you end up with stuff like the badX subreddits (badhistory, badphilosophy) who get their adrenaline/dopamine fix by seeking explanations they see as ignorant/naive/arrogant and sneering at it while self-aggrandizing and feeling like they are in the inner circle who know it all.

The other thing is, you never see all the people who do go to Wikipedia, google or check a book. They won't comment "Hello I'm not commenting now because I went to Wikipedia". They just don't comment.

And Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

People are more prone to comment out of frustration than other feelings.

RALaBarge · on Sept 14, 2023

You may find the book "Structure of Scientific Revolutions" interesting if you have not read it. The author posits that it isn't people entrenched in the field that will offer breakthrough advancements, but it is instead outsiders looking in.

More often than not, they are aggressively rebutted, which leads to the belief that science progresses one funeral at a time. Perhaps it is you who needs to guard against hubris?

https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...

misnome · on Sept 14, 2023

I disagree with you; therefore by your own logic, I am right.

RALaBarge · on Sept 14, 2023

I wish I could claim the logic, but I am a nobody who knows nothing. The author of the book, however, has impressed people for over 70 years with this line of thought and I too agree with him.

Dylan16807 · on Sept 14, 2023

It depends on your goal.

If your goal is compressing human knowledge, then you do want to avoid wasting bits on the details of wording that were random chance.

The problem is inability to objectively judge such a compression, not the mere fact that it won't be bit-perfect.

It is not "not even wrong".

ltbarcly3 · on Sept 14, 2023

You didn't even read the thread though?

All the methods in this competition that are competitive DO use lossy compression.

Dylan16807 · on Sept 14, 2023

I did read the thread. I'm objecting to the idea that stacking lossless corrections onto lossy compression and then measuring the total is a good way to measure what we want to measure here, wrt human knowledge. It may be the best we have, but it's not good.

See my other comment here, that I probably shouldn't have put in a reply to a reply to a dead comment: https://news.ycombinator.com/item?id=37506099

ltbarcly3 · on Sept 15, 2023

Why should we care what you think though? Im not being nasty, but unless you have a reputation in that field you have to give some cogent argument or its just some random possibly-uninformed opinion.

slashdev · on Sept 14, 2023

I’m sorry you see it as a trash comment, but I hope the discussion it provoked was to your liking.

I found it enlightening.

ltbarcly3 · on Sept 14, 2023

There is nothing wrong with jumping in and saying stuff that probably isn't right on an internet forum! That doesn't make it valuable or insightful, it's just how forums work. Your comment was cool with me and you shouldn't feel like you need to change at all.

That said, the response to your comment was insightful and made interesting points. You did in fact kick off a very interesting conversation!

Lanzaa · on Sept 14, 2023

The phrase ["not even wrong"] is generally used to imply a stupid or unscientific suggestion.

An unscientific suggestion is exactly what was offered. Forgive my ignorance, but why was the tone of the message misplaced, and what was the tone?

carapace · on Sept 14, 2023

I feel your tone-policing is misplaced here. When someone is that ignorant or foolish they should be told so so they can hopefully recalibrate or something. There's enough inchoate nonsense on the Internet that I appreciate the efforts to keep discussion on a little higher level here.

p-e-w · on Sept 14, 2023

> There's enough inchoate nonsense on the Internet

I agree 100%. But the top-level comment is not an example of such.

However, the reply in question – and your comment – are certainly examples of the kind of tone-deaf, needlessly aggressive, hostile, confrontational, and borderline malicious posts I wish I could cleanse the Internet of wholesale.

carapace · on Sept 14, 2023

Thanks for the feedback, I disagree.

> tone-deaf, needlessly aggressive, hostile, confrontational, and borderline malicious posts I wish I could cleanse the Internet of wholesale.

I have never said this before, but maybe you're a little too sensitive?

In any event, I feel your characterization of my comment borders on ad hominem and certainly it seems to violate the site guideline to interpret comments charitably.

Good day.

justinclift · on Sept 14, 2023

> When someone is that ignorant or foolish ...

Dude. Calling people "ignorant or foolish" isn't exactly great either. :(

gfody · on Sept 14, 2023

first two comments are the classic HN banter I’m here for, next two have stink on them I do not like

Dylan16807 · on Sept 14, 2023

Banter shouldn't be insulting for no good reason.

ltbarcly3 · on Sept 14, 2023

Banter by definition is teasing, which insulting. That, along with being humorous in that context and not to be taken seriously is what makes it banter rather than any other form of exchange.

Lets just go to the dictionary:

Banter (n): the playful and friendly exchange of teasing remarks.

Teasing (adj): intended to provoke or make fun of someone in a playful way.

Dylan16807 · on Sept 14, 2023

And in specific, "not even wrong" is a total shutdown, not playful.

ltbarcly3 · on Sept 14, 2023

Oh I agree 100%, that wasn't banter, it was a grumpy shutdown.

aeternum · on Sept 14, 2023

Is there any evidence that arithmetic coding works at the level of concepts?

As a thought experiment, suppose Copernicus came up with this Hutter prize idea and declared that he would provide the award to whoever could compress the text of his book on the epicycle-based movement of planets around the sun (De revolutionibus orbium coelestium).

Today we can explain the actual motion to high accuracy with a single sentence that would have been understandable in that age: "A line drawn from the sun to any planet sweeps out equal areas in equal time"

This however is mostly useless in attempting to win the Copernican Hutter prize. Predicting the wording of some random human's choosing (especially at length) is very far removed from ability to predict in general.

AnotherGoodName · on Sept 14, 2023

Arithmetic coding isn't the key thing here. That's just a bit wise algorithm. You predict the next bit is a 1 with 90% certainty? you don't need to store much data with arithmetic coding. That's all arithmetic coding is here.

What your getting at is the 'predictor' that feeds into the arithmetic coder and that's wide open and can work any way you want it to. LLMs absolutely have context which is similar to what your asking and they are good predictors of output given complex input (pass gpt a mathematical series and ask it what comes next. If it's right then it's really helpful in compression as you wouldn't need to store the whole series in full).

kaba0 · on Sept 14, 2023

So you don’t think that writing a wiki article about this could be made smaller by encoding this info in a few logical steps, and adding some metadata on what kind of sentence should follow what? That part about decompressing it is the AI part. Where to place a comma can be added in constant cost between any contender programs.

aeternum · on Sept 14, 2023

Sure my point is more that this adding of commas business dwarfs any real prediction.

Suppose there were a superintelligence that figured out the theory of everything for the universe. It's unclear that would actually help with this task. You could likely easily derive things like gravitaiton, chemistry, etc but the vast majority of your bits would still be used attempting to match the persona and wording of the various wikipedia authors.

This superintelligence would be masked by some LLM that is slightly better at faking human wording.

kaba0 · on Sept 14, 2023

But that comma will have the exact same price between 2 contending lossy compressions. In fact, it is a monotonic function of the difference, so the better your lossy compression is, the better your arithmetic one will be — making you measure the correct thing in an objective way.

It’s like, smart people have spent more than 3 minutes on this problem already.

Dylan16807 · on Sept 15, 2023

> But that comma will have the exact same price between 2 contending lossy compressions.

Why do you think that? Do you have proof of that?

> making you measure the correct thing

If we're trying to measure knowledge, then the exact wording is not part of being correct.

Very often you will have to be less correct to match wikipedia's working. A better lossy encoding of knowledge would have a higher cost to correct it into a perfect match of the source.

kaba0 · on Sept 15, 2023

> Why do you think that? Do you have proof of that?

You want to encode “it’s cloudy, so it’ll rain”. Your lossy, intelligent algorithm comes up with “it is cloudy so it will rain”. You save the diff and apply it. If another, worse algorithm can only produce “it’s cloudy so sunny”, it will have to pay more in the diff, which scales with the number of differences between the produced and original string.

You can be less correct, if that cumulatively produces better results, that’s the beauty of the problem - the last “mile” difference is the same cost for everyone as a factor of the difference.

Dylan16807 · on Sept 15, 2023

But that's just two random examples.

How about "it is cloudy so it will rain" and "it's cloudy, so sunny"? Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.

You seem to be assuming that a less intelligent algorithm is worse at matching the original text in every way, and I don't think that assumption is warranted.

I'll rephrase the last line from my earlier post: What if wikipedia is using the incorrect word in a lot of locations, and the smart algorithm predicts the correct word? That means the smart algorithm is a better encoding of knowledge, but it gets punished for it.

In that case the last mile cost is higher for a smart algorithm.

And even when the last mile cost is roughly the same, the bigger of a percentage it becomes, the harder it is to measure anything else.

And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile, even though such a result should be a huge win. There are lots of possible ways to encode knowledge that will drag things just a bit away from the original arbitrary wording. So even if you use the same sub-algorithm to do the last mile, it will have to spend more bits. I don't think this is an unlikely scenario.

kaba0 · on Sept 16, 2023

> Then since we're looking at the commas for this argument, the second algorithm is paying less for comma correction even though it's much wronger.

And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

> What if wikipedia is using the incorrect word in a lot of locations,

Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor, but to beat trivial compressions you do have to be more smart than plain looking at the data - that's the point.

> And it shuns any algorithm that's (for example) 5% better at knowledge and 2% worse at the last mile

That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.

Dylan16807 · on Sept 24, 2023

> And? It will surely have to be on average more correct than another competitor, otherwise its size will be much larger.

It's possible to have an algorithm that is consistently closer in meaning but also consistently gets commas (or XML) wrong and pays a penalty every time.

Let's say both that algorithm and its competitor are using 80MB at this stage, before fixups.

Which one is more correct?

If you say "the one that needs fewer bytes of fixups is more correct", then that is a valid metric but you're not measuring human knowledge.

A human knowledge metric would say that the first one is a more correct 80MB lossy encoding, regardless of how many bytes it takes to restore the original text.

> Then you write s/wrongword/goodword for a few more bytes. It won't be a deciding factor

You can't just declare it won't be a deciding factor. If different algorithms are good at different things, it might be a deciding factor.

> That's not how it works. With all due respect, much smarter people than us has been thinking about it for many years - let's not try to make up why it's wrong after thinking about it badly for 3 minutes.

Prove it!

Specifically, prove they disagree with what I'm saying.

canjobear · on Sept 14, 2023

You could still come up with a scorecard for lossy compression, for example area under the rate distortion curve. It would be very hard to calculate.

eru · on Sept 14, 2023

Lossless compression _is_ a scorecard for lossy compression. Thanks to the ideas from arithmetic coding.

soVeryTired · on Sept 14, 2023

There’s no way I could reproduce a calculus textbook verbatim, but I can probably prove all the important theorems in it.

Even then, given half of any sentence in the book, I don’t rate my chances of reproducing the next half. That’s more a question of knowing the author’s style than knowing calculus itself.

Does arithmetic coding capture all of that?

kaba0 · on Sept 14, 2023

Let’s say you come up with the exact same text given an initial seed (of you ate this and that that day). Then it really is just the arithmetic function of the difference, as it is deterministic.

Now, who will get closer to the algorithm with that added arithmetic coding? You, knowing the proofs, or a random guy that doesn’t even speak the language? Does it then measure intelligence all else being equals!

QuadmasterXLII · on Sept 14, 2023

gregw134 · on Sept 14, 2023

Or just require the results to be 99.99 percent accurate (within some edit distance)

AnotherGoodName · on Sept 14, 2023

If it's 99.99% accurate arithmetic coding would have next to no data stored.

Arithmetic coding is optimal in turning probabilistic data into lossless data. There's provably no way to do it more efficiently than arithmetic coding. The data it needs for corrections is smaller the better the predictions are.

So given this why even dwell on ways that add any form of subjectivity. Arithmetic coding is there. It's a simple algorithm.

nathanfig · on Sept 14, 2023

"... lossless compression can be a scorecard of prediction which is the same as AGI."

Having not read the papers, this sentence strikes me as a bit of a leap. Maybe for very constrained definitions of AGI?

AnotherGoodName · on Sept 14, 2023

There's a section in the link above "Further Recommended Technical Reading relevant to the Compression=AI Paradigm" and they define it in a reasonably precise mathematical way. It's well accepted at this point. If you can take input, predict what will happen given some options you can direct towards a certain goal. This ability to direct towards a goal effectively defines AGI. "Make paperclips" and the AI observes the world, what decisions needed to be made to optimize for output paperclips and then starts taking decisions to output paperclips is essentially what we mean by AGI and prediction is a piece of this.

I have no stake in this btw, I've just had a crack at the above challenge in my younger days. I failed but i want to get back into it. In theory a small LLM model without any existing training data (for size) that trains itself on the input as it passes predictions to an arithmetic coder that optimally compresses and the same process on the decompression side should work really well here. But i don't have the time these days. Sigh.

mellosouls · on Sept 14, 2023

This ability to direct towards a goal effectively defines AGI

No it doesn't, though it may be argued to be a requirement.

That's the point of the previous commenter - that you are making unjustified assertions using an extrapolation of the views of some researchers. Reiterating it with a pointer to why they believe that to be the case doesn't make it more so.

If that's your favoured interpretation, fine, but that's all it is at this point.

AnotherGoodName · on Sept 14, 2023

Hey don't pin this on me it's not my assertion.

Go argue with the scientists who state pretty much what i just said verbatim including full links with proofs in http://prize.hutter1.net/hfaq.htm#ai :)

>One can prove that the better you can compress, the better you can predict; and being able to predict [the environment] well is key for being able to act well. Consider the sequence of 1000 digits "14159...[990 more digits]...01989". If it looks random to you, you can neither compress it nor can you predict the 1001st digit. If you realize that they are the first 1000 digits of π, you can compress the sequence and predict the next digit. While the program computing the digits of π is an example of a one-part self-extracting archive, the impressive Minimum Description Length (MDL) principle is a two-part coding scheme akin to a (parameterized) decompressor plus a compressed archive. If M is a probabilistic model of the data X, then the data can be compressed (to an archive of) length log(1/P(X|M)) via arithmetic coding, where P(X|M) is the probability of X under M. The decompressor must know M, hence has length L(M). One can show that the model M that minimizes the total length L(M)+log(1/P(X|M)) leads to best predictions of future data. For instance, the quality of natural language models is typically judged by its Perplexity, which is equivalent to code length. Finally, sequential decision theory tells you how to exploit such models M for optimal rational actions. Indeed, integrating compression (=prediction) into sequential decision theory (=stochastic planning) can serve as the theoretical foundations of super-intelligence (brief introduction, comprehensive introduction, full treatment with proofs.

mellosouls · on Sept 14, 2023

Hey don't pin this on me it's not my assertion.

But it is your assertion, wherever you've picked up the idea from.

...which is the same as AGI..

...effectively defines AGI...

No it isn't, and no it doesn't. Your language is too strong in its claims.

cornel_io · on Sept 14, 2023

Whether or not you agree, a lot of people do. There is a trivial sense in which a perfect compression algorithm is a perfect predictor (if it ever mispredicted anything, that error would make it a sub-optimal compressor for a corpus that included that utterance), and there are plenty of ways to prove that a perfect predictor can be used as an optimal actor (if you ever mispredicted the outcome of an event worse than what might be fundamentally necessary due to limited observations or quantum shenanigans, that would be a sub-optimal prediction and hence you would be a sub-optimal compressor), a.k.a. an AGI.

Where a lot of us get off the fence is when we remove "perfect" from the mix. I don't personally think that performance on a compression task correlates very strongly with what we'd generally consider as intelligence. I suspect good AGIs will function as excellent compression routines, but I don't think optimizing on compression ratio will necessarily be fruitful. And I think it's quite possible that a more powerful AGI could perform worse at compression than a weaker one, for a million reasons.

AbrahamParangi · on Sept 14, 2023

If you had a perfect lossless compressor (one that could compress anything down to it's fundamental kolmogorov complexity), you would also definitionally have an oracle that could compute any computable function.

Intelligence would be a subset of the capabilities of such an oracle.

vidarh · on Sept 14, 2023

A universal turing machine can compute any computable function. That doesn't make it intelligent, because it won't without direction.

AbrahamParangi · on Sept 14, 2023

Then expand your target to a universal turing machine and instructions to compute any computable function. Do you consider it intelligent then?

vidarh · on Sept 14, 2023

No, because instructions to compute any computable function for an utm only requires a tiny set of instructions on how to generate every possible permutation over forever increasing lengths of tape.

It will run forever, and I would agree that in that set there will be an infinite number of functions that when run would be deemed intelligent, but that does not make the computer itself intelligent absent first stumbling on one of those specific programs.

EDIT: Put another way, if the potential to be made to compute in a way we would deem intelligent is itself intelligence, then a lump of random particles is intelligent because it could be rearranged into a brain.

AbrahamParangi · on Sept 14, 2023

The analogy is not a random lump of particles but all particles in all configurations, of which you are a subset. Is the set of you plus a rock intelligent?

vidarh · on Sept 14, 2023

This is a fundamentally flawed argument, because a computer is not in all the states it can execute at once, and naively iterating over the set of possible states might well not have found a single intelligence before the heat death of the universe.

If we were picking me out of an equally large set of object, then I'd argue that no, the set is not meaningfully intelligent, because the odds of picking me would be negligible enough that it'd be unreasonable in the extreme to assign the set any of my characteristics.

version_five · on Sept 13, 2023

[flagged]

AnotherGoodName · on Sept 13, 2023

Sorry but i hope my long-winded explanation explains it.

In computer science we have a way to score how good lossy data is. That way is to make it lossless and look at how much data the arithmetic coder needed to correct it from lossy to lossless.

This is a mathematically perfect way to judge (you can't correct lossy data any more efficiently than an arithmetic coder). All the entries here do in fact make probabilistic predictions on the data and they do all use arithmetic coding. So the suggestion misses a key point of CS involved here. I don't mean to be rude about it but the idea does need correcting.

Dylan16807 · on Sept 14, 2023

> This is a mathematically perfect way to judge

Only if you're using a very particular and honestly circular-sounding definition of "good".

Some deviations are more important than others, even if you're looking at deviations that take the same amount of data to correct.

Think about film grain. Some codecs can characterize it, remove it when compressing, and then synthesize new visually matching grain when decompressing.

Let's say it takes a billion bytes to turn the lossy version back into the lossless version.

The version with synthetic film grain still needs a billion bytes or maybe even slightly more bytes, even if the synthetic grain is 95% as good as real grain. The cost to turn it lossless is the wrong metric.

joshxyz · on Sept 13, 2023

Your comment is far more worse, lol. His comment actually supports gp's suggestion, just in a different angle.

It looks like LLM's are like compression algorithms with strengths and weaknesses in different things.

Losslessness doesnt always equate to usefulness. But yea, maybe a different competition for this.