Hacker Newsnew | past | comments | ask | show | jobs | submit | more PheonixPharts's commentslogin

Have you never been to a book store?

You don't have to read every word of a book to understand if it's interesting to you. I purchased a bunch of technical books the other day that I had never heard of based on opening them up, reading a bit of the intro and flipping through the examples.

Relatively few of my favorite books have come through recommendations compared to those that I have come across through serendipitous discovery.

For anyone who has no time to browse books, then most of the best books in existence would be of little interest to that person.


"Don't cast your pearls before swine".

When I was younger I used to passionately defend those things I've seen as beautiful, but after years experience talking with people passionate about their fields and learning and those who never will be: If you lack the innate curiosity to explore those things others have declared marvelous, then this book will offer you no value.

Every time I crack this book open I get excited and I've read it multiple time and one most of the exercises. I can think of few other books that really expose the beauty and simultaneously strong engineering foundations of software.

You have "tons of experience programming" and sound like you've already decided you know what needs to be known (otherwise why even ask rather than just read it free online), I doubt this will offer you anything you haven't already seen before.


> If you lack the innate curiosity to explore those things others have declared marvelous, then this book will offer you no value.

> Every time I crack this book open I get excited and I've read it multiple time and one most of the exercises. I can think of few other books that really expose the beauty and simultaneously strong engineering foundations of software.

---

https://www.stilldrinking.org/programming-sucks ( https://news.ycombinator.com/item?id=7667825 and others)

> Every programmer occasionally, when nobody’s home, turns off the lights, pours a glass of scotch, puts on some light German electronica, and opens up a file on their computer. It’s a different file for every programmer. Sometimes they wrote it, sometimes they found it and knew they had to save it. They read over the lines, and weep at their beauty, then the tears turn bitter as they remember the rest of the files and the inevitable collapse of all that is good and true in the world.

I recommend (not ironically) Double Binded Sax by the group named Software.


> I recommend (not ironically) Double Binded Sax by the group named Software.

After you finish that, I recommend queuing up Friedrich Nietzsche by Klaus Schulze.


> otherwise why even ask rather than just read it free online

Not GP, but my time is limited, so asking "is this something an experienced programmer would find worthwhile and insightful" is a fair question.


To be honest I'd recommend just taking a look. It's been 10 years since I first encountered the book (and programming) and I'm sure the lessons can be found elsewhere.

But I find the book a marvel of pedagogy and rank it as maybe one of the greatest textbooks of all time across disciplines. The lessons are packed so densely yet so concisely that you'll appreciate different things on successive reads.

If you're experienced it will also read very easily and quickly so it then becomes quite an easy and enjoyable skim and then you don't have to rely on other people's accounts of whether it is or isn't worth your time.


Might find it beautiful, but I'm also just f'in busy. Why do you think that if I enjoyed it I would've already done it? Enjoying things is great but it's not my main desideratum for doing something


Then it's not for you and you can do something else, like commenting about why you should read it.


| I can think of few other books I’m interested to know which books …


I don't think it's that fake. There were people, researchers, that were obsessed with ELIZA back in the day. I personally have a friend who is constantly thinking about fine-tuning a model on his ex's texts to recreate chatting with her. I suspect many others are less extreme but still seek out a form of companionship they have full control over.

In the visual space, if we're honest, most of the work being done on Stable Diffusion is with the aim of creating "intimate" art. Might not be "companionship" in the family sense, but absolutely a good chunk of the Civit.ai world is, more-or-less, concerned with the mass creation of "waifus".

I don't think that we're going to have many startups succeed in this space, because I suspect the main driver is to have conversations/generate images that people don't entirely want there to be a record off/watched by a private company. But I actually suspect that the number of people using AI for companionship of some form or another is much higher than it appears.


Boston at number 6?

I find this list hard to take too seriously. Despite the city's incredible university culture it's always been fundamentally anti-startup from it's very nature.

Boston is a city were credentials matter far more than ideas, and even then there's a strong, strong culture of "you aren't special, nobody is". People are skeptical of new ideas, compensation for tech has always been abysmal, and people tend to be surprisingly risk adverse.

I've lived there a few times, and there are parts of the city I love, but nobody seriously interested in anything disruptive is going to be hanging in that area too long after graduation.


Taking a look at Boston in this report and it is dominated by startups in categories like agriculture, drug research, and fintech.

I don't think these are "startups" like a couple of guys and a beer keg - they are looking at well-funded business ventures.


Boston is a suit & tie startup scene completely different from the West Coast. You won't find anyone wearing shorts and a Hawaiian shirt in office.


That’s because Boston is cold and we think Hawaiian shirts are dorky. Casual culture is alive and well, but different


I used to be from East Coast, those pocket protectors are not just for pens but snack storage.


Google Mitch Kapor Hawaiian shirt. Nobody in Boston VC or angel investing wears a tie. They might wear a jacket and laundered shirt meeting a limited. Nobody pitching them wears a tie.


It is a phrase, very serious and hard to enter. If someone is a fresh grad or someone with tons of experience wants exposure to a fast pace burn you out startup culture there is absolutely nothing like West Coast startups.


Even if credentials matter more than ideas there’s more than enough credentialed people to go around and soak up funding

Plus, tech isn’t the only startup dimension. Boston has a ton of medical stuff too.


> it's always been fundamentally anti-startup from it's very nature.

Apollo, Akamai, BBN, DEC, Data General, Draper Labs, EMC, Prime, Raytheon, Stratus, Thinking Machines, Wang...and numerous others I'm probably forgetting.

The 128 and 495 belt used to chock full of tech companies.

Nowadays it's a lot of bioscience startups.


Boston is the biotech capital of the US.


Isn’t Boston usually ranked 2nd or 3rd in these lists? I think the MIT ecosystem, military / defense, and biotech usually make them punch above their population size. But also Boston itself is fairly tiny but when you include Cambridge and surrounding cities it’s like 5 million people.


What's your beef?

>Boston at number 6? ... I find this list hard to take too seriously.

and silicon valley london, new york, tel aviv, and los angeles are ranked higher than it; the headline, that seattle dropped to 20, is the headline.

after Boston are singapore, beijing, seoul, tokyo, shanghai, and washington dc.

what is your suggested list? I just don't understand what you and many other commenters are saying.


Pretty sure a lot of it is related to MIT and biotech.


I use VScode every day on linux and have never had an issue with it.


I want to second this regarding desktop Linux.

Each major PC build I've had I've tried to go Linux desktop and ultimately failed because I just got tired of switching to windows for gaming along with periodic cases of essential devices/apps just not working right.

I tried a final time a few years back. I even made sure I had a separate drive just for the Windows partition I felt was inevitable... turns out it never happened.

Gaming on desktop linux is phenomenal now, and I haven't run into any issues with some device or essential app not working. There is no personal task that I ever have to use an old macbook for and it never made sense to add that Windows partition. Occasionally I have to get more fiddly with driver settings to get things to work just right, but not enough for it to be a major annoyance.

Overall if you're tired of your personal OS spying on you increasingly more often and slowly turning into an advertising machine, the linux desktop is ready and quite enjoyable!


> Occasionally I have to get more fiddly with driver settings to get things to work just right, but not enough for it to be a major annoyance.

And honestly, until I recently erased Windows from my personal computer, I was battling Win10 settings frequently enough that usability of Mint, for example, is on par.


I just use a single monitor/mouse/keyboard/mic/camera and then have a usb switch they all plug into so I can easily swap between work and personal. It make switching between workstations just a few seconds and with a bit of wire management can be quite clean.


What USB switch do you use? I've been considering getting this exact setup.


> hard not to regret the choice

If you can't handle "regret" in these cases, then you probably shouldn't be in a position where you're deriving the vast majority of your income/weatlh from investments (which is fundamentally what a CEO does).

It's astounding how many ICs can't wrap their heads around the concept that holding onto your RSUs make absolutely no financial sense. With rare exceptions, this doesn't make sense for anyone. And yet, fear for "regret" keeps people holding.

But it's not shocking that even in tech many ICs are not good at reasoning financially. But if you want to be a co-founder, and hold a lot of your wealth in investments it's essentially that you learn to reason, plan and accept outcomes accordingly. Otherwise you're more-or-less a professional gambler.


I’m going to rebound on that and explain why it doesn’t make sense to hold on to RSUs.

Disclaimer: I’m an IC myself.

I worked for my 1st company for 15 years. Held to their RSUs most of the time. Then moved to another (public) company and stayed there for a year before leaving. Now in a startup with a lower salary and no immediate liquidity on my stock options.

When you work at a public company, you have multiple exposures to the company’s growth: the RSUs that have already vested, the RSUs that haven’t vested yet and through your own career growth and salary increase that goes with a successful company. If you were early enough, you also get market cred for having made the company successful. If the companies goes under (or shrinks, or lays people off), all those assets are at risk.

Usually, one has more in granted stocks than in vested stocks. If your company just went public, you might have a lote more sellable than in your pipeline, but even that is unusual. Usually, you’ll still have more in the pipeline than you’ve already vested.

If your company has been public for a while, you should get frequent refreshes, which means you still have a significant numbers of unvested shares.

Regardless, you should sell as soon as you can, because of the remaining exposure through unvested equity. Use the proceeds to place in an ETF, or in a high-yield savings account, or some more aggressive investment strategy. Or use it for the downpayment on your house, or fund your kid’s college funds, whatever floats your boat.

Anyways, keep in mind that you still have a significant exposure to the growth of the company through your unvested equities. If you’re worried about short-term cap gain, don’t be. If you sell immediately, there’s no growth between cost basis and selling price, so no cap gain. Another upside to selling is that you’re not bound by the blackout periods, so your assets are much more liquid. And remember you still have exposure


I would say that for most RSU lots it's better to wait for long-term capital gain taxes to kick in before selling.


That's the incorrect belief that causes so many people to hold their RSUs. The day you vest the RSU is the day someone decided to:

   (1) give you the amount in cash (as regular income) 
   (2) take that cash and buy that stock on your behalf
   (3) turn around and give you the stock
and somehow you decide to let (2) and (3) happen without returning to the cash position in (1) and buying whatever else you would prefer to hold. The LTCG clock starts on that day, and all you're doing by holding your vested RSU is let someone else decide to buy stuff on your behalf and make the decision for you.

(that's assuming that there's an ability to liquidate the RSU on the vest date)


At vesting time you are taxed (immediately) at ordinary income rates on the fair market value the day that it vests, and that's what the cost basis is set to. If you sell on that day, your capital gains from the sale will be (near) $0.

The only reason to wait for LTCG on RSUs is if you decided to hold it for some non-zero amount of time after vesting and then the stock price shot up. But then you're also taking on the risk that the stock price will drop again before the year has passed, and end up with less post-tax money than if you'd sold at short-term tax rates.


Some companies might make you hold for a few months until the next earnings report and trading window. After that it depends on your tolerance for risk and your attitude about the IRS.


How does that work?


Earnings reports happen once a quarter between the company and the public. A couple of business days after that, employees (without material nonpublic info) may trade company shares for the next month or so. Maybe you can't sell April shares until mid-July, and then you have to decide whether to wait until next July to minimize tax on gain.

Sometimes you can elect to sell every released share in a quarter, or file a 10b5-1 plan with a schedule, but you have to do that during a trading window.


Most (all?) public tech companies have policies that prohibit employees from trading the company's stock outside designated windows following a quarterly earnings release.


I believe in diversification and index funds for most people, but this seems overdone.

The issue here is that sometimes if you procrastinate about diversifying, it pays off very well. As a Google employee (who joined after IPO), it was by far my best investment and funded my retirement.

I guess that's accidental gambling. I did have other investments.


The way you can test if it's accidental gambling is by answering the following:

If you had worked at a different company with pure cash comp equivalent to your RSUs, would you have invested the same $$ in Google stock? Or would you have invested it instead in an aggressive but diversified portfolio (e.g. 100% S&P 500 or even just a bucket of blue-chip tech stocks).

I am confident that for the vast majority of tech employees they would choose the latter if they were operating in a pure cash regime.


No, I definitely wouldn't have invested so much in Google. However, I'm not sure how much to attribute to it being a default choice, versus the differences between an inside versus an outside view.

It's easier to be comfortable investing long-term in something you know well. While there's a lot I'll never know about Google, I think I understand the company somewhat better than others. For example, I can discount a lot of news articles as being written by people who don't really understand the culture. If I hadn't worked there, I might worry more.

That's less and less true, though, as much has changed since I left. And for investment purposes, maybe that bias only seemed to be helpful, versus an outside view?


Yes that’s accidental gambling. Or what i like to call “at the right place at the right time”.

Ask a Yahoo employee how that same plan would have worked out for them.

That being said, good for you. :)


Mostly agreed, but as an employee you do have some semblance of material non-public information that gives you a structural edge in assessing the stock. (This probably works better at a 1k-5k company than a Google/FB, but I can't say because I haven't worked at the big faangs).

I've benefited financially from having a good sense of how well things are going and holding/selling accordingly (within the confines of the law and blackout periods, of course).


> non-public information that gives you a structural edge in assessing the stock

This can also cut the other direction too. I had a slightly negative sentiment about Google during my tenure there due to the organization I was in. When earnings call season rolled around it didn't matter since the ads revenue line always dominated everything else.


I'll agree that it's not super common that holding onto RSUs makes sense, but I think it's more common than "rare exceptions".

Ultimately it's an investing decision. If you believe the stock price is going up at a rate faster than the rest of the market, and are willing to accept the risk that a concentrated position like that entails, then that can make financial sense.

For people who want to hold their RSUs but still want to diversify to some extent, my usual recommendation is to pick some percent of the shares that vest every quarter to sell immediately, and hold the rest. And -- critically -- to stick with that commitment every single quarter, and not fall into the trap of thinking "oh, the stock seems to be doing so well, I'll skip the sale this quarter". (Of course, a measured re-evaluation of the plan is a reasonable and good thing to do every so often.)


> Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

I have kids so I'm presuming I'm allowed to have an opinion here.

This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Once they have the basics down concept acquisition time shrinks rapidly and kids can easily learn their new favorite animal in as little as a single example.

Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.

Beyond just learning a new animal, humans are able to learn entirely new systems of reasoning in surprisingly few examples (though it does take quite a bit of time to process them). How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.


> kids can easily learn their new favorite animal in as little as a single example

Until they encounter a similar animal and get confused, at which point you understand the implicit heuristic they were relying on. (Eg. They confused a dairy cow as a zebra, which means their heuristic was a black-and-white quadrupedal)

Doesn't this seem remarkably close to how LLMs behave with one-shot or few-shot learning? I think there are a lot more similarities here than you give it credit for.

Also, I grew up in South Korea where early math education is highly prioritized (for better or for worse). I remember having to solve 2 dozen arithmetic problems every week after school with a private tutor. Yes, it was torture and I was miserable, but it did expose me to thousands more arithmetic questions than my American peers. All that misery paid off when I moved to the U.S. at the age of 12 and realized that my math level was 3-4 years above my peers. So yes, I think human intelligence accuracy also does improve with more training data.


Not many zebras where I live but lots of little dogs. Small dogs were clearly cats for a long time no matter what I said. The training can take a while.


This. My 2.5 y.o. still argues with me that a small dog she just saw in the park is a "cat". That's in contrast to her older sister, who at 5 is... begrudgingly accepting that I might be right about it after the third time I correct her.


The thing is that the labels "cat" and "dog" reflect a choice in most languages to name animals based on species, which manifests in certain physical/behavioral attributes. Children need to learn by observation/teaching and generalization that these are the characteristics they need to use to conform to our chosen labelling/distinction, and that other things such as size/color/speed are irrelevant.

Of course it didn't have to be this way - in a different language animals might be named based on size or abilities/behavior, etc.

So, your daughter wanting to label a cat-sized dog as a cat is just a reflection of her not having aligned her generalization of what you are talking about when you say "cat" vs "dog" with her own.


And once they learn sarcasm, small dogs are cats again :-)


My favourite part of this is when they apply their new words to things that technically make sense, but don't. My daughter proudly pointed at a king wearing a crown as "sharp king" after learning about knives, saws, etc.


> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems. During my mathematics education I had to practice solving a lot of problems dissimilar what I had seen before. Even in the theory part, a lot of it was actually about filling in details in proofs and arguments, and reformulating challenging steps (by words or drawings). My notes on top of a mathematical textbook are much more than the text itself.

People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions. The original article is spot on that there is no AGI pathway in the current research direction. But there are huge incentives for ignoring this.


> Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems.

I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.

The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.

> The original article is spot on that there is no AGI pathway in the current research direction.

I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.

They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.


> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.


> Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes

Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.

> It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.

There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.


I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.

Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).


> I started by understanding. I could multiply by repeat addition

How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?

There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.


There's a difference between memorizing meanings of words (addition is same as counting this and then the other thing, "3" means three things) and memorizing methods (table of single digit addition/multiplication to do them faster in your head). You were arguing the second, I'm a counterexample. I agree about the first, everyone learns language by memorization (some rote, some by use), but language is not math.


> You were arguing the second, I'm a counterexample.

I still don't think you are. Since we agree that you memorized numbers and how they are sequential, and that counting is moving "up" in the sequence, addition as counting is still memorizing a procedure based on this, not just memorizing a name: to add any two numbers, count down on one as you count up on the other until the first number number reaches zero, and the number that counted up is the sum. I'm curious how you think you learned addition without memorizing this procedure (or one equivalent to it).

Then you memorized the procedure for multiplication: given any two numbers, count down on one and add the other to itself until the counted down number reaches one. This is still a procedure that you memorized under the label "multiplication".

This is exactly the kind of procedure that I initially described. Someone taught you a correct procedure for achieving some goal and gave you a name for it, and "learning math" consists of memorizing such correct procedures (valid moves in the game of math if you will). These moves get progressively more sophisticated as the math gets more advanced, but it's the same basic process.

They "make sense" to you, and you call it "understanding", because they are built on a deep foundation that ultimately grounds out in counting, but it's still memorizing procedures up and down the stack. You're just memorizing the "minimum" needed to reproduce everything else, and compression is understanding [1].

The "variation in outcomes" that an OP discussed is simply because many valid moves are possible in any given situation, just like in chess, and if you "understand" when a move is valid vs. not (eg. you remember it), then you have an advantage over someone who just memorized specific shortcuts, which I suspect is what you are thinking I mean by memorization.

[1] https://philpapers.org/rec/WILUAC-2


I think you are confusing "memory" with strategies based on memorisation. Yes memorising (ie putting things into memory) is always involved in learning in some way, but that is too general and not what is discussed here. "Compression is understanding" possibly to some extent, but understanding is not just compression; that would be a reduction of what understanding really is, as it involves a certain range of processes and contexts in which the understanding is actually enacted rather than purely "memorised" or applied, and that is fundamentally relational. It is so relational that it can even go deeply down to how motor skills are acquired or spatial relationships understood. It is no surprise that tasks like mental rotation correlates well with mathematical skills.

Current research in early mathematical education now focuses on teaching certain spatial skills to very young kids rather than (just) numbers. Mathematics is about understanding of relationships, and that is not a detached kind of understanding that we can make into an algorithm, but deeply invested and relational between the "subject" and the "object" of understanding. Taking the subject and all the relations with the world out of the context of learning processes is absurd, because that is in the exact centre of them.


Sorry, I strongly disagree.

I did memorize names of numbers, but that is not essential in any way to doing or understanding math, and I can remember a time where I understood addition but did not fully understand how names of numbers work (I remember, when I was six, playing with a friend at counting up high, and we came up with some ridiculous names for high numbers because we didn't understand decimal very well yet).

Addition is a thing you do on matchsticks, or fingers, or eggs, or whatever objects you're thinking about. It's merging two groups and then counting the resulting group. This is how I learned addition works (plus the invariant that you will get the same result no matter what kind of object you happen to work with). Counting up and down is one method that I learned, but I learned it by understanding how and why it obviously works, which means I had the ability to generate variants - instead of 2+8=3+7=... I can do 8+2=9+1=..., or I can add ten at a time, etc'.

Same goes for multiplication. I remember the very simple conversation where I was taught multiplication. "Mom, what is multiplication?" "It's addition again and again, for example 4x3 is 3+3+3". That's it, from that point on I understood (integer) multiplication, and could e.g. wonder myself at why people claim that xy=yx and convince myself that it makes sense, and explore and learn faster ways to calculate it while understanding how they fit in the world and what they mean. (An exception is long multiplication, which I was taught as a method one day and was simple enough that I could memorize it and it was many years before I was comfortable enough with math that whenever I did it it was obvious to me why what I'm doing here calculates exactly multiplication. Long division is a more complex method: it was taught to me twice by my parents, twice again in the slightly harder polynomial variant by university textbooks, and yet I still don't have it memorized because I never bothered to figure out how it works nor to practice enough that I understand it).

I never in my life had an ability to add 2+2 while not understanding what + means. I did for half an hour have the same for long division (kinda... I did understand what division means, just not how the method accomplishes it) and then forgot. All the math I remember, I was taught in the correct order.

edit: a good test for whether I understood a method or just memorized it would be, if there's a step I'm not sure I remember correctly, whether I can tell which variation has to be the correct one. For example, in long multiplication, if I remembered each line has to be indented one place more to the right or left but wasn't sure which, since I understand it, I can easily tell that it has to be the left because this accomplishes the goal of multiplying it by 10, which we need to do because we had x0 and treated it as x.


The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.


Does it though? It's a common claim but I don't think that's been rigourously established.


> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly.

Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.

> They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.


> Perhaps that is how you learned math, but it is nothing like how I learned math.

Memorization is literally how you learned arithmetic, multiplication tables and fractions. Everyone starts learning math by memorization, and only later start understanding why certain steps work. Some people don't advance to that point, and those that do become more adept at math.


> Memorization is literally how you learned arithmetic, multiplication tables and fractions

I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure". Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?


> I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure"

What did you understand, exactly? You understood how to "count" using "numbers" that you also memorized? You intuitively understood that addition was counting up and subtraction was counting down, or did you memorize those words and what they meant in reference to counting?

> Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

The procedure to add or subtract fractions by establishing a common denominator, for instance. The procedure for how numerators and denominators are multiplied or divided. I could go on.


Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

I do have the single digit multiplication table memorized now, but there was a long time where that table had gaps and I would use my understanding of how numbers worked to to calculate the result rather than remembering it. That same process still occurs for double digit number.

Mathematics education, especially historically, has indeed leaned pretty heavily on memorization. That does mean thats the only way to learn math, or even a particularly good one. I personally think over reliance on memorization is part of why so many people think they hate math.


> Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

Sure, I did that plenty too, but that doesn't refute the point that memorization is core to understanding mathematics, it's just a specific kind of memorization that results maximal flexibility for minimal state retention. All you're claiming is that you memorized some core axioms/primitives and the procedures that operate on them, and then memorized how higher-level concepts are defined in terms of that core. I go into more detail of the specifics here:

https://news.ycombinator.com/item?id=40669585

I agree that this is a better way to memorize mathematics, eg. it's more parsimonious than memorizing lots of shortcuts. We call this type of memorizing "understanding" because it's arguably the most parsimonious approach, requiring the least memory, and machine learning has persuasively argued IMO that compression is understanding [1].

[1] https://philpapers.org/rec/WILUAC-2


Every time I see people online reduce the human thinking process to just production of a perceptible output, I start questioning myself, whether somehow I am the only human on this planet capable of thinking and everyone else is just pretending. That can't be right. It doesn't add up.

The answer is that both humans and the model are capable of reasoning, but the model is more restricted in the reasoning that it can perform since it must conform to the dataset. This means the model is not allowed to invest tokens that do not immediately represent an answer but have to be derived on the way to the answer. Since these thinking tokens are not part of the dataset, the reasoning that the LLM can perform is constrained to the parts of the model that are not subject to the straight jacket of training loss. Therefore most of the reasoning occurs in-between the first and last layers and ends with the last layer, at which point the produced token must cross the training loss barrier. Tokens that invest into the future but are not in the dataset get rejected and thereby limit the ability of the LLM to reason.


> People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions

And almost all of it is just more text, or described in more text.

You're very much right about this. And that's exactly why LLMs work as well as they do - they're trained on enough text of all kinds and topics, that they get to pick up on all kinds of patterns and relationships, big and small. The meaning of any word isn't embedded in the letters that make it, but in what other words and experiences are associated with it - and it so happens that it's exactly what language models are mapping.


It is not "just more text". That is an extremely reductive approach on human cognition and experience that does favour to nothing. Describing things in text collapses too many dimensions. Human cognition is multimodal. Humans are not computational machines, we are attuned and in constant allostatic relationship with the changing world around us.


I think there is a component of memorizing solutions. For example, for mathematical proofs there is a set of standard "tricks" that you should have memorized.


Sure memory helps a lot, it allows you to concentrate your mental effort on the novel ot unique parts of the problem.


> How many homework questions did your entire calc 1 class have? I'm guessing less than 100…

I’m quite surprised at this guess and intrigued by your school’s methodology. I would have estimated >30 problems average across 20 weeks for myself.

My kids are still in pre-algebra, but they get way more drilling still, well over 1000 problems per semester once Zern, IReady, etc. are factored in. I believe it’s too much, but it does seem like the typical approach here in California.


I preferred doing large problem sets in math class because that is the only way I felt like I could gain an innate understanding of the math.

For example after doing several hundred logarithms, I was eventually able to do logs to 2 decimal places in my head. (Sadly I cannot do that anymore!) I imagine if I had just done a dozen or so problems I would not have gained that ability.


> This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Sure, but they learn a lot of labels.

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100

At least 20 to 30 a week, for about 10 weeks of class. Some weeks were more, and I remember plenty of days where we had 20 problems assigned a day.

Indeed, I am a huge fan of "the best way to learn math is to do hundreds upon hundreds of problems", because IMHO some concepts just require massive amounts of repetition.


illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts

Now imagine how much would your kid learn if the only input he ever received was a sequence of words?


Are you saying it's not fair for LLMs, because of the way they are taught is different?

The difference is that we don't know better methods for them, but we do know of better methods for people.


I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day. It takes at least 4 years for a human children to be in any way comparable in performance to GPT-4 on any task both of them could be tested on; do people really believe GPT-4 was trained with more data than a 4 year old?


> do people really believe GPT-4 was trained with more data than a 4 year old?

I think it was; the guesstimate I've seen is GPT-4 was trained on 13e12 tokens, that over 4 years is 8.9e9/day or about 1e5/s.

Then it's a question of how many bits per token — my expectation is 100k/s is more than the number of token-equivalents we experience, even though it's much less than the bitrate even of just our ears let alone our eyes.


Interesting analysis, makes sense. I wonder how we should account for the “pre-built” knowledge that is transferred to a newborn genetically and from the environment at conception and during gestation. Of course things like epi-genetics also come into play.

The analogies get a little blurry here, but perhaps we can draw a distinction between information that an infant gets from their higher-level senses (e.g. sight, smell, touch, etc) versus any lower-level biological processes (genetics, epi-genetics, developmental processes, and so on).

The main point is that there is a fundamental difference: LLMs have very little prior knowledge [1] while humans contain an immense amount of information even before they begin learning through the senses.

We need to look at the billions of years of biological evolution, millions of years of cultural evolution, and the immense amounts of environmental factors, all which shape us before birth and before any “learning” occurs.

[1] The model architecture probably counts as hard-coded prior knowledge contained before the model begins training, but it is a ridiculously small amount of information compared to the complexity of living organisms.


I think that's all fair that both LMMs and and people get a certain (even unbounded) amount of "pretraining" before actual tasks.

But after the training people are much more equipped to do single-shot recognition and cognitive tasks of imagery and situations they have not encountered before, e.g. identifying (from pictures) which animals is being shown, even if it is the second time of seeing that animal (the first being shown that this animal is a zebra).

So, basically, after initial training, I believe people are superior in single-shot tasks—and things are going to get much more interesting once LMMs (or something after that?) are able to do that well.

It might be that GPT-4o can actually do that task well! Someone should demo it, I don't have access. Except, of course, GPT-4o already knows what zebras look like, so something else than exactly that..


> I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day.

Yeah, but they're seeing mostly the same thing day after day!

They aren't seeing 10k stills of 10k different dogs, then 10k stills of 10k different cats. They're seeing $FOO thousand images of the family dog and the family cat.

My (now 4.5yo) toddler did reliably tell the difference between cats and dogs the first time he went with us to the local SPCA and saw cats and dogs that were not our cats and dogs.

In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.


> In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.

I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled. In our case, this was a significant source of animal recognition skills of my daughters.


> I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled.

No images or photos (no books).

TV, certainly, but I consider it unlikely that animals in the animation style of pepper pig helps the classifier.

Besides which, we're still talking under a dozen cats/dogs seen till that point.

Forget about cats/dogs. Here's another example: he only had to see a burger patty once to determine that it was an altogether new type of food, different from (for example) a sausage.

Anyone who has kids will have dozens of examples where the classifier worked without a false positive off a single novel item.


So a billion years of evolutionary search plus 20 years of finetuning is a better method?


Two other points - I've also forgotten a bunch, but also know I could "relearn" it faster than the first time around.

To continue your example, I know I've learned calculus and was lauded at the time. Now I could only give you the vagaries, nothing practical. However I know if I was pressed, I could learn it again in short order.


> This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Yes. All that learning is feeding off one another. They're learning how reality works. Every bit of new information informs everything else. It's something that LLMs demonstrate too, so it shouldn't be a surprising observation.

> Once they have the basics down concept acquisition time shrinks rapidly

Sort of, kind of.

> and kids can easily learn their new favorite animal in as little as a single example.

Under 5 they don't. Can't speak what happens later, as my oldest kid just had their 5th birthday. But below 5, all I've seen is kids being quick to remember a name, but taking quite a bit longer to actually distinguish between a new animal and similarly looking ones they already know. It takes a while to update the classifier :).

(And no, they aren't going to one-shot recognize an animal in a zoo that they saw first time on a picture hours earlier; it's a case I've seen brought up, and I maintain that even most adults will fail spectacularly at this test.)

> Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.

Correct, in the sense that the models don't update their weights while you use them. But that just means you have to compare them with ability of humans to one-shot tasks on the spot, "thinking on their feet", which for most tasks makes even adults look bad compared to GPT-4.

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

I don't believe someone could learn calc in 100 exercises or less. Per concept like "addition of small numbers", or "long division", or "basic derivatives", or "trivial integrals", yes. Note that in-class exercises count too; learning doesn't happen primarily by homework (mostly because few have enough time in a day to do it).


> But that just means you have to compare them with ability of humans to one-shot tasks on the spot, "thinking on their feet", which for most tasks makes even adults look bad compared to GPT-4.

This simply is not true as stated in the article. ARC-AGI is a one-shot task test that humans reliably do much, much better on than any AI model.

> I don't believe someone could learn calc in 100 exercises or less.

I learned the basics of integration in a foreign language I barely understood by watching a couple of diagrams get drawn out and seeing far less than 100 examples or exercises.


Coming up with and quickly adopting new terms to sound "hip" is one of the most important skills for AI practitioners. We've had "agent-based" concepts in CS for decades, but if you're "in" you'll of course refer to "agentic" workflows and the like.

It make sense to come up with terms to describe common patterns: Chain-of-Thought, RAG etc. are good examples of this. But the passion some members of this community have for being intentionally confusing is tiresome.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: