For the uninitiated, this is the same author as the many other "The Illustrated....

punkspider · 2025-01-27T22:38:00 1738017480

Thanks so much for mentioning this. His name carries a lot of weight for me as well.

jamestimmins · 2025-01-27T23:01:11 1738018871

Have you read his book Hands-On Large Language Models?

Looks interesting, but I'm skeptical that a book can feasibly stay up to date with the speed of development.

jampekka · 2025-01-28T07:44:22 1738050262

> Looks interesting, but I'm skeptical that a book can feasibly stay up to date with the speed of development.

The basic structure of the base models has not really changed since the first GPT launched in 2018. You still have to understand gradient descent, tokenization, embeddings, self-attention, MLPs, supervised fine tuning, RLHF etc for the foreseeable future.

Adding RL based CoT training would be a relatively straightforward addendum to a new edition, and it's an application of long established methods like PPO.

All "generations" of models are presented as revolutionary -- and results-wise they maybe are -- but technically they are usually quite incremental "tweaks" to the previous architecture.

Even more "radical" departures like state space models are closely related to same basic techniques and architectures.

mistrial9 · 2025-01-28T15:46:55 1738079215

> gradient descent

funny mentioning the math but not the Transformer encoders..

jampekka · 2025-01-28T16:33:11 1738081991

Transformer encoders are not really popular anymore, and all the top LLMs are decoder-only architectures. But encoder models like BERT are used for some tasks.

In any case, self-attention and MLP is the crux of Transformer blocks, be they in the decoder or the encoder.

mistrial9 · 2025-01-28T16:39:49 1738082389

> Transformer encoders are not really popular anymore

references, please

jasonjmcghee · 2025-01-27T23:29:00 1738020540

I have not, but Jay has created a ton of value and knowledge for free and don't fault him for throwing an ad for his book / trying to benefit a bit financially.

jamestimmins · 2025-01-28T00:24:32 1738023872

Yeah no shade for someone selling their knowledge; I'm just trying to suss out how useful the book is for learning foundations.

whoisburbansky · 2025-01-28T04:57:54 1738040274

Foundations don't change much with "the speed of development"

jamestimmins · 2025-01-28T17:14:39 1738084479

That's a good point