Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For the uninitiated, this is the same author as the many other "The Illustrated..." blog posts.

A particularly popular one: https://jalammar.github.io/illustrated-transformer/

Always very high quality.



Thanks so much for mentioning this. His name carries a lot of weight for me as well.


Have you read his book Hands-On Large Language Models?

Looks interesting, but I'm skeptical that a book can feasibly stay up to date with the speed of development.


> Looks interesting, but I'm skeptical that a book can feasibly stay up to date with the speed of development.

The basic structure of the base models has not really changed since the first GPT launched in 2018. You still have to understand gradient descent, tokenization, embeddings, self-attention, MLPs, supervised fine tuning, RLHF etc for the foreseeable future.

Adding RL based CoT training would be a relatively straightforward addendum to a new edition, and it's an application of long established methods like PPO.

All "generations" of models are presented as revolutionary -- and results-wise they maybe are -- but technically they are usually quite incremental "tweaks" to the previous architecture.

Even more "radical" departures like state space models are closely related to same basic techniques and architectures.


> gradient descent

funny mentioning the math but not the Transformer encoders..


Transformer encoders are not really popular anymore, and all the top LLMs are decoder-only architectures. But encoder models like BERT are used for some tasks.

In any case, self-attention and MLP is the crux of Transformer blocks, be they in the decoder or the encoder.


> Transformer encoders are not really popular anymore

references, please


I have not, but Jay has created a ton of value and knowledge for free and don't fault him for throwing an ad for his book / trying to benefit a bit financially.


Yeah no shade for someone selling their knowledge; I'm just trying to suss out how useful the book is for learning foundations.


Foundations don't change much with "the speed of development"


That's a good point




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: