Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am surprised they allow only 32k tokens when Reformer can have context length of 1M on 16GB VRAM. It seems like they have some ways to optimize it further.


Is the Reformer as capable as this model? It's a trade-off.


It's not, it uses locality-sensitive hashing to reduce attention complexity from O(n^2) to O(nlogn) while maintaining the same performance in 16GB as a best model that could fit into 100GB but nobody scaled it up to 1000 GPUs as its purpose was the opposite.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: