Interesting. Thanks for the response. Do you have any resources where I can educ...

fzzzy · on April 25, 2024

Well, when Llama 1 came out I signed up and downloaded it, and that led me to llama.cpp. I followed the instructions to quantize the model to fit in my graphics card. Then later when more models like llama2 and mixtral came out I would download and evaluate them.

I kept up on hacker news posts and any comments about things I didn't understand. I've also found the localllama subreddit to be a great way to learn.

Any time I saw a comment on anything I would try it, like ollama, kobold.cpp, sillytavern, textgen-webui, and more.

I also have a friend who has been into ai for many years and we always exchange links to new things. I developed a retrieval augmented generation (rag) app with him and a "transformation engine" pipeline.

So following ai stories on hn and reddit, learning through doing, and applying what I learned to real projects.

FezzikTheGiant · on April 25, 2024

Thanks. Very cool. Have you ever tried to implement a transformer from scratch? Like in the Attention is all you need paper? Can a first/second year college student do it

xyc · on April 26, 2024

Andrej Karpathy's course is a good resource: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

fzzzy · on April 27, 2024

I haven't tried it yet, but I do intend to. I think the code for llm inference is quite straightforward. The complexity lies in collecting the training corpus and doing good rlhf. That's just my intuition.

elliotto · on April 25, 2024

Hi, I work at a startup where we train / fine tune / inference models on a gcp kubernetes cluster on some a100s.

There isn't really that much information about how to do this properly because everyone is working it out and it changes month by month. It requires a bunch of DevOps an infrastructure knowledge above and beyond the raw ml knowledge.

Your best bet is probably just to tool around and see what you can do.