More

WeMoveOn · 2024-03-20T16:41:01 1710952861

but is switch c even usable? iirc the training set was nowhere near enough for a model of that size to be coherent in a conversation

WeMoveOn · 2024-03-16T02:54:55 1710557695

the different formats you can export data out in is pretty great, especially the export sql option.

WeMoveOn · on Feb 1, 2024

> much of the work is repetitive, but it comes with its edge cases

for the repetitive stuff, just use copilot embedded in whatever editor you use.

the edge cases are tricky, to actually avoid these the model would need an understanding of both the use case (which is easy to describe to the model) and the code base itself (which is difficult, since description/docstring is not enough to capture the complex behaviors that can arise from interactions between parts of your codebase).

idk how you would train/finetune a model to somehow have this understanding of your code base, I doubt just doing next token prediction would help, you'd likely have to create chat data discussing the intricacies of your code base and do DPO/RLFH to bake it into your model.

look into techniques like qlora that'll reduce the needed memory during tuning. look into platforms like vast ai to rent GPUs for cheap.

RAG/Agents could be useful but probably not. could store info about functions in your codebase such as the signature, the function it calls, its docstring, and known edge cases associated with it. if you don't have docstrings using a LLM to generate them is feasible.

WeMoveOn · on Jan 20, 2024

can someone explain how his costs went to $1? he essentially just replaced GPT4 with a tuned variant of mixtral 8x7b which requires multiple GPUs to run. even if he quantized the model himself it would still need to pay for the hardware and infra, which would require more than $1. is he self hosting or something?

WeMoveOn · on Jan 18, 2024

his username checks out

WeMoveOn · on Dec 18, 2023

Did you now?? I'll have you know that I wrote the full word2vec paper on a roll of shabby two-ply tissue paper during my time in a Taco Bell stall. Sadly, it was then used to mop up my dietary regrets and was subsequently lost to foul wretches of the sewage system. Left with nothing but the memories of my groundbreaking thoughts and the lingering aroma of liquid feces, I texted Mikolov an idea I had about using neural nets to map sequences of text tokens from one language to another only for him to reply "lol thx" and ghost me. I was quite negatively surprised when he decided to take this to the public courts of Facebook and failed to mention the "brainest boi alive™" who gave him this idea in the first place.

WeMoveOn · on Dec 18, 2023

Seriously, his explanations on topics go well beyond the lectures some of my professors provide and could probably benefit a lot of students if given as a resource... If only academia wasn't so distrustful of those outside their circles...

sureglymop · on Dec 18, 2023

But he's not even outside their circles right? Doesn't 3b1b have a PhD? I can tell you at least that we watched some 3b1b videos in my university undergrad math classes last year!

loufe · on Dec 18, 2023

That's so encouraging to hear, I would've loved uni maths with some high quality videos like his to watch, even if just for after class to reinforce the topics.

seanhunter · on Dec 19, 2023

Wikipedia thinks he has a Bachelors from Stanford[1]. That said, I love his videos too. I started with The Essence of Linear Algebra[2], which is just fantastic and really shaped my intuition for the topic.

[1] https://en.wikipedia.org/wiki/3Blue1Brown

[2] https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...

Kranar · on Dec 19, 2023

Grant Sanderson does not have a PhD, he holds a Bachelor's degree in Mathematics from Stanford University.

WeMoveOn · on Dec 18, 2023

This guy got me through my engineering degree

WeMoveOn · on Dec 8, 2023

How did you come up with 40b for the memory? specifically, why 0.7 * total params?

moffkalast · on Dec 9, 2023

It's just a rough estimate given that these things are fairly linear, the original 7B mistral was 15 GB and the new one is 86 GB, whereas a fully duplicated 8 * 15 GB would suggest a 120 GB size, so 86/120 = 0.71 for actual size, suggesting 29% memory savings. This of course doesn't really account for any multiple vs single file saving overhead and such, so it's likely to be a bit off.

WeMoveOn · on Nov 26, 2023