Hacker News new | past | comments | ask | show | jobs | submit | WeMoveOn's comments login

but is switch c even usable? iirc the training set was nowhere near enough for a model of that size to be coherent in a conversation


the different formats you can export data out in is pretty great, especially the export sql option.


> much of the work is repetitive, but it comes with its edge cases

for the repetitive stuff, just use copilot embedded in whatever editor you use.

the edge cases are tricky, to actually avoid these the model would need an understanding of both the use case (which is easy to describe to the model) and the code base itself (which is difficult, since description/docstring is not enough to capture the complex behaviors that can arise from interactions between parts of your codebase).

idk how you would train/finetune a model to somehow have this understanding of your code base, I doubt just doing next token prediction would help, you'd likely have to create chat data discussing the intricacies of your code base and do DPO/RLFH to bake it into your model.

look into techniques like qlora that'll reduce the needed memory during tuning. look into platforms like vast ai to rent GPUs for cheap.

RAG/Agents could be useful but probably not. could store info about functions in your codebase such as the signature, the function it calls, its docstring, and known edge cases associated with it. if you don't have docstrings using a LLM to generate them is feasible.


can someone explain how his costs went to $1? he essentially just replaced GPT4 with a tuned variant of mixtral 8x7b which requires multiple GPUs to run. even if he quantized the model himself it would still need to pay for the hardware and infra, which would require more than $1. is he self hosting or something?


his username checks out


Did you now?? I'll have you know that I wrote the full word2vec paper on a roll of shabby two-ply tissue paper during my time in a Taco Bell stall. Sadly, it was then used to mop up my dietary regrets and was subsequently lost to foul wretches of the sewage system. Left with nothing but the memories of my groundbreaking thoughts and the lingering aroma of liquid feces, I texted Mikolov an idea I had about using neural nets to map sequences of text tokens from one language to another only for him to reply "lol thx" and ghost me. I was quite negatively surprised when he decided to take this to the public courts of Facebook and failed to mention the "brainest boi alive™" who gave him this idea in the first place.


Seriously, his explanations on topics go well beyond the lectures some of my professors provide and could probably benefit a lot of students if given as a resource... If only academia wasn't so distrustful of those outside their circles...


But he's not even outside their circles right? Doesn't 3b1b have a PhD? I can tell you at least that we watched some 3b1b videos in my university undergrad math classes last year!


That's so encouraging to hear, I would've loved uni maths with some high quality videos like his to watch, even if just for after class to reinforce the topics.


Wikipedia thinks he has a Bachelors from Stanford[1]. That said, I love his videos too. I started with The Essence of Linear Algebra[2], which is just fantastic and really shaped my intuition for the topic.

[1] https://en.wikipedia.org/wiki/3Blue1Brown

[2] https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...


Grant Sanderson does not have a PhD, he holds a Bachelor's degree in Mathematics from Stanford University.


This guy got me through my engineering degree


How did you come up with 40b for the memory? specifically, why 0.7 * total params?


It's just a rough estimate given that these things are fairly linear, the original 7B mistral was 15 GB and the new one is 86 GB, whereas a fully duplicated 8 * 15 GB would suggest a 120 GB size, so 86/120 = 0.71 for actual size, suggesting 29% memory savings. This of course doesn't really account for any multiple vs single file saving overhead and such, so it's likely to be a bit off.


lit


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: