Hacker News new | past | comments | ask | show | jobs | submit | xeckr's comments login

Good.

Can't access the site. Did one of you get offended at the title and then give the server a tight hug?


Archive link available below.


The dress, anyone?


I see people ITT saying "memorizing", do they all happen to mean "mesmerizing"?


I see people ITT saying "all", do they both happen to mean "both"?


Do you mean person ITT saying all? Whether it’s two people or a dozen people, if there are people ITT, using the same word, then they’re all using the word.


They „could of“ meant that.

:-)


I agree. Hacking a smart TV to bypass/replace its OS would make for an interesting project, though.


They're currently doing human trials on this in Japan.


Looks like it punches way above its weight(s).

How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?


We’re already past that point! MacBooks can easily run models exceeding GPT-3.5, such as Llama 3.1 8B, Qwen 2.5 8B, or Gemma 2 9B. These models run at very comfortable speeds on Apple Silicon. And they are distinctly more capable and less prone to hallucination than GPT-3.5 was.

Llama 3.3 70B and Qwen 2.5 72B are certainly comparable to GPT-4, and they will run on MacBook Pros with at least 64GB of RAM. However, I have an M3 Max and I can’t say that models of this size run at comfortable speeds. They’re a bit sluggish.


The coolness of local LLMs is THE only reason I am sadly eyeing upgrading from M1 64GB to M4/5 128+GB.


Compare performance on various Macs here as it gets updated:

https://github.com/ggerganov/llama.cpp/discussions/4167

OMM, Llama 3.3 70B runs at ~7 text generation tokens per second on Macbook Pro Max 128GB, while generating GPT-4 feeling text with more in depth responses and fewer bullets. Llama 3.3 70B also doesn't fight the system prompt, it leans in.

Consider e.g. LM Studio (0.3.5 or newer) for a Metal (MLX) centered UI, include MLX in your search term when downloading models.

Also, do not scrimp on the storage. At 60GB - 100GB per model, it takes a day of experimentation to use 2.5TB of storage in your model cache. And remember to exclude that path from your TimeMachine backups.


Thank you for all the tips! I'd probably go 128GB 8TB because of masochism. Curious, what makes so many of the M4s in the red currently.


It's all memory bandwidth related -- what's slow is loading these models into memory, basically. The last die from Apple with all the channels was the M2 Ultra, and I bet that's what tops those leader boards. M4 has not had a Max or an Ultra release yet; when it does (and it seems likely it will), those will be the ones to get.


What if you have a Macbook Air with 16GB (the bechmarks dont seem to show memory).


You could definitely run an 8B model on that, and some of those are getting very capable now.

The problem is that often you can't run anything else. I've had trouble running larger models in 64GB when I've had a bunch of Firefox and VS Code tabs open at the same time.


I thought VSCode was supposed to be lightweight, though I suppose with extensions it can add up


8B models with larger contexts, or even 9-14B parameter models quantized.

Qwen2.5 Coder 14B at a 4 bit quantization could run but you will need to be diligent about what else you have in memory at the same time.


I have a M2 Air with 24GB, and have successfully run some 12B models such as mistral-nemo. Had other stuff going as well, but it's best to give it as much of the machine as possible.


I recently upgraded to exactly this machine for exactly this reason, but I haven't taken the leap and installed anything yet. What's your favorite model to run on it?


I bought an old used desktop computer, a used 3090, and upgraded the power supply, all for around 900€. Didn't assemble it all yet. But it will be able to comfortably run 30B parameter models with 30-40 T/s. The M4 Max can do ~10 T/s, which is not great once you really want to rely on it for your productivity.

Yes, it is not "local" as I will have to use the internet when not at home. But it will also not drain the battery very quickly when using it, which I suspect would happen to a Macbook Pro running such models. Also 70B models are out of reach of my setup, but I think they are painfully slow on Mac hardware.


I'm returning my 96GB m2 max. It can run unquantized llama 3.3 70B but tokens per second is slow as molasses and still I couldn't find any use for it, just kept going back to perplexity when I actually needed to find an answer to something.


Interesting. You're using the FP8 version i'm guessing? How many tokens/s are you using and which software? MLX?


I'm waiting for next gen hardware. All the companies are aiming for AI acceleration.


Sorry, I'm not up to date, but can you run GPTs locally or only vanilla LLMs?


>MacBooks can easily run models exceeding GPT-3.5, such as Llama 3.1 8B, Qwen 2.5 8B, or Gemma 2 9B.

If only those models supported anything other than English


Llama 3.1 8B advertises itself as multilingual.

All of the Qwen models are basically fluent in both English and Chinese.


Llama 8B is multilingual on paper, but the quality is very bad compared to English. It generally understands grammar, and you can understand what it's trying to say, but the choice of words is very off most of the time, often complete gibberish. If you can imagine the output of an undertrained model, this is it. Meanwhile GPT3.5 had far better output that you could use in production.


Cohere just announced Command R7B. I haven’t tried it yet but their larger models are the best multilingual models I’ve used


Is subtext to this uncensored Chinese support?


[dead]


> gpt-3.5-turbo is generally considered to be about 20B params. An 8B model does not exceed it.

The industry has moved on from the old Chinchilla scaling regime, and with it the conviction that LLM capability is mainly dictated by parameter count. OpenAI didn't disclose how much pretraining they did for 3.5-Turbo, but GPT 3 was trained on 300 billion tokens of text data. In contrast, Llama 3.1 was trained on 15 trillion tokens of data.

Objectively, Llama 3.1 8B and other small models have exceeded GPT-3.5-Turbo in benchmarks and human preference scores.

> Is a $8000 MBP regular consumer hardware?

As user `bloomingkales` notes down below, a $499 Mac Mini can run 8B parameter models. An $8,000 expenditure is not required.


>> Llama 3.3 70B and Qwen 2.5 72B are certainly comparable to GPT-4

> I'm skeptical; the llama 3.1 405B model is the only comparable model I've used, and it's significantly larger than the 70B models you can run locally.

Every new Llama generation achieved to beat larger models of the previous generation with smaller ones.

Check Kagi's LLM benchmark: https://help.kagi.com/kagi/ai/llm-benchmark.html

Check the HN thread around the 3.3 70b release: https://news.ycombinator.com/item?id=42341388

And their own benchmark results in their model card: https://github.com/meta-llama/llama-models/blob/main/models%...

Groq's post about it: https://groq.com/a-new-scaling-paradigm-metas-llama-3-3-70b-...

Etc


They still do not beat GPT-4, however.

And benchmarks are very misleading in this regard. We've seen no shortage of even 8B models claiming that they beat GPT-4 and Claude in benchmarks. Every time this happens, once you start actually using the model, it's clear that it's not actually on par.


GPT-4 from March 2023, not GPT-4o from May 2024.


> Is a $8000 MBP regular consumer hardware?

May want to double-check your specs. 16" w/128GB & 2TB is $5,400.


> Is a $8000 MBP regular consumer hardware? If you don't think so, then the answer is probably no.

The very first Apple McIntosh was not far from that price at its release. Adjusted for inflation of course.


A Mac with 16GB RAM can run qwen 7b, gemma 9b and similar models that are somewhere between GPT3.5 and GPT4.

Quite impressive.


on what metric?

Why would OpenAI bother serving GPT4 if customers would be just as happy with a tiny 9B model?


https://lmarena.ai/

Check out the lmsys leaderboard. It has an overall ranking as well as ranking for specific categories.

OpenAI are also serving gpt4o mini. That said afaiu it’s not known how large/small mini is.

Being more useful than GPT3.5 is not a high bar anymore.


Don't confuse GPT-4 and GPT-4o.

GPT-4o is a much better experience than the smaller local models. You can see that in the lmarena benchmarks or from trying them out yourself.


M4 Mac mini 16gb for $500. It's literally an inferencing block (small too, fits in my palm). I feel like the whole world needs one.


> inferencing block

Did you mean _external gpu_?

Choose any 12GB or more video card with GDDR6 or superior and you'll have at least double the performance of a base m4 mini.

The base model is almost an older generation. Thunderbolt 4 instead of 5, slower bandwidths, slower SSDs.


> you'll have at least double the performance of a base m4 mini

For $500 all included?


The base mini is 599.

Here's a config for around the same price. All brand new parts for 573. You can spend the difference improving any part you wish, or maybe get an used 3060 and go AM5 instead (Ryzen 8400F). Both paths are upgradeable.

https://pcpartpicker.com/list/ftK8rM

Double the LLM performance. Half the desktop performance. But you can use both at the same time. Your computer will not slow down when running inference.


That’s a really nice build.


Another possible build is to use a mini-pc and M.2 connections

You'll need a mini-pc with two M.2 slots, like this:

https://www.amazon.com/Beelink-SER7-7840HS-Computer-Display/...

And a riser like this:

https://www.amazon.com/CERRXIAN-Graphics-Left-PCI-Express-Ex...

And some courage to open it and rig the stuff in.

Then you can plug a GPU on it. It should have decent load times. Better than an eGPU, worse than the AM4 desktop build, fast enough to beat the M4 (once the data is in the GPU, it doesn't matter).

It makes for a very portable setup. I haven't built it, but I think it's a reasonable LLM choice comparable to the M4 in speed and portability while still being upgradable.

Edit: and you'll need an external power supply of at least 400W:)


It's easy to argue that Llama-3.3 8B performs better than GPT-3.5. Compare their benchmarks, and try the two side-by-side.

Phi-4 is yet another step towards a small, open, GPT-4 level model. I think we're getting quite close.

Check the benchmarks comparing to GPT-4o on the first page of their technical report if you haven't already https://arxiv.org/pdf/2412.08905


Did you mean Llama-3.1 8B? Llama 3.3 currently only has a 70B model as far as I’m aware.


Why would you want to though? You already can get free access to large LLMs and nobody is doing anything groundbreaking with them.


I only use local, open source LLMs because I don’t trust cloud-based LLM hosts with my data. I also don’t want to build a dependence on proprietary technology.


We're there, Llama 3.1 8B beats Gemini Advanced for $20/month. Telosnex with llama 3.1 8b GGUF from bartowski. https://telosnex.com/compare/ (How!? tl;dr: I assume Google is sandbagging and hasn't updated the underlying Gemini)


We're there. Llama 3.3 70B is GPT-4 level and runs on my 64GB MacBook Pro: https://simonwillison.net/2024/Dec/9/llama-33-70b/

The Qwen2 models that run on my MacBook Pro are GPT-4 level too.


Saying these models are at GPT-4 level is setting anyone who doesn't place special value on the local aspect up for disappointment.

Some people do place value on running locally, and I'm not against then for it, but realistically no 70B class model has the amount of general knowledge or understanding of nuance as any recent GPT-4 checkpoint.

That being said these models are still very strong compared to what we had a year ago and capable of useful work


I said GPT-4, not GPT-4o. I'm talking about a model that feels equivalent to the GPT-4 we were using in March of 2023.


I remember using GPT-4 when it first dropped to get a feeling of its capabilities, and no, I wouldn't say that llama-3.3-70b is comparable.

At the end of the day, there's only so much you can cram into any given number of parameters, regardless of what any artificial benchmark says.


I envy your memory.


You're free to intentionally miss their point, does them no good.


I wouldn't call 64GB MacBook Pro "regular consumer hardware".


I have to disagree. I understand it's very expensive, but it's still a consumer product available to anyone with a credit card.

The comparison is between something you can buy off the shelf like a powerful Mac, vs something powered by a Grace Hopper CPU from Nvidia, which would require both lots of money and a business relationship.

Honestly, people pay $4k for nice TVs, refrigerators and even couches, and those are not professional tools by any stretch. If LLMs needed a $50k Mac Pro with maxed out everything, that might be different. But anything that's a laptop is definitely regular consumer hardware.


There's definitely been plenty sources of hardware capable of running LLMs out there for a while, Mac or not. A couple 4090s or P40s will run 3.1 70b. Or, since price isn't a limit, there are other easier & more powerful options like a [tinybox](https://tinygrad.org/#tinybox:~:text=won%27t%20be%20consider...).


Yeah, a computer which starts at $3900 is really stretching that classification. Plus if you're that serious about local LLMs then you'd probably want the even bigger RAM option, which adds another $800...


An optioned up minivan is also expensive but doesn’t cost as much as a firetruck. It’s expensive but still very much consumer hardware. A 3x4090 rig is more expensive and still consumer hardware. An H100 is not, you can buy like 7 of these optioned up MBP for a single H100.


In my experience, people use the term in two separate ways.

If I'm running a software business selling software that runs on 'consumer hardware' the more people can run my software, the more people can pay me. For me, the term means the hardware used by a typical-ish consumer. I'll check the Steam hardware survey, find the 75th-percentile gamer has 8 cores, 32GB RAM, 12GB VRAM - and I'd better make sure my software works on a machine like that.

On the other hand, 'consumer hardware' could also be used to simply mean hardware available off-the-shelf from retailers who sell to consumers. By this definition, 128GB of RAM is 'consumer hardware' even if it only counts as 0.5% in Steam's hardware survey.


On the Steam Hardware Survey the average gamer uses a computer with a 1080p display too. That doesn't somehow make any gaming laptop with a 2k screen sold in the last half decade a non-consumer product. For that matter the average gaming PC on Steam is even above average relative to the average computer. The typical office computer or school Chromebook is likely several generations older doesn't have an NPU or discrete GPU at all.

For AI and LLMs, I'm not aware of any company even selling the models assets directly to consumers, they're either completely unavailable (OpenAI) or freely licensed so the companies training them aren't really dependent what the average person has for commercial success.


In the early 80's, people were spending more than $3k for an IBM 5150. For that price you got 64 kB of RAM, a floppy drive, and monochrome monitor.

Today, lots of people spend far more than that for gaming PCs. An Alienware R16 (unquestionably a consumer PC) with 64 GB of RAM starts at $4700.

It is an expensive computer, but the best mainstream computers at any particular time have always cost between $2500 and $5000.


https://status.openai.com/

Seems to be an issue that isn't region-specific.


It guessed Hebrew for me too when I was going for a German accent do caralho.


That's basically a new standard. I like it, but getting all JSON parsers on board will be tricky if not impossible. Maybe it deserves its own name. How about JSONc (JAYSON-SEE)?

Edit: I read the fine print on the page after posting this... Turns out that I'm not the first to come up with jsonc. The author just isn't a huge fan and wants JSON parsers to accept comments. Good luck with that!


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: