More

xeckr · 2025-01-22T04:41:51 1737520911

Good.

xeckr · 2025-01-14T09:55:37 1736848537

Can't access the site. Did one of you get offended at the title and then give the server a tight hug?

nalinidash · 2025-01-14T10:06:03 1736849163

Archive link available below.

xeckr · 2025-01-05T02:53:07 1736045587

The dress, anyone?

xeckr · 2024-12-20T05:37:26 1734673046

I see people ITT saying "memorizing", do they all happen to mean "mesmerizing"?

shric · 2024-12-20T05:49:55 1734673795

I see people ITT saying "all", do they both happen to mean "both"?

xerox13ster · 2024-12-20T06:21:08 1734675668

Do you mean person ITT saying all? Whether it’s two people or a dozen people, if there are people ITT, using the same word, then they’re all using the word.

leobg · 2024-12-20T12:54:44 1734699284

They „could of“ meant that.

:-)

xeckr · 2024-12-16T03:48:33 1734320913

I agree. Hacking a smart TV to bypass/replace its OS would make for an interesting project, though.

xeckr · 2024-12-13T23:49:15 1734133755

They're currently doing human trials on this in Japan.

xeckr · 2024-12-13T05:04:01 1734066241

Looks like it punches way above its weight(s).

How far are we from running a GPT-3/GPT-4 level LLM on regular consumer hardware, like a MacBook Pro?

anon373839 · 2024-12-13T10:11:11 1734084671

We’re already past that point! MacBooks can easily run models exceeding GPT-3.5, such as Llama 3.1 8B, Qwen 2.5 8B, or Gemma 2 9B. These models run at very comfortable speeds on Apple Silicon. And they are distinctly more capable and less prone to hallucination than GPT-3.5 was.

Llama 3.3 70B and Qwen 2.5 72B are certainly comparable to GPT-4, and they will run on MacBook Pros with at least 64GB of RAM. However, I have an M3 Max and I can’t say that models of this size run at comfortable speeds. They’re a bit sluggish.

noman-land · 2024-12-14T18:57:09 1734202629

The coolness of local LLMs is THE only reason I am sadly eyeing upgrading from M1 64GB to M4/5 128+GB.

Terretta · 2024-12-15T12:15:42 1734264942

Compare performance on various Macs here as it gets updated:

https://github.com/ggerganov/llama.cpp/discussions/4167

OMM, Llama 3.3 70B runs at ~7 text generation tokens per second on Macbook Pro Max 128GB, while generating GPT-4 feeling text with more in depth responses and fewer bullets. Llama 3.3 70B also doesn't fight the system prompt, it leans in.

Consider e.g. LM Studio (0.3.5 or newer) for a Metal (MLX) centered UI, include MLX in your search term when downloading models.

Also, do not scrimp on the storage. At 60GB - 100GB per model, it takes a day of experimentation to use 2.5TB of storage in your model cache. And remember to exclude that path from your TimeMachine backups.

noman-land · 2024-12-15T18:15:21 1734286521

Thank you for all the tips! I'd probably go 128GB 8TB because of masochism. Curious, what makes so many of the M4s in the red currently.

vessenes · 2024-12-16T17:12:53 1734369173

It's all memory bandwidth related -- what's slow is loading these models into memory, basically. The last die from Apple with all the channels was the M2 Ultra, and I bet that's what tops those leader boards. M4 has not had a Max or an Ultra release yet; when it does (and it seems likely it will), those will be the ones to get.

ant6n · 2024-12-16T00:28:27 1734308907

What if you have a Macbook Air with 16GB (the bechmarks dont seem to show memory).

simonw · 2024-12-16T01:29:25 1734312565

You could definitely run an 8B model on that, and some of those are getting very capable now.

The problem is that often you can't run anything else. I've had trouble running larger models in 64GB when I've had a bunch of Firefox and VS Code tabs open at the same time.

xdavidliu · 2024-12-16T20:43:47 1734381827

I thought VSCode was supposed to be lightweight, though I suppose with extensions it can add up

evilduck · 2024-12-16T03:32:49 1734319969

8B models with larger contexts, or even 9-14B parameter models quantized.

Qwen2.5 Coder 14B at a 4 bit quantization could run but you will need to be diligent about what else you have in memory at the same time.

chris_st · 2024-12-16T02:45:28 1734317128

I have a M2 Air with 24GB, and have successfully run some 12B models such as mistral-nemo. Had other stuff going as well, but it's best to give it as much of the machine as possible.

gcanyon · 2024-12-16T12:03:50 1734350630

I recently upgraded to exactly this machine for exactly this reason, but I haven't taken the leap and installed anything yet. What's your favorite model to run on it?

stkdump · 2024-12-16T08:00:01 1734336001

I bought an old used desktop computer, a used 3090, and upgraded the power supply, all for around 900€. Didn't assemble it all yet. But it will be able to comfortably run 30B parameter models with 30-40 T/s. The M4 Max can do ~10 T/s, which is not great once you really want to rely on it for your productivity.

Yes, it is not "local" as I will have to use the internet when not at home. But it will also not drain the battery very quickly when using it, which I suspect would happen to a Macbook Pro running such models. Also 70B models are out of reach of my setup, but I think they are painfully slow on Mac hardware.

jazzyjackson · 2024-12-16T08:20:17 1734337217

I'm returning my 96GB m2 max. It can run unquantized llama 3.3 70B but tokens per second is slow as molasses and still I couldn't find any use for it, just kept going back to perplexity when I actually needed to find an answer to something.

Tepix · 2024-12-17T20:46:25 1734468385

Interesting. You're using the FP8 version i'm guessing? How many tokens/s are you using and which software? MLX?

alecco · 2024-12-16T08:59:21 1734339561

I'm waiting for next gen hardware. All the companies are aiming for AI acceleration.

kleiba · 2024-12-16T10:05:03 1734343503

Sorry, I'm not up to date, but can you run GPTs locally or only vanilla LLMs?

kgeist · 2024-12-16T05:02:52 1734325372

>MacBooks can easily run models exceeding GPT-3.5, such as Llama 3.1 8B, Qwen 2.5 8B, or Gemma 2 9B.

If only those models supported anything other than English

simonw · 2024-12-16T05:11:54 1734325914

Llama 3.1 8B advertises itself as multilingual.

All of the Qwen models are basically fluent in both English and Chinese.

kgeist · 2024-12-16T05:52:12 1734328332

Llama 8B is multilingual on paper, but the quality is very bad compared to English. It generally understands grammar, and you can understand what it's trying to say, but the choice of words is very off most of the time, often complete gibberish. If you can imagine the output of an undertrained model, this is it. Meanwhile GPT3.5 had far better output that you could use in production.

barrell · 2024-12-16T06:00:40 1734328840

Cohere just announced Command R7B. I haven’t tried it yet but their larger models are the best multilingual models I’ve used

numpad0 · 2024-12-16T06:58:07 1734332287

Is subtext to this uncensored Chinese support?

2024-12-16T01:41:17 1734313277

[dead]

anon373839 · 2024-12-16T03:45:24 1734320724

> gpt-3.5-turbo is generally considered to be about 20B params. An 8B model does not exceed it.

The industry has moved on from the old Chinchilla scaling regime, and with it the conviction that LLM capability is mainly dictated by parameter count. OpenAI didn't disclose how much pretraining they did for 3.5-Turbo, but GPT 3 was trained on 300 billion tokens of text data. In contrast, Llama 3.1 was trained on 15 trillion tokens of data.

Objectively, Llama 3.1 8B and other small models have exceeded GPT-3.5-Turbo in benchmarks and human preference scores.

> Is a $8000 MBP regular consumer hardware?

As user `bloomingkales` notes down below, a $499 Mac Mini can run 8B parameter models. An $8,000 expenditure is not required.

PhilippGille · 2024-12-16T07:30:00 1734334200

>> Llama 3.3 70B and Qwen 2.5 72B are certainly comparable to GPT-4

> I'm skeptical; the llama 3.1 405B model is the only comparable model I've used, and it's significantly larger than the 70B models you can run locally.

Every new Llama generation achieved to beat larger models of the previous generation with smaller ones.

Check Kagi's LLM benchmark: https://help.kagi.com/kagi/ai/llm-benchmark.html

Check the HN thread around the 3.3 70b release: https://news.ycombinator.com/item?id=42341388

And their own benchmark results in their model card: https://github.com/meta-llama/llama-models/blob/main/models%...

Groq's post about it: https://groq.com/a-new-scaling-paradigm-metas-llama-3-3-70b-...

Etc

int_19h · 2024-12-16T19:19:55 1734376795

They still do not beat GPT-4, however.

And benchmarks are very misleading in this regard. We've seen no shortage of even 8B models claiming that they beat GPT-4 and Claude in benchmarks. Every time this happens, once you start actually using the model, it's clear that it's not actually on par.

simonw · 2024-12-16T19:42:34 1734378154

GPT-4 from March 2023, not GPT-4o from May 2024.

runako · 2024-12-16T03:08:15 1734318495

> Is a $8000 MBP regular consumer hardware?

May want to double-check your specs. 16" w/128GB & 2TB is $5,400.

zozbot234 · 2024-12-16T07:55:36 1734335736

> Is a $8000 MBP regular consumer hardware? If you don't think so, then the answer is probably no.

The very first Apple McIntosh was not far from that price at its release. Adjusted for inflation of course.

tosh · 2024-12-16T05:53:26 1734328406

A Mac with 16GB RAM can run qwen 7b, gemma 9b and similar models that are somewhere between GPT3.5 and GPT4.

Quite impressive.

jazzyjackson · 2024-12-16T08:23:14 1734337394

on what metric?

Why would OpenAI bother serving GPT4 if customers would be just as happy with a tiny 9B model?

tosh · 2024-12-16T13:18:51 1734355131

https://lmarena.ai/

Check out the lmsys leaderboard. It has an overall ranking as well as ranking for specific categories.

OpenAI are also serving gpt4o mini. That said afaiu it’s not known how large/small mini is.

Being more useful than GPT3.5 is not a high bar anymore.

simonw · 2024-12-16T13:44:23 1734356663

Don't confuse GPT-4 and GPT-4o.

GPT-4o is a much better experience than the smaller local models. You can see that in the lmarena benchmarks or from trying them out yourself.

bloomingkales · 2024-12-16T01:09:52 1734311392

M4 Mac mini 16gb for $500. It's literally an inferencing block (small too, fits in my palm). I feel like the whole world needs one.

alganet · 2024-12-16T11:50:11 1734349811

> inferencing block

Did you mean _external gpu_?

Choose any 12GB or more video card with GDDR6 or superior and you'll have at least double the performance of a base m4 mini.

The base model is almost an older generation. Thunderbolt 4 instead of 5, slower bandwidths, slower SSDs.

kgwgk · 2024-12-16T12:52:45 1734353565

> you'll have at least double the performance of a base m4 mini

For $500 all included?

alganet · 2024-12-16T13:38:45 1734356325

The base mini is 599.

Here's a config for around the same price. All brand new parts for 573. You can spend the difference improving any part you wish, or maybe get an used 3060 and go AM5 instead (Ryzen 8400F). Both paths are upgradeable.

https://pcpartpicker.com/list/ftK8rM

Double the LLM performance. Half the desktop performance. But you can use both at the same time. Your computer will not slow down when running inference.

bloomingkales · 2024-12-16T15:15:37 1734362137

That’s a really nice build.

alganet · 2024-12-16T16:06:04 1734365164

Another possible build is to use a mini-pc and M.2 connections

You'll need a mini-pc with two M.2 slots, like this:

https://www.amazon.com/Beelink-SER7-7840HS-Computer-Display/...

And a riser like this:

https://www.amazon.com/CERRXIAN-Graphics-Left-PCI-Express-Ex...

And some courage to open it and rig the stuff in.

Then you can plug a GPU on it. It should have decent load times. Better than an eGPU, worse than the AM4 desktop build, fast enough to beat the M4 (once the data is in the GPU, it doesn't matter).

It makes for a very portable setup. I haven't built it, but I think it's a reasonable LLM choice comparable to the M4 in speed and portability while still being upgradable.

Edit: and you'll need an external power supply of at least 400W:)

lappa · 2024-12-13T05:10:17 1734066617

It's easy to argue that Llama-3.3 8B performs better than GPT-3.5. Compare their benchmarks, and try the two side-by-side.

Phi-4 is yet another step towards a small, open, GPT-4 level model. I think we're getting quite close.

Check the benchmarks comparing to GPT-4o on the first page of their technical report if you haven't already https://arxiv.org/pdf/2412.08905

vulcanash999 · 2024-12-15T23:41:32 1734306092

Did you mean Llama-3.1 8B? Llama 3.3 currently only has a 70B model as far as I’m aware.

ActorNightly · 2024-12-16T05:30:09 1734327009

Why would you want to though? You already can get free access to large LLMs and nobody is doing anything groundbreaking with them.

jckahn · 2024-12-16T11:39:52 1734349192

I only use local, open source LLMs because I don’t trust cloud-based LLM hosts with my data. I also don’t want to build a dependence on proprietary technology.

refulgentis · 2024-12-15T23:47:11 1734306431

We're there, Llama 3.1 8B beats Gemini Advanced for $20/month. Telosnex with llama 3.1 8b GGUF from bartowski. https://telosnex.com/compare/ (How!? tl;dr: I assume Google is sandbagging and hasn't updated the underlying Gemini)

simonw · 2024-12-15T23:35:18 1734305718

We're there. Llama 3.3 70B is GPT-4 level and runs on my 64GB MacBook Pro: https://simonwillison.net/2024/Dec/9/llama-33-70b/

The Qwen2 models that run on my MacBook Pro are GPT-4 level too.

BoorishBears · 2024-12-15T23:47:11 1734306431

Saying these models are at GPT-4 level is setting anyone who doesn't place special value on the local aspect up for disappointment.

Some people do place value on running locally, and I'm not against then for it, but realistically no 70B class model has the amount of general knowledge or understanding of nuance as any recent GPT-4 checkpoint.

That being said these models are still very strong compared to what we had a year ago and capable of useful work

simonw · 2024-12-15T23:51:30 1734306690

I said GPT-4, not GPT-4o. I'm talking about a model that feels equivalent to the GPT-4 we were using in March of 2023.

int_19h · 2024-12-16T19:24:54 1734377094

I remember using GPT-4 when it first dropped to get a feeling of its capabilities, and no, I wouldn't say that llama-3.3-70b is comparable.

At the end of the day, there's only so much you can cram into any given number of parameters, regardless of what any artificial benchmark says.

simonw · 2024-12-16T19:41:42 1734378102

I envy your memory.

BoorishBears · 2024-12-17T03:07:30 1734404850

You're free to intentionally miss their point, does them no good.

n144q · 2024-12-15T23:48:06 1734306486

I wouldn't call 64GB MacBook Pro "regular consumer hardware".

russellbeattie · 2024-12-16T00:31:41 1734309101

I have to disagree. I understand it's very expensive, but it's still a consumer product available to anyone with a credit card.

The comparison is between something you can buy off the shelf like a powerful Mac, vs something powered by a Grace Hopper CPU from Nvidia, which would require both lots of money and a business relationship.

Honestly, people pay $4k for nice TVs, refrigerators and even couches, and those are not professional tools by any stretch. If LLMs needed a $50k Mac Pro with maxed out everything, that might be different. But anything that's a laptop is definitely regular consumer hardware.

PhunkyPhil · 2024-12-16T03:51:28 1734321088

There's definitely been plenty sources of hardware capable of running LLMs out there for a while, Mac or not. A couple 4090s or P40s will run 3.1 70b. Or, since price isn't a limit, there are other easier & more powerful options like a [tinybox](https://tinygrad.org/#tinybox:~:text=won%27t%20be%20consider...).

jsheard · 2024-12-16T00:02:17 1734307337

Yeah, a computer which starts at $3900 is really stretching that classification. Plus if you're that serious about local LLMs then you'd probably want the even bigger RAM option, which adds another $800...

evilduck · 2024-12-16T03:39:27 1734320367

An optioned up minivan is also expensive but doesn’t cost as much as a firetruck. It’s expensive but still very much consumer hardware. A 3x4090 rig is more expensive and still consumer hardware. An H100 is not, you can buy like 7 of these optioned up MBP for a single H100.

michaelt · 2024-12-16T12:00:54 1734350454

In my experience, people use the term in two separate ways.

If I'm running a software business selling software that runs on 'consumer hardware' the more people can run my software, the more people can pay me. For me, the term means the hardware used by a typical-ish consumer. I'll check the Steam hardware survey, find the 75th-percentile gamer has 8 cores, 32GB RAM, 12GB VRAM - and I'd better make sure my software works on a machine like that.

On the other hand, 'consumer hardware' could also be used to simply mean hardware available off-the-shelf from retailers who sell to consumers. By this definition, 128GB of RAM is 'consumer hardware' even if it only counts as 0.5% in Steam's hardware survey.

evilduck · 2024-12-16T17:05:14 1734368714

On the Steam Hardware Survey the average gamer uses a computer with a 1080p display too. That doesn't somehow make any gaming laptop with a 2k screen sold in the last half decade a non-consumer product. For that matter the average gaming PC on Steam is even above average relative to the average computer. The typical office computer or school Chromebook is likely several generations older doesn't have an NPU or discrete GPU at all.

For AI and LLMs, I'm not aware of any company even selling the models assets directly to consumers, they're either completely unavailable (OpenAI) or freely licensed so the companies training them aren't really dependent what the average person has for commercial success.

criddell · 2024-12-16T13:36:07 1734356167

In the early 80's, people were spending more than $3k for an IBM 5150. For that price you got 64 kB of RAM, a floppy drive, and monochrome monitor.

Today, lots of people spend far more than that for gaming PCs. An Alienware R16 (unquestionably a consumer PC) with 64 GB of RAM starts at $4700.

It is an expensive computer, but the best mainstream computers at any particular time have always cost between $2500 and $5000.

xeckr · 2024-12-11T23:41:43 1733960503

https://status.openai.com/

Seems to be an issue that isn't region-specific.

xeckr · 2024-12-11T20:13:47 1733948027

It guessed Hebrew for me too when I was going for a German accent do caralho.

xeckr · 2024-12-08T21:27:38 1733693258

That's basically a new standard. I like it, but getting all JSON parsers on board will be tricky if not impossible. Maybe it deserves its own name. How about JSONc (JAYSON-SEE)?

Edit: I read the fine print on the page after posting this... Turns out that I'm not the first to come up with jsonc. The author just isn't a huge fan and wants JSON parsers to accept comments. Good luck with that!