Exploring Microsoft's Phi-3-Mini and its integration with tool like Ollama

simonw · 2024-12-29T01:39:55 1735436395

Phi-3 is five months old now. I suggest trying Phi 3.5 instead - it's effectively the same size (2.2GB from HF - Phi-3 Mini is 2.2GB as well) but should provide better results.

If you have Ollama installed for that there are plenty of other interesting models to try out too. I like the Llama 3.2 small models, or if you have a whole lot of RAM (I use 64GB on an M2 MacBook Pro) you can run Llama 3.3 70B which is genuinely GPT-4 class: https://simonwillison.net/2024/Dec/9/llama-33-70b/

pkaye · 2024-12-29T03:47:16 1735444036

How does Llama-3.2 3B compare to Phi-3.5?

ron0c · 2024-12-28T18:22:54 1735410174

This is the AI I am excited for. Data and execution local to my machine. I think Intel is betting on this with the copilot included processors. I hope ollama or other local AI services will be able to utilize these co-processors soon.

ekianjo · 2024-12-28T18:32:54 1735410774

The NPUs on laptops don't have access to enough memory to run very large models.

talldayo · 2024-12-28T18:53:42 1735412022

Oftentimes they do. If they don't, it's not very hard to page memory to and from the NPU until the operation is completed.

The bigger problem is that this NPU hardware isn't built around scaling to larger models. It's laser-focused on dense computation and low-precision inference, which usually isn't much more efficient than running the same matmul as a compute shader. For Whisper-scale models that don't require insanely high precision or super sparse decoding, NPU hardware can work great. For LLMs it is almost always going to be slower than a well-tuned GPU.

650REDHAIR · 2024-12-28T20:28:35 1735417715

Right, but for most people do they need access to a huge model locally?

e12e · 2024-12-29T02:05:07 1735437907

AFAIU NPUs are for things like voice input/output, computer vision/hand gesture io, knowing how many people/who are in front of the camera etc. Always on, real-time "ai peripherals" - not content generation.

I believe Microsoft calls them "SLMs - Small Language Models".

ben_w · 2024-12-28T20:47:23 1735418843

Most people shouldn't host locally at all.

Of those who do, I can see students and researchers benefiting from small models. Students in particular are famously short on money for fancy hardware.

My experience trying one of the Phi models (I think 3, might have been 2) was brief, because it failed so hard: my first test was to ask for a single page web app Tetris clone, and not only was the first half the output simply doing that task wrong, the second half was a sudden sharp turn into python code to train an ML model — it didn't even delimit the transition, one line javascript, the next python.

diggan · 2024-12-28T22:07:39 1735423659

> My experience trying one of the Phi models (I think 3, might have been 2) was brief

The Phi models are tiny LMs, maybe SLM is more fitting label than LLM (Large -> Small). As such, you cannot throw even semi-complicated problems at them. Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.

ben_w · 2024-12-28T22:28:11 1735424891

> Things like "autocomplete" and other simpler things are the use cases you'd use it for, not "code this game for me", you'll need something much more powerful for that.

Indeed, clearly.

However, it was tuned for chat, and people kept telling me it was competitive with the OpenAI models for coding.

ron0c · 2025-01-06T13:36:44 1736170604

Asking a leading LLM to "code a game" is a tall order. I have found a lot of success with self hosted small models to accomplish coding that would have taking me months without. I just break down the "code me a game" to its parts.

Think of it like an extended auto complete.

miohtama · 2024-12-28T22:24:54 1735424694

Maybe a better solution is privately hosted cloud solution, or just any SaaS that cannot violate data privacy by design.

sofixa · 2024-12-28T22:59:23 1735426763

> any SaaS that cannot violate data privacy by design

And that is hosted in a jurisdiction that forces them to take it seriously, e.g. Mistral in France that has to comply with GDPR and any AI and privacy regulations out of the EU.

msoad · 2024-12-28T18:44:18 1735411458

in my opinion there is room for small and fast and large and slow but much smarter models. Use cases like phone keyboard autocomplete and next few words suggestion in coding or writing need very fast models that should by definition small. Very large models that are much smarter are also useful, for instance debugging issues or proofreading long letters.

Cursor really aced this. The Cursor model is very fast to suggest useful inline completions and then leaves big problems to big models.

mycall · 2024-12-28T18:47:55 1735411675

Could chaining models together via tool calls based on benchmarking that would redirect to the best model allow for smaller models to perform as well as big[er] models for memory constrained/local environments?

isoprophlex · 2024-12-28T20:22:13 1735417333

Yes, indeed, see for example https://arxiv.org/abs/2310.03094

weekay · 2024-12-29T04:07:39 1735445259

Are there any other tools similar to pieces.app that are useful and preferably open-source, which can be integrated into the developer workflow? I’ve used Heynote, which helps to some extent, but it’s not a direct fit and isn’t a complete AI developer workflow companion.

maccam912 · 2024-12-28T18:08:57 1735409337

Is there any rule of thumb for small language models vs large language models? I've seen phi 4 called a small language model but with 14 billion parameters, it's larger than some large language models.

ekianjo · 2024-12-28T18:12:05 1735409525

7b to 9b is usually what we call small. the rule of thumb is a model that you can run on a single GPU.

exitb · 2024-12-28T22:13:46 1735424026

It’s not a useful distinction. The first LLMs had less than 1 billion parameters anyway.

kittikitti · 2024-12-29T03:08:29 1735441709

I would claim that even 500 million parameters could be considered large.

akudha · 2024-12-28T19:26:00 1735413960

Apologies for the dumb question - can these models be used at my work, i.e, for commercial purposes? What is the legality of it?

minimaxir · 2024-12-28T20:51:27 1735419087

Phi-3-mini has a MIT license, which is commercial friendly: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

nicce · 2024-12-28T20:56:01 1735419361

Do we know for sure that model is not trained with copyrighted material or with GPL-lisenced code? That is the biggest issue right now.

minimaxir · 2024-12-28T21:20:26 1735420826

That is the case with every LLM (except a couple research experiments) and won’t be resolved until the courts do.

Literally every tech company that uses LLMs would be in legal trouble if that becomes the precedent.

nicce · 2024-12-28T22:06:40 1735423600

Yes. It is a bigger problem than the correct lisence of the model, and I feel that original commenter is not aware of that.

Many companies are waiting for court decisions and are not using even GitHub Copilot. There is even growing business for making analysis for binaries and source code whether they use GPL code or not.

smallerize · 2024-12-28T19:29:55 1735414195

In the USA, code generated by a computer cannot be copyrighted. So you can use it for commercial purposes, but you can't control it the way you could with code that you wrote yourself. And that's legally fine, but your company's legal department might not like that idea.

lodovic · 2024-12-28T20:04:45 1735416285

That's not entirely accurate. In the US, computer-generated code can be copyrighted. The key point is that copyright protection extends to the original expression in the code, but not to its functional aspects, such as algorithms, system design, or logic.

kittikitti · 2024-12-29T03:03:00 1735441380

This person has no idea what they're talking about. "Code generated by a computer"?

smallerize · 2024-12-29T03:38:47 1735443527

"works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author"

https://arstechnica.com/information-technology/2023/02/us-co...

akudha · 2024-12-28T19:34:47 1735414487

But this model can be used for more than generating code, no?