More

manca · 2025-07-20T04:10:14 1752984614

I literally had the same experience when I asked the top code LLMs (Claude Code, GPT-4o) to rewrite the code from Erlang/Elixir codebase to Java. It got some things right, but most things wrong and it required a lot of debugging to figure out what went wrong.

It's the absolute proof that they are still dumb prediction machines, fully relying on the type of content they've been trained on. They can't generalize (yet) and if you want to use them for novel things, they'll fail miserably.

h4ck_th3_pl4n3t · 2025-07-20T05:01:50 1752987710

I just wished the LLM model providers would realize this and instead would provide specialized LLMs for each programming language. The results likely would be better.

chuckadams · 2025-07-20T05:17:39 1752988659

The local models JetBrains IDEs use for completion are specialized per-language. For more general problems, I’m not sure over-fitting to a single language is any better for a LLM than it is for a human.

abrookewood · 2025-07-20T04:47:58 1752986878

Clearly the issue is that you are going from Erlang/Elixir to Java, rather than the other way around :)

Jokes aside, they are pretty different languages. I imagine you'd have much better luck going from .Net to Java.

tsimionescu · 2025-07-20T06:49:29 1752994169

Sure, it's easier to solve an easier problem, news at eleven. In particular, translating from C# to Java could probably be automated with some 90% accuracy using a decent sized bash script.

mattmanser · 2025-07-20T08:23:18 1752999798

I once redid a project from VB.Net to C# and pretty much did that.

People misjudge many tasks as 'hard' when they are in fact easy but tedious.

The problem is you need a high degree of accuracy, which you don't get with LLMs.

The best you can do is set the LLM on a loop and try and Brute force it, which is the current vibe 'coding' trick.

I sound pessimistic but I'm actually shocked at how effective it is.

nine_k · 2025-07-20T06:48:45 1752994125

This mostly means that LLMs are good at simpler forms of pattern matching, and have much harder time actually reasoning at a significant depth. (It's not easy even for human intellect, the finest we currently have.)

nerdsniper · 2025-07-20T06:58:29 1752994709

Claude Code / 4o struggle with this for me, but I had Claude Opus 4 rewrite a 2,500 line powershell script for embedded automation into Python and it did a pretty solid job. A few bugs, but cheaper models were able to clean those up. I still haven't found a great solution for general refactoring -- like I'd love to split it out into multiple Python modules but I rarely like how it decides to do that without me telling it specifically how to structure the modules.

conception · 2025-07-20T06:54:56 1752994496

I’m curious what your process was. If you just said “rewrite this in Java” I’d expect that to fail. If you treated the llm like a junior developer or an official project, worked with them to document the codebase, come up with a plan, tasks for each part of the code base and a solid workflow prompt- I would expect it to succeed.

4hg4ufxhy · 2025-07-20T12:50:06 1753015806

There is a reason to go the extra mile for juniors. They eventually learn and become seniors. With AI I'd rather just do it myself and be done with it.

conception · 2025-07-20T18:51:52 1753037512

But you can just do it once with AI. It’s just a script process that you would set up for any project. It’s just an on boarding process.

And I’ll know by when I say do it once I mean, obviously processes have to be it on to get exactly what you want out of them, but that’s just how process works . Once it’s working the way you want to just reuse it.

Marazan · 2025-07-20T08:53:16 1753001596

Yes, if you do all the difficult time consuming bits I bet it would work.

conception · 2025-07-20T18:52:45 1753037565

You don’t do the work you just work with them to get the work done to plan it out.

FeepingCreature · 2025-07-20T10:19:47 1753006787

It's still a lot faster than doing it yourself ime.

(Yes I've seen the study, it doesn't account for motivation.)

credit_guy · 2025-07-20T12:49:01 1753015741

If you try to ride a bicycle, do you expect to succeed at the first try? Getting AI code assistants to help you write high quality code takes time. Little by little you start having a feel for what prompts work, what don't, what type of tasks the LLMs are likely to perform well, which ones are likely to result in hallucinations. It's a learning curve. A lot of people try once or twice, get bad results, and conclude that LLMs are useless. But few people conclude that bicycles are useless if they can't ride them after trying once or twice.

hammyhavoc · 2025-07-20T04:11:16 1752984676

They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

zer00eyz · 2025-07-20T04:27:07 1752985627

I will give you an example of where you are dead wrong, and one where the article is spot on (without diving into historic artifacts).

I run HomeAssistant, I don't get to play/use it every day. Here, LLM's excel at filling in the (legion) of blanks in both the manual and end user devices. There is a large body of work for it to summarize and work against.

I also play with SBC's. Many of these are "fringe" at best. LLM's are as you say "not fit for purpose".

What kind of development you are using LLM's for will determine your experience with them. The tool may or may not live up to the hype depending how "common", well documented and "frequent" your issue is. Once you start hitting these "walls" you realize that no, real reason, leaps of inference and intelligence are still far away.

SecuredMarvin · 2025-07-20T07:09:38 1752995378

I also made this experience. As long as the public level of knowledge is high, LLMs are massively helpful. Otherwise not so much and still hallucinating. It does not matter if you think highly of this public knowledge. QFT, QED and Gravity are fine, AD emulation on SAMBA, or Atari Basic not so much.

If I would program Atari Basic, after finishing my Atari Emulator on my C64, I would learn the environment and test my assumptions. Single shot LLMs questions won't do it. A strong agent loop could probably.

I believe that LLMs are yanking the needle to 80%. This level is easy achievable for professionals of the trade and this level is beyond the ability of beginners. LLMs are really powerful tools here. But if you are trying for 90% LLMs are always trying to keep you down.

And if you are trying for 100%, new, fringe or exotic LLMs are a disaster because they do not learn and do not understand, even while being inside the token window.

We learn that knowledge, (power) and language proficiency are an indicator for crystalline but not fluid intelligence

otabdeveloper4 · 2025-07-20T07:43:14 1752997394

> yanking the needle to 80%

80 percent of what, exactly? A software developer's job isn't to write code, it's understanding poorly-specified requirements. LLMs do nothing for that unless your requirements are already public on Stackoverflow and Github. (And in that case, do you really need an LLM to copy-paste for you?)

zer00eyz · 2025-07-20T09:29:52 1753003792

> fluid intelligence

How about basic intelligence. Kids logic puzzles.

https://daydreampuzzles.com/logic-puzzles/

LLM's whiffing hard on these sorts of puzzles is just amusing.

It gets even better if you change the clues from innocent things like "driving tests" or "day care pickup" to things that it doesn't really want to speak about. War crimes, suicide, dictators and so on.

Or just flat out make up words whole cloth to use as "activates" in the puzzles.

motorest · 2025-07-20T05:02:50 1752987770

> They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

This comment is detached from reality. LLMs in general have been proven to be effective at even creating complete, fully working and fully featured projects from scratch. You need to provide the necessary context and use popular technologies with enough corpus to allow the LLM to know what to do. If one-shot approaches fail, a few iterations are all it takes to bridge the gap. I know that to be a fact because I do it on a daily basis.

otabdeveloper4 · 2025-07-20T07:39:05 1752997145

> because I do it on a daily basis

Cool. How many "complete, fully working" products have you released?

Must be in the hundreds now, right?

motorest · 2025-07-20T11:28:58 1753010938

> Cool. How many "complete, fully working" products have you released?

Fully featured? One, so far.

I also worked on small backing services, and a GUI application to visualize the data provided by a backing service.

I lost count of the number of API testing projects I vibe-coded. I have a few instruction files that help me vibecode API test suites from the OpenAPI specs. Postman collections work even better.

And I'm far from an expert in the field.

What point were you trying to make?

jeltz · 2025-07-20T12:38:53 1753015133

If you are far from an expert in the field maybe you should refrain from commenting so strongly because some people here actually are experts.

So you have built a few small PoCs, does not tell us much.

motorest · 2025-07-20T12:46:55 1753015615

> If you are far from an expert in the field maybe you should refrain from commenting so strongly because some people here actually are experts.

Your opinion makes no sense. Your so called experts are claiming LLMs don't do vibecoding well. I, a non-expert, am quite able to vibecode my way into producing production-ready code. What conclusion are you hoping to draw from that? What do you think your experts' opinion will achieve? Will it suddenly delete the commits from LLMs and all the instruction prompts I put together? What point do you plan to make with your silly appeal to authority?

I repeat: non-experts are proving to be possible, practical, and even mundane what your so-called experts claim to not work. What do you plan to draw from that?

hammyhavoc · 2025-07-20T15:57:11 1753027031

No, experts are saying that vibe coding sucks. LLMs absolutely do vibe coding, and the output is shit.

otabdeveloper4 · 2025-07-20T11:55:13 1753012513

> What point were you trying to make?

The point is that software developers can't evaluate their own work. (Especially the kind of n00b developers that use LLMs.)

You initially made wild claims about insane productivity gains that turned out to be just one small product and a lot of wasted time under scrutiny.

(Asking LLMs to write tests is a waste of time. LLMs can't evaluate risks, which is the only reason to write tests in the first place.)

hammyhavoc · 2025-07-20T15:54:42 1753026882

Let's see it then.

cactusplant7374 · 2025-07-20T11:34:13 1753011253

A good trial would be pointing an LLM at Jira and telling it to finish the backlog.

motorest · 2025-07-20T12:58:55 1753016335

> A good trial would be pointing an LLM at Jira and telling it to finish the backlog.

GitHub is already rolling out this feature.

https://github.blog/news-insights/product-news/github-copilo...

hammyhavoc · 2025-07-20T15:53:12 1753026792

So, where is the substance?

Do what I couldn't with these supposedly capable LLMs:

- A Wear OS version of Element X for Matrix protocol that works like Apple Watch's Walkie Talkie and Orion—push-to-talk, easily switching between conversations/channels, sending and playing back voice messages via the existing spec implementation so it works on all clients. Like Orion, need to be able to replay missed messages. Initiating and declining real-time calls. Bonus points for messaging, reactions and switching between conversations via a list.

- Dependencies/task relationships in Nextcloud Deck and Nextcloud Tasks, e.g., `blocking`, `blocked by`, `follows` with support for more than one of each. A filtered view to show what's currently actionable and hide what isn't so people aren't scrolling through enormous lists of tasks.

- WearOS version of Nextcloud Tasks/Deck in a single app.

- Nextcloud Notes on WearOS with feature parity to Google Keep.

- Implement portable identities in Matrix protocol.

- Implement P2P in Matrix protocol.

- Implement push-to-talk in Element for Matrix protocol ala Discord, e.g., hold a key or press a button and start speaking.

- Implement message archiving in Element for Matrix protocol ala WhatsApp where a message that has been archived no longer appears in the user's list of conversations, and is instead in an `Archived` area of the UI, but when a new message is received in it, it comes out of the Archive view. Archive status needs to sync between devices.

Open source the repo(s) and issue pull requests to the main projects, provide the prompts and do a proper writeup. Pull requests for project additions need to be accepted and it all needs to respect existing specs. Otherwise, it's just yet more hot air in the comments section. Tired of all this empty bragging. It's a LARP and waste of time.

As far as I'm concerned, it is all slop and not fit for purpose. Unwarranted breathless hype akin to crypto with zero substance and endless gimmicks and kidology to appeal to hacks.

Guarantee you can't meaningfully do any of the above and get it into public builds with an LLM, but would love to be proven wrong.

If they were so capable, it would be a revolution in FOSS, and yet anyone who heavily uses it produces a mix of inefficient, insecure, idiotic, bizarre code.

manca · 2025-07-11T02:47:59 1752202079

Elon mentioned that Grok's 4 image and video understanding capabilities are somewhat limited and he suggested a new version of the foundation model is being trained to address these issues. According to the "Humanity's Last Exam" benchmark, though, it seems to perform reasonably well, if not the best among the SOTA models.

I agree, though - the timing of the release is a bit unfortunate and it felt like rushed a bit, since not even a model card is available.

joaogui1 · 2025-07-11T06:57:41 1752217061

They used a text-only subset of HLE

manca · 2024-10-16T18:02:42 1729101762

Isn't this exactly what Google's new NotebookLM does?

Cool idea anyway :)

martins_irbe · 2024-10-17T07:55:43 1729151743

I wouldn't compare this to NotebookLM; this is more like a ChatGPT-generated script plus ElevenLabs. It doesn't have the feel of NotebookLM.

manca · on June 30, 2024

When I read lossless, I immediately thought about the editing of the real lossless formats like ProRes, MJPEG2000, HuffYUV, etc. But what this ultimately does it remuxes the original container in a new one without touching the elementary stream (no reencoding).

It's no wonder that it uses FFMpeg to do the heavy-lifting, but I think it's worthwhile for the community to understand how this process ultimately works.

In a nutshell, every single modern video format you know about - mp4, mov, avi, ts, etc - is ultimately the extension of the container that could contain multiple video and audio tracks. The tracks are called Elementary Streams (ES) and they are separately encoded using appropriate codecs such as H264/AVC, H265/HEVC, AAC, etc. Then during the process called "muxing" they are put together in a container and each sample/frame is timestamped, so the ESes can be in sync.

Now, since the ES is encoded, you don't get frame-level accuracy when seeking for example, because the ES is compressed and the only fully decodable frame is an I-Frame. Then every subsequent frame (P, or B) is decoded based on the information from the IFrame. This sequence of IPPBPPB... is called GOP (Group of Pictures).

The cool part is that you could glean the type of the frame, even though it's encoded by looking into NAL units (Network Abstraction Layer), which have specific headers that identify each frame type or picture slice. For example for H264 IFrame the frame-type byte is like 0x07, while the header is 0x000001.

Putting all this together, you could look into the ES bitstream and detect GOP boundaries without decoding the stream. The challenge here is of course that you can't just cut in the middle of the GOP, but the solution for that is to either be ok with some <1sec accuracy, or just decode the entire GOP which is usually 30 frames and insert an IFrame (fully decoded frame can be turned into an IFrame) in the resulting output. That way all you do is literally super fast bit manipulation and copy from one container into another. That's why this is such an efficient process if all you care about is cutting the original video into segments.

manca · on May 2, 2024

I love projects like this. It shows the true potential of what LLMs and RAG can unlock. Imagine applying the same method on the actual content within the threads and extract the sentiment, as well as summarize the key points of a particular thread -- the options are limitless.

My only piece of advice, though: try to do the reranking using some other rerankers instead of an LLM -- you'll save both on the latency AND the cost.

Other than that, good job.

jnnnthnn · on May 2, 2024

Thanks! I tried a few other approaches and found the LLM results were overall better (latency and cost aside). Maybe that should be an option made available to users though...

isoprophlex · on May 2, 2024

Cohere has a very cheap, fast and effective reranking API!

https://cohere.com/rerank

bardiapour · on May 2, 2024

i think not, better results >>> better latency + cost

robrenaud · on May 2, 2024

Maybe a combined approach beats either? Let some non-LLM reranker quickly spit out two results, and fill in the rest with the LLM.

manca · on April 30, 2024

This is exactly what https://www.perplexity.ai/ is trying to do. Maybe not "RAGing" the entire internet, but sure using the mapping between natural language query to their own (probably) vector database which contains "source of truth" from the internet.

The way how they build that database and what models they use for text tokenization, embeddings generation and ranking at "internet" scale is the secret sauce that enabled them to raise more than $165M to date.

For sure this is where the internet search will be in a couple of years and that's why Google got really concerned when original ChatGPT was released. That said, don't assume Google is not already working on something similar. In fact, the main theme of their Google Next conference was about LLMs and RAG.

manca · on April 24, 2024

A lot of the answers to your question focus solely on the infra piece of the deployment process, which is just one, albeit, important piece of the puzzle.

Each model is built using some predefined model architecture and the majority of the LLMs of today are the implementation of Transformer architecture, based on the "Attention is All You Need" paper from 2017. That said, when you fine-tune a model, you usually start from a checkpoint and then using techniques like LORA or QLORA you compute new weights. You do this in your training/fine-tuning script using PyTorch, or some other framework.

Once the training is done you get the final weights -- a binary blob of floats. Now you need to use those weights back into the inference architecture of the model. You do that by using the framework which is used for training (PyTorch) to construct the inferencing pipeline. You can build your own framework/inferencing engine too if you want and try to beat PyTorch :) The pipeline will consist of things like:

- loading the model weights

- doing pre-processing on your input

- building the inference graph

- running your input (embeddings/vectors) through the graph

- generating predictions/results

Now, the execution of this pipeline can be done on GPU(s) so all the computations (matrix multiplications) are super fast and the results are generated quickly, or it can still run on good old CPUs, but much slower. Tricks like quantization of model weights can be used here to reduce the model size and speed up the execution by trading-off precision/recall.

Services like ollama, or vllm abstract away all the above steps and that's why they are very popular -- they might even allow you to bring your own (fine-tuned) model.

On top of the pure model execution, you can create a web service that will serve your model via a HTTP or gRPC endpoint. It could accept user query/input and return a JSON with the results. Then it can be incorporated in any application, or become part of another service, etc.

So, the answer is much more than "get the GPU and run with it" and I think it's important to be aware of all the steps required if you want to really understand what goes into deploying custom ML models and putting them to a good use.

FezzikTheGiant · on April 24, 2024

Thanks for the insightful response. This is exactly the type of answer I was looking for. What's the best way to educate myself on the end-to-end process of deploying a production grade model smartly in a cost efficient manner?

FezzikTheGiant · on April 24, 2024

This might be asking for too much but is there a guide that explains each part of this process? Your comment made the higher level way clearer for me and I'd like to go into the weeds a bit on each of these

fzzzy · on April 24, 2024

download llama.cpp

convert the fine tuned model into gguf format. choose a number of quantization bits such that the final gguf will fit in your free ram + vram

run the llama.cpp server binary. choose the -ngl number of graphics layers which is the max number that will not overflow your vram (i just determine it experimentally, i start with the full number of layers, divide by two if it runs out of vram, multiply by 1.5 if there is enough vram, etc)

make sure to set the temperature to 0 if you are doing facts based language conversion and not creative tasks

if it's too slow, get more vram

ollama, kobold.cpp, and just running the model yourself with a python script as described by the original commenter are also options, but the above is what i have been enjoying lately.

everyone else in this thread is saying you need gpus but this really isn't true. what you need is ram. if you are trying to get a model that can reason you really want the biggest model possible. the more ram you have the less quantized you have to make your production model. if you can batch your requests and get the result a day later, you just need as much ram as you can get and it doesn't matter how many tokens per second you get. if you are doing creative generation then this doesn't matter nearly as much. if you need realtime then it gets extremely expensive fast to get enough vram to host your whole model (assuming you want as large a model as possible for better reasoning capability)

FezzikTheGiant · on April 24, 2024

Interesting. Thanks for the response. Do you have any resources where I can educate myself about this? How did you learn what you know about LLMs?

fzzzy · on April 25, 2024

Well, when Llama 1 came out I signed up and downloaded it, and that led me to llama.cpp. I followed the instructions to quantize the model to fit in my graphics card. Then later when more models like llama2 and mixtral came out I would download and evaluate them.

I kept up on hacker news posts and any comments about things I didn't understand. I've also found the localllama subreddit to be a great way to learn.

Any time I saw a comment on anything I would try it, like ollama, kobold.cpp, sillytavern, textgen-webui, and more.

I also have a friend who has been into ai for many years and we always exchange links to new things. I developed a retrieval augmented generation (rag) app with him and a "transformation engine" pipeline.

So following ai stories on hn and reddit, learning through doing, and applying what I learned to real projects.

FezzikTheGiant · on April 25, 2024

Thanks. Very cool. Have you ever tried to implement a transformer from scratch? Like in the Attention is all you need paper? Can a first/second year college student do it

xyc · on April 26, 2024

Andrej Karpathy's course is a good resource: https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThs...

fzzzy · on April 27, 2024

I haven't tried it yet, but I do intend to. I think the code for llm inference is quite straightforward. The complexity lies in collecting the training corpus and doing good rlhf. That's just my intuition.

elliotto · on April 25, 2024

Hi, I work at a startup where we train / fine tune / inference models on a gcp kubernetes cluster on some a100s.

There isn't really that much information about how to do this properly because everyone is working it out and it changes month by month. It requires a bunch of DevOps an infrastructure knowledge above and beyond the raw ml knowledge.

Your best bet is probably just to tool around and see what you can do.

d13 · on April 24, 2024

Try this: https://github.com/mlabonne/llm-course

FezzikTheGiant · on April 25, 2024

Thanks!! This is really cool

manca · on Feb 27, 2024

If you don't care about the details of how those model servers work, then something that abstracts out the whole process like LM Studio or Ollama is all you need.

However, if you want to get into the weeds of how this actually works, I recommend you look up model quantization and some libraries like ggml[1] that actually do that for you.

[1] https://github.com/ggerganov/ggml

manca · on Feb 27, 2024

I've tried code-llama with Ollama, along with Continue.dev and found it to be pretty good. The only downside is that I couldn't "productively" run the 70B version, even on my MBP with M3 Max with 36GB of RAM (which interestingly should be enough to hold quantized model weights). It was simply painfully slow. 34B one works good enough for most of my use-cases, so I am happy.

0x008 · on Feb 27, 2024

I tried to use codellama 34B and I think it is pretty bad. For Example I asked it to convert a comment into a docstring and it would hallucinate a whole function around it.

gpjt · on Feb 28, 2024

What quantization were you using? I've been getting some weird results with 34b quantized to 4 bits -- glitching, dropped tokens, generating Java rather than Python as requested. But 7b, even at 4 bits, works OK. Posted about it earlier on this evening: https://www.gilesthomas.com/2024/02/llm-quantisation-weirdne...

3abiton · on Feb 27, 2024

Same, CodeLlama 70B is known to suck. Deepseek is the best for coding so far in my experience, Mixtral 8x7B is another great contender (to be frank, for most tasks). Miqu is making a buzz, but so far I haven't tested it personally yet.

manca · on Oct 12, 2023

Galaksija was truly a "masterpiece" at that time, made by a single person by stitching together various smuggled parts from the West. I have a huge admiration and respect for Voja, especially after he decided to give up everything in Serbia and move to the US and start from scratch on his own in his late sixties!

He's a very humble man despite his remarkable impact and influence on the early tech industry in Yugoslavia. He and Dejan Ristanovic [1] started one of the first PC magazines in the 80's which was the bastion of progress filled with ingenious articles and insights collected from all over the world mostly by the word of mouth (remember there was no internet back then). They and a few others actually founded the first ISP and BBC in Yugoslavia in the late eighties.

Anyway, I am glad to see this article on HN and would suggest you all to watch Voja's interview [2] given to Computer History Museum in Mountain View where Galaksija rightfully got its own piece of the history.

[1] https://en.wikipedia.org/wiki/Dejan_Ristanovi%C4%87

[2] https://www.youtube.com/watch?v=nPLyzOEobw8&ab_channel=Compu...

grujicd · on Oct 12, 2023

It's hard to overstate how important how these magazines "Računari u kući", "Svet kompjutera" and "Moj Mikro" were important during 80s. There was very limited amount of computer literature to be found, so we all learnt 90% of what we knew from these magazines. Hats off to Dejan Ristanović, Voja Antonić, and all others who wrote for these magazines. They were the light that guided many of us to our future careers.