If anything, i'm getting more hyped up over time. Here are the things i've used LLMs for, with success in all areas as a solo technical founder.
Business Advice including marketing, reaching out to investors, understanding SAFE notes (follow up questions after watching the Y Combinator videos), customer interview design. All of which, as an engineer, I had never done before.
Create SQL queries for all kinds of business metrics including Monthly/Daily Active users, breakdown of users by country, abusive user detection and more.
Automated unit test creation. Not just the happy path either.
Automated data repository creation, based on a one shot example and MySQL text output describing the tables involved. From this, I have super fast data repositories that use raw SQL to get/write data.
Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.
Database and query optimization.
Code Review. This has caught edge case bugs that normal testing did not detect.
I'm going to try out aider + claude sonnet 3.5 on my codebases. I have heard good things about it and some rave reviews on X/twitter. I watched a video where an engineer had a bug, described it to some tool (which wasn't specified, but I suspect aider), then Claude created a test to reproduce the bug and then fixed the code. The test passed, they then did a manual test and the bug was gone.
> Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.
I'm glad this has been working for you -- generally any time I actually have a really difficult problem, ChatGPT just makes up the API I wish existed. Then when I bring it up to ChatGPT, it just apologizes and invents new API.
LLMs aren't good when you drift out of the training distribution. You want to be hitting the meat in the middle and leveraging the LLM to blast through it quickly.
That means LLMs are great for scaffolding, prototypes, the v0.1 of new code especially when it's very ordinary logic but using a language or library you're not 100% up to speed on.
One project I was on recently was translation: converting a JS library into Kotlin. In-editor AI code completion made this really quick: I pasted a snippet of JS for translation in a comment, and the AI completed the Kotlin version. It was frequently not quite right, but it was way faster than without. In particular, when there was repeated blocks of code for different cases that different only slightly, once I got the first block correct, the LLM picked up on the pattern in-context and applied it correctly for the remaining blocks. Even when it's wrong, if it has an opportunity to learn locally, it can do so.
I’ve come to the similar conclusion, and realized that I’m slowly learning how to wield this new tool.
The sweet spot is when you need something and you’re sure it’s possible but just don’t know (or it’s too time consuming) how. E.G. change the css to to X, rewrite this python code in typescript, use the pattern of this code to do Y, etc.
Reminds me of the early days of Google where you had to learn how write a good search query. You learn you need more than a word or two, but don’t write a whole essay, etc.
Has to be something pretty generic. I'm trying to write a little C program that talks to an LCD display via the SPI bus--something I did before a few times, but not with this particular display and MCU. There is no LLM that can even begin to reason this out since they've been mostly trained on web dev content.
is there no documentation about that uc and LCD controller? I'd assume they've been trained on every html, pdf, and video tutorial out there, as well as the pdfs of all of those chip's datasheets, and example microcontroller C code out there in the form of vendor documentation. sure that's maybe less than the amount of html web app tutorials but if we assume it's been fed all of the vendor's documentation about an uc and their library documentation as well, the ones it would fail on are undocumented chips out of china that have no data sheets (which make for a very fun project to reverse engineer, mind you), or something very new. Even without that though, it's still able to dot product (not going to call it "reasoning") its way to at least hallucinate code to talk to an LCD controller chip via SPI for a uc is never even heard of, so I can't agree with "even begin to reason this out".
You don't learn how to program SPI from reading documentation about an LCD controller. You need a lot more context and understand how to string together basic operations, which are quite often not detailed in parts documentation.
I think that you have a serious misunderstanding of the capabilities of LLMs - they cannot reason out relationships among documents that easily. They cannot even tell you what they don't know to finish a given task (and I'm not just talking one-shot here, agent frameworks suffer from the same problem).
you keep making these claims that it's harder than it appears but have yet to back them up. I'd be more than happy to update my understanding of the capabilities of these things if you can actually show me limitations of the technology. until then, just saying SPI is harder than that, when I've written the code for a microcontroller to interface to an LCD, at the advent of Google, (so before stack overflow and ChatGPT), and so I deeply know the frustrations of board bring up, doesn't convince me that it's it's my understanding that's wrong, but yours.
Oh you're that rcarmo? So cool! so help me understand, because that's what we're all here for, curiosity, with a bit more specificity, why SPI to an a LCD controller for this as-of-yet unspecified microcontroller is harder for an LLM than it seems? I buy that it is, but you've yet to give any sort of evidence and appeal to authority is a logic fallacy.
I found that ChatGPT needs to be rained in with the prompts, and then it does a very impressive job. E.g. you can create a function prototype (with input and output expectations) and in the body tell the logic you are thinking about in meta-code. Then tell it to write the actual code. It's also good if you want to immerse yourself into a new programming language and outline what kind of program you want, and expect the results to be different from what you throught, but insightful.
Now if you throw larger context or more obscure interface expectations at it, it'll start to discard code and hallucinate.
Do you provide code examples? In my experience the more specific you get with your problem the more specific are the provided solutions (probably a "natural" occurence in LLMs). Hallucianted APIs are sometimes a problem for me, but then I just specify which API to use.
Why do you need an LLM if you know what you want it to do? Just write the code rather than wrangling with the LLM, it isn't like writing code take much time when you know what it should do.
Not op but my response: Because I am lazy and would like to save the 1-5 minutes it would take me to actually write it. When there are dozens of these small things a day the saved time really adds.
For me, it depends on the problem. I avoid LLMs for anything complex, cause I prefer to think it through myself. But there are often times when you know exactly what you want and you know how it should look like. Let's say you need a simple web API to help you with a task. These days I'd typically ask an LLM to write the app. It will usually get some stuff wrong, but after a quick glance I can steer it to fix the problems (like: you didn't handle errors etc).
That way I can generate a simple few hundred lines of code app in minutes. There is no way I could type that fast even if I exactly know what characters to write and it's not always the case. Like, oftentimes I know exactly what to do and I know if it's OK when I see the code, but writing it would require me to look into the docs here and there.
- Getting over the blank canvas hurdle, this is great for kick starting a small project and even if the code isn't amazing, it gets my brain to the "start writing code and thinking about algo/data-structures/interesting-problem" rather than being held up at the "Where to begin?" Metaphorically where to place my first stroke, this helps somewhat.
- Sometimes LLM has helped when stuck on issues but this is hit and miss, more specifically it will often show a solution that jogs my brain and gets me there, "oh yeah of course" however I've noticed I'm more in than state when tired and need sleep, so the LLM might let me push a bit longer making up for tired brain. However this is more harmful to be honest without the LLM I go to sleep and then magically like brains do solve 4 hours of issues in 20 minutes after waking up.
So LLM might be helping in ways that actually indicate you should sleep as brain is slooooowwwwing down
Yes, this. I was skeptical and disgusted at a lot of what was being done or promised by using LLMs, but this was because I initially saw a lot of wholesale: "Make thing for me," being hyped or discussed.
In practice, I have found them to be good tools for getting going or un-stuck, and use them more like an inspiration engine, or brain kick-starter.
- Getting over the blank canvas hurdle, this is great for kick starting a small project and even if the code isn't amazing, it gets my brain to the "start writing code and thinking about algo/data-structures/interesting-problem" rather than being held up at the "Where to begin?" Metaphorically where to place my first stroke, this helps somewhat.
- Sometimes LLM has helped when stuck on issues but this is hit and miss, more specifically it will often show a solution that jogs my brain and gets me there, "oh yeah of course" however I've noticed I'm more in than state when tired and need sleep, so the LLM might let me push a bit longer making up for tired brain. However this is more harmful to be honest without the LLM I go to sleep and then magically like brains do solve 4 hours of issues in 20 minutes after waking up.
So LLM might be helping in ways that actually indicate you should sleep as brain is slooooowwwwing down
I know it’s not a completely fair comparison, but to me this question is kind of missing the point. It’s like asking “Why take a cab if you know where you want to go?”
It's such a poor comparison it's ridiculous. A better analogy is "why take a cab if you know where you want to go and provide the car and instructions on how to drive"
No, it's like saying "why take a cab where you have to provide the driver so much guidance on driving as to be equal or greater than the effort of driving yourself."
That makes sense, LLM training data probably has tons of common problems in it, but maybe only a few or no instances of really niche, difficult ones. So it just comes up with a bunch of garbage.
My experience has been similar, it is amazing for stuff I am a beginner at but kinda useless for my actual work. It was invaluable today when I was trying to grasp CA zoning laws, but its almost useless for coding.
This also points to why it will never (imo) be "intelligent". It will never be able to take all its knowledge and use that to solve a problem it doesn't have training data for.
It's nice that everybody is trying to help with the way you're prompting but just use Bing Copilot or Phind for this, not ChatGPT
It'll generate a bunch of queries to Google (well, "to Bing" I guess in that case) based on your question, read the results for you, base its answer on the results and provide you with sources that you can check if it used anything from that webpage.
I only use ChatGPT for documentation when I have no idea where I'm going at all, and I need a lay of the land on best practices and the way forward.
For specifics, Bing Copilot. Essentially a true semantic web search
I assume it knows the big stuff like the PyTorch API/major JS and React libs then just paste the docs or even impl code for any libs it needs to know beyond that.
There was a movie that came out in 2001 called "Artificial Intelligence", at a time when we were still figuring out how things like search engines and the online economy were going to work. It had a scene where the main characters went to a city and visited a pay-per-question AI oracle. It was very artistically done, but it really revealed (in hindsight) how naive we were about how "online" was going to turn out.
When I look at the kinds of AI projects I have visibility into, there's a parallel where the public are expecting a centralized, all knowing, general purpose AI, but what it's really going to look like is a graph of oddball AI agents tuned for different optimizations.
One node might be slow and expensive but able to infer intent from a document, but its input is filtered by a fast and cheap one that eliminates uninteresting content, and it could offload work to a domain-specific one that knows everything about URLs, for example. More like the network of small, specialized computers scattered around your car than a central know-it-all computer.
> When I look at the kinds of AI projects I have visibility into, there's a parallel where the public are expecting a centralized, all knowing, general purpose AI
I don't think this is entirely fair to "the public". Media was stuffed with AI company CEOs claiming that AGI was just around the corner. Nvidia, OpenAI and Musk, Zuckerberg, and others were positively starry eyed at how, soon, we'd all be just a GPU matmul away from intelligence. "The public" has seen these eye watering amounts of money shifting around, and they imply that it must mean something.
The entire system has been acting as if GenAI was right around the corner.
maybe there's a term confusion here - GenAI has come to mean Generative AI (LLM's, Diffusion models..) rather than General-AI. People call that AIG, now people also talk about AIS which I take to mean "human level on a narrow domain only" while AIG is "generally intelligent at roughly human level".
My personal belief is that AIS is not a real thing (in the sense I wrote above) because narrow domain competence is tightly coupled to general domain competence . Even very autistic people that are functional in some domain actually have a staggering range of competences that we tend to ignore because we expect them in humans. I think machines will be similar.
Anyway, AIG or AIS is not round the corner at all. But that doesn't mean that there isn't a lot of value to be had from generative AI in the near future or now. Will this be a small fraction of the value from Web1.0 and Web2.0? Will it be approximately the same? Will it be a multiple? I think that's the question. I think it's clear that assistants for software engineers are somewhat valuable now (evidence: I get value out of them) how valuable? Well, more than stackexchange, less than a good editor. That's still alot, for me. I won't pay for it though...
And this points to the killer issue: there isn't a good way to monetize this. There isn't a good way to monetize the web, so we got adverts (a bad way). What will be the equivalent for LLM's? We just don't know right now. Interestingly there seems to be very little focus on this! Instead folks are studying the second order value. Using this "free thing" we can drive productivity... or quality... increase opportunities... create a new business?
I was definitely confusing the terms. I was thinking of AGI, but i remembered that the G was for general, and GenAI "felt" right (probably because it's used in a similar enough context).
Replace all the instances of GenAI with AGI in my post.
It's an interesting observation that the economics aren't there yet. I think it's generally assumed that if we find something valuable, we can probably figure out how to monetize it. That's not necessarily true though. In the same but opposite vein, it doesn't necessarily need to be useful to stick around. It's possible AI is forever going to be useless (in objective terms, maybe it will make people less efficient) but find a monetization strategy that keeps it around (maybe it makes people feel good).
A ton of the technology economy isn't really based on objective metrics of usefulness. Microsoft isn't the biggest because they're the most useful. We don't look to the quality of windows to understand if people will buy the next version. We don't look at the google search results as an indicator of google's profitability.
> The entire system has been acting as if GenAI was right around the corner.
To be clear, I think it is. It's just not going to be a hologram of a wizard in a room you can ask a question to for a quarter, which is what these chat bots and copilots you see today are modeled around.
Every question I've asked of chatGPT, meta and Gemini have returned results that were either obvious or wrong. Pointing out how wrong the answers returned got the obvious, "I apologize" response which returned an obvious answer.
I consider all these AI engines to be interactive search engines where the results need to be double checked. The only thing these engines do, for me perhaps, is save some search time so I don't have to click around on a lot of sites to scroll for some semblance of an answer to verify.
I still get a lot of LLM "memes", for example "In today's digital landscape" or "is crucial." Also, Liang Et al noted a 10x increase in certain words like "meticulous" or "intricate" since the introduction of LLMs.
If it’s returning results that were obvious, why were you asking the question?
And I don’t believe that the other ~50% were wrong.
> The only thing these engines do, for me perhaps, is save some search time so I don't have to click around on a lot of sites to scroll for some semblance of an answer to verify.
IME they basically rephrase the information I've put into it, rarely adding anything I didn't already imply I knew by my formulation of the question
Something to keep in mind is that gambling rules apply: if enough people flip coins, there is always someone experiencing a winning streak and someone experiencing a losing streak and majority that gets a mixed bag of roughly breaking even (mediocre usefulness and a waste of time)
my first week of using GPT4 every day I experienced great answer after great answer, and I was convinced the world would change in a matter of months, that natural language translation was now a solved problem etc etc
But my luck changed, and now I get some good answers and some idiotic answers, so it's mostly not worth my time. Some people never get a good answer in their few dice rolls before writing off the technology
Well, time to run it locally then :)
Check out ollama.com. llama 3.1 is pretty crazy, especially if you can run the 405B one.
Otherwise, use Mistral/Mixtral or something similar.
> The only thing these engines do, for me perhaps, is save some search time.
This. Saving time(or money if you see them as the same) is the whole point actually.
Intelligence is supposed to be in the context of a shared Goal or beliefs. In your case and case of most humans time and money is the context.
Are ant(insect) networks intelligent? Possibly, They do help millions of them communicate quickly. But ants don't have a brain.
Are beings that make decisions without conscious choices intelligent? Possibly, if they could escape death by amazing ability at any instant. But these beings don't have a frontal cortex that can make decisions by rational inquiry.
Debating this point is a bit like bike shedding and besides the point.
The point is they can't think nearly as closely like humans, but a network of them can seemingly do intelligent things. Intelligence is only about shared goals and beliefs with an agent(the other ants in this example) and achieving them.
Most of it is mediocre creativity and a low willingness to adapt their prompting patterns for an optimal ratio of effort to output quality. Most people who don't understand yet expect LLMs to read their minds when they would better of orienting themselves as student who have zero experiencing developing the arsenal strategies that elicit productivity gains.
They haven't developed any intuition into which kinds of questions are worth prompting, which kinds of context are effective, or even which kinds of limitations apply to which models.
Business Advice including marketing, reaching out to investors, understanding SAFE notes (follow up questions after watching the Y Combinator videos), customer interview design. All of which, as an engineer, I had never done before.
Create SQL queries for all kinds of business metrics including Monthly/Daily Active users, breakdown of users by country, abusive user detection and more.
Automated unit test creation. Not just the happy path either.
Automated data repository creation, based on a one shot example and MySQL text output describing the tables involved. From this, I have super fast data repositories that use raw SQL to get/write data.
Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.
Database and query optimization.
Code Review. This has caught edge case bugs that normal testing did not detect.
I'm going to try out aider + claude sonnet 3.5 on my codebases. I have heard good things about it and some rave reviews on X/twitter. I watched a video where an engineer had a bug, described it to some tool (which wasn't specified, but I suspect aider), then Claude created a test to reproduce the bug and then fixed the code. The test passed, they then did a manual test and the bug was gone.