After reading this Twitter thread and watching the talk it was referencing I can't help thinking that a chat/text interface to AI is lazy product design.
Problem is generative AI sucks at this. I've had multiple instances where it would say something like "you did this part of the code wrong fix it like this" and it would go "sorry for the confusion here's the code with your suggestion " then repeat the initial code or do something else entirely. GPT4, Copilot X, JetBrains AI - I've seen an instance of this problem in all of them.
Personally copilot is a magical typing speedup and that got me enthusiastic about next step - but it looks like next step is going to require another breakthrough considering GPT4 HW requirements and actual performance.
Chat interface as a search bot is the only use case I've found it useful outside of copilot - regurgitating relevant part of internal documentation as search on steroids, even 3.5 is relatively decent at this.
I completely disagree. I've found that GPT-4 and Copilot Chat generate good suggestions most of the time. What kind of things are you try to use codegen for?
You also need to provide adequate context. I don't think source code alone is good enough context most of the time (E.g. need an error message as well), unless you're pulling in multiple chunks of code.
To me it sounds like what's described is similar to my own interactions with ChatGPT where:
1. Write a prompt, get a full script in response (good! as I expected)
2. Realize something missing in the prompt, or want to improve a specific function.
3. Prompt like "For function foo_bar(), give me just a updated version of that function that adds exception handling for missing files".
4. Chat GPT ignores the (admittedly implied in my language above) "just that function" part, and rewrites the entire script.
While usually it only modifies the part of the script I asked it to, it's annoying because (1) it's slower to give me the whole thing back (2) need to do additional checks in case the function I care about depends on things outside of it that might have been changed.
I've had this a few times, so I'm in the habit of not trying to iterate with chatGPT when asking it for scripts, but instead just doing the rest myself.
(Disclaimers: I haven't tried too hard to find a solution to this; there might be one; I also don't generally combine ChatGPT with Copilot; one or the other on different projects; and haven't tried other forms of using GPT-4...)
Are you using GPT-4? I've found it very responsive to follow-up suggestions. I don't re-specify the whole ask though, just ask for the changes like "can you add error handling for missing files?".
"fix this code" is very broad, it will likely do better with more of a prompt.
"you will be given a section of code wrapped in ``` your task is analysis this code for any possible issues. You should list any issues you find and why you believe it's an issue. Then explain how to correct it"
But the context is there, e.g. I get a bad suggestion by LLM - I suggest a fix - it acknowledged the correction and ignores the instructions when generating code.
I had a similar thought, but would I would like to maintain control what the LLM has access to. It would be difficult to prevent things you might temporarily have hardcoded or generally do not want to expose.
I started working on something for this purpose called j-dev [0]. It started as a fork off smol-dev [1] which basically gets GPT to write your entire project from scratch. And then you would have to iterate the prompt to nuke everything and re-write everything, filling in increasingly complicated statements like "oh except in this function make sure you return a promise"
j-dev is a CLI where it gives a prompt similar to the one in the parent article [2]. You start with a prompt and the CLI fills in the directory contents (excluding gitignore). Then it requests access to the files it thinks it wants. And then it can edit, delete or add files or ask for followup based on your response. It does stuff like make sure the file exists, show you a git diff of changes, respond if the LLM is not following the system prompt, etc.
It also addresses the problem that a lot of these tools eat up way too many tokens so a single prompt to something like smol-dev would eat up a few dollars on every iterations.
It's still very much a work in progress and i'll prob do a show hn next week but I would love some feedback
Sounds cool! I'm looking forward to the Show HN. Would definitely recommend recording a video of it in action. A video makes it much easier to understand what the tool can do.
> video makes it much easier to understand what the tool can do
For you.
Long before YouTube versus WikiHow we've had "visual learners", "practical learners", and those who prefer learning from clear writing.
So it's probably not just some sort of generational or way-of-thinking divide, it's probably a cohort of personas who far prefer text to rapidly understand what things do, and a cohort who prefer someone to "show and tell"[^1].
That said, the balance does seem to have shifted in recent decades, perhaps for Americans around the same time (correlation not causation) as free play outside and big three over-the-air TV networks gave way to helicopter parenting of fully programmed days and/or the first 150 channel MTV generation.
Have you seen a lot of non-lazy design in the last decade? It’s one trend after another and a general tendency for the UI to offer the bare minimum in functionality and call it design.
I agree - “fixes” are a cool opportunity. I’ve created a runtime analysis of code execution that spots certain types of flaws and anti-patterns (that static analyzers cant find). Then I’m using a combination of the execution trace, code, and finding metadata (eg OWASP URL) to create a prompt. The AI responds with a detailed description of the problem (in the style of a PR comment), and a suggested fix. Here’s a short video of it in action - lmk what you think.
It looks quite powerful. I would focus on adoption and usability over adding any more features. I feel like there's a lot of value there already, but I'm not exactly sure how I'd integrate it into my workflow.
The CI integration sounds like the most interesting part to me, since I usually let things fail in CI then go back and fix them.
It's kind of in an interesting spot because it's not instant feedback like a linter/type checker, but only running it in CI feels like a waste of potential.
Thanks for the advice! I agree with your characterization as somewhere between “instant” and “too late” (eg in prod) feedback. We are focusing on the code editor and GitHub Actions at the moment. For example, figuring out out what happens after the GitHub Action identifies a problem. Do you try and fix it directly in the browser? Or go back to the code editor to inspect the issue and work on the AI-assisted fix? Fixing “in browser” feels awkward to me, but I have seen some videos of Copilot X doing this so maybe it’s possible? Working with the code back in the code editor is of course much more powerful, but it takes some work to setup the context locally to work on the fix. Wdyt?
We've been using windmill for our internal tooling and dashboards and its been great! Genuinely excited to see GPT-4 integration, we'll definitely give it a go.
Some things we've done with windmill so far:
* Slack bots / alerts from scheduled jobs -> All sourced in one location
* One source of truth for connectors (google drive / sheets, slack, postgres database aggs)
* Complex ATS flows (We load our applicants through a windmill task that runs LLMs to parse against a rubric we made, scores then get pushed back into the ATS system as an AI review, then that job reports to slack which helps us prioritize)
* Dashboards (Though I'll admit a bit of sharp-edges here on the UI builder) running complex python + pandas tasks that pull from postgres and run dashboards for us with pretty VEGAlite plots
* Job orchestration -- (though this is partially on hold) we have prototyped orchestrating some of our large workloads, data pipelines, etc. in a few different ways. (we run spark jobs and ray jobs on distributed clusters, and have been trying to use windmill to orchestrate and schedule these, with alerts)
Additionally, windmill made running and tracking these things (like the ATS system) accessible to my "low"-technical co-founder, who regularly will hop in and run the job or browse through error logs of previous runs.
Last, I found some bugs, reported in their discord and I heard back from rubenf very quickly and fixes went in rapidly too! Huge shoutout to the great work the windmill team is doing.
Are startups built on ChatGPT about as useful as ChatGPT? I really do not see the value-add from anyone yet?
An interesting question is, what are you providing above copying and pasting a load of my data into ChatGPT in a way that I’m not sure I gave you permission to?
For a given task, ChatGPT seems to do way better if it given a precise prompt with detailed instructions. This includes describing the context of the problem, maybe what kind of data it will be operating on, and especially the format in which to output its answer. Once the answer is produced in the expected format, the application can then integrate it into its own data and make it immediately available.
An embedded AI assistant in a product can provide a lot of this context, whereas a simple user of ChatGPT on the web might not end up with an answer that they can use immediately. I have no doubt that many people spend more way time preparing data for and extracting data out of ChatGPT than how long it took to answer.
You can see an example of this in this demo of a tool called Tana: https://youtu.be/FlqpK8ucf8s?t=310 – the user is asking for airline home bases, and has to add "do not mention the country" for the data to have the right format. Once the prompt is correct though, the app is able to ingest this generated data and merge it with the user's.
Startups built on ChatGPT can be useful, but their value depends on the specific applications and services they offer. While ChatGPT can generate content, these startups need to showcase unique features and services that go beyond simply using ChatGPT. Regarding data usage, reputable startups should always prioritize user consent and comply with privacy regulations to ensure data is handled responsibly. Transparency and clear user consent are essential aspects of building trust in AI-powered applications.
I could tell this was chatGPT generated after the first sentence and a half. So wishy washing, and saying "it depends" which works as an answer regardless of the question it is answering.
I've started to build this exact kind of setup for everything I do into Emacs. It has dramatically improved my productivity and made development a joy. Rather than just having the completion style AI of Github copilot I freely move around my work, highlight functions and ask for completions. Or I provide an entire package/directory as context and ask for new files. I can iterate with followups. I can ask for inplace responses or commentary in a sidebar. I've setup predesignated prompts/questions and tied them to keybindings. I've discovered that I can provide stacktraces to GPT4 with my code context and it can pretty quickly debug my code. It feels like pair programming with a very fast junior engineer.
I do this with code, design documents, and task planning.
> I've discovered that I can provide stacktraces to GPT4 with my code context and it can pretty quickly debug my code.
I was blown away by how effective it is when I tried it. Unfortunately there are cases where it doesn't work well, and it can still hallucinate solutions when it has no idea what the problem is, but generally speaking it's excellent.
My goal was to tie this into VS Code and attach it to processes that would expose an error stack to the AI agent when things went wrong, and it would then iteratively attempt to find and test solutions for you. I couldn't get the success rate high enough to bother publishing it, but it was really fun and I learned a lot. I also saw that GitHub is essentially doing this and much more with Copilot X, so my solution would never compare anyways. I think this will become a huge time saver in the near future, though. When it worked well, I could have a solution to the error and tests to validate the solution within 30 seconds or so.
Wow. Ya this is great. Any dotfiles to skim through? I am a Doom Emacser myself, slowly learning the elisp flow to make my own config, and this seems like something I could chew on. :)
- Did you use any fine-tuning with this or is this standard GPT4?
- Does quality of response improve with any additional examples?
- Does this perform well for specific languages?
It's interesting because we came to a different conclusion with Autopilot[0] - context and learning is incredibly important for result quality, and gpt4 doesn't (yet) support fine-tuning but will soon, and we'll definitely be taking advantage of that. Not just for quality, but also for speed (less time spent gathering context and processing input tokens).
My view is, everyone has access to chatgpt and github copilot, and so the idea is to provide value in excess of what chatgpt/copilot can do. Part of that is embedding it in the UI, but (especially for internal tools, which tend to be shorter) the improvement isn't huge over copy/paste or using copilot in vs code.
However, beyond UI integration, we can intelligently pull context on related files, connected DBs/resources, SDKs you're using, and so on. And that's something chatgpt can't do (for now). The quality of response, from what we saw, dramatically improved with the right docs and examples pulled in.
And yes, gpt4 does much better on JS (React specifically) and Python. It's just whatever it's trained on, and there's a ton more JS/Python code out there.
To be able to click a button and send in script, Context and error message. I belive is a huge changes not a small change. Speeding ut the turnaround time a lot.
I will fins out the next 2 weeks at least. I hopw this will change how i program
No need to fine-tune since windmill uses normal code without custom syntax so gpt4 corpus works very well. That's one of the benefit of not using DSL, most tooling work well out-of-the-box including vscode copilot if you were to use windmill using the GitHub/vscode integration.
Windmill is such a great product and having the code editor and the execution layer in the same product gives a lot of power.
The only similar product i know that has this approche is darklang.
Do we as a industry have a good name for this yet?
(Im not talking about other low code platforms where code is secondary. Both darklang and windmill is code first. Giving them a huge advantage when doing thinks like AI since they can test on real data really fast. Making the turnaround speed and time to production potentially really low)
Seems like only a editor. Do that host and run your code also?
Windmills "Fix error", looks so nice. But you need a tight
Integration between the editor / code and the execution platform.
We do support these type of workflows in our spin on Python notebooks: https://einblick.ai
Since we control the code and the execution we can do a lot of interesting things with sending specific context to the LLMs like this "Fix error" feature.
Love Windmill so much. It's blazing fast, super easier to try it out locally with docker-compose, and Ruben and his team are shipping new features / fixing bugs every day, and the product only gets better.
Windmill is amazing, incredibly powerful and easy to run anywhere. Thank you so much for making it open source and building in public. I wish you a LOT of success!
There seems to be a bug in the regex in the video as well as the text below. [a-zA-z] is going to include the characters between Z and a as well as the upper and lower case letters. The claim by GPT that it matches "any letter (upper or lower case)" doesn't seem to catch it either for some reason.
This is how I expected the latest Copilot to work: integrate more data from the disk, see runtime variables when in debugging mode, and receive the error messages when the program crashes. Unfortunately it didn't improve much.
One of these days I would love to work at a company where all of you service providers didn't need SOC2 compliance and an 10 week risk assessment prior to even signing a contract.
Yes, we will do this next and address this in the post. We wanted to validate the approach first using the best model available but using another model is just one endpoint change away in the codebase https://github.com/windmill-labs/windmill/blob/8d550a7ea5708...
As an open-source project ourselves, it is pretty obvious next step!
I don't think it'd give them access at any time but related data/code for gets sent to OpenAI when you use it, all LLM models who work with code work that way, otherwise it'd be unfeasible.
However, OpenAI pinky promises they don't use API data for anything, like training. Maybe that makes you feel a bit safer, although probably it shouldn't.
> However, OpenAI pinky promises they don't use API data for anything, like training. Maybe that makes you feel a bit safer, although probably it shouldn't.
It doesn't. I don't trust OpenAI or Sama. Frankly, I'm even hesitant to use VSCode now, even with its customizable privacy/telemetry settings (though I can at least limit its network access).
If Windmill had Langchain and/or LlamaIndex integration, that would be incredibly awesome. That's the AI integration I'd love to see in a low-code product.
Langflow/Flowise (especially the former) has almost everything I need within the AI world, and Windmill has everything else. Bridging the gap between them would be pretty great.
This looks really interesting and I hadn't really thought of this as a use case before. Does anyone here know if a similar product is available as an extension for VS Code? I know that I would love to be able to tell VS to just comment all of my functions for me and ensure that the typing is correct.
Continue (disclaimer: I am a creator) is a VS Code extension that enables you to highlight a section of code or file and tell it to comment all of your functions and ensure that the typing is correct. Here is the point in our demo video where we show this: https://youtu.be/3Ocrc-WX4iQ?t=31. You could even create a custom slash command to do this if you do it frequently and want it done in the same specific way: https://continue.dev/docs/customization#custom-commands
Currently the context is limited to your database schemas, your resources, and the current script code. But we will soon augment it the rest of your code. The techniques applied here would remain relevant but you would need to find the most relevant context of your codebase first since you can't (or shouldn't) cram all the codebase in your prompt.
https://twitter.com/transitive_bs/status/1646778220052897792
https://www.youtube.com/watch?v=rd-J3hmycQs
Features like their "AI Fix" feel much more like the direction things should go. Providing context-based actions without requiring user input.
E.g. use the database schema, existing code, etc as context to suggest actions without a user having to type it out.