> Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?
We do have it, we call those programmers, without such tools you don't get much useful output at all. But other than that static analysis tools aren't powerful enough to detect the kind of problems and issues these language models creates.
I'd be interested to know the answer to this as well. Considering the wealth of AI IDE integrations, it's very eyebrow-raising that there are zero instances of this. Seems like somewhat low hanging fruit to rule out tokens that are clearly syntactically or semantically invalid.
I’d like to constrain the output of the LLM by accessing the probabilities for the next token, pick the next token that has the highest probability and also is valid in the type system, and use that. Originally OpenAI did give you the probabilities for the next token, but apparently that made it easy to steal the weights, so they turned that feature off.
This can be done: I gave mine a justfile and early in the project very attentively steered it towards building out quality checks. CLAUDE.md also contains instructions to run those after each iteration.
What I'd like to see is the CLI's interaction with VSCode etc extending to understand things which the IDE has given us for free for years.
Previously at Sprout Social where I built their ML inference platform - reduced deployment time from 6 months to 6 hours and cut AWS costs by $500K/yr.
Looking for interesting problems in AI infrastructure, performance optimization, or building products from scratch.
Location: Wisconsin
Remote: Yes
Willing to relocate: Yes
Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
GitHub: https://github.com/Mockapapella
Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/
Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.
Previously at Sprout Social where I built their ML inference platform - reduced deployment time from 6 months to 6 hours and cut AWS costs by $500K/yr.
Looking for interesting problems in AI infrastructure, performance optimization, or building products from scratch.
Location: Wisconsin
Remote: Yes
Willing to relocate: Yes
Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
GitHub: https://github.com/Mockapapella
Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/
Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.
This is a good article on the "fog of war" for GPU inference. Modal has been doing a great job of aggregating and disseminating info on how to think about high quality AI inference. Learned some fun stuff -- thanks for posting it.
> the majority of organizations achieve less than 70% GPU Allocation Utilization when running at peak demand — to say nothing of aggregate utilization. This is true even of sophisticated players, like the former Banana serverless GPU platform, which operated at an aggregate utilization of around 20%.
Saw this sort of thing at my last job. Was very frustrating pointing this out to people only for them to respond with ¯\_(ツ)_/¯. I posted a much less tactful article (read: rant) than the one by Modal, but I think it still touches on a lot of the little things you need to consider when deploying AI models: https://thelisowe.substack.com/p/you-suck-at-deploying-ai-mo...
Honestly I thought you guys had launched already (and didn't know you were a part of YC), been aware of you guys for years now it seems. Congrats on the launch! Hope the twitter issues aren't causing you guys too many problems.
Normally I'd send this as a DM or email, but I think it could be useful for others to learn about how to use your service/the limitations of it. A couple weeks ago I made a search for:
In early 2023, Andrej Karpathy said something like "large training runs are a good test of the overall health of the network." Something something resilience as well I think. I need you to find it.
Unfortunately it wasn't able to find it, but it was either in a tweet or a really long presentation, neither of which are good targets for search. It was around the same time that this (https://www.youtube.com/watch?v=c3b-JASoPi0) video was posted, like within a couple weeks before or after. How could I have improved my query? Does exa work over videos?
All of this is just stuff I kind of made up and wanted in the song, but it meaningfully improved the output over just tags. I think "steering/nudging the generation space" is a decent idea for how I feel like this affects the output.
I also often use them to structure things around song structure like [intro], [break], [chorus], and even get more descriptive with these describing things or moments I'd like to happen. Again adherence is not perfect, but seems to help steer things.
One of my favorite tags I've seen is [Suck the entire song through vacuum] and well... I choose to believe, check out 1:29 https://suno.com/s/xdIDhlKQUed0Dp1I
Worth playing around with a bunch, especially if you're not quite getting something interesting or in the direction you want.
Brackets such as [Verse] help provide waveform separation in the edit view so that you can easily edit that section without manually dragging the slider.
Others such as [Interrupt] will provide a DJ-like fade-out / announcement (that was <Artist name>, next up..." / fade-in - providing an opportunity to break the AI out of repetitive loops it obsesses about.
I've used [Bridge] successfully, and [Instrumental] [No vocals] work reliably as well (there are also instrumental options, but I still use brackets out of habit I guess).
Location: Wisconsin
Remote: Yes
Willing to relocate: Yes
Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
GitHub: https://github.com/Mockapapella
Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/
Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.
Location: Wisconsin
Remote: Yes
Willing to relocate: Yes
Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
GitHub: https://github.com/Mockapapella
Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/
Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.
Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?