Hacker Newsnew | past | comments | ask | show | jobs | submit | Mockapapella's commentslogin

> When it’s able to create code that compiles, the code is invariably inefficient and ugly.

Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?


> Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?

We do have it, we call those programmers, without such tools you don't get much useful output at all. But other than that static analysis tools aren't powerful enough to detect the kind of problems and issues these language models creates.


I'd be interested to know the answer to this as well. Considering the wealth of AI IDE integrations, it's very eyebrow-raising that there are zero instances of this. Seems like somewhat low hanging fruit to rule out tokens that are clearly syntactically or semantically invalid.


I’d like to constrain the output of the LLM by accessing the probabilities for the next token, pick the next token that has the highest probability and also is valid in the type system, and use that. Originally OpenAI did give you the probabilities for the next token, but apparently that made it easy to steal the weights, so they turned that feature off.


It's been tried already and doesn't work. Very often a model needs to emit tokens that aren't valid yet but will become so later.


This can be done: I gave mine a justfile and early in the project very attentively steered it towards building out quality checks. CLAUDE.md also contains instructions to run those after each iteration.

What I'd like to see is the CLI's interaction with VSCode etc extending to understand things which the IDE has given us for free for years.


SEEKING WORK | Remote | AI Infrastructure & Performance

Location: Wisconsin

---

Last September I built an AI inference tool that hit #3 on HN (https://news.ycombinator.com/item?id=41620530). It processed 17.3M messages in 24 hours and only cost $17 to run.

I specialize in:

- LLM inference optimization (FastAPI, proper batching, memory management)

- CI/CD pipelines for ML deployments

- Making AI systems cost-effective at scale

Recent work: FrankenClaude (reasoning injection experiments, https://thelisowe.substack.com/p/frankenclaude-injecting-dee...), self driving Rocket League (https://thelisowe.substack.com/p/building-an-ai-that-plays-r...), diffdev (AI-powered code modification tool, https://pypi.org/project/diffdev/).

Previously at Sprout Social where I built their ML inference platform - reduced deployment time from 6 months to 6 hours and cut AWS costs by $500K/yr.

Looking for interesting problems in AI infrastructure, performance optimization, or building products from scratch.

Tech: PyTorch, FastAPI, K8s, Docker, AWS, ONNX

---

Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2... GitHub: github.com/Mockapapella Blog: thelisowe.substack.com

Contact: My email is in my bio or on my resume


    Location: Wisconsin
    Remote: Yes
    Willing to relocate: Yes
    Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
    Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
    GitHub: https://github.com/Mockapapella
    Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/

Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.


SEEKING WORK | Remote | AI Infrastructure & Performance

Location: Wisconsin

---

Last September I built an AI inference tool that hit #3 on HN (https://news.ycombinator.com/item?id=41620530). It processed 17.3M messages in 24 hours and only cost $17 to run.

I specialize in:

- LLM inference optimization (FastAPI, proper batching, memory management)

- CI/CD pipelines for ML deployments

- Making AI systems cost-effective at scale

Recent work: FrankenClaude (reasoning injection experiments, https://thelisowe.substack.com/p/frankenclaude-injecting-dee...), self driving Rocket League (https://thelisowe.substack.com/p/building-an-ai-that-plays-r...), diffdev (AI-powered code modification tool, https://pypi.org/project/diffdev/).

Previously at Sprout Social where I built their ML inference platform - reduced deployment time from 6 months to 6 hours and cut AWS costs by $500K/yr.

Looking for interesting problems in AI infrastructure, performance optimization, or building products from scratch.

Tech: PyTorch, FastAPI, K8s, Docker, AWS, ONNX

---

Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2... GitHub: github.com/Mockapapella Blog: thelisowe.substack.com

Contact: My email is in my bio or on my resume


    Location: Wisconsin
    Remote: Yes
    Willing to relocate: Yes
    Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
    Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
    GitHub: https://github.com/Mockapapella
    Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/

Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.


This is a good article on the "fog of war" for GPU inference. Modal has been doing a great job of aggregating and disseminating info on how to think about high quality AI inference. Learned some fun stuff -- thanks for posting it.

> the majority of organizations achieve less than 70% GPU Allocation Utilization when running at peak demand — to say nothing of aggregate utilization. This is true even of sophisticated players, like the former Banana serverless GPU platform, which operated at an aggregate utilization of around 20%.

Saw this sort of thing at my last job. Was very frustrating pointing this out to people only for them to respond with ¯\_(ツ)_/¯. I posted a much less tactful article (read: rant) than the one by Modal, but I think it still touches on a lot of the little things you need to consider when deploying AI models: https://thelisowe.substack.com/p/you-suck-at-deploying-ai-mo...


Nice article! I had to restrain myself from ranting on our blog :)


Honestly I thought you guys had launched already (and didn't know you were a part of YC), been aware of you guys for years now it seems. Congrats on the launch! Hope the twitter issues aren't causing you guys too many problems.

Normally I'd send this as a DM or email, but I think it could be useful for others to learn about how to use your service/the limitations of it. A couple weeks ago I made a search for:

    In early 2023, Andrej Karpathy said something like "large training runs are a good test of the overall health of the network." Something something resilience as well I think. I need you to find it.
Unfortunately it wasn't able to find it, but it was either in a tweet or a really long presentation, neither of which are good targets for search. It was around the same time that this (https://www.youtube.com/watch?v=c3b-JASoPi0) video was posted, like within a couple weeks before or after. How could I have improved my query? Does exa work over videos?


I think I found it! Unfortunately we do not include tweets in our search index

> TLDR LLM training runs are significant stress-tests of an overall fault tolerance of a large computing system acting as a biological entity.

https://x.com/karpathy/status/1765424847705047247


Holy shit I think that might be it! I have been looking for that tweet for like a year now. Thanks!


Could you elaborate on the instructions in brackets part?


Sure, you can do a lot of things here... stuff in [brackets] isn't sung.

For example I was trying to steer a melodic techno prompt recently in a better direction by putting stuff like this upfront:

    [intro - dramatic synths, pulsing techno bass]
    [organic percussive samples]
    [rolling galloping pulsing gritty bassline]
    [soaring experimental synths, modulation heavy, echos, sound design, 3d sound]
    [lush atmosphere, variation]
    [hypnotic groovy arppegiation arps]
    [sampled repetitive trippy vocal]
All of this is just stuff I kind of made up and wanted in the song, but it meaningfully improved the output over just tags. I think "steering/nudging the generation space" is a decent idea for how I feel like this affects the output.

I also often use them to structure things around song structure like [intro], [break], [chorus], and even get more descriptive with these describing things or moments I'd like to happen. Again adherence is not perfect, but seems to help steer things.

One of my favorite tags I've seen is [Suck the entire song through vacuum] and well... I choose to believe, check out 1:29 https://suno.com/s/xdIDhlKQUed0Dp1I

Worth playing around with a bunch, especially if you're not quite getting something interesting or in the direction you want.


Brackets such as [Verse] help provide waveform separation in the edit view so that you can easily edit that section without manually dragging the slider.

Others such as [Interrupt] will provide a DJ-like fade-out / announcement (that was <Artist name>, next up..." / fade-in - providing an opportunity to break the AI out of repetitive loops it obsesses about.

I've used [Bridge] successfully, and [Instrumental] [No vocals] work reliably as well (there are also instrumental options, but I still use brackets out of habit I guess).


    Location: Wisconsin
    Remote: Yes
    Willing to relocate: Yes
    Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
    Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
    GitHub: https://github.com/Mockapapella
    Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/

Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.


Great background!

Interested in building bleeding-edge VLM infrastructure at VLM Run? https://vlm-run.notion.site/vlm-run-hiring-25q1-staff


Thanks, looks interesting. Sent an email


    Location: Wisconsin
    Remote: Yes
    Willing to relocate: Yes
    Technologies: Python, PyTorch, Kubernetes, Docker, AWS, FastAPI, ONNX, MLOps
    Resume: https://drive.google.com/file/d/1qO8XdisNTFq_wmrQGDKnu6eWDi2g33Me/view
    GitHub: https://github.com/Mockapapella
    Email: In profile or on resume
AI/ML Engineer specializing in high-performance deployments. Built distributed systems handling 30K QPS, developed a neural network for Rocket League gameplay, and created platforms that cut model deployment time from 6 months to 6 hours. Saved $500K/yr in infrastructure costs through optimization at previous role. Former technical founder with experience in humanoid robotics and AI writing assistance. I write about my projects and musings on my blog: https://thelisowe.substack.com/

Seeking roles focusing on ML infrastructure, model optimization, post-training, or full-stack AI engineering.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: