JambalayaJimbo's favorites

		refset 17 days ago \| parent \| context \| on: Show HN: Vibe Linking Great seeing another example here of The Monospace Web design theme https://owickstrom.github.io/the-monospace-web/
		brcmthrowaway 19 days ago \| parent \| context \| on: Zig got a new ELF linker and it's fast Crafting Interpreters?
		chrisweekly 19 days ago \| parent \| context \| on: Bringing Observability to Claude Code: OpenTelemet... Good point. Also (tangent), I followed your profile link to https://sibylline.dev and am thoroughly impressed. Stoked to have found your treasure trove of repos and insights.
		tedsanders 24 days ago \| parent \| context \| on: Tau² benchmark: How a prompt rewrite boosted GPT-5... >GPT-5 showed significant improvement only in one benchmark domain - which is Telecom. The other ones have been somehow overlooked during model presentation - therefore we won’t bother about them either. I work at OpenAI and you can partly blame me for our emphasis on Telecom. While we no doubt highlight the evals that make us look good, let me defend why the emphasis on Telecom isn't unprincipled cherry picking. Telecom was made after Retail and Airline, and fixes some of their problems. In Retail and Airline, the model is graded against a ground truth reference solution. Grading against a reference solution makes grading easier, but has the downside that valid alternative solutions can receive scores of 0 by the automatic grading. This, along with some user model issues, is partly why Airline and Retail scores stopped climbing with the latest generations of models and are stuck around 60% / 80%. I'd bet you $100 that a superintelligence would probably plateau around here too, as getting 100% requires perfect guessing of which valid solution is written as the reference solution. In Telecom, the authors (Barres et al.) made the grading less brittle by grading against outcome states, which may be achieved via multiple solutions, rather than by matching against a single specific solution. They also improved the user modeling and some other things too. So Telecom is the much better eval, with a much cleaner signal, which is partly why models can score as high as 97% instead of getting mired at 60%/80% due to brittle grading and other issues. Even if I had never seen GPT-5's numbers, I like to think I would have said ahead of time that Telecom is much better than Airline/Retail for measuring tool use. Incidentally, another thing to keep in mind when critically looking at OpenAI and others reporting their scores on these evals is that the evals give no partial credit - so sometimes you can have very good models that do all but one thing perfectly, which results in very poor scores. If you tried generalizing to tasks that don't trigger that quirk, you might get much better performance than the eval scores suggest (or vice versa, if your tasks trigger a quirk not present in the eval). Here's the tau2-bench paper if anyone wants to read more: https://arxiv.org/abs/2506.07982
		moab on March 20, 2022 \| parent \| context \| on: George Hotz against the institutions "During our conversation, I was continually struck by the degree to which Hotz and his company are anti-mimetic. Like many founders of tech startups—thanks to the influence of Peter Thiel – Hotz has a passing familiarity with René Girard’s theory of mimetic desire. The theory, now supported by a trove of empirical evidence, posits that our desires do not originate in us but are always learned from models." Perhaps not so relevant, but here is a nice read about Girard: https://arcade.stanford.edu/rofl/deceit-desire-and-literatur...
		FeepingCreature 28 days ago \| parent \| context \| on: AI coding That makes me think of https://store.steampowered.com/app/2262930/Bombe/ which is a version of Minesweeper where instead of clicking on squares you define (parametric!) rules that propagate information around the board automatically. Your own rules skip all the easy parts for you. As a result, every challenge you get is by definition a problem that you've never considered before. It's fun, but also exhausting.
		lrvick 40 days ago \| parent \| context \| on: Code Is Debt Nothing for the use cases I have in production other than more platform support, but those can be compile time features for when using those specific environments. I want 0 lines of dead code in production for easy auditing. https://git.distrust.co/public/nit
		rich_sasha 43 days ago \| parent \| context \| on: How life-size cows made of butter became an iconic... Reminds me of: https://www.mit.edu/people/dpolicar/writing/prose/text/think... "They're made out of meat"
		0xbadcafebee 45 days ago \| parent \| context \| on: Malicious versions of Nx and some supporting plugi... Before anyone puts the blame on Nx, or Anthropic, I would like to remind you all what actually caused this exploit. The exploit was caused by an exploit, shipped in a package, that was uploaded using a stolen "token" (a string of characters used as a sort of "usename+password" to access a programming-language package-manager repository). But that's just the delivery mechanism of the attack. What caused the attack to be successful were: 1. The package manager repository did not require signing of artifacts to verify they were generated by an authorized developer. 2. The package manager repository did not require code signing to verify the code was signed by an authorized developer. 3. (presumably) The package manager repository did not implement any heuristics to detect and prevent unusual activity (such as uploads coming from a new source IP or country). 4. (presumably) The package manager repository did not require MFA for the use of the compromised token. 5. (presumably) The token was not ephemeral. 6. (presumably) The developer whose token was stolen did not store the token in a password manager that requires the developer to manually authorize unsealing of the token by a new requesting application and session. Now after all those failures, if you were affected and a GitHub repo was created in your account, this is a failure of: `1. You to keep your GitHub tokens/auth in a password manager that requires you to manually authorize unsealing of the token by a new requesting application and session.` So what really caused this exploit, is all completely preventable security mechanisms, that could have been easily added years ago by any competent programmer. The fact that they were not in place and mandatory is a fundamental failure of the entire software industry, because 1) this is not a new attack; it has been going on for years, and 2) we are software developers; there is nothing stopping us from fixing it. This is why I continue to insist there needs to be building codes for software, with inspections and fines for not following through. This attack could have been used on tens of thousands of institutions to bring down finance, power, telecommunications, hospitals, military, etc. And the scope of the attacks and their impact will only increase with AI. Clearly we are not responsible enough to write software safely and securely. So we must have a building code that forces us to do it safely and securely.
		drob518 46 days ago \| parent \| context \| on: A bug saved the company Bingo. The best trials are those that allow the user to determine whether the product is capable of solving the user’s immediate problem without actually solving it unless the product is purchased.
		Harvesterify 49 days ago \| parent \| context \| on: Rethinking the Linux cloud stack for confidential ... I'd recommend anyone interested in Confidential Computing to read the work from Rodrigo Branco (@BSDaemon) to understand why it's mostly a failure and a PR stunt from cloud providers to give the illusion that the customer stays in control, while at the same time the hardware capabilities CC is built upon are unsecure (and can't be fixed by firmware or microcode update, most of the time). For example, a direct link to his keynote slides from ESA 3S conference last year (PDF): https://indico.esa.int/event/528/attachments/5988/10212/Keyn...
		CraigJPerry 48 days ago \| parent \| context \| on: Ask HN: Best codebases to study to learn software ... I've done this for years and swear by it. Top 5 codebases for changing my mind about things: Wietse Venema's Postfix mail server. Taught me tons about security posture, the architecture i'd describe as microservices before microservices was a thing, but contrary to the modern take on microservices (it's mostly a tool for decomposing work across large semi-isolated groups) this was primarily about security and simplicity. Spring framework - this opened my eyes to ways of working that i hadn't really thought enough about before, the developers on that project have a culture of deeply considering the needs of their users (who are java developers often in an enterprise environment). Git - the thing i like about the git code base is that once you've covered the objects database (e.g. blobs, trees and commits) and the implementation of refs, everything else just feels like additional incremental features. With those core concepts, everything else is kinda harmoniously built on top. Varnish by Poul Henning-Kamp is another one - feels like he went to great lengths to make that code base a teaching tool despite the fact it's also a top tier reverse proxy. Last one isn't a code base - but it will help with software design in the large; studying how the lieutenants model works in the linux kernel. Thinking about my answers, i think i've highlighted something subtly different than "well designed codebases" it's more a list of codebases that left a notable long lasting impression on me because of design decisions they made.
		imakwana 51 days ago \| parent \| context \| on: Margin debt surges to record high Glad to see a fellow fundamental indexer on HN! As a US based investor, I personally invest in the RAFI US broad market fundamental index (FNDB ETF) which does keep up with the Vanguard US total market over the past 10 years except the bubbly years of 2020/2021 & 2024/2025, even with a higher expense ratio. Last 10 years comparison (VTI vs FNDB): https://www.portfoliovisualizer.com/backtest-portfolio?s=y&s... In my case, after observing the Covid-19 craziness in market, I decided to dig further on value strategies and discovered this gem from Research Affiliates in Journal of Portfolio Management circa 2012, which completely convinced me on the concept of fundamental indexation as a superior alternative to market-cap weighted total market index. Rebalancing and the value effect (JPM 2012): https://www.researchaffiliates.com/content/dam/ra/publicatio...
		bwfan123 51 days ago \| parent \| context \| on: AWS CEO says using AI to replace junior staff is '... My abstract algebra class had it exactly backwards. It started with a lot of needless formalism culminating in galois theory. This was boring to most students as they had no clue why the formalism was invented in the first place. Instead, I wished it showed how the sausage was actually made in the original writings of galois [1]. This would have been far more interesting to students, as it showed the struggles that went into making the product - not to mention the colorful personality of the founder. The history of how concepts were invented for the problems faced is far more motivating to students to build a mental model than canned capsules of knowledge. [1] https://www.ams.org/notices/201207/rtx120700912p.pdf
		zelphirkalt 57 days ago \| parent \| context \| on: It seems like the AI crawlers learned how to solve... I think so too. Maybe the compute cost needs to be upped some more. I am OK with waiting a bit longer when I access the site.
		oinfoalgo 53 days ago \| parent \| context \| on: GenAI FOMO has spurred businesses to light nearly ... The forward P/E of the Nasdaq 100 is pretty exemplary of how this is not the same situation at all. https://en.macromicro.me/series/23955/nasdaq-100-pe It is easy to spot the dot com bubble on this chart.
		grinich 53 days ago \| parent \| context \| on: Vendors that treat single sign-on as a luxury feat... also if anyone wants to go down the rabbit hole about why SAML is hard to implement, this is a pretty interesting writeup of a major 0-day vuln we discovered earlier this year: https://workos.com/blog/samlstorm
		IncreasePosts 52 days ago \| parent \| context \| on: The Rise and Fall of Music Ringtones: A Statistica... You can do that pretty easy with twilio. It would probably even be free or effectively free based on mere personal levels of calls.
		DaiPlusPlus 56 days ago \| parent \| context \| on: I accidentally became PureGym’s unofficial Apple W... I've never been to a PureGym; if you guys use a PIN-pad to enter does that mean they're like those unattended 24/7 gyms? ...or if they do have an attendant there, why can't they let you in with a friendly greeting like they used to in some imagined past?
		phkahler on Aug 26, 2020 \| parent \| context \| on: Is TDD Dead? (2014) >> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent. Its OK to dislike unit testing, but please don't redefine the term to avoid it. That's not helpful. Instead try to find the papers (by NASA or IBM?) That show unit testing finds only very few actual bugs, making it low value. That said, there are IMHO some units worth testing more.
		tylerhou 60 days ago \| parent \| context \| on: Why tail-recursive functions are loops I disagree with the title; loops are tail-recursive functions, but tail-recursive functions are not loops (in the sense that squares are rectangles, but rectangles are not squares). It is true that every tail recursive function can be converted into a semantically equivalent loop via a transformation like CPS (which the author mentions). However, for mutually tail-recursive functions, this conversion loses control flow information. This is because after the CPS transformation, calls to the other function become calls to a continuation; this call usually must be implemented as an indirect jump. On the other hand, mutually tail-recursive functions can call each other with direct/statically-known jumps. This loss of information might appear trivial, but in practice it has some important consequences. Classic examples are interpreter loops. It is well-known that computed gotos can result in modest to large speedups for interpreters [1]. The reason why is that computed gotos create an indirect jump per opcode, so a branch predictor can take advantage of correlations between opcodes. For example, looking at Python disassembly, the header of a standard range for loop compiles down to three opcodes: GET_ITER, FOR_ITER, STORE_FAST in sequence [2]. A branch predictor can recognize that the target of the "FOR_ITER" indirect jump will likely be the "STORE_FAST" instruction pointer; it cannot predict this in the naive implementation where jumps for all instructions are "merged" into a single indirect jump / switch at the top of the loop body. In this case, computed goto is effectively equivalent to a CPS transformation whose closures require no storage on the heap. Suppose, however, we know even more information about the instruction sequence; for example, we know ahead of time that every FOR_ITER opcode will be followed by a STORE_FAST opcode. We could completely replace the indirect jump with a direct jump to the instruction pointer for the STORE_FAST opcode. Because modern branch predictors are very good, this will have about the same performance in practice as the computed goto loop. However, consider the limiting case where we know the entire instruction sequence beforehand. If we write our interpreter as many mutually tail-recursive functions, with one function for every instruction, an optimizing compiler can replace every indirect call with a direct (tail-recursive) call to the function that implements the next instruction's opcode. With a good enough optimizer / partial evaluator, you can turn an interpreter into a compiler! This is known as the first Futamura projection [3]. To see an example of this in action, I wrote a prototype of a Brainfuck compiler via the Futamura projection; it uses LLVM as a partial evaluator [4]. The main interesting function is `interpret`, which is templated on the program counter / instruction. That is, `interpret` is really a family of mutually tail-recursive functions which statically call each other as described above. For short Brainfuck programs, the LLVM optimizer is able to statically compute the output of the Brainfuck program. (The one in the Godbolt link compiles to a loop, likely because LLVM does not want to unroll the mutual recursion too much.) You can play around with different Brainfuck programs by modifying the `program` string on line 5. [1] https://eli.thegreenplace.net/2012/07/12/computed-goto-for-e... [2] https://godbolt.org/z/rdhMvPo36 [3] https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr... [4] https://godbolt.org/z/WY4j931jf
		benlivengood 63 days ago \| parent \| context \| on: My Lethal Trifecta talk at the Bay Area AI Securit... I believe you've covered some working solutions in your presentation. They limit LLMs to providing information/summaries and taking tightly curated actions. There are currently no fully general solutions to data exfiltration, so things like local agents or computer use/interaction will require new solutions. Others are also researching in this direction; https://security.googleblog.com/2025/06/mitigating-prompt-in... and https://arxiv.org/html/2506.08837v2 for example. CaMeL was a great paper, but complex. My personal perspective is that the best we can do is build secure frameworks that LLMs can operate within, carefully controlling their inputs and interactions with untrusted third party components. There will not be inherent LLM safety precautions until we are well into superintelligence, and even those may not be applicable across agents with different levels of superintelligence. Deception/prompt injection as offense will always beat defense.
		simonw 63 days ago \| parent \| context \| on: My Lethal Trifecta talk at the Bay Area AI Securit... I loved that Design Patterns for Securing LLM Agents against Prompt Injections paper: https://simonwillison.net/2025/Jun/13/prompt-injection-desig... I wrote notes on one of the Google papers that blog post references here: https://simonwillison.net/2025/Jun/15/ai-agent-security/
		stan_kirdey 75 days ago \| parent \| context \| on: Enough AI copilots, we need AI HUDs needed to find a kids’ orthodontist. made a tiny voice agent: feed it numbers, it calls, asks about price/availability/insurance, logs the gist. it kind of worked. the magic was the smallest UI around it: - timeline of dials + retries - "call me back" flags - when it tried, who picked up - short summaries with links to the raw transcript once i could see the behavior, it stopped feeling spooky and started feeling useful. so yeah, copilots are cool, but i want HUDs: quiet most of the time, glanceable, easy to interrupt, receipts for every action.
		schneider89 3 months ago \| parent \| context \| on: 'I felt pure, unconditional love': the people who ... When the chatbot can also make cutting remarks pointing out your insecurities, nag you about chores and responsibilities, withhold affection, make you waste your time doing things the chatbot wants to do, or have you make soulcrushing smalltalk with the chatbot's parents, and you can't leave because you had children with it, and who knows if you can even do better you are getting too old to start over anyway, then you can call it real love.
		eMPee584 3 months ago \| parent \| context \| on: Show HN: MCP server for searching and downloading ... The real crime is an economic system that limits the spread of knowledge and access to other "human rights" by requiring everyone to hustle to survive (and, if possible, increase capital gains for the financial overlords) when we would already be technologically equipped to feed and house well all of mankind, instead of letting thousands of children starve to death each day and restricting access to education so that billions miss out on their intellectual development - a void easily filled with addictive media full of rage and distraction. Pirating books is just a symptom of this wretched system. And it is not enough - RISE, HN! .. towards RBE & beyond..
		scubakid 3 months ago \| parent \| context \| on: Bootstrapping a side project into a profitable sev... Thanks. For a while there, it wasn't clear to me which side of the line I was walking. Something that stuck with me from Poor Charlie’s Almanack is that low expectations are a cornerstone of a happy life. I built this for myself first, so when people actually signed up and paid, it was incredibly motivating. I was thrilled to spend my free time treating those early customers like royalty and building more of what they wanted. If I had instead come into this with the expectation of quick success, I doubt I would have made it through those early years. And cheers from one bootstrapper to another. It's not easy, but I can't imagine a more rewarding way to build.
		galdre 3 months ago \| parent \| context \| on: Tools: Code Is All You Need My absolute favorite use of MCP so far is Bruce Hauman's clojure-mcp. In short, it gives the LLM (a) a bash tool, (b) a persistent Clojure REPL, and (c) structural editing tools. The effect is that it's far more efficient at editing Clojure code than any purely string-diff-based approach, and if you write a good test suite it can rapidly iterate back and forth just editing files, reloading them, and then re-running the test suite at the REPL -- just like I would. It's pretty incredible to watch.
		simonw 3 months ago \| parent \| context \| on: Tools: Code Is All You Need Something I've realized about LLM tool use is that it means that if you can reduce a problem to something that can be solved by an LLM in a sandbox using tools in a loop, you can brute force that problem. The job then becomes identifying those problems and figuring out how to configure a sandbox for them, what tools to provide and how to define the success criteria for the model. That still takes significant skill and experience, but it's at a higher level than chewing through that problem using trial and error by hand. My assembly Mandelbrot experiment was the thing that made this click for me: https://simonwillison.net/2025/Jul/2/mandelbrot-in-x86-assem...
		lazarus01 3 months ago \| parent \| context \| on: Everything around LLMs is still magical and wishfu... < I have an unusually high need to own the understanding of any thing I'm learning This is called deprivation sensitivity. It’s different from intellectual curiosity, where the former is a need to understand vs. the latter, which is a need to know. Deprivation sensitivity comes with anxiety and stress. Where intellectual curiosity is associated with joyous exploration. I score very high with deprivation sensitivity. I have unbridled drive to acquire and retain important information. It’s a blessing and curse. An exhausting way 2 live. I love it but sometimes wish I was not neurodivergent.
		More