Hacker News new | past | comments | ask | show | jobs | submit | ozten's comments login

Yes. ARC AGI benchmark was supposed to last years and is already saturated. The authors are currently creating the second version.

Orbital Materials is designing wafer substrates that capture carbon and reuse excess heat.

Logged in I am wildly mistargeted. When I go anonymous via private browsing, the YT ads are soft core porn. Is Google having trouble with inventory or ???

Yeah, I'm totally shocked at the amount of soft core troll porn spread across Meta, TikTok, YouTube and probably any other main stream tech property. Is Apple allowing this at all? I'm no church lady but I don't want this stuff cast onto my screens unless I'm searching for it specifically. They are pushing it on people involuntarily.

I guess the question is if your parents and your grand-parents formed opinions based on sample size N of 1.

I got massive productivity gains from having an LLM fill out my test suite.

It is like autocomplete and macros... "Based on these two unit tests, fill out the suite considering b, c, and d. Add any critical corner case tests I have missed or suggest them if they don't fit well."

It is on the human to look at the generated test to ensure a) they are comprehensive and b) useful and c) communicate clearly


Can you extend that - what was the domain, how did you start? I would like to give this a try but am not quite sure I get it?

Backend coding for web services.

In the past I would hand write 8 or 9 unit tests. Now I write the first one or two and then brain dump anything else into the LLM prompt. It then outputs mine plus 6 or more.

I delete any that seem low value or ridiculous or have a follow up prompt to ask for refinements. Then just copy/pasta back into the codebase out of the chat.


That simple ? I’ll try it

Can confirm this approach works well for us too.

See, I’m arguing for writing fewer, better tests.

I realize that it’s the norm to rely heavily on unit tests. Hundreds or thousands of examples of inputs and outputs. We still find errors in programs. “Examples prove the presence of an error, not the absence of errors,” as Djikstra (or was it Hoare? I can’t remember) would say. So I understand how one could view having an LLM generate tests being a win for productivity in that case.

But such test suites don’t add much. And generating 20 more tests won’t tell me much more about the code. It will actually make the test suite harder to read and understand.


At least post-mortems are filled with dead carcass.

Facebook is virtual reality, whereas VRChat is inhabited by humans.


Shifting the Overton Window as a Service


Nope. AlphaZero taught itself to play games like chess, shogi, and Go through self-play, starting from random moves. It was not given any strategies or human gameplay data but was provided with the basic rules of each game to guide its learning process.


Yes its reinforcement learning, but need to create policy and each policy is specialized for specific tasks.


Google as a first class partner is a massive liability. Example: Stadia was amazing and they snuffed it in the cradle.

Samsung should license Google App store, but retain full control for executing a product launch.


> Example: Stadia was amazing and they snuffed it in the cradle.

You can be amazing and not make money.

Google is in the business of building good products AND making money.

Stadia was a good product.

It didn't look like it would ever make money.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: