I am very suspicious of the results. A few months ago they published a LLM benchmark, calling it "perfect" while it actually contained like only 50 inputs (academic benchmark datasets usually contain tens of thousands of inputs).
Very audacious to call it "almost perfect" when it has only what appears to be 50 questions. For comparison, MMLU contains 57 tasks and more than 100k questions.
I remember that you can detect the "curl | bash" server side and serve a different script than what the user would get by downloading using other methods[1]. But yeah, the binary itself already has enough attack surface.
Not only that, but Servo is only receiving less 3000 USD/month in donation, less than their goal of 10k/month, and much less than what they deserve if consider Ladybird is receiving millions.
would be neat if ladybird ended up depending on and helping fund servo. that would fit both projects' missions - servo could continue focusing on the rendering engine and ladybird could get to outsource effort on that part of the code.
> Lack of speed and ghosting felt like it made traditional Eink impossible to do most computing tasks. So we focused on making the most Paperlike epaper display that has no ghosting and high refresh rate - 60 to 120fps. We started working on this in 2018.
The website mentions 60hz, will it also support 120hz?
we're trying to underpromise and overdeliver, and our display can now do 60 - 120 fps, but believe it or not, our PDF renderer and software can't match that yet.
So we're waiting till we can holistically do 120fps before announcing that.
But if you do frame rate tests, you'll see its 120fps
(for the nerds out there, its 6hz-120hz variable refresh rate IGZO)
While we are talking "LCD" and "In direct sunlight" and "display is close to surface" were you able to fit circular polarizers in there, or will I need to take my sunglasses off when using?
This is the big question. If they (or any company) can squeeze out sub-10ms latency for touch/pen interactions on an e-ink-like screen (like the Daylight), that will be a game-changer.
There’s a market for that though. If I am running a startup to generate video meeting summaries, the price of the models might matter a lot, because I can only charge so much for this service. On the other hand, if I’m selling a tool to have AI look for discrepancies in mergers and acquisitions contracts, the difference between $1 and $5 is immaterial… I’d be happy to pay 5x more for software that is 10% better because the numbers are so low to begin with.
My point is that there’s plenty of room for high priced but only slightly better models.
That's quite expensive indeed. At full context of 200K, that would be at least $3 per use. I would hate it if I receive a refusal as answer at that rate.
You are not going to take the expensive human out of the loop where downside risk is high. You are likely to take the human out of the loop only in low risk low cost operations to begin with. For those use cases, these models are quite expensive.
Just a note that the 67.0% HumanEval figure for GPT-4 is from its first release in March 2023. The actual performance of current ChatGPT-4 on similar problems might be better due to OpenAI's internal system prompts, possible fine-tuning, and other tricks.
Yeah the output pricing I think is really interesting, 150% more expensive input tokens 250% more expensive output tokens, I wonder what's behind that?
That suggests the inference time is more expensive then the memory needed to load it in the first place I guess?
I'm more curious about the input/output token discrepancy
Their pricing suggests that either output tokens are more expensive for some technical reason, or they're trying to encourage a specific type of usage pattern, etc.
Or that market research showed a higher price for input tokens would drive customers away, while a lower price for output tokens would leave money on the table.
Chatgpt-3.5 price reduction seems to be a direct response to Mixtral, which was cheaper (~0.0019 vs 0.0020 for 1K tokens) and better (https://arena.lmsys.org/) until now.