I tend to agree. I used to run Call of Cthulhu in high school, and there was a lot of preparation involved in running a good scenario. Some people are good at winging it and making stuff up as they go along. I never was. The best experiences for me and my players was always when I'd meticulously designed my own scenario, or used a store-bought scenario and read it thoroughly. Either way, hours of prep.
I would love to have an AI as an assistant Dungeon Master (or game master, or Keeper, or what have you). That is, one person in a group of players maintains the role of a master storyteller, but the AI is ready to fill in details or suggest ways to get the players back on track. This would probably be tedious if you're interacting with the LLM entirely through text, and having to manually keep it up to date with the story. But it could work well if you have a model that understands spoken language listening in on the game and generating cool images and making private suggestions to the game master.
> one person in a group of players maintains the role of a master storyteller, but the AI is ready to fill in details or suggest ways to get the players back on track
As a player and very occasional DM myself, filling in details and trying to get players back on track is where is the fun and challenge. AI could definitely be useful to handle all the paperwork (fight resolution and so on) though
Going by the specs, this pretty much blows Tinybox out of the water.
For $40,000, a Tinybox pro is advertised as offering 1.36 petaflops processing and 192 GB VRAM.
For about $6,000 a pair of Nvidia Project Digits offer about a combined 2 petaflops processing and 256 GB VRAM.
The market segment for Tinybox always seemed to be people that were somewhat price-insensitive, but unless Nvidia completely fumbles on execution, I struggle to think of any benefits of a Tinygrad Tinybox over an Nvidia Digits. Maybe if you absolutely, positively, need to run your OS on x86.
I'd love to see if AMD or Intel has a response to these. I'm not holding my breath.
You're right. Tinybox's 1.36 petaflops is FP16 so that is a significant difference.
Also, the Tinybox's memory bandwidth is 8064 GB/s, while the Digits seems to be around 512 GB/s, according to speculation on Reddit.
Moreover, Nvidia's announced their RTX 5090s priced at $2k, which could put downward pressure on the price of Tinybox's 4090s. So the Tinybox green or pro models might get cheaper, or they might come out with a 5090-based model.
If you're the kind of person that's ready to spend $40k on a beastly ML workstation, there's still some upside to Tinybox.
You're missing the most critical part though. Memory bandwidth. It hasn't been announced yet for Digits and it probably won't be comparable to that of dedicated GPUs.
The hate is for Musk. Tesla hate is just collateral damage.
People tolerated Musk's narcissism when he was largely apolitical and his entrepreneurship seemed to benefiting the world at large. Since 2020, Musk's brain seems to have been pretty severely infected by the woke/anti-woke mind-virus, and it's far less clear that his impact on the world is an unmitigated good thing.
I tend to agree with his conclusion that it's unlikely that the Russians attempted to force the jet to crash in the Caspian Sea to cover up the event. Not because Putin's government has any moral qualms, but because they lack the level of organizational competence it would take to come up with a plan like this and carry it out on short notice.
The two nearest airports in the direction away from Russia's war of aggression are the flight's origin at Baku, and Aktau. Baku is closer, but it's on the other side of a mountain range. Aktau is slightly further, on the other side of the Caspian Sea. A water landing is at least possible. Choosing to fly a damaged aircraft the slightly longer route over a lake rather than the shorter route over a mountain range is an eminently sensible decision.
A point of anecdata: my 2010 Prius has nearly 200,000 miles on it and it's still going strong.
The Prius has a traditional lead acid battery for starting the engine that's been replaced a couple of times. The traction batteries (the ones that help push the car forward and are charged from regenerative braking) are the originals.
The Priuses have a 12v AGM lead acid battery (about the size of a small motorcycle battery) to power the computers when you hit the power button. They don't have a starter. Instead, the 200v NiMH traction battery spins up the CVT, which spins the engine's crank shaft and to start it they simply apply spark to the plugs. The Prius Prime has the above batteries plus a separate Li-ion battery that can be charged overnight from an outlet at 120v or 240v.
I know two people who have had battery degradation in their Priuses, but in their case it was just 1 or 2 cells that crapped out, not the whole battery. They were able to fix it themselves, which sounded like the sort of thing that isn't too hard if you're moderately handy
Yes, I replaced one bad cell in my 2006 Prius. I got the cell on eBay for $20 and it took a few hours to remove the traction battery, replace the cell and put the traction battery back in. Add an hour if you clean the copper battery connectors. The battery has worked fine now for 10 months. Now my biggest problem is body rust.
If those patents hadn't been, or had they been shorter lived, or if antitrust law was more robust, we might have seen a lot more investment in optimizations on and variations of the NiMH chemistry.
I guess part of the reason is that the "M" in "NiMH" stands for Lanthanum - a rare earth element which happens to be used in catalysts facilitating oil refinement.
Of course catalysts are consumed slowly, but you still don't want any disturbances in the supply of a key element.
By those living in the delusion that the US will be 100% EV production in the next few years when the charging infrastructure is woefully inadequate. No one is questioning EV drivetrains but there are very real logistical and electrical supply issues that are being ignored.
California can barely keep the lights on as it is. This doesn't even scratch the surface that most are not the privileged few with at-home charging.
Nobody thinks EVs will replace 100% of vehicles in the next few years. This is the premise I constantly hear from people trying to argue that the power infrastructure isn't ready. EV ownership, charging availability, and per infrastructure are all growing slowly and in tandem and there's no reason to expect otherwise.
Nobody does, except some people. The belief that EVs will promptly replace the entire fleet is implicit in the claim that EVs can solve short-term greenhouse gas emissions goals by 2030, 2035 or 2040. Such claims are often used to naysay pedestrian, bicycle and transit projects that have a higher likelihood of achieving those greenhouse gas emissions goals. Obviously the belief is ridiculous because even if EVs were 100% of fleet sales today, they would not be 100% of the fleet on the road for a long time. And in America they are not remotely close to 100% of sales.
Are you sure you didn’t exaggerate? Any examples of those who believe EV will be 100% sales, or that EV can solve greenhouse gas emission goals in the next 15 years?
I’ve never heard of that, and even in countries like Norway where new car sales are predominantly EVs, it will be a long time until it can replace the entire fleet.
I am sure there are loads of people in America who say we don't need to do [thing] because EVs are coming. I don't press these people on their arithmetic because they aren't arguing in good faith, they just hate [thing].
If I understand correctly, Hugging Face is exploring approaches to tuning the output quality of a given model by tuning how long to let it run.
Normally when you run an LLM, you set your prompt and whatever tunable parameters, and the LLM software (eg. lamma.cpp) spits out tokens at whatever rate it can. If you want higher quality, you run a bigger model (though you're limited by the amount of memory you have available). If you want higher speed, you run a smaller model. Hugging Face seems to be looking at ways to make this tradeoff without switching between different models.
They show Llama 3.2 1B with chain-of-thought that outperforms Llama 3.1 8B and 3.2 3B that outperforms 3.1 70B. It’s less clear whether you actually inference time is faster for CoT 3B using 256x generations vs 70B if you have enough RAM. Basically a classical RAM/compute trade off
From a practical standpoint, scaling test-time compute does enable datacenter-scale performance on the edge. I can not feasibly run 70B on my iphone, but I can run 3B even if takes a lot of time for it to produce a solution comparable to 70B's 0-shot.
I struggle with this idea of "run it long enough", or another description I have heard "give the model time to think" it's not a thing - it takes as long as it takes. What im taking away from this is two things:
1. the reason for generalizations like 'long enough' and 'think more' are apparently because the methods are somewhat obscure
2. those methods are being explored by hugging face to make them less obscure
am I getting that right? I have been struggling to see past the metaphors and understand exactly what additional computation is being done - and here I read its something like multiple guesses being fed back in and chosen among which means its just multiple inferences in series that are all related to solving 1 problem.
The American healthcare system is responding to the incentives that are in place, and those incentives are designed (if not intended) to drive up costs.
Doctors can be sued for malpractice if they miss something, so they're incentivized to provide as much care as possible, regardless of cost.
Hospitals and other healthcare organizations are likewise incentivized to charge as much as they can, since the bills are passed on to insurance companies.
Insurance companies have every incentive to lower the bills they pay. They negotiate volume discounts with healthcare corporations, and the billing system that gets worked out between these large organizations has little connection to the actual cost of healthcare provided to consumers.
For most consumers, insurance coverage is connected to employment, so it's very difficult to shop around. It could mean leaving one job for another, just to get different health coverage.
Inefficiencies abound. We can do better, but there is not a simple solution to this.
You could probably eliminate a lot of these inefficiencies by eschewing any sense of morality or fairness and simply letting more people die. Society isn't quite ready for this level of dystopia, but it's the direction we're headed if we don't change course. Nationalizing the US healthcare system may be the best answer, but it is guaranteed to be enormously complicated, and not guaranteed to improve outcomes.
> Inefficiencies abound. We can do better, but there is not a simple solution to this.
A single-payer system is the conceptually simple solution to this. We could get there either by expanding Medicare to cover everyone (M4A), or we could add a public option to the ACA that is priced and funded to outcompete the private options.
In practice there would be enormous complications in managing the transition, but the evidence is that it would improve both costs and outcomes unless it is being sabotaged by an austerity-focused government (like the NHS in the UK).
The History of English podcast is worth a listen. It's about the development of the English language, so it covers a lot of history and prehistory, and also linguistics. The presenter Kevin Stroud has a deep passion for the subject matter. Unfortunately, he also has a tendency to repeat himself and over-explain simple examples so the effect can be somewhat soporific.
I'd be very careful what you let others tell you about Early Christianity - almost exclusively they will explain to you how whatever belief they have now came out of it and compare the two and determine today we have truth...
Early was earlier tho - the people that twist our recent revelations about the content of the testaments to support the perverted teaching today are literally twisting the closest text in existence to Jesus himself, to "explain" (pervert) a statement Jesus says in the Gospel of Thomas - a text predating all Gospels, may actually be perverting words truly spoke by Jesus, which is "leading astray" the truth that has come out from the millenia - as he said it would...
Not one sentence of this have I heard someone speak or have I read on someone's page.
I am not sure what do you mean by your hint. Stating that the Gospel of Thomas predates all other Gospels is not a view that is shared by many scientists.
Well, it actually is commonly accepted either as the "Q Text" - or as possibly the "Q" text. The "Q" text is the single document long speculated to be the primary source for the 4 Gospels in the Bible, and that was before we had the Gospel of Thomas as we do now - nothing at all to do with a conspiracy theory that shares the letter.
Anyways, I've also read the Gospel of Thomas - it most certainly predates the other Gospels.
If it doesn't, the others have been so edited and changed over time that they've been rendered more likely to be a copy than an original work.
It's also fantastic and Jesus himself says things the Church doesn't want you to believe - hence why a book comprised exclusively of the sayings of Jesus Christ (many of which are in the other Gospels) isn't in the Bible, bc THAT Jesus preached a different Christianity than the one we have. Weird...
It's one of my favorite texts in general. Super easy read... perhaps think for yourself?
I mean fr, scientist made up dark matter and dark energy bc by their own realization, their maths didn't work - so they invent an invisible thing to make their maths work and it now turns out, the maths did work, if the universe was simply older than they accounted - they obviously didn't consider that or I would kno all about how embarrassingly stupid they have been to avoid looking somewhere they have "decided" the answer already. They just pulled it out their ass - it's in textbooks, I doubt it exists at all but the people with papers on it will keep on as "a theory" until they die, solely out of pride.
That's just my one example of today. You as capable as almost every other person - let's discuss when you've read it.
This of course depends on your budget and what you expect to do with these models. For a lot of people, the most cost-effective solution is probably to rent a GPU in the cloud.
The limiting factor for running LLMs on consumer grade hardware is generally how much memory your GPU has access to. This is VRAM that's built into the GPU. On non-Apple hardware, the GPU's bandwidth to system RAM is so constrained that you might as well run those operations on the CPU.
The cheapest PC solution is usually second-hand RTX 3090's. These can be had for around $700 and they have 24G of VRAM. An RTX 4090 also has 24G of VRAM, but they're about twice as expensive, so for that price you're probably better off getting two 3090's than a single 4090.
Llama.cpp runs on the CPU and supports GPU offloading, so you can run a model partly on CPU and partly on GPU. Running anything on the CPU will slow down performance considerably, but it does mean that you can reasonably run a model that's slightly bigger than will fit in VRAM.
Quantization works by trimming the least significant digits from the models' parameters, so the model uses less memory at the cost of slight brain damage. A lightly quantized version of QwQ 32B will fit onto a single 3090. A 70B parameter model will need to be quantized down to Q3 or so to run entirely on a 3090. Or you could run a model quantized to Q4 or Q5, but expect only a few tokens per second. We'll need to see how well the quantized versions of this new model behave in practice.
Apple's M1-M4 series chips have unified memory so their GPU has access to the system RAM. If you like using a Mac and you were thinking of getting one anyway, they're not a bad choice. But you'll want to get a Mac with as much RAM as you can and they're not cheap.
I would love to have an AI as an assistant Dungeon Master (or game master, or Keeper, or what have you). That is, one person in a group of players maintains the role of a master storyteller, but the AI is ready to fill in details or suggest ways to get the players back on track. This would probably be tedious if you're interacting with the LLM entirely through text, and having to manually keep it up to date with the story. But it could work well if you have a model that understands spoken language listening in on the game and generating cool images and making private suggestions to the game master.
reply