Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is really exciting! They're laying out an architecture that may mean even small players with cheap GPUs can compete with the majors. The idea implies that eventually crowd-sourcing an open AI is probably technically feasible and we've got the Chinese actively researching how to do it to a high standard that competes with the monolithic models.

I was sceptical of the US sanctions but this seems like a real win if this can be taken all the way to its logical conclusions.






Yeah the sanctions will (not sarcastically) actually improve the world on a number of fronts. Increasing diversity of compute, forcing decentralization of manufacturing, etc. etc.

also increase smuggling, theft, espionage, crime, sabotage.

There are much better ways to increase diversity


PRESIDENT TRUMP: "You don’t think we can. You don’t think we do that to them? We do. So we do a lot of things." https://singjupost.com/transcript-maria-bartiromo-interviews...

Nit-pick: smuggling is when you import goods into a country without informing the relevant government bodies. When it comes to GPUs, it's one country that has declared an export ban. Chinese port authorities wont care if you declare you're importing a container with 16000 Nvidia GPUs as that's still legal.

This is a mistaken belief. Sure you get all those negative aspects to being degrees, just like you get them under all other conditions … Chinese, Russian, Israeli espionage over the last ~80 years, anyone?… but you cannot actually get diversity without isolation that permits actual diversity to emerge.

Diversity is not pouring oil into water and using the polluted oil-water in lieu of oil and also in lieu of oil. If you want actual diversity you need differences that are separated from each other. It is precisely what has been collapsing for the last 80+ years, actual real diversity, precisely because unique separate groups and clusters have been shattered, scattered, mixed, and polluted.

Even AI is now accelerating this collapse of what is really a form of human biodiversity, or should it be called cultural diversity, as AI is causing a conformity of thought. There are several reports and papers on that phenomenon already.

It’s absolutely ridiculous to claim that somehow those factors will increase over the prior situation simply because we increase actual, real diversity of unique things; not this fake, fraudulent, delusional diversity that has forced on us like a toxic sludge dump that has destroyed human diversity as everyone increasingly consumes the same “content” slop and eats the same food slop, and has the same cultural and musical slop.


[flagged]


Why would the Chinese self destructing be “amazingly helpful “ to the West? This sounds like spiteful vitriol.

I think vs spending it on defense spending which would further make us mil parity difficult.

[flagged]


Jesus. What a steady consumption of American neocon propaganda can do to a human brain! It's so sad!

Do you even know what that word means? Do you think that Taiwan is gonna be just fine if the US packs up and leaves tomorrow? That things will work out great for the people living there?

You can call it whatever you want. People who have fled shitty regimes have a much better sense for propaganda than you do, evidently.


> Do you think that Taiwan is gonna be just fine if the US packs up and leaves tomorrow?

I do. The World will be just fine as the American empire fades and the US becomes just another country. Even for the American people, for the average persons, it will be an improvement.


>> ... spends the normalized equivalent of America’s defense spending...

I'd be interested in seeing the numbers for that claim broken down if you can cite them. From napkin math it seems hard to make the budgets line up, unless we're doing a very large purchasing power parity adjustment?


There exists such numbers/information circulated mainly inside Chinese (language) media/social media in form of "screenshot" but no links. Screenshot as a way of hiding source is a common format for this type of information because the links will disclose the media that spread the information. Then normal (Chinese) audience will know the credibility of the information. Give you an example, "epoch times" is a common source of such type of information. The nature of the media is well-known to Chinese audience.

The real equivalence to US defense budget in term of size is actually the infrastructure construction budget. While both budgets boost the economy , infrastructure budget improves the life of local people. Now as the most cities in coast areas run out of project to build, the over capacity cultivated in early years is poured to other directions: rural areas, undeveloped provinces, and even overseas especially Africa and Latin America. It's amazing that China changes very fast year by year as I visited some rural areas.

Ionically this behavior of infrastructure building sounds like Chinese MAGA to me: mind our own business, focus on improve ourselves instead of spread values to other countries.


I'm struggling to see which is worse, using AI to police their own people, or using AI to genocide in Middle East.

> China spends the normalized equivalent of America’s defense spending on suppressing their own citizens.

I don't believe you.

> From a western standpoint, this is amazingly helpful because it’s a form of Chinese self destruction and waste.

As a Westerner, as a human, I reject this zero-sum mentality.


The sanctions will (not sarcastically) massively harm the world because Nvidia may no longer be a free money cheat code. I like having an easy economic strategy for investing...

The world doesn’t have to optimize policy to increase the profits of a single American company.

Chinese stocks are pretty reasonable right now, if their market has dealt with the insider trader mess then it might be a good time to onboard. It isn’t for the feint of heart however.

Markets used to be places to make money more smart (efficient allocation of capital) but have somehow degraded to index fund buys that track average economic growth of a few hot stocks that are expected to at least not get cold anytime soon.


>The idea implies that eventually crowd-sourcing an open AI is probably technically feasible

It's already technically feasible: https://www.primeintellect.ai/blog/intellect-2



Deepseek-R1 is at the level of GPT 4.1 already, it's open-weight, open-source and they even open-sourced their inference code.

I don't know why everyone keeps echoing this, my experience with Deepseek-R1, from a coding perspective at least, has been underwhelming at best. Much better experience with GPT 4.1 (and even better with Claude, but that's a different price category).

I'm not arguing which model is better for your use-case. I'm saying in general as it's "powerful" as GPT 4.1 in a lot of benchmarks, and you can peak underneath the hood, even make it better for your said use-case

Do you mean V3? V3 is 4.1 level or above.

A lot of software (eg. ollama) has confusingly named Deepseek's distill/finetunes of other base models "DeepSeek-R1" as well. See eg. https://www.threads.com/@si.fong/post/DKSdUOHzaBB

I wonder whether you're actually running the proper DeepSeek-R1 model, or one of those lesser finetunes?


In my experience, all reasoning models feel (vibely) worse at structured output like code versus comparable non-reasoning models, but far better at knowledge-based answering.

This is everyone with every model.

People sang praise from the roof for Google's Gemini 2.5 models, but in many things for me they can't even beat Deepseek V3.


What would be an example of 2.5 Pro failing against R1 (which is what you'd actually want to compare it to)?

R1 sometimes fails against V3 for me too, so its not a specific dig against Gemini.

In terms of code and science, Gemini is way, way too verbose in its output, and because of that it ends up getting confused by itself and hurting the quality of longer windows.

R1 does this too, but it poisons itself in the reasoning loop. You can see it during the streaming, literally criss-crossing its thoughts and thinking itself into loops before it finally arrives at an answer.

On top of that, both R1 and Gemini Pro / Flash are mediocre at anything creative. I can accept that from R1, since it's mainly meant as more of a "hard sciences" model, but Gemini is meant to be an all-purpose model.

If you pit Gemini, Deepseek R1 and Deepseek V3 against each other in a writing contest, V3 will blow both of them out of the water.


Agreed on the last point, V3 is terrifyingly good at narrative writing. And yes, R1 talks itself out of correct answers almost as often as it talks itself into them.

But in general 2.5 Pro is an extremely strong model. It may lose out in some respects to o3-pro, but o3-pro is so much slower that its utility tends to be limited by my own attention span. I don't think either would have much to fear from V3, though, except possibly in the area of short fiction composition.


I got the impression that 03-mini or 03-mini-high were meant for coding? GPT 4.1 was meant for creative writing, not coding?

It’s good at a lot of things:

  GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.
https://openai.com/index/gpt-4-1/

They are trained on these "benchmarks", that's why they score better.

If they were trained on those benchmarks they would score 100%

They show how bad they are, they cannot score 100% on benchmarks they were trained on.

[flagged]



wasnt it shown recently that the filtering layer is on the prompt input and llm output, and not on the training set or model weights.

https://www.socialscience.international/making-deepseek-spea...


It depends on the model, probably, but there are multiple layers of censorship, one of which is the post-facto nuking these models will do online, and that goes away "for free" when you download the open weight model.

I don't have a powerful enough system to run DeepSeek, but I've tried this with some of the Qwen3 models. They'll write answers that discuss Xi Jinping (which results in an auto-nuke of the reply from Chinese-hosted models, at least DeepSeek) or other very mildly/nominally sensitive topics.

(This is probably a coarse measure to easily ensure compliance with a recent national security law that requires commercial providers of web services address sensitive topics "appropriately" or something like that, and LLMs run non-deterministically. That's why this layer of censorship often comes across as laughably extreme— it's an extreme compliance strategy that exceeds the demands of the law for the sake of guaranteeing legal safety from an unpredictable software system.)

But the same models will altogether refuse to discuss the Tiananmen Square Massacre, even locally.

Some "decensored" versions of the Qwen3 models will discuss the Tiananmen Square Massacre, but in a very concise, formulaic, "official" way. After some chatting about it, it fell into an infinite repetition of one of its short formulaic answers (a behavior I didn't see with the original Qwen3 models with the same settings).


FWIW, I've downloaded Deepseek's R1 (DeepSeek-R1-0528 -- which is released after the your linked article) model's weights and ran it locally. I asked it about what happened in Beijing 1989-06-04, and it basically gave me a stern statement that could have been written by CCP propaganda department. I asked it to give other alternative views besides the CCP perspective, but it simply continued to stonewall me.

So yeah, the model itself is tuned at least somewhat to refuse to talk about politically sensitive things. It's not just another filter.


SETI@Home style peer2peer open GPU training network is something I’m looking into as well.

Possible and has been done, but super-slow and inefficient resulting in long training times for small models. To keep compute occupied you need to pass gradients very fast.


This is what piqued my interest in the first place

Yes but could you break it up into chunks of sets of gradients to compute? I know that compute needs the full chunk to compute a set. Again, things I’m exploring but ultimately no different than just having the full dataset on disk and just scaling out compute nodes in ro mode.

I suppose its exciting, but whether that is a good thing depends entirely on how much you think AI technologies pose existential threats to human survival. This may sound hyperbolic, but serious people are seriously thinking about this and are seriously afraid.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: