> the documentation is poorly written (all LLM vendors seem to have an internal competition in writing confusing documentation).
This is almost certainly because they're all using LLMs to write the documentation, which is still a very bad idea. The MCP spec [0] has LLM fingerprints all over it.
In fact, misusing LLMs to build a spec is much worse than misusing them to avoid writing good docs because when it comes to specifications and RFCs the process of writing the spec is half the point. You're not just trying to get a reasonable output document at the end (which they didn't get anyway—just try reading it!), you're trying to figure out all the ways your current thinking is flawed, inadequate, and incomplete. You're reading it critically and identifying edge cases and massaging the spec until it answers every question that the humans designing the spec and the community surrounding it have.
Which means in the end the biggest tell that the MCP spec is the product of LLMs isn't that it's somewhat incoherent or that it's composed entirely of bullet lists or that it has that uniquely bland style: it's that it shows every sign of having had very little human thought put into it relative to what we'd expect from a major specification.
DeepSeek's documentation has a different problem, which is that there are spelling errors and weird grammatical constructions everywhere:
"DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to receive a response from the server. During this period, your HTTP request will remain connected, and you may continuously receive contents in the following formats..."
The documentation is still mostly easy to read, so it doesn't *really" matter, but I always thought this was bizarre. I mean, I get the language barrier reading manuals from Chinese products off of Amazon or whatever, but this is a company that does nothing but work with language all day long, and even at one point had the world's leading English-speaking language model. Shouldn't they be able to produce professional-looking documentation without spelling and grammatical errors?
"does NOT constrain user's rate limit" should be "does NOT rate limit incoming requests" or similar.
"We will try out best" should be "our best".
"when our servers are under high traffic pressure" is at least grammatical, but it's awkward. Normally you'd say "when our servers are dealing with high load" or something similar.
"your requests may take some time to receive a response from the server" is again grammatical but also awkward. "Our response times may be slower" would be more natural.
The last sentence is also awkward but the whole thing would need to be restructured, which is too much for an HN comment.
Basically: everything about this screams English as a second language. Which does mean that it's unlikely to have been LLM generated, because from what I've seen DeepSeek itself does a pretty good job with English!
I'm a native English speaker, and I partially disagree with your claims of awkwardness.
"when our servers are under high traffic pressure" - this is a bit awkward I agree, but only the last three words.
If we rearrange it to "when our servers are under pressure from high traffic", I think it sounds good. It's using a metaphor, and I think that should be encouraged. It's interesting. And the phrase "high traffic" conveys some drama.
"your requests may take some time to receive a response from the server" - I think that's fine, to be honest. I like it.
I think you are conflating "awkwardness" with linguistic flair. Technical documentation English has become standardised to a large degree, which of course is useful, and efficient. But it is also a narrow usage of English, and breaking out of its straitjacket does not make language awkward.
That’s a very generous interpretation. I don’t know mandarin but these are likely a transfer of grammar constructs from the primary language to english, in the same way the Dutch will say “make a picture” or “the house of my parents”, which can be justly classified as awkward rather than as linguistic flair.
If someone was editing my writing, it would feel a bit patronizing if they said grammar mistakes (many of which come from my mother tongue Portuguese) are “adding flair”, as they are not a stylistic choice.
I'm not claiming it was intentional on their part. My point was solely one of language, so how the sentence came to be written that way is out of scope. And given the word swap I suggested, I don't think it is awkward at all (unlike your examples from Dutch, which definitely are).
As for it being patronising, why is telling a non-native speaker their sentence is interesting unacceptable, but telling them it's awkward is ok? (Assuming both are genuinely held opinions).
I'll reiterate my point that common English usage (non-awkward?) has narrowed enormously in the last 50 years. I think that this is a bad thing.
Your point of how the social norms for English have changed over the last 50 years could be interesting but what does it have to do with the parents point of "these docs seem human written and not spell checked which is very different from the other ai companies AND which is weird anyway for a megacompany with ai tools that write English well".
Which part is it that has linguistic flair? Is it "The prices listed below are in unites of per 1M tokens", or "The expense = number of tokens × price"? Or maybe "you may continuously receive contents in the following formats"?
This sentence is a good example where the native speaker's version is worse (in this case because it's just non-sense, as the parent commenter already pointed out).
Sounds like you're the kind of person who will insist to Spanish speakers that a double negative is logically incoherent. Good luck with that approach to language!
When you remark on improvements, up is generally better and down is generally worse. So saying "response times will be higher" gives an immediate sentiment of improvement. But, obviously, a moments thinking helps you re-orient and realize it's better. This is why plots often have "lower is better" in the legend, to help readers understand.
I often use 'slower' and 'faster' as a native speaker to help reinforce the meaning of the direction.
"Response times will be higher" sounds very confusing as a way of saying we'll take less time to respond, right? So why should "response times will be lower" mean we'll take more time if the opposite construct is confusing?
Far better to just use the comparative forms that we already have for time specifically to make it perfectly clear.
If your problem is that the texts you quoted were not written by someone with English as the first language, I tell you: English is not the framework of human civilization, it sometimes uses English for data quantization and message passing.
A lot of English native speakers has such assumptions that:
- any academic topics are universally discussed in English/Latin and so every highly educated person shall speak good English,
- language is like a thin wrapper over a to-be-converted-to-YAML common intermediate language(Universal Grammar theory),
- anything should translate into fluid English with intent completely intact,
- but WWW is >90% English anyway,
- etc.
None of these are true, and it's just not realistic for a well educated East Asian - common theme of East Asian languages is it's all custom implementations with minimal sharing with neighbors let alone English - to "just" pick up natural English. I suppose you're looking for something like following:
"At DeepSeek, we strive to serve every request to our customers with best of our effort, and we do not impose a rate limit for our APIs. However, do note that due to finite nature of our computing resources, API responses might become delayed in cases when our backend is experiencing high load. Under such circumstances, the HTTP sessions will be kept alive, and response will be served in following formats..."
... Isn't this a $1m/yr skill on its own? Have you seen a great Far East engineer write like this - I mean, how often do you come across a Far Eastern translator that can casually do this?
Why do we not pretend like foreign language technical ghostwriting is a solved problem! You guys are asking for complete rewrites by someone explicitly NOT Chinese natives for all documentations. There's some point it'll be just an unreasonable ask.
A lot of HNers puts blind trust on Universal Grammar Theory and downplay languages as all but obsolete human output packing format that are each no more than header differences and those are just wrong. Languages are at least CODEC. And if you go back to the original topic from there, I don't think it will sound so unreasonable that translating between different CODECs will induce losses and artifacts.
DeepSeek API does NOT have rate limits.
However, when our servers are under high traffic,
your requests may take some time. During this period,
you will continuously receive the following responses:
I've seen documents that were applications by CCP-affiliated provincial government bodies, things like detailed studies for loan applications to international banks, etc. and trust me, the Deepseek documentation is miles ahead of that. These are official government documents from one government agency to some international agency.
This was about Amazon products rather than government documentation, but the point is the same. I'll just quote the relevant part:
> The people who make these products have to spend millions and millions of dollars setting up factories, hiring people, putting things into production, etc. But somehow they don't have a budget for a bilingual college student intern to translate a bunch of copy to English better than "using this product will bring a great joy." Why?
> I will make a super strong claim: ChatGPT can now do nearly perfect mass translations of this stuff for free, in theory simultaneously increasing translation quality and reducing costs. Despite this, for whatever reason, I predict that the average translation quality on Amazon won't improve within the next few years.
My super strong claim has so far been correct. Just go on Amazon.com and click just about anything. For instance, here's a random blanket: https://www.amazon.com/dp/B07MR4FSPT
"OPTIMUM GIFT: All people can use this flannel fleece blanket in Coach、Office、Bed、Study, etc. Reversible softness offers all seasons warmth.
INTIMATE SERVICE: If you have any questions, please contact us. it is our pleasure to serve you."
How does a human being in this situation somehow invent the phrase "OPTIMUM GIFT?" "Optimum" is a fairly advanced English word. Maybe you'd expect, I dunno, "GREAT GIFT" or "BEST GIFT"? And "INTIMATE SERVICE?"
And once again, we now have magic English-speaking computers that can do this all for us - for free - and China has unanimously decided "nah, screw that. We'd rather go with INTIMATE SERVICE."
I live in Japan, and when you read English texts here (it doesn’t really matter if it’s a restaurant menu, a pamphlet at a touristic area, a flyer for local government services…) the same English word will often be written differently within the same document (eg for a recent one I saw: “curbside” was spelled “crubside” and “carbside”).
I always wonder how that happens, because the documents themselves often smell strongly of machine translation - but if they’re machine translated, how would those mistakes get in? My best guess is that there’s a human manually typing out a machine translation output, which kind of boggles the mind.
I think us computer nerds who are used to using computers to do work efficiently have a hard time imagining all the weird ways in which non-computer nerds actually use computers.
Sometimes they are machine translated by someone who doesn't understand Ctrl+C shortcut if it's longer than 10 words, but equally often they're just hand kneaded. Japanese English education is effectively machine translation with human brains as machines - we're not actually taught English at all[0], just memorized technical rulesets that yield predictable garbage. A lot of weird "Engrish" text is likely result of that.
0: That's supposed to be drastically changing, we'll see if it does. English skills is still a resume stuffer in Japan.
I mean, it's not so easy getting a perfectly knowledgeable English speaker in China. Heck, if you see some of the interviews of the insanely viral Tony from LC Signs, his English outside of his skits is actually very Chinese-flavored. One of my friend's exes who was Chinese and studied at Oxbridge had a very similar strong Chinese twang, and often made grammatical mistakes while writing, even though the British would easily call her fluent.
[me]
> Translate to palatable startup-style English:
>> DeepSeek API 不限制用户的访问速率。我们会尽力满足每个请求。但是,请注意,当我们的服务器流量压力较大时,您的请求可能需要一些时间才能收到服务器的响应。在此期间,您的 HTTP 请求将保持连接状态,您可能会持续收到以下格式的内容……
[LLM]
> Here's a more palatable, startup-style translation of your message:
> DeepSeek API has no hard rate limits. We strive to process every request as quickly as possible. However, during peak traffic, responses may take slightly longer. Rest assured, your HTTP connection will remain active, and you may continue to receive real-time updates in the following format…
I asked it to turn it "corpospeak like":
> At DeepSeek, we prioritize accessibility and scalability—which is why we enforce no strict rate limits on API usage. Our systems are designed to handle all requests with high availability, though during peak operational loads, response times may experience nominal delays. Rest assured, your connection will remain active, and responses will continue streaming in real time with the following structure:
... This is Google Translate from GP -> DeepSeek Web. I don't think DeepSeek is a perfectly knowledgeable English speaker in China. "However, during peak traffic," is basically a word substitution on "但是,当服务器流量压力大时", if my Han Script reading is right. Parts of the corpo version like "response times may experience nominal delays." still shows Chinese accent, assuming that's the part you think must be thoroughly washed off.
What you're asking needs English-first bilingual human person who can be trusted and has tech backgrounds. That's quite a tall order.
Similarly strange and incorrect grammatical constructions are found in the English translations for Game Science’s hit game Black Myth Wukong. My expectations for, for example, the construction manual for a bookshelf is pretty different than a game or AI model & service costing tens of millions of dollars in development (or more).
Heck, they could literally pay any native English speaker to take their English-ish translations and regionalize them; you don’t even need to know Chinese to fix those paragraphs. Why is this such a common problem with the English China exports? Is it cultural? Are they so disconnected from the west that they don’t realize?
A great counter-example is NetEase’s Marvel Rivals; their English translations are fantastic, and even their dev interviews with their Chinese development team is fantastically regionalized. They make a real effort to appeal to English audiences.
Sometimes I wonder if I have ADHD or if it's induced by the content, because I can spend hours soaking up interesting literature and putting my weird thoughts down onto paper but I can barely make it a few words through LLM-driven drivel.
It's crazy seeing bots posting AITA rage bait on Reddit that always follows the same pattern: some inter-personal conflict that escalates to a wider group: "I told my husband I wasn't into face-sitting and now all my colleagues are saying I should sit on his face to keep the peace."
That is one thing but using the same LLM to drive your tech specs, knowing it can say a whole lot of shit the 'author' isn't aware of, because they're illiterate and that is fucking normal... is worrying.
I let the domain "micropolisonline.com" expire, which I was using for the old OpenLaszlo/Flash Python/SWIG/C++ client/server based version of open source SimCity, and somebody took it over and replaced it with AI generated claptrap, stealing a lot of my own and other's images without any credit. It even promises the source code, but doesn't actually link to it, just has promises and placeholders.
It totally misrepresents what Micropolis is, which was based on the original SimCity classic, and confuses it with all the subsequent versions of SimCity and other made-up stuff. And it never mentions the GPL-3 license, EA's license and restrictions on the use of their SimCity trademark, or Micropolis's license to use their trademark. I have no idea what the point of it is.
>How to Access the Source Code: For those eager to explore the Micropolis Online Source Code, it is available on our dedicated GitHub repository. Visit [Link] to access the repository, where you can browse the code, contribute to ongoing projects, or initiate your own.
They could be in some legal jepordy since they didn't mention or link to the Micropolis GPL License or the Micropolis Public Name License, which they may be violating.
The have a "Meet the Team" page that mentions nobody, just hand waves about "we" and the community. They couldn't even bother to generate generic looking fake profiles of non-existent people. Suffice it to say I never heard back from anyone after using the "Contact Us" page.
They even have a cute little Terms and Conditions page with their very own license, which doesn't allow anyone to do to them what they did to me, and is not particularly GPL-v3 compatible:
>Unless otherwise stated, Micropolis Online and/or its licensors own the intellectual property rights for all material on Micropolis Online. All intellectual property rights are reserved. You may view and/or print pages from micropolisonline.com for your own personal use subject to restrictions set in these terms and conditions.
>You must not:
>Republish material from micropolisonline.com
Sell, rent, or sub-license material from micropolisonline.com
Reproduce, duplicate, or copy material from micropolisonline.com
They also claim all rights to all user created content:
>By displaying Your Content, you grant Micropolis Online a non-exclusive, worldwide irrevocable, sub-licensable license to use, reproduce, adapt, publish, translate, and distribute it in any and all media.
Kind of ironic for an LLM to go around stealing people's content, then telling them that not only can't anyone copy it back, but it owns the rights to everything anyone else may contribute in the future.
In my experience AI startups are AI maximalists. They use AI for everything they can. AI meeting summarizations, AI search (Perplexity), AI to write code and contracts, AI to perform SEO, AI to recruit candidates, etc. So I 100% believe they would use AI to write specs.
I can't say whether the original spec was written with AI assistance, but having a cursory look through the commit history [0] it doesn't look like they're just blatantly auto-generating the docs. The git history indicates that they do think about the spec and manually update the docs as the spec changes.
I don't write perfect English. Far from it. But I'd prefer broken English any day over default LLM verbiage. It seems so unnatural and facetitious. I always have this in my prompts: "Be succinct and use simple English sentences".
The DeepSeek documentation seems to be better. It looks to be quickly thrown together but not bad. I’m not sure what that says about LLMs writing documentation.
It had not occurred to me that the AI coding vendors are basically positively motivated to themselves produce code that is not documented. They want code that is comprehensible to AIs but actively not comprehensible to humans. Then you need their AIs to manipulate it.
AI code as the biggest "lock you in the box" in programming history. That takes rather a lot of the luster out of it....
They'd better be right that they can get to the point that they can fully replace programmers in about two years, otherwise following this siren song will, well, demonstrate why I chose "siren song" as my metaphor. If AI code produces big piles of code that are simply incomprehensible to humans, but then the AIs can't handle it either, they'll crash out their own market by the rather disgusting mechanism of killing all their customers, precisely because the customers consumed their service.
This is almost certainly because they're all using LLMs to write the documentation, which is still a very bad idea. The MCP spec [0] has LLM fingerprints all over it.
In fact, misusing LLMs to build a spec is much worse than misusing them to avoid writing good docs because when it comes to specifications and RFCs the process of writing the spec is half the point. You're not just trying to get a reasonable output document at the end (which they didn't get anyway—just try reading it!), you're trying to figure out all the ways your current thinking is flawed, inadequate, and incomplete. You're reading it critically and identifying edge cases and massaging the spec until it answers every question that the humans designing the spec and the community surrounding it have.
Which means in the end the biggest tell that the MCP spec is the product of LLMs isn't that it's somewhat incoherent or that it's composed entirely of bullet lists or that it has that uniquely bland style: it's that it shows every sign of having had very little human thought put into it relative to what we'd expect from a major specification.
[0] https://modelcontextprotocol.io/specification/2025-03-26