The MCP server technically doesn't support DCR. The authorization server for the MCP server does, which is a minor distinction.
Have you seen significant need for this? I've been trying to find data on things like "how many MCP clients are there really" - if it takes off where everything is going to be an MCP client && dynamically discovering what tools it needs beyond what it was originally set up for, sure.
... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.
And you're lucky if it's two. If they're not familiar with the rendering engine/docs system/API platform/middleware etc (or just think they aren't), or have low confidence in a debt ridden platform, they'll spend five minutes a day for a week or three theorizing on how it could have gone wrong, debating if, actually, it's correct the way it is, refreshing their knowledge of deep internals, debating if a fix could break something, and so on. Way safer to do that than risk making things worse in an uncertain system.
I agree it doesn't work for all cases. In our case, some services can have complex, service-specific access control logic that's hard to express declaratively in a token, so we also have to make some checks by consulting the DB. I don't think that's usually a problem (performance-wise) because we already need to contact the DB anyway - to retrieve the entity to work with (and that has, say, an OwnerID property). The access token helps reduce DB load by skipping general checks ("can the user access calendars in principle?"), and for users who can, we then consult the DB additionally, if the service requires it ("is the user actually the owner of this calendar?" or any other additional access logic). The general case "can the user access calendars in principle?" also allows to hide menu items / return 403 in the UI immediately with zero DB or cache cost.
We don't need packages or functions, we can just write the code ourselves, every time it's needed.
We don't need compilers, the computer understands assembly already.
It's a mismatch of use - MCP isn't about telling the LLM about the APIs. It's also about encapsulating and simplifying them so the LLM isn't manually figuring them out every single time from scratch.
OpenAPI specs provide everything the LLM needs to understand an API. And if it can be simplified for the LLM while retaining functionality, why not simplify the API altogether, instead of implementing yet another entry point into the application? Just so nobody has to sit down for longer than the duration of a TikTok and design their API properly?
The point above about enterprise glue is why this is a pull model.
In your push model, the onus is on you to go find the scan from one of five backends, traverse whatever hoops of access are needed, and actually handle the files manually.
In the pull model, each backend implements the server once, the LLM gets connected to each one once, and you have one single flow to interact with all of them.
It is interesting that the model I am proposing inverts many peoples expectation of how LLMs will benefit us. In one vision, we give a data-lake of information to LLMs, they tease out the relevant context and then make deductions.
In my view, we hand craft the context and then the LLM makes the deductions.
I guess it will come down to how important crafting the relevant context is for making useful deductions. In my experience with writing code using LLMs, the effectiveness increases when I very carefully select the context and the effectiveness goes down when I let the agent framework (e.g. Cursor) figure out the context. The ideal case is the entire project fits in the context window obviously, but that won't always be possible.
What I've found is that LLMs struggle to ask the right questions. I will often ask the LLM "what other information can I provide you to help solve this problem" and I rarely get a good answer. However, if I know the information that will help it solve the problem and I provide it to the agent then it often does a good job.
> In my view, we hand craft the context and then the LLM makes the deductions.
We (as in users) provide the source material and our questions, the LLM provides the answers. The entire concept of a context is incidental complexity resulting from technical constraints, it's not anything that users should need to care about, and certainly not something they should need to craft themselves.
But it makes a radical difference to the quality of the answer. How is the LLM (or collaboration of LLMs) going to get all the useful context when it’s not told what it is?
(Maybe it’s obvious in how MCP works? I’m only at the stage of occasionally using LLMs to write half a function for me)
In short, that's the job of the software/tooling/LLM to figure out, not the job of the user to specify. The user doesn't know what the context needs to be, if they did and could specify it then they probably don't need an LLM in the first place.
MCP servers are a step in the direction of allowing the LLM to essentially build up its own context, based on the user prompt, by querying third-party services/APIs/etc. for information that's not part of their e.g. training data.
Touching on tenancy and the "real" gaps in the spec does help push the discussion in a useful direction.
https://vulnerablemcp.info/ is a good collection of the immediately obvious issues with the MCP protocol that need to be addressed. A couple low blows in there, that feel a bit motivated to make MCP look worse, but generally a good starting point overall.
At one point we were looking at moving a bunch of separate domains under a single dotless domain, due to the threatened death of 3p cookies, so that cookies could be dropped directly onto the cctld (think "you're logged into the entire TLD"). As the owners of the cctld it felt like a neat use that technically could work but ICANN and other groups are explicitly against that.
I think done well, AOL keywords are actually a good idea.
They could also cut down on the fraudulent websites out there.
Not sure how to fully implement it but given the safe browsing features already implemented in web browsers it could perhaps be part of that. Or a new TLD.
I imagine they'd have all the lovely problems of both EV certs (sure, you're legitimately PayPal Corp, in Malawi) and limited real estate price squeezes.
Curation of "good" or "real" websites has been tried before - I don't envy anyone that wants to try another go at it.
In the same way that crypto folks speedran "why we have finance regulations and standards", LLM folks are now speedrunning "how to build software paradigms".
The concept they're trying to accomplish (expose possibly remote functions to a caller in an interrogable manner) has plenty of existing examples in DLLs, gRPC, SOAP, IDL, dCOM, etc, but they don't seem to have learned from any of them, let alone be aware that they exist.
Give it more than a couple months though and I think we'll see it mature some more. We just got their auth patterns to use existing rails and concepts, just have to eat the rest of the camel.
> Give it more than a couple months though and I think we'll see it mature some more.
Or like the early Python ecosystem, mistakes will become ossified at the bottom layers of the stack, as people rapidly build higher level tools that depend on them.
Except unlike early Python, the AI ecosystem community has no excuse, BECAUSE THERE ARE ALREADY HISTORICAL EXAMPLES OF THE EXACT MISTAKES THEY'RE MAKING.
CPython in particular exposes so much detail about its internal implementation that other implementations essentially have to choose between compatibility and performance. Contrast this with, say, JavaScript, which is implemented according to a language standard and which, despite the many issues with the language, is still implemented by three distinct groups, all reasonably performant, yet all by and large compatible.
I don’t mind those, tbh (maybe because I came to Python from C and Lisp).
Contrary, I dislike NumPy’s decision to make reduce/accumulate methods of operations (e.g. np.add.reduce and np.mul.accumulate), not higher order functions or methods of NumPy arrays.
map() and filter() work on any iterable, so it's not clear which type you'd expect them to be methods on (keeping in mind that the notion of "iterable" in Python is expressed through duck typing, so there's no interface like say .NET IEnumerable where they would naturally belong).
len() guarantees that what you get back is an integer. If some broken type implements __len__() poorly by returning something else, len() will throw TypeError. You can argue that this is out of place for Python given it overall loose duck typing behavior, but it can still be convenient to rely on certain fundamental functions always behaving well. Similar reasoning goes for str(), repr(), type() and some others.
One can reasonably argue that Python should have something similar to C# extension methods - i.e. a way to define a function that can be called like a method, but which doesn't actually belong to any particular type - and then all those things would be such functions. But, given the facilities that Python does have, the design of those functions makes sense.
I guess there's an incentive to quickly get a first version out the door so people will start building around your products rather than your competitors.
And now you will outsource part of the thinking process. Everyone will show you examples when it doesn't work.
Hey there, expecting basically literacy or comprehension out of a sub-industry seemingly dedicated to minimising human understanding and involvement is bridge too far.
Clearly if these things are problems, AI will simply solve them, duhhh.
You joke, but with the right prompt, I am almost certain that an LLM would've written a better spec than MCP. Like others said, there are many protocols that can be used as inspiration for what MCP tries to achieve, so LLMs should "know" how it should be done... which is definitely NOT by using SSE and a freaking separate "write" endpoint.
That's LLM in a nutshell though. A naive prompt + taking the first output = high probability of garbage. A thoughtful prompt informed by subject matter expertise + evaluating, considering, and iterating on output = better than human-alone.
But the pre-knowledge and creative curation are key components of reliable utility.
Your comment reminds me that when I first wrote about MCP it reminded me of COM/DCOM and how this was a bit of a nightmare, and we ended up with the infamous "DLL Hell"...
Most users don't care about the implementation. They care about the way that MCP makes it easier to Do Cool Stuff by gluing little boxes of code together with minimal effort.
So this will run ahead because it catches developer imagination and lowers cost of entry.
The implementation could certainly be improved. I'm not convinced websockets are a better option because they're notorious for firewall issues, which can be showstoppers for this kind of work.
If the docs are improved there's no reason a custom implementation in Go or Arm assembler or whatever else takes your fancy shouldn't be possible.
Don't forget you can ask an LLM to do this for you. God only knows what you'll get with the current state of the art, but we are getting to the point where this kind of information can be explored interactively with questions and AI codegen, instead of being kept in a fixed document that has to be updated manually (and usually isn't anyway) and hand coded.
It's a read/write protocol for making external data/services available to a LLM. You can write a tool/endpoint to the MCP protocol and plug it into Claude Desktop, for example. Claude Desktop has MCP support built-in and automatically queries your MCP endpoint to discover its functionality, and makes those functions available to Claude by including their descriptions in the prompt. Claude can then instruct Claude Desktop to call those functions as it sees fit. Claude Desktop will call the functions and then include the results in the prompt, allowing Claude to generate with relevant data in context.
Since Claude Desktop has MCP support built-in, you can just plug off the shelf MCP endpoints into it. Like you could plug your Gmail account, and your Discord, and your Reddit into Claude Desktop provided that MCP integrations exist for those services. So you can tell Claude "look up my recent activity on reddit and send a summary email to my friend Bob about it" or whatever, and Claude will accomplish that task using the available MCPs. There's like a proliferation of MCP tools and marketplaces being built.
Wasn't the point of REST supposed to be runtime discoverability though? Of course REST in practice just seems to be json-rpc without the easy discoverability which seems to have been bolted on with Swagger or whatnot. But what does MCP do that (properly implemented) REST can't?
> Of course REST in practice just seems to be json-rpc
That's so wrong. REST in practice is more like HTTP with JSON payloads. If you find anything similar to json-rpc calling itself REST just please ask them politely to stop doing that.
Half the point of MCP is just making it easy for an LLM to use some language in a standard way to talk to some other tool. I mean MCP is partly a standard schema for tools to interact and discover each other, and part of it is just allowing non-webserver based communication (like stdio piping, especially useful since initial the use case is running local scripts
it is amazing we used to prize determinism, but now it's like determinism is slowing me down. I mean how do you even write test cases for LLM agents. Do you have another LLM judge the results as close enough, or not close enough?
What an amazing business to convince people to use. Making people pay to use LLMs to supervise the LLMs they pay for in order to get decent results is diabolically genius.
At the risk of offending some folks it feels like the genius of the Mormon church making its "customers" pay the church AND work for it for free AND market it for free in person AND shame anyone who wants to leave. Why have cost centers if you don't have to!
It's a business model I wasn't smart or audacious enough to even come up with.
I agree they should learn from DLLs, gRPC, SOAP, IDL, dCOM, etc.
But they should also learn from how NeWS was better than X-Windows because instead of a fixed protocol, it allowed you to send executable PostScript code that runs locally next to the graphics hardware and input devices, interprets efficient custom network protocols, responds to local input events instantly, implements a responsive user interface while minimizing network traffic.
For the same reason the client-side Google Maps via AJAX of 20 years ago was better than the server-side Xerox PARC Map Viewer via http of 32 years ago.
I felt compelled to write "The X-Windows Disaster" comparing X-Windows and NeWS, and I would hate if 37 years from now, when MCP is as old as X11, I had to write about "The MCP-Token-Windows Disaster", comparing it to a more efficient, elegant, underdog solution that got out worse-is-bettered. It doesn't have to be that way!
It would be "The World's Second Fully Modular Software Disaster" if we were stuck with MCP for the next 37 years, like we still are to this day with X-Windows.
And you know what they say about X-Windows:
>Even your dog won’t like it. Complex non-solutions to simple non-problems. Garbage at your fingertips. Artificial Ignorance is our most important resource. Don’t get frustrated without it. A mistake carried out to perfection. Dissatisfaction guaranteed. It could be worse, but it’ll take time. Let it get in your way. Power tools for power fools. Putting new limits on productivity. Simplicity made complex. The cutting edge of obsolescence. You’ll envy the dead. [...]
Instead, how about running and exposing sandboxed JavaScript/WASM engines on the GPU servers themselves, that can instantly submit and respond to tokens, cache and procedurally render prompts, and intelligently guide the completion in real time, and orchestrate between multiple models, with no network traffic or latency?
They're probably already doing that anyway, just not exposing Turing-complete extensibility for public consumption.
Ok, so maybe Adobe's compute farm runs PostScript by the GPU instead of JavaScript. I'd be fine with that, I love writing PostScript! ;) And there's a great WASM based Forth called WAForth, too.
It really doesn't matter how bad the language is, just look at the success and perseverance of TCL/Tk! It just needs to be extensible at runtime.
NeWS applications were much more responsive than X11 applications, since you download PostScript code into the window server to locally handle input events, provide immediate feedback, translate them to higher level events or even completely handle them locally, using a user interface toolkit that runs in the server, and only sends high level events over the network, using optimized application specific protocols.
You know, just what all web browsers have been doing for decades with JavaScript and calling it AJAX?
Now it's all about rendering and responding to tokens instead of pixels and mouse clicks.
Protocols that fix the shape of interaction (like X11 or MCP) can become ossified, limiting innovation. Extensible, programmable environments allow evolution and responsiveness.
It reminds me a bit of LSP, which feels to me like a similar speed-run and a pile of assumptions baked in which were more parochial aspects of the original application... now shipped as a standard.
And yeah, sounds like it's explicitly a choice to follow that model.
This article feels like an old timer who knows WebSockets just doesn't want to learn what SSE is. I support the decision to ditch WebSockets because WebSockets would only add extra bloat and complexity to your server, whereas SSE is just HTTP. I don't understand though why have "stdio" transport if you could just run an HTTP server locally.
I'm confused by the "old timer" comment, as SSE not only predates WebSockets, but the techniques surrounding its usage go really far back (I was doing SSE-like things--using script blocks to get incrementally-parsed data--back around 1999). If anything, I could see the opposite issue, where someone could argue that the spec was written by someone who just doesn't want to learn how WebSockets works, and is stuck in a world where SSE is the only viable means to implement this? And like, in addition to the complaints from this author, I'll note the session resume feature clearly doesn't work in all cases (as you can't get the session resume token until after you successfully get responses).
That all said, the real underlying mistake here isn't the choice of SSE... it is trying to use JSON-RPC -- a protocol which very explicitly and very proudly is supposed to be stateless -- and to then use it in a way that is stateful in a ton of ways that aren't ignorable, which in turn causes all of this other insanity. If they had correctly factored out the state and not incorrectly attempted to pretend JSON-RPC was capable of state (which might have been more obvious if they used an off-the-shelf JSON-RPC library in their initial implementation, which clearly isn't ever going to be possible with what they threw together), they wouldn't have caused any of this mess, and the question about the transport wouldn't even be an issue.
Yeah, if you want to be super technical, it's Node that does the actual command running, but in my opinion, that's as good as saying the MCP client is...
Indeed! But seemingly only for the actual object representation - it's a start, and I wonder if JSON is uniquely suited to LLMs because it's so text-first.
Is it the redundancy? Or is it because markup is a more natural way to annotate language, which obviously is what LLMs are all about?
Genuinely curious, I don’t know the answer. But intuitively JSON is nice for easy to read payloads for transport but to be able to provide rich context around specific parts of text seems right up XML’s alley?
The lack of inline context is a failing of JSON and a very useful feature of XML.
Two simple but useful examples would be inline markup to define a series of numbers as a date or telephone number or a particular phrase tagged as being a different language from the main document. Inline semantic tags would let LLMs better understand the context of those tokens. JSON can't really do that while it's a native aspect of XML.
Have you seen significant need for this? I've been trying to find data on things like "how many MCP clients are there really" - if it takes off where everything is going to be an MCP client && dynamically discovering what tools it needs beyond what it was originally set up for, sure.