One of the biggest lessons for me while riding the MCP hype was that if you're writing backend software, you don't actually need to do MCP. Architecturally, they don't make sense. Atleast not on Elixir anyway. One server per API? That actually sounds crazy if you're doing backend. That's 500 different microservices for 500 APIs. After working with 20 different MCP servers, it then finally dawned on me, good ole' function calling (which is what MCP is under the hood) works just fine. And each API can be just it's own module instead of a server. So, no need to keep yourself updated on the latest MCP spec nor need to update 100s of microservices because the spec changed. Needless complexity.
Unless the client speaks to each micro service independently without going through some kind of gateway or BFF, I'd expect you'd just plonk your MCP in front of it all and expose the same functionality clients call via API (or graphql or RPC or whatever), so it's basically just an LLM-specific interface.
No reason why you couldn't just use tool calls with OpenAPI specs though. Either way, making all your microservices talk to each other over MCP sounds wild.
nothing prevents you from using a proxy server if you want it all from one entry point. idk of any clients hitting hundreds of MCPs anyways, that's a straw man.
unlike web dev where you client cab just write more components / API callers, AI has context limits. if you try to plug (i get it, exaggerated) 500 MCP servers each with 2-10 tools you will waste a ton of context in your sys prompt. you can use a tool proxy (one tool which routes to others) but that will add latency from the processing.
In the end, MCP is just like Rest APIs, there isn't need for a paid service for me to connect to 400 Rest APIs now, why do I need a service to connect to 400 MCPs?
All I need for my users is to be able to connect to one or two really useful MCPs, which I can do myself. I don't need to pay for some multi REST API server or multi MCP server.
Agentic automation is almost always about operating multiple tools and doing something with them. So you invariably need to integrate with a bunch of APIs. Sure, you can write your own MCP and implement everything in it. Or you can save yourself the trouble and use the official one provided by the integrations you need.
There are quite a few competitors in this space, trying to figure the best way about this. I've been recently playing with the Jentic MCP server[0] that seems to do it quite cleanly and appears to be entirely free for regular usage.
Forcing LLMs to output JSON is just silly. A lot of time and effort is being spent forcing models to output a format that is picky and that LLMs quite frankly just don't seem to like very much. A text based DSL with more restrictions on it would've been a far better choice.
Years ago I was able to trivially teach GPT 3.5 to reliably output an English like DSL with just a few in prompt examples. Meanwhile even today the latest models still have notes that they may occasionally ignore some parts of JSON schemas sent down.
MCP doesn't force models to output JSON, quite the opposite. Tool call results in MCP are text, images, audio — the things models naturally output. The whole point of MCP is to make APIs digestable to LLMs
I'm not sure about that. Newer models are able to output structured outputs perfectly and infact, if you combine it with changesets, you can have insanely useful applications since changesets also provide type-checking.
The technical term is constrained decoding. OpenAI has had this for almost a year now. They say it requires generating some artifacts to do efficiently, which slows down the first response but can be cached.
> One server per API? That actually sounds crazy if you're doing backend
Not familiar with elixir, but is there anything prohibiting you from just making a monolith MCP combining multiple disparate API's/backends/microservices as you were doing previously?
Further, you won't get the various client-application integrations with MCP merely using tool-calling; which to me is the "killer app" of MCP (as a sibling comment touches on).
(I do still have mixed feelings about MCP, but in this case MCP sorta wins for me)
> just making a monolith MCP combining multiple disparate API
This is what I ended up doing.
The reason I thought I must do it the "MCP way" was because of the tons of YouTube videos about MCP which just kept saying how much of an awesome protocol it is, an everyone should be using it, etc. Once I realized it's actually more consumer facing than the backend, it made much more sense as why it became so popular.
If you search for MCP integrations, you will find tons of MCP "servers", which basically are entire servers for just one vendor's API (sometimes just for one of their products, eg. YouTube). This is the go to default right now, instead of just one server with 100 modules.
The MCP protocol itself is just to make it easier to communicate with the LLM clients that users can use and install. But, if you're doing backend code, there is no need to use MCP for it.
There's also little reason not to. I have an app server in the works, and all the API endpoints will be exposed via MCP because it hardly requires writing any extra code since the app server already auto-generates the REST endpoints from a schema anyway and can do the same for MCP.
An "entire server" is also overplaying what an MCP server is - in the case where an MCP server is just wrapping a single API it can be absolutely tiny, and also can just be a binary that speaks to the MCP client over stdio - it doesn't need to be a standalone server you need to start separately. In which case the MCP server is effectively just a small module.
The problem with making it one server with 100 modules is doing that in a language agnostic way, and MCP solves that with the stdio option. You can make "one server with 100 modules" if you want, just those modules would themselves be MCP servers talking over stdio.
> The problem with making it one server with 100 modules is doing that in a language agnostic way
I agree with this. For my particular use-case, I'm completely into Elixir, so, for backend work, it doesn't provide much benefit for me.
> it can be absolutely tiny
Yes, but at the end of the day, it's still a server. Its size is immaterial - you still need to deal with the issues of maintaining a server - patching security vulnerabilities and making sure you don't get hacked and don't expose anything publicly you're not supposed to. It requires routine maintenance just like a real server. Mulitply that with 100, if you have 100 MCP "servers". It's just not a scalable model.
In a monolith with 100 modules, you just do all the security patching for ONE server.
You will still have the issues of maintaining and patching your modules as well.
I think you have an overcomplicated idea of what a "server" means here. For MCP that does not mean it needs to speak HTTP. It can be just a binary that reads from stdin and writes to stdout.
>I think you have an overcomplicated idea of what a "server"
That's actually true, because I'm always thinking from a cloud deployment perspective (which is my use case). What kind of architecture do you run this on, at scale on the cloud? You have very limited options if your monolith is on a serverless and is CPU/memory bound, too. So, that's where I was coming from.
You'd run it on any architecture that can spawn a process and attach to stdin/stdout.
The overhead of spawning a process is not the problem. The overhead if your runtime is huge and/or slow to start could be, in which case you simply wouldn't run it on a serverless system - which is ludicrously expensive at scale anyway (my dayjob is running a consultancy where the big money-earner is helping people cut their cloud costs, and moving them off serverless systems that are entirely wrong for them is often one of the big savings; it's cheap for tiny systems, but even then often the hassle isn't worth it)
It is only undesirable if you are to expose your APIs for others to use via a consumer facing client. As a backend developer, I'm writing a backend for my app to consume, not for consumers (like MCP is designed for). So, this is better for me since I'm a 100% Elixir shop anyway.
I am just glad that we now have a simple path to authorized MCP servers. Massive shout-out to the MCP community and folks at Anthropic for corralling all the changes here.
The main alternative one would have for having a plug-and-play (just configure a single URL) non-MCP REST API would be to use OpenAPI definitions and ingesting them accordingly.
However, as someone that has tried to use OpenAPI for that in the past (both via OpenAI's "Custom GPT"s and auto-converting OpenAPI specifications to a list of tools), in my experience almost every existing OpenAPI spec out there is insufficient as a basis for tool calling in one way or another:
- Largely insufficient documentation on the endpoints themselves
- REST is too open to interpretation, and without operationIds (which almost nobody in the wild defines), there is usually context missing on what "action" is being triggered by POST/PUT/DELETE endpoints (e.g. many APIs do a delete of a resource via a POST or PUT, and some APIs use DELETE to archive resources)
- baseUrls are often wrong/broken and assumed to be replaced by the API client
- underdocumented AuthZ/AuthN mechanisms (usually only present in the general description comment on the API, and missing on the individual endpoints)
In practice you often have to remedy that by patching the officially distributed OpenAPI specs to make them good enough for a basis of tool calling, making it not-very-plug-and-play.
I think the biggest upside that MCP brings (given all "content"/"functionality") being equal is that using it instead of just plain REST, is that it acts as a badge that says "we had AI usage in mind when building this".
On top of that, MCP also standardizes mechanisms like e.g. elicitation that with traditional REST APIs are completely up to the client to implement.
I can’t help but notice that so many of the things mentioned are not at all due to flaws in the protocol, but developers specifying protocol endpoints incorrectly. We’re one step away from developers putting everything as a tool call, which would put us in the same situation with MCP that we’re in with OpenAPI. You can get that badge with a literal badge; for a protocol, I’d hope for something at least novel over HATEOAS.
The analogy that was used a lot is that it's essentially USB-C for your data being connected to LLMs. You don't need to fight 4,532,529 standards - there is one (yes, I am familiar with the XKCD comic). As long as your client is MCP-compatible, it can work with _any_ MCP server.
The whole USB C comparison they make is eyeroll inducing, imo. All they needed to state was that it was a specification for function calling.
My gripe is that they had the opportunity to spec out tool use in models and they did not. The client->llm implementation is up to the implementor and many models differ with different tags like <|python_call|> etc.
Clearly they need to try explaining it it easy terms. The number of people that cannot or will not understand the protocol is mind boggling.
I'm with you on an Agent -> LLM industry standard spec need. The APIs are all over the place and it's frustrating. If there was a spec for that, then agent development becomes simply focused on the business logic and the LLM and the Tools/Resource are just standardized components you plug together like Lego. I've basically done that for our internal agent development. I have a Universal LLM API that everything uses. It's helped a lot.
The comparison to USB C is valid, given the variety of unmarked support from cable to cable and port to port.
It has the physical plug, but what can it actually do?
It would be nice to see a standard aiming for better UX than USB C. (Imho they should have used colored micro dots on device and cable connector to physically declare capabilities)
Certainly valid in that just like various usb c cables supporting slightly different data rates or power capacities, MCP doesn't deal with my aforementioned issue of the glue between MCP client and model you've chosen; that exercise is left up to us still.
you have to write code for MCP server, and code to consume them too. It's just the LLM vendor decide that they are going to have the consume side built-in, which people question as they could as well do the same for open API, GRPC and what not, instead of a completely new thing.
Fascinated to see that the core spec is written in TypeScript and not, say, an OpenAPI spec or something. I suppose it makes sense, but it’s still surprising to see.
Because type script is a language not a language agnostic specification for languages. As someone who uses typescript a lot that probably is irrelevant. But if I’m using Rust, say, then the typescript means I need to reimplement the spec from scratch. With OpenApi, say, I can code generate canonically correct stubs and implement from there.
It is mostly pointless complexity, but I’m going to miss batching. It was kind of neat to be able to say ‘do all these things, then respond,’ even if the client can batch the responses itself if it wants to.
I agree. JSON-RPC batching has always been my "gee, that's neat" feature and seeing it removed from the spec is sad. But, as you said, it's mostly just adding complexity.
The introduction of WWW-Authenticate challenge is so welcome. Now it's much clearer that the MCP server can punt the client to resource provider's OAuth flow and sit back waiting for an `Authorization: Bearer ...`.
Elicitation is a big win. One of my favorite MCP servers is an SSH server I built, it allows me to basically automate 90% of the server tasks I need done. I handled authentication via a config file, but it's kind of a pain to manage if I want to access a new server.
Ansible, puppet, chef, salt, cfengine... There are tons of tools that are more precise and succinct for describing sensitive tasks, such as managing a single or a fleet of remote servers. Using MCP/LLMs for this is... :O
There are security / reliability concerns, true, but finally getting technically close to Star Trek computers and then still doing things 'the way they've always been done' doesn't seem efficient.
I don't know if you understand the role the LLM is playing here. The mechanism used to execute the command is not the relevant thing. The LLM autonomously executing commands has intelligence, it's not just a shell script. If I ask it to do a task, and it runs into an issue... LLMs like Claude can recognize the problem, and find a way to resolve it. Script failed because of a missing dependency, it'll install it. Need a config change, it'll do it. The SSH mcp is just the interface for the LLM to do the work.
You can give an LLM a github repo link, a fresh VPC, and say "deploy my app using nginx" and any other details you need... and it'll get it done.
I spent the last few days playing with MCP building some wrappers for some datasets. My thoughts:
1) Probably the coolest thing to happen with LLMs. While you could do all this with tool calling and APIs, being able to send to less technical friends a MCP URL for claude and seeing them get going with it in a few clicks is amazing.
2) I'm using the csharp SDK, which only has auth in a branch - so very bleeding edge. Had a lot of problems with implementing that - 95% of my time was spent on auth (which is required for claude MCP integrations if you are not building locally). I'm sure this will get easier with docs but it's pretty involved.
3) Related to that, Claude doesn't AFIAK expose much in terms of developer logs for what they are sending via their web (not desktop) app and what is going wrong. Would be super helpful to have a developer mode where it showed you request/response of errors. I had real problems with refresh on auth and it turned out I was logging the wrong endpoint on my side. Operator error for sure but would have fixed that in a couple of mins had they had better MCP logging somewhere in the webui. It all worked fine with stdio in desktop and MCP inspector.
4) My main question/issue is handling longer running tasks. The dataset I'm exposing is effectively a load of PDF documents as I can't get claude to handle the PDF files itself (I am all ears if there is a way!). What I'm currently doing is sending them through gemini to get the text, then sending that to the user via MCP. This works fine for short/simple documents, but for longer length documents (which can take some time to process) I return a message saying it is processing and to retry later.
While I'm aware there is a progress API, it still requires keeping the connection open to the server (which times out after a while with Cloudflare at least) - could be wrong here though?. It would be much better to be able to tell the LLM to check back in x seconds when you predict it will be done and the LLM can do other stuff in the background (which it will do), but then sorta 'pause execution' until the timer is hit.
Right now (AFIAK!) you can either keep it waiting (which means it can't do anything else in the meantime) with an open connection w/ progress, or you can return a job ID, but then it will just return a half finished answer which is often misleading as it doesn't have all the context yet. Don't know if this makes any sense, but I can imagine this being a real pain for tasks that take 10mins+.
Long-running tasks are an open topic of discussion, and I think MCP intends to address it in the future.
There are a few proposals floating around, but one issue is that you don't always know whether a task will be long-running, so having separate APIs for long-running tasks vs "regular" tool calls doesn't fully address the problem.
It'd be nice to more closely integrate MCP into something like Airflow, with hints as to expected completion time.
Real world LLM is going to be built on non-negligible (and increasingly complicated) deterministic tools, so might as well integrate with the 'for all possible workflows' use case.
It would make sense to have a 'resource update/delete' approval workflow, where a MCP server can have an o-auth link just to approve the particular step.
It's very hard for me to understand what MCP solves aside from providing a quick and dirty way to prototype something on my laptop.
If I'm building a local program, I am going to want tighter control over the toolsets my LLM calls have access to.
E.g. an MCP server for Google Calendar. MCP is not saving me significant time - I can access the same API's the MCP can. I probably need to carefully instruct the LLM on when and how to use the Google Calendar calls, and I don't want to delegate that to a third party.
I also do not want to spin up a bunch of arbitrary processes in whatever runtime environment the MCP is written in. If I'm writing in Python, why do I want my users to have to set up a typescript runtime? God help me if there's a security issue in the MCP wrapper for language_foo.
On the server, things get even more difficult to justify. We have a great tool for having one machine call a process hosted on another machine without knowing it's implementation details: the RPC. MCP just adds a bunch of opinionated middleware (and security holes)
What I don't get is why all the MCPs I've seen so far are commands instead of using the HTTP interface. Maybe I'm not understanding something, but with that you could spin up 1 server for your org and everyone could share an instance without messing around with different local toolchains
HTTP MCP servers is generally where the trend is moving, now that authentication/authorization in the spec is getting into shape.
That most MCP usage you'll find out in repositories in the wild is focused on local toolchains is mostly due to MCP on launch being essentially only available via the Claude Desktop client. There they also highlighted many local single-user use-cases (rather than organizational ones). That SSE support was also spotty in most implementations also didn't help.
If there isn't an MCP proxy yet, I'm sure there will be soon. The problem with doing that more widely so far has been that the authentication story hasn't been there yet.
The advantage of MCP over fixed flows controlled from backend is that the LLM does the orchestration itself, for example when doing web searches it can rephrase the query as needed and retry until it finds the desired information. When solving a bespoke problem in CLI it will use a number of tools, and call them in the order required by the task at hand. You can't do that easily with a fixed flow.
I agree vehemently, I'm sort of stunned how...slow...things are in practice. I quit my job 2 years ago to do LLM client stuff and I still haven't made it to Google calendar. It's handy as a user to have something to plug holes in the interim.
In the limit, I remember some old saw about how every had the same top 3 rows of apps on their iPhone homescreen, but the last row was all different. I bet IT will be managing, and dev teams will make, their own bespoke MCP servers for years to come.
If I understand your point correctly - the main bottleneck for tool-calling/MCP is the models themselves being relatively terrible at tool-calling anything but the tools they were finetuned to work with until recently. Even with the latest developments, any given MCP server has a variable chance of success just due to the nature of LLM's only learning the most common downstream tasks. Further, LLM's _still_ struggle when you give them too many tools to call. They're poor at assessing the correct tool to use when given tools with overlapping functionality or similar function name/args.
This is what people mean when they say that MCP should maybe wait for a better LLM before going all-in on this design.
Not in my opinion, works fine in general, wrote 2500 lines of tests for me over about 30 min tonight.
To your point that this isn't trivial or universal, there's a sharp gradient that you wouldn't notice if you're just opining on it as opposed to coding against it -- ex. I've spent every waking minute since mid-December on MCP-like territory, and it still bugs me out how worse every model is than Claude at it. It sounds like you have similar experience, though, perhaps not as satisfied with Claude as I am.
It is a protocol. If I have to list a bunch of files on my system I don't call a rest server. Same way mcp is not for you doing your stuff. It is for other people do to stuff on your server by the way of tools.
It's hilarious how LLMs are relearning the idea that it might be a good idea to carry control signals out of band.
I guess the phone phreak generation wasn't around to say 'Maybe this is a bad idea...' (because the first thing users are going to do is try to hijack control via in band overrides)
What MCP is missing is a reasonable way to do async callbacks where you can have the mcp query the model with a custom prompt and results of some operation.
My main disappointment with sampling right now is the very limited scope. It'd be nice to support some universal tool calling syntax or something. Otherwise a reasonably complicated MCP Server is still going to need a direct LLM connect.
Dumb question: in that case, wouldn't it not be an MCP server? It would be an LLM client with the ability to execute tool calls made by the LLM?
I don't get how MCP could create a wrapper for all possible LLM inference APIs or why it'd be desirable (that's an awful long leash for me to give out on my API key)
An MCP Server can be many things. It can be as simple as an echo server or as complex as a full-blown tool-calling agent and beyond. The MCP Client Sampling feature is an interesting thing that's designed to allow the primary agent, the MCP Host, to offer up some subset of LLM models that is has access to for the MCP Servers it connects with. That would allow the MCP Server to make LLM calls that are mediated (or not, YMMV) by the MCP Host. As I said above, the feature is very limited right now, but still interesting for some simpler use cases. Why would you do this? So you don't have to configure every MCP Server you use with LLM credentials? And the particulars of exactly what model gets used are under your control. That allows the MCP Server to worry about the business logic and not about how to talk to a specific LLM Provider.
I get the general premise but am uncertain as to if it's desirable to invest more in inverting the protocol, where the tool server becomes an LLM client. "Now you have 2 protocols", comes to mind - more concretely, it upends the security model.
The async callbacks are in your implementation. I wrote an MCP server so customers could use an AI model to query a databricks sql catalog. The queries were all async.
Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.
No? With structured outputs you get valid JSON 100% of the time. This is a non-problem now. (If you understand how it works, it really can't be otherwise.)
The guarantee promised in link 1 is not supported by the documentation in link 2. Structured Output does a _very good_ job, but still sometimes messes up. When you’re trying to parse hundreds of thousands of documents per day, you need a lot of 9s of reliability before you can earnestly say “100% guarantee” of accuracy.
Whether it's a non-problem or not very much depends on how much the LLM API providers actually bother to add enforcement server-side.
Anecdotally, I've seen Azure OpenAI services hallucinate tools just last week, when I provided an empty array of tools rather than not providing the tools key at all (silly me!). Up until that point I would have assumed that there are server-side safeguards against that, but now I have to consider spending time on adding client-side checks for all kinds of bugs in that area.
You are confusing API response payloads with Structured JSON that we expect to conform to the given Schema. It's carnage that requires defensive coding. Neither OpenAI nor Google are interested in fixing this, because some developers decide to retry until they get valid structured output which means they spend 3x-5x on the calls to the API.
One of the biggest lessons for me while riding the MCP hype was that if you're writing backend software, you don't actually need to do MCP. Architecturally, they don't make sense. Atleast not on Elixir anyway. One server per API? That actually sounds crazy if you're doing backend. That's 500 different microservices for 500 APIs. After working with 20 different MCP servers, it then finally dawned on me, good ole' function calling (which is what MCP is under the hood) works just fine. And each API can be just it's own module instead of a server. So, no need to keep yourself updated on the latest MCP spec nor need to update 100s of microservices because the spec changed. Needless complexity.
Unless the client speaks to each micro service independently without going through some kind of gateway or BFF, I'd expect you'd just plonk your MCP in front of it all and expose the same functionality clients call via API (or graphql or RPC or whatever), so it's basically just an LLM-specific interface.
No reason why you couldn't just use tool calls with OpenAPI specs though. Either way, making all your microservices talk to each other over MCP sounds wild.
I always saw MCPs as a plug-and-play integration for enabling function calling without incurring API costs when using Claude.
If you're using the API and not in a hurry, there's no need for it.
nothing prevents you from using a proxy server if you want it all from one entry point. idk of any clients hitting hundreds of MCPs anyways, that's a straw man.
unlike web dev where you client cab just write more components / API callers, AI has context limits. if you try to plug (i get it, exaggerated) 500 MCP servers each with 2-10 tools you will waste a ton of context in your sys prompt. you can use a tool proxy (one tool which routes to others) but that will add latency from the processing.
It really is a standard protocol for connecting clients to models and vice versa. It’s not there to just be a container for tool calls.
Agree, one MCP server per API doesn’t scale.
With something like https://nango.dev you can get a single server that covers 400+ APIs.
Also handles auth, observability and offers other interfaces for direct tool calling.
(Full disclosure, I’m the founder)
Why do you even need to connect to 400 APIs?
In the end, MCP is just like Rest APIs, there isn't need for a paid service for me to connect to 400 Rest APIs now, why do I need a service to connect to 400 MCPs?
All I need for my users is to be able to connect to one or two really useful MCPs, which I can do myself. I don't need to pay for some multi REST API server or multi MCP server.
Agentic automation is almost always about operating multiple tools and doing something with them. So you invariably need to integrate with a bunch of APIs. Sure, you can write your own MCP and implement everything in it. Or you can save yourself the trouble and use the official one provided by the integrations you need.
People want to not think and throw the kitchen sink at problems instead of thinking what they actually need.
Nango is cool, but pricing is quite high at scale.
There are quite a few competitors in this space, trying to figure the best way about this. I've been recently playing with the Jentic MCP server[0] that seems to do it quite cleanly and appears to be entirely free for regular usage.
[0] https://jentic.com/
We offer volume discounts on all metrics.
Email me on robin @ <domain> and happy to find a solution for your use case
Looks pretty cool, thanks for sharing!
I'll go one further -
Forcing LLMs to output JSON is just silly. A lot of time and effort is being spent forcing models to output a format that is picky and that LLMs quite frankly just don't seem to like very much. A text based DSL with more restrictions on it would've been a far better choice.
Years ago I was able to trivially teach GPT 3.5 to reliably output an English like DSL with just a few in prompt examples. Meanwhile even today the latest models still have notes that they may occasionally ignore some parts of JSON schemas sent down.
Square peg, round hole, please stop hammering.
MCP doesn't force models to output JSON, quite the opposite. Tool call results in MCP are text, images, audio — the things models naturally output. The whole point of MCP is to make APIs digestable to LLMs
I think perhaps they're more referring to the tool descriptions...not sure why they said output
I'm not sure about that. Newer models are able to output structured outputs perfectly and infact, if you combine it with changesets, you can have insanely useful applications since changesets also provide type-checking.
For example, in Elixir, we have this library: https://hexdocs.pm/instructor/
It's massively useful for any structured output related work.
Any model can provide perfect JSON according to a schema if you discard non-conforming logits.
I imagine that validation as you go could slow things down though.
The technical term is constrained decoding. OpenAI has had this for almost a year now. They say it requires generating some artifacts to do efficiently, which slows down the first response but can be cached.
Expect this is a problem pattern that will be seen a lot with LLMs.
Do I look at whether the data format is easily output by my target LLM?
Or do I just validate clamp/discard non-conforming output?
Always using the latter seems pretty inefficient.
> One server per API? That actually sounds crazy if you're doing backend
Not familiar with elixir, but is there anything prohibiting you from just making a monolith MCP combining multiple disparate API's/backends/microservices as you were doing previously?
Further, you won't get the various client-application integrations with MCP merely using tool-calling; which to me is the "killer app" of MCP (as a sibling comment touches on).
(I do still have mixed feelings about MCP, but in this case MCP sorta wins for me)
> just making a monolith MCP combining multiple disparate API
This is what I ended up doing.
The reason I thought I must do it the "MCP way" was because of the tons of YouTube videos about MCP which just kept saying how much of an awesome protocol it is, an everyone should be using it, etc. Once I realized it's actually more consumer facing than the backend, it made much more sense as why it became so popular.
“each API can just be its own module instead of a server”
This is basically what MCP is. Before MCP, everyone was rolling their own function calling interfaces to every API. Now it’s (slowly) standardising.
If you search for MCP integrations, you will find tons of MCP "servers", which basically are entire servers for just one vendor's API (sometimes just for one of their products, eg. YouTube). This is the go to default right now, instead of just one server with 100 modules. The MCP protocol itself is just to make it easier to communicate with the LLM clients that users can use and install. But, if you're doing backend code, there is no need to use MCP for it.
There's also little reason not to. I have an app server in the works, and all the API endpoints will be exposed via MCP because it hardly requires writing any extra code since the app server already auto-generates the REST endpoints from a schema anyway and can do the same for MCP.
An "entire server" is also overplaying what an MCP server is - in the case where an MCP server is just wrapping a single API it can be absolutely tiny, and also can just be a binary that speaks to the MCP client over stdio - it doesn't need to be a standalone server you need to start separately. In which case the MCP server is effectively just a small module.
The problem with making it one server with 100 modules is doing that in a language agnostic way, and MCP solves that with the stdio option. You can make "one server with 100 modules" if you want, just those modules would themselves be MCP servers talking over stdio.
> The problem with making it one server with 100 modules is doing that in a language agnostic way
I agree with this. For my particular use-case, I'm completely into Elixir, so, for backend work, it doesn't provide much benefit for me.
> it can be absolutely tiny
Yes, but at the end of the day, it's still a server. Its size is immaterial - you still need to deal with the issues of maintaining a server - patching security vulnerabilities and making sure you don't get hacked and don't expose anything publicly you're not supposed to. It requires routine maintenance just like a real server. Mulitply that with 100, if you have 100 MCP "servers". It's just not a scalable model.
In a monolith with 100 modules, you just do all the security patching for ONE server.
You will still have the issues of maintaining and patching your modules as well.
I think you have an overcomplicated idea of what a "server" means here. For MCP that does not mean it needs to speak HTTP. It can be just a binary that reads from stdin and writes to stdout.
>I think you have an overcomplicated idea of what a "server"
That's actually true, because I'm always thinking from a cloud deployment perspective (which is my use case). What kind of architecture do you run this on, at scale on the cloud? You have very limited options if your monolith is on a serverless and is CPU/memory bound, too. So, that's where I was coming from.
You'd run it on any architecture that can spawn a process and attach to stdin/stdout.
The overhead of spawning a process is not the problem. The overhead if your runtime is huge and/or slow to start could be, in which case you simply wouldn't run it on a serverless system - which is ludicrously expensive at scale anyway (my dayjob is running a consultancy where the big money-earner is helping people cut their cloud costs, and moving them off serverless systems that are entirely wrong for them is often one of the big savings; it's cheap for tiny systems, but even then often the hassle isn't worth it)
.. what's the security patching you have to do for reading / writing on stdio on your local machine?
Hehe.
> each API can be just it's own module
That implies a language lock-in, which is undesirable.
It is only undesirable if you are to expose your APIs for others to use via a consumer facing client. As a backend developer, I'm writing a backend for my app to consume, not for consumers (like MCP is designed for). So, this is better for me since I'm a 100% Elixir shop anyway.
I am just glad that we now have a simple path to authorized MCP servers. Massive shout-out to the MCP community and folks at Anthropic for corralling all the changes here.
What is the point of a MCP server? If you want to make an RPC from an agent, why not... just use an RPC?
Standardising tool use, I suppose.
Not sure why people treat MCP like it's much more than smashing tool descriptions together and concatenating to the prompt, but here we are.
It is nice to have a standard definition of tools that models can be trained/fine tuned for, though.
Also nice to have a standard(ish) for evolution purposes. I.e. +15 years from now.
It is easier to communicate and sell that we have this MCP server that you can just plug and play vs some custom RPC stuff.
But MCP deliberately doesn’t define endpoints, or arguments, or return types… it is the definition of custom RPC stuff.
How does it differ from providing a non MCP REST API?
The main alternative one would have for having a plug-and-play (just configure a single URL) non-MCP REST API would be to use OpenAPI definitions and ingesting them accordingly.
However, as someone that has tried to use OpenAPI for that in the past (both via OpenAI's "Custom GPT"s and auto-converting OpenAPI specifications to a list of tools), in my experience almost every existing OpenAPI spec out there is insufficient as a basis for tool calling in one way or another:
- Largely insufficient documentation on the endpoints themselves
- REST is too open to interpretation, and without operationIds (which almost nobody in the wild defines), there is usually context missing on what "action" is being triggered by POST/PUT/DELETE endpoints (e.g. many APIs do a delete of a resource via a POST or PUT, and some APIs use DELETE to archive resources)
- baseUrls are often wrong/broken and assumed to be replaced by the API client
- underdocumented AuthZ/AuthN mechanisms (usually only present in the general description comment on the API, and missing on the individual endpoints)
In practice you often have to remedy that by patching the officially distributed OpenAPI specs to make them good enough for a basis of tool calling, making it not-very-plug-and-play.
I think the biggest upside that MCP brings (given all "content"/"functionality") being equal is that using it instead of just plain REST, is that it acts as a badge that says "we had AI usage in mind when building this".
On top of that, MCP also standardizes mechanisms like e.g. elicitation that with traditional REST APIs are completely up to the client to implement.
I can’t help but notice that so many of the things mentioned are not at all due to flaws in the protocol, but developers specifying protocol endpoints incorrectly. We’re one step away from developers putting everything as a tool call, which would put us in the same situation with MCP that we’re in with OpenAPI. You can get that badge with a literal badge; for a protocol, I’d hope for something at least novel over HATEOAS.
REST for all the use cases: We have successfully agreed on what words to use! We just disagree on what they mean.
Standardization. You spin up a server that conforms to MCP, every LLM instantly knows how to use it.
MCP is an RPC protocol.
The analogy that was used a lot is that it's essentially USB-C for your data being connected to LLMs. You don't need to fight 4,532,529 standards - there is one (yes, I am familiar with the XKCD comic). As long as your client is MCP-compatible, it can work with _any_ MCP server.
The whole USB C comparison they make is eyeroll inducing, imo. All they needed to state was that it was a specification for function calling.
My gripe is that they had the opportunity to spec out tool use in models and they did not. The client->llm implementation is up to the implementor and many models differ with different tags like <|python_call|> etc.
Clearly they need to try explaining it it easy terms. The number of people that cannot or will not understand the protocol is mind boggling.
I'm with you on an Agent -> LLM industry standard spec need. The APIs are all over the place and it's frustrating. If there was a spec for that, then agent development becomes simply focused on the business logic and the LLM and the Tools/Resource are just standardized components you plug together like Lego. I've basically done that for our internal agent development. I have a Universal LLM API that everything uses. It's helped a lot.
The comparison to USB C is valid, given the variety of unmarked support from cable to cable and port to port.
It has the physical plug, but what can it actually do?
It would be nice to see a standard aiming for better UX than USB C. (Imho they should have used colored micro dots on device and cable connector to physically declare capabilities)
Certainly valid in that just like various usb c cables supporting slightly different data rates or power capacities, MCP doesn't deal with my aforementioned issue of the glue between MCP client and model you've chosen; that exercise is left up to us still.
Not everyone can code, and not everyone who can code is allowed to write code against the resources I have.
you have to write code for MCP server, and code to consume them too. It's just the LLM vendor decide that they are going to have the consume side built-in, which people question as they could as well do the same for open API, GRPC and what not, instead of a completely new thing.
Oh nice! I know what to do over the weekend: I will be updating my mcp oauth proxy :D
Fascinated to see that the core spec is written in TypeScript and not, say, an OpenAPI spec or something. I suppose it makes sense, but it’s still surprising to see.
Why is it surprising? I use typescript a lot, but I would have never even thought to have this insight so I am missing some language design decisions
Because type script is a language not a language agnostic specification for languages. As someone who uses typescript a lot that probably is irrelevant. But if I’m using Rust, say, then the typescript means I need to reimplement the spec from scratch. With OpenApi, say, I can code generate canonically correct stubs and implement from there.
It is mostly pointless complexity, but I’m going to miss batching. It was kind of neat to be able to say ‘do all these things, then respond,’ even if the client can batch the responses itself if it wants to.
I agree. JSON-RPC batching has always been my "gee, that's neat" feature and seeing it removed from the spec is sad. But, as you said, it's mostly just adding complexity.
The introduction of WWW-Authenticate challenge is so welcome. Now it's much clearer that the MCP server can punt the client to resource provider's OAuth flow and sit back waiting for an `Authorization: Bearer ...`.
Elicitation is a big win. One of my favorite MCP servers is an SSH server I built, it allows me to basically automate 90% of the server tasks I need done. I handled authentication via a config file, but it's kind of a pain to manage if I want to access a new server.
There's always https://www.fabfile.org/ no need to throw an LLM into a conversation that is best kept private.
Ansible, puppet, chef, salt, cfengine... There are tons of tools that are more precise and succinct for describing sensitive tasks, such as managing a single or a fleet of remote servers. Using MCP/LLMs for this is... :O
... the future?
There are security / reliability concerns, true, but finally getting technically close to Star Trek computers and then still doing things 'the way they've always been done' doesn't seem efficient.
I don't know if you understand the role the LLM is playing here. The mechanism used to execute the command is not the relevant thing. The LLM autonomously executing commands has intelligence, it's not just a shell script. If I ask it to do a task, and it runs into an issue... LLMs like Claude can recognize the problem, and find a way to resolve it. Script failed because of a missing dependency, it'll install it. Need a config change, it'll do it. The SSH mcp is just the interface for the LLM to do the work.
You can give an LLM a github repo link, a fresh VPC, and say "deploy my app using nginx" and any other details you need... and it'll get it done.
I spent the last few days playing with MCP building some wrappers for some datasets. My thoughts:
1) Probably the coolest thing to happen with LLMs. While you could do all this with tool calling and APIs, being able to send to less technical friends a MCP URL for claude and seeing them get going with it in a few clicks is amazing.
2) I'm using the csharp SDK, which only has auth in a branch - so very bleeding edge. Had a lot of problems with implementing that - 95% of my time was spent on auth (which is required for claude MCP integrations if you are not building locally). I'm sure this will get easier with docs but it's pretty involved.
3) Related to that, Claude doesn't AFIAK expose much in terms of developer logs for what they are sending via their web (not desktop) app and what is going wrong. Would be super helpful to have a developer mode where it showed you request/response of errors. I had real problems with refresh on auth and it turned out I was logging the wrong endpoint on my side. Operator error for sure but would have fixed that in a couple of mins had they had better MCP logging somewhere in the webui. It all worked fine with stdio in desktop and MCP inspector.
4) My main question/issue is handling longer running tasks. The dataset I'm exposing is effectively a load of PDF documents as I can't get claude to handle the PDF files itself (I am all ears if there is a way!). What I'm currently doing is sending them through gemini to get the text, then sending that to the user via MCP. This works fine for short/simple documents, but for longer length documents (which can take some time to process) I return a message saying it is processing and to retry later.
While I'm aware there is a progress API, it still requires keeping the connection open to the server (which times out after a while with Cloudflare at least) - could be wrong here though?. It would be much better to be able to tell the LLM to check back in x seconds when you predict it will be done and the LLM can do other stuff in the background (which it will do), but then sorta 'pause execution' until the timer is hit.
Right now (AFIAK!) you can either keep it waiting (which means it can't do anything else in the meantime) with an open connection w/ progress, or you can return a job ID, but then it will just return a half finished answer which is often misleading as it doesn't have all the context yet. Don't know if this makes any sense, but I can imagine this being a real pain for tasks that take 10mins+.
Long-running tasks are an open topic of discussion, and I think MCP intends to address it in the future.
There are a few proposals floating around, but one issue is that you don't always know whether a task will be long-running, so having separate APIs for long-running tasks vs "regular" tool calls doesn't fully address the problem.
I've written a proposal to solve the problem in a more holistic way: https://github.com/modelcontextprotocol/modelcontextprotocol...
It'd be nice to more closely integrate MCP into something like Airflow, with hints as to expected completion time.
Real world LLM is going to be built on non-negligible (and increasingly complicated) deterministic tools, so might as well integrate with the 'for all possible workflows' use case.
I'm not familiar with Airflow, but MCP supports progress notifications: https://modelcontextprotocol.io/specification/2025-03-26/bas...
DAG workflow scheduling. https://github.com/apache/airflow
Very glad to see MCP Specification rapid improvement. With each new release I notice something that I was missing in my MCP integrations.
It would make sense to have a 'resource update/delete' approval workflow, where a MCP server can have an o-auth link just to approve the particular step.
Yeah unfortunately doesn’t exist yet and it’s an open issue
Funny that changes to the spec require a single approval before being merged xD
It's very hard for me to understand what MCP solves aside from providing a quick and dirty way to prototype something on my laptop.
If I'm building a local program, I am going to want tighter control over the toolsets my LLM calls have access to.
E.g. an MCP server for Google Calendar. MCP is not saving me significant time - I can access the same API's the MCP can. I probably need to carefully instruct the LLM on when and how to use the Google Calendar calls, and I don't want to delegate that to a third party.
I also do not want to spin up a bunch of arbitrary processes in whatever runtime environment the MCP is written in. If I'm writing in Python, why do I want my users to have to set up a typescript runtime? God help me if there's a security issue in the MCP wrapper for language_foo.
On the server, things get even more difficult to justify. We have a great tool for having one machine call a process hosted on another machine without knowing it's implementation details: the RPC. MCP just adds a bunch of opinionated middleware (and security holes)
What I don't get is why all the MCPs I've seen so far are commands instead of using the HTTP interface. Maybe I'm not understanding something, but with that you could spin up 1 server for your org and everyone could share an instance without messing around with different local toolchains
HTTP MCP servers is generally where the trend is moving, now that authentication/authorization in the spec is getting into shape.
That most MCP usage you'll find out in repositories in the wild is focused on local toolchains is mostly due to MCP on launch being essentially only available via the Claude Desktop client. There they also highlighted many local single-user use-cases (rather than organizational ones). That SSE support was also spotty in most implementations also didn't help.
If there isn't an MCP proxy yet, I'm sure there will be soon. The problem with doing that more widely so far has been that the authentication story hasn't been there yet.
It’s because auth was a pain and people were hiding that in the command. That’s getting better though.
The advantage of MCP over fixed flows controlled from backend is that the LLM does the orchestration itself, for example when doing web searches it can rephrase the query as needed and retry until it finds the desired information. When solving a bespoke problem in CLI it will use a number of tools, and call them in the order required by the task at hand. You can't do that easily with a fixed flow.
> It's very hard for me to understand what MCP solves
It's providing a standardized protocol to attach tools (and other stuff) to agents (in an LLM-centric world).
I agree vehemently, I'm sort of stunned how...slow...things are in practice. I quit my job 2 years ago to do LLM client stuff and I still haven't made it to Google calendar. It's handy as a user to have something to plug holes in the interim.
In the limit, I remember some old saw about how every had the same top 3 rows of apps on their iPhone homescreen, but the last row was all different. I bet IT will be managing, and dev teams will make, their own bespoke MCP servers for years to come.
If I understand your point correctly - the main bottleneck for tool-calling/MCP is the models themselves being relatively terrible at tool-calling anything but the tools they were finetuned to work with until recently. Even with the latest developments, any given MCP server has a variable chance of success just due to the nature of LLM's only learning the most common downstream tasks. Further, LLM's _still_ struggle when you give them too many tools to call. They're poor at assessing the correct tool to use when given tools with overlapping functionality or similar function name/args.
This is what people mean when they say that MCP should maybe wait for a better LLM before going all-in on this design.
Not in my opinion, works fine in general, wrote 2500 lines of tests for me over about 30 min tonight.
To your point that this isn't trivial or universal, there's a sharp gradient that you wouldn't notice if you're just opining on it as opposed to coding against it -- ex. I've spent every waking minute since mid-December on MCP-like territory, and it still bugs me out how worse every model is than Claude at it. It sounds like you have similar experience, though, perhaps not as satisfied with Claude as I am.
A fair point I suppose. I'm not entirely inexperienced with it, but it does sound like you have more experience with it than I do.
> you wouldn't notice if you're just opining on it as opposed to coding against it
Maybe i'm being sensitive but that is perhaps not the way I would have worded that as it reads a bit like an insult. Food for thought.
It is a protocol. If I have to list a bunch of files on my system I don't call a rest server. Same way mcp is not for you doing your stuff. It is for other people do to stuff on your server by the way of tools.
Did they fix evil MCP servers/prompt injection/data exfiltration yet?
No.
That (prompt injection) isn’t something you can fix until you come up with a way to split prompts.
That means new types models; there is no variant on MCP that can solve it with existing models.
It's hilarious how LLMs are relearning the idea that it might be a good idea to carry control signals out of band.
I guess the phone phreak generation wasn't around to say 'Maybe this is a bad idea...' (because the first thing users are going to do is try to hijack control via in band overrides)
What MCP is missing is a reasonable way to do async callbacks where you can have the mcp query the model with a custom prompt and results of some operation.
Isn't this what sampling is for?
https://modelcontextprotocol.io/docs/concepts/sampling
That was my thought as well.
My main disappointment with sampling right now is the very limited scope. It'd be nice to support some universal tool calling syntax or something. Otherwise a reasonably complicated MCP Server is still going to need a direct LLM connect.
Dumb question: in that case, wouldn't it not be an MCP server? It would be an LLM client with the ability to execute tool calls made by the LLM?
I don't get how MCP could create a wrapper for all possible LLM inference APIs or why it'd be desirable (that's an awful long leash for me to give out on my API key)
An MCP Server can be many things. It can be as simple as an echo server or as complex as a full-blown tool-calling agent and beyond. The MCP Client Sampling feature is an interesting thing that's designed to allow the primary agent, the MCP Host, to offer up some subset of LLM models that is has access to for the MCP Servers it connects with. That would allow the MCP Server to make LLM calls that are mediated (or not, YMMV) by the MCP Host. As I said above, the feature is very limited right now, but still interesting for some simpler use cases. Why would you do this? So you don't have to configure every MCP Server you use with LLM credentials? And the particulars of exactly what model gets used are under your control. That allows the MCP Server to worry about the business logic and not about how to talk to a specific LLM Provider.
I get the general premise but am uncertain as to if it's desirable to invest more in inverting the protocol, where the tool server becomes an LLM client. "Now you have 2 protocols", comes to mind - more concretely, it upends the security model.
Make a remote agent a remote MCP
The async callbacks are in your implementation. I wrote an MCP server so customers could use an AI model to query a databricks sql catalog. The queries were all async.
Why does MCP need to support this explicitly? Is it hard to write a small wrapper than handles async callbacks? (Serious question)
Does any popular MCP host support sampling by now?
Yes. VSCode 1.101.0 does, as well as fast-agent.
Earlier I posted about mcp-webcam (you can find it) which gives you a no-install way to try out Sampling if you like.
> structured tool output
Yeah, let's pretend it works. So far structured output from an LLM is an exercise in programmers' ability to code defensively against responses that may or may not be valid JSON, may not conform to the schema, or may just be null. There's a new cottage industry of modules that automate dealing with this crap.
No? With structured outputs you get valid JSON 100% of the time. This is a non-problem now. (If you understand how it works, it really can't be otherwise.)
https://openai.com/index/introducing-structured-outputs-in-t...
https://platform.openai.com/docs/guides/structured-outputs
From the 2nd link:
> Structured Outputs can still contain mistakes.
The guarantee promised in link 1 is not supported by the documentation in link 2. Structured Output does a _very good_ job, but still sometimes messes up. When you’re trying to parse hundreds of thousands of documents per day, you need a lot of 9s of reliability before you can earnestly say “100% guarantee” of accuracy.
Have you actually used this stuff at scale? The replies are often not valid.
Yes I have.
Whether it's a non-problem or not very much depends on how much the LLM API providers actually bother to add enforcement server-side.
Anecdotally, I've seen Azure OpenAI services hallucinate tools just last week, when I provided an empty array of tools rather than not providing the tools key at all (silly me!). Up until that point I would have assumed that there are server-side safeguards against that, but now I have to consider spending time on adding client-side checks for all kinds of bugs in that area.
Meanwhile https://dev.to/rishabdugar/crafting-structured-json-response... and https://www.boundaryml.com/blog/structured-output-from-llms
You are confusing API response payloads with Structured JSON that we expect to conform to the given Schema. It's carnage that requires defensive coding. Neither OpenAI nor Google are interested in fixing this, because some developers decide to retry until they get valid structured output which means they spend 3x-5x on the calls to the API.
Structured Output in this case refers to the output from the MCP Server Tool Call, not the LLM itself.
They keep inventing new terms for things. This time it's "Elicitation" for user input.
What would you have preferred it to be called?
> n. the process of getting or producing something, especially information or a reaction
Now, Sampling seems like an odd feature name to me, but maybe I'm missing the driver behind that name.
> What would you have preferred it to be called?
It's right there in my comment. You don't need to call user input "elicitation" just to pretend it's something it's not.
Same goes for sampling.
That one's even more egregious, coming from a field where statistics is central.
We need to work CLU and TRON in there too, then!
Maybe the Key Changes page would be a better link if we're concerned with a specific version?
https://modelcontextprotocol.io/specification/2025-06-18/cha...
OK we've changed the URL and the title to that, thanks!
Agree, thanks for the link. I was wondering what actually changed. The resource links and elicitation look like useful functionality.