Skip to content

Your App Should Ship an MCP Server

By Justin Poehnelt · Markdown

I’m building a native desktop editor for fiction writers. It’s written in Rust on top of gpui, by the Zed team. Under the hood, it generates a fiction specific AST, runs ~120 prose-craft analyzers and uses a multi-task ONNX transformer model against your manuscript in real time, surfacing things like show-don’t-tell violations, passive voice, pacing issues, and much more.

I started out using gpui-component, but the available Input component was not sufficient for building the complex UI I wanted with a CRDT backing, so I ended up with my own buffer based system and building the entire text editing UX from scratch. This is not recommended!

To solve for this complexity, I embedded a full MCP server directly inside the application binary. It has since become the single most impactful architectural decision I’ve made on this project, not for users, but for how I Claude builds the product itself.

Here’s the case for why your application should do the same.

The Problem: GUI Apps Are Opaque to AI Agents

If you’re building a native desktop application in 2026, you’ve probably noticed a gap. Your AI coding assistant can read your source code, run your tests, and even propose edits. However, your AI cannot always see your running application. It can’t click a button, type into a text field, verify that a diagnostic tooltip rendered correctly, or confirm that a scrollbar stopped at the right position.

For web apps, this is a solved problem with headless browsers, Playwright, Chrome MCP, etc. For native apps, especially those built on GPU-accelerated frameworks like gpui, you’re largely on your own. There’s no DOM to query. There’s no accessibility tree you can trivially script against. The rendered output is just a texture.

I spent too long in a loop that looked like this:

  1. Read the source code
  2. Make a change
  3. cargo build
  4. Manually launch the app
  5. Manually paste in test prose
  6. Squint at the screen
  7. Screenshot it myself
  8. Paste the screenshot into the AI interface
  9. Repeat

Steps 4 through 8 are the bottleneck, and no amount of faster builds fixes that. The feedback loop is human-gated.

The Solution: Make the App Speak MCP

The Model Context Protocol is essentially a standardized JSON-RPC interface that AI agents already know how to speak. If your app exposes MCP tools, any MCP-compatible client can drive your application programmatically.

My implementation has two pieces:

1. The In-App MCP Server

When launched with --mcp, the app starts a background thread that reads newline-delimited JSON-RPC from stdin and writes responses to stdout. Commands are dispatched into the gpui event loop.

This is ~200 lines of Rust. No external dependencies beyond serde_json. The protocol surface is minimal: initialize, tools/list, and tools/call. That’s it (for now).

2. The Lifecycle Wrapper

This is a separate binary that manages the app process. It:

  • Builds the app from source on startup
  • Launches it with --mcp
  • Proxies all JSON-RPC between the MCP client and the app
  • Intercepts a special rebuild tool call to stop the app, run cargo build, and relaunch, without dropping the MCP connection

The wrapper feels like a hack and there is probably a cleaner solution. When the agent edits Rust source and calls rebuild, the app restarts with the new binary and the agent’s MCP session continues uninterrupted.

The SDLC Loop

Here’s what the development loop looks like with the MCP server in place compared to without:

Before (human-gated):

  ╭─► Edit .rs             Agent
  │   cargo build          Agent
  │   Launch App           Agent
  │   Paste Prose          Human
  │   Squint               Human
  │   Screenshot           Human
  │   Describe to AI       Human
  ╰───────────╯

After (agent-driven):

  ╭─► Edit .rs             Agent
  │   rebuild              Agent
  │   set_text             Agent
  │   wait_idle            Agent
  │   screenshot           Agent
  ╰───────────╯

The “before” loop requires a human at every step past the build. The “after” loop is fully autonomous and the agent drives the entire cycle in ~10-second iterations. Sceeenshots are expensive, you can expose other tools.

What Tools Does the Server Expose?

Here’s my current tool surface. Claude can quickly iterate on the available tools as it adds features too!

ToolWhat it does
set_text / type_textLoad prose or type at cursor
press_keySimulate any keystroke (enter, backspace, Cmd+B, etc.)
click / double_click / triple_clickClick at pixel coordinates
drag_selectClick-drag selection
screenshotCapture the window to PNG
get_stateReturn cursor position, selection, text content, word count
get_diagnosticsReturn structured analysis results (message, severity, source, byte range)
wait_idleBlock until both fast and semantic analysis stages complete
set_view_modeSwitch between Draft, Review, and Analyze modes
set_nav_paneSwitch sidebar panes (editor, outline, find, diagnostics, settings)
list_elementsEnumerate UI elements with rendered positions
hover_diagnosticProgrammatically hover a diagnostic card
format_stateQuery which inline/block formats are active at cursor
rebuildStop → cargo build → relaunch (wrapper-only)

The total is around 30 tools. The marginal cost of adding a new tool is about 15 minutes; write a match arm, call an existing editor method, return JSON.

I haven’t attempted to expose dynamic tools based upon the current view.

What This Actually Enables

AI-Driven Iteration and Verification

The agent can now verify what it built. It edits paint.rs, calls rebuild, calls set_text with sample prose, calls wait_idle to let the analyzers finish, and calls screenshot to capture the result. It reads the PNG, evaluates whether the margin notes rendered correctly, and iterates. No human in the loop.

Structured Test Authoring

Instead of asserting against internal state (which couples tests to implementation), the agent can write behavioral tests:

set_text("She felt very sad about what happened.")
wait_idle()
diagnostics = get_diagnostics()
assert any(d.source == "show_tell" for d in diagnostics)
assert any(d.source == "redundancy" for d in diagnostics)

This tests what the user would experience. If I refactor the analyzer pipeline — change AST nodes, rename modules, swap out models — these tests still pass because they’re testing the product surface, not the implementation. Too many tests at these lower levels just add friction and churn especially with AI coding tools.

The “Rebuild” Pattern

This is another useful pattern. The agent can:

  1. Edit a .rs file
  2. Call rebuild (wrapper stops the app, runs cargo build, relaunches)
  3. Immediately verify the new build
  4. Evaluate and iterate

The rebuild starts (shouldn’t take too long with Rust’s incremental compilation). The MCP session stays connected. The agent can do edit-verify cycles quicker than I can switch windows.

The Broader Principle

There’s a deeper pattern here. We’re entering a period where the audience for your application’s API is not just other programmers — it’s AI agents. And agents don’t need the same things programmers need. They don’t need beautiful documentation, clever abstractions, or versioned REST endpoints. They need:

  1. A way to do things
  2. A way to wait for things
  3. A way to verify things

MCP in your app gives you a standardized way to expose all three. The protocol handles capability negotiation, tool discovery, and structured responses. Your job is just to wire the tools to your application’s internals.

I started the MCP server to speed up my own development loop. You might want to also use the same MCP server for your power users!

Opinions are my own and not the views of my employer.

© 2026 by Justin Poehnelt is licensed under CC BY-SA 4.0