Your App Should Ship an MCP Server

By Justin Poehnelt · May 1, 2026

#ai #mcp #rust #agents #code #native #gpui

I’m building a native desktop editor for fiction writers. It’s written in Rust on top of gpui, by the Zed team. Under the hood, it generates a fiction specific AST, runs ~120 prose-craft analyzers and uses a multi-task ONNX transformer model against your manuscript in real time, surfacing things like show-don’t-tell violations, passive voice, pacing issues, and much more.

I started out using gpui-component, but the available Input component was not sufficient for building the complex UI I wanted with a CRDT backing, so I ended up with my own buffer based system and building the entire text editing UX from scratch. This is not recommended!

To solve for this complexity, I embedded a full MCP server directly inside the application binary. It has since become the single most impactful architectural decision I’ve made on this project, not for users, but for how I Claude builds the product itself.

Here’s the case for why your application should do the same.

The Problem: GUI Apps Are Opaque to AI Agents

If you’re building a native desktop application in 2026, you’ve probably noticed a gap. Your AI coding assistant can read your source code, run your tests, and even propose edits. However, your AI cannot always see your running application. It can’t click a button, type into a text field, verify that a diagnostic tooltip rendered correctly, or confirm that a scrollbar stopped at the right position.

For web apps, this is a solved problem with headless browsers, Playwright, Chrome MCP, etc. For native apps, especially those built on GPU-accelerated frameworks like gpui, you’re largely on your own. There’s no DOM to query. There’s no accessibility tree you can trivially script against. The rendered output is just a texture.

I spent too long in a loop that looked like this:

Read the source code
Make a change
cargo build
Manually launch the app
Manually paste in test prose
Squint at the screen
Screenshot it myself
Paste the screenshot into the AI interface
Repeat

Steps 4 through 8 are the bottleneck, and no amount of faster builds fixes that. The feedback loop is human-gated.

The Solution: Make the App Speak MCP

The Model Context Protocol is essentially a standardized JSON-RPC interface that AI agents already know how to speak. If your app exposes MCP tools, any MCP-compatible client can drive your application programmatically.

My implementation has two pieces:

1. The In-App MCP Server

When launched with --mcp, the app starts a background thread that reads newline-delimited JSON-RPC from stdin and writes responses to stdout. Commands are dispatched into the gpui event loop.

This is ~200 lines of Rust. No external dependencies beyond serde_json. The protocol surface is minimal: initialize, tools/list, and tools/call. That’s it (for now).

2. The Lifecycle Wrapper

This is a separate binary that manages the app process. It:

Builds the app from source on startup
Launches it with --mcp
Proxies all JSON-RPC between the MCP client and the app
Intercepts a special rebuild tool call to stop the app, run cargo build, and relaunch, without dropping the MCP connection

The wrapper feels like a hack and there is probably a cleaner solution. When the agent edits Rust source and calls rebuild, the app restarts with the new binary and the agent’s MCP session continues uninterrupted.

The SDLC Loop

Here’s what the development loop looks like with the MCP server in place compared to without:

Before (human-gated):

  ╭─► Edit .rs             Agent
  │   cargo build          Agent
  │   Launch App           Agent
  │   Paste Prose          Human
  │   Squint               Human
  │   Screenshot           Human
  │   Describe to AI       Human
  ╰───────────╯

After (agent-driven):

  ╭─► Edit .rs             Agent
  │   rebuild              Agent
  │   set_text             Agent
  │   wait_idle            Agent
  │   screenshot           Agent
  ╰───────────╯

The “before” loop requires a human at every step past the build. The “after” loop is fully autonomous and the agent drives the entire cycle in ~10-second iterations. Sceeenshots are expensive, you can expose other tools.

What Tools Does the Server Expose?

Here’s my current tool surface. Claude can quickly iterate on the available tools as it adds features too!

Tool	What it does
`set_text` / `type_text`	Load prose or type at cursor
`press_key`	Simulate any keystroke (enter, backspace, Cmd+B, etc.)
`click` / `double_click` / `triple_click`	Click at pixel coordinates
`drag_select`	Click-drag selection
`screenshot`	Capture the window to PNG
`get_state`	Return cursor position, selection, text content, word count
`get_diagnostics`	Return structured analysis results (message, severity, source, byte range)
`wait_idle`	Block until both fast and semantic analysis stages complete
`set_view_mode`	Switch between Draft, Review, and Analyze modes
`set_nav_pane`	Switch sidebar panes (editor, outline, find, diagnostics, settings)
`list_elements`	Enumerate UI elements with rendered positions
`hover_diagnostic`	Programmatically hover a diagnostic card
`format_state`	Query which inline/block formats are active at cursor
`rebuild`	Stop → `cargo build` → relaunch (wrapper-only)

The total is around 30 tools. The marginal cost of adding a new tool is about 15 minutes; write a match arm, call an existing editor method, return JSON.

I haven’t attempted to expose dynamic tools based upon the current view.

What This Actually Enables

AI-Driven Iteration and Verification

The agent can now verify what it built. It edits paint.rs, calls rebuild, calls set_text with sample prose, calls wait_idle to let the analyzers finish, and calls screenshot to capture the result. It reads the PNG, evaluates whether the margin notes rendered correctly, and iterates. No human in the loop.

Structured Test Authoring

Instead of asserting against internal state (which couples tests to implementation), the agent can write behavioral tests:

set_text("She felt very sad about what happened.")
wait_idle()
diagnostics = get_diagnostics()
assert any(d.source == "show_tell" for d in diagnostics)
assert any(d.source == "redundancy" for d in diagnostics)

This tests what the user would experience. If I refactor the analyzer pipeline — change AST nodes, rename modules, swap out models — these tests still pass because they’re testing the product surface, not the implementation. Too many tests at these lower levels just add friction and churn especially with AI coding tools.

The “Rebuild” Pattern

This is another useful pattern. The agent can:

Edit a .rs file
Call rebuild (wrapper stops the app, runs cargo build, relaunches)
Immediately verify the new build
Evaluate and iterate

The rebuild starts (shouldn’t take too long with Rust’s incremental compilation). The MCP session stays connected. The agent can do edit-verify cycles quicker than I can switch windows.

The Broader Principle

There’s a deeper pattern here. We’re entering a period where the audience for your application’s API is not just other programmers — it’s AI agents. And agents don’t need the same things programmers need. They don’t need beautiful documentation, clever abstractions, or versioned REST endpoints. They need:

A way to do things
A way to wait for things
A way to verify things

MCP in your app gives you a standardized way to expose all three. The protocol handles capability negotiation, tool discovery, and structured responses. Your job is just to wire the tools to your application’s internals.

I started the MCP server to speed up my own development loop. You might want to also use the same MCP server for your power users!

Opinions are my own and not the views of my employer.

On this page