⚙️🤖 Practical LM Studio Tools: Web Search, Routing, and Real “Assistant” Behavior

by | Feb 1, 2026 | Uncategorized | 0 comments

Goal: build useful “plugins” for LM Studio that feel practical in daily life — like a SearXNG web search tool, plus a tool router that converts natural language into reliable tool calls.

This is not hype. This is how you turn a local model into a real assistant: it can search, retrieve, summarize, and cite — instead of guessing.

Rule I use: If the model needs fresh facts, it should search. If it doesn’t search, it should say “I don’t know.”


✅ What you’re building (v1)

  • SearXNG running locally (your private metasearch engine)
  • One tool: search_web(query) returns results in JSON
  • Tool router: natural language → “use tool” vs “answer directly”
  • LM Studio integration: via Tools/Function Calling OR MCP (recommended)

The vibe: your assistant stops being “chat-only” and starts being “tool-enabled.”


🧠 Two ways to do “plugins” with LM Studio

Option A — Tools / Function Calling (fast to ship)

You define tools in the chat request. The model returns a tool call. Your app executes the tool and sends results back.

Option B — MCP servers (the clean, scalable approach)

MCP (Model Control Protocol) lets LM Studio use external tool servers in a standardized way. You can run tool servers locally, restrict which tools are allowed, and reuse them across apps.

Beginner recommendation: start with Tools/Function Calling to learn the pattern, then graduate to MCP when you want a stable “tool ecosystem.”


🔎 Step 1 — Run SearXNG locally

SearXNG is a self-hostable metasearch engine. Running it locally gives you a search endpoint your AI can call.

✅ Easiest setup: Docker Compose

Use the official SearXNG Docker stack (it includes sensible defaults). Once running, you’ll have a local endpoint like:

http://localhost:8080

✅ Confirm JSON search works

SearXNG can return JSON results. A typical pattern is:

GET /search?q=your+query&format=json

If you get results back as JSON, you’re ready to connect the AI.


🧰 Step 2 — Create a “search_web” tool (simple contract)

Whether you use Tools/Function Calling or MCP, your tool should do one job well:

  • Take a query string
  • Call SearXNG
  • Return a clean list: title, URL, snippet
Tool contract (concept):
search_web({ query, limit=5 }) -> [{ title, url, snippet }]

🤖 Step 3 — Tool router: natural language → tool use

You have two router styles:

Router Style 1 — Let the model call tools (cleanest)

You give the model tool definitions and a system rule like:

"If the user asks for recent facts, prices, news, or verification, call search_web first.
If the question is opinion or writing, answer directly."

The model decides. Your app executes the tool call and feeds results back.

Router Style 2 — Use a cheap “router model” (saves tokens)

A tiny model makes a fast decision before involving the big model:

Router prompt (cheap model):
"Return JSON only:
{ action: 'search'|'answer', query?: string }

User request: ..."

If action = search, you call SearXNG, then pass results to the larger model for a final response. This saves time and generation cost.


🧪 A minimal Tools/Function Calling flow (concept)

1) User asks: "What changed in LM Studio recently?"
2) Model returns tool_call: search_web({ query: "LM Studio 0.4.0 new features" })
3) Your app calls SearXNG and returns results to the model
4) Model writes the final answer with citations/snippets

Important: always set a timeout and always send “working…” feedback in your UI if search takes time.


🧱 The MCP approach (recommended once you want stability)

With LM Studio’s native REST API, you can enable MCP servers in two ways:

  • Ephemeral MCP: defined per-request (great for testing)
  • mcp.json servers: configured once, reused everywhere (best for daily workflow)

Pro tip: restrict tool access. Give the model only the tools it needs for the current job.


⚠️ Gotchas (the ones that waste hours)

1) Tool spam / looping

Cause: the model keeps calling tools because the prompt is vague.

Fix: limit allowed tools and set a hard max: “at most 2 searches.”

2) Slow responses

Cause: search + summarization can be slower than pure chat.

Fix: stream results, show “Searching…” immediately, and keep the final answer short.

3) Bad search results

Cause: broad queries or noisy engines.

Fix: teach the router to refine queries (add year, product version, location).


🔗 The links you actually need


🔗 Curtision internal links (swap these for your real pages)

If you want the “v2” of this post, I can write it as a full copy-paste build guide with:
Docker Compose for SearXNG, a Node tool server (Tools + MCP versions), and a router prompt tuned for low tokens + high reliability.