Goal: run a team of local AIs that cooperate on one task — so you get better output, faster iteration, and less “single-model tunnel vision.”
This is the practical version of “multi-agent” — not sci-fi. You’ll learn how to:
- Run multiple LM Studio servers on different PCs
- Call them over your LAN (like little AI appliances)
- Use a simple pattern that actually improves results
The rule: ⚡ Every agent must respond fast or show progress — otherwise multi-model becomes multi-waiting.
✅ Why multiple LLMs can beat one
One model is great at one style of thinking. But real projects need different roles:
- Draft model → fast, good enough, gets momentum
- Critic model → finds mistakes, missing steps, edge cases
- Finisher model → writes the clean final answer/output
- Tool/lookup model (optional) → does retrieval/search or “reads receipts”
You’re building a mini production pipeline: rough → review → final.
🧠 The simple architecture
Your task
└─> Router / Coordinator (one script on one machine)
├─> Model A (fast) - drafts
├─> Model B (critic) - reviews
└─> Model C (smart) - final output
Key idea: your router sends the same task to multiple models, then uses a “finisher” prompt to combine the best parts.
🖥 Step 1 — Set up LM Studio servers on each computer
On each PC:
- Open LM Studio
- Go to Developer / server controls
- Pick a port (default is often
1234) - Enable Serve on Local Network (so other PCs can reach it)
- (Optional) Enable authentication and generate a token
Receipt: from another machine you should be able to hit:
http://<PC-IP>:1234/v1/models
Curtision note: link your own “LAN server” guide here:
Curtision: LM Studio guides
🧱 Step 2 — Optional: run headless (no GUI) with llmster
If one of your machines is a “server box” (always on), use LM Studio’s headless daemon instead of the GUI.
# Mac / Linux
curl -fsSL https://lmstudio.ai/install.sh | bash
# Windows (PowerShell)
irm https://lmstudio.ai/install.ps1 | iex
This is how you turn a spare PC into a permanent model endpoint.
🔌 Step 3 — The “router” script (one file, LAN-first)
This example uses Node.js and calls three OpenAI-compatible endpoints (LM Studio servers) over your local network.
What it does:
- Sends the task to Draft + Critic in parallel
- Sends both outputs to the Finisher
- Returns a clean final answer
// router.mjs (concept example)
//
// Edit IPs + model names for your machines.
// If auth is enabled, set LM_API_TOKEN to something like:
// sk-lm-xs4Zrb2e:D9HZa8DU2hDBwgx9HJBW
const ENDPOINTS = {
draft: { baseUrl: "http://192.168.1.10:1234/v1", model: "your-fast-model" },
critic: { baseUrl: "http://192.168.1.11:1234/v1", model: "your-critic-model" },
finish: { baseUrl: "http://192.168.1.12:1234/v1", model: "your-best-model" },
};
const API_KEY = process.env.LM_API_TOKEN || "not-needed";
async function chat({ baseUrl, model, messages, temperature = 0.2, max_tokens = 600 }) {
const r = await fetch(`${baseUrl}/chat/completions`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${API_KEY}`,
},
body: JSON.stringify({ model, messages, temperature, max_tokens }),
});
if (!r.ok) throw new Error(`HTTP ${r.status}: ${await r.text()}`);
const j = await r.json();
return j.choices?.[0]?.message?.content?.trim() || "";
}
export async function multiModelSolve(task) {
const draftPrompt = [
{ role: "system", content: "You are a fast drafting assistant. Produce a concise first draft." },
{ role: "user", content: task }
];
const criticPrompt = [
{ role: "system", content: "You are a strict reviewer. Find mistakes, missing steps, and edge cases." },
{ role: "user", content: task }
];
// 1) parallel
const [draft, critique] = await Promise.all([
chat({ ...ENDPOINTS.draft, messages: draftPrompt, temperature: 0.4, max_tokens: 600 }),
chat({ ...ENDPOINTS.critic, messages: criticPrompt, temperature: 0.2, max_tokens: 450 }),
]);
// 2) finish/synthesize
const finishPrompt = [
{ role: "system", content: "You are the finisher. Combine the best parts, fix issues, produce the final answer." },
{ role: "user", content: `TASK:\n${task}\n\nDRAFT:\n${draft}\n\nCRITIQUE:\n${critique}\n\nNow write the improved final output.` }
];
return await chat({ ...ENDPOINTS.finish, messages: finishPrompt, temperature: 0.2, max_tokens: 900 });
}
This pattern is the “minimum effective multi-model.” It’s fast and it reliably improves results without turning into a complicated agent maze.
🧠 Multi-model patterns that work (use these, ignore the hype)
1) Draft → Critique → Final (best default)
- Fast model gets momentum
- Critic model catches errors
- Finisher writes clean output
2) Parallel brainstorm → Synthesis (good for creative / planning)
Send the task to 3–5 models, then synthesize. This is similar to “Mixture-of-Agents.”
3) Supervisor + specialists (best for tool-heavy projects)
A supervisor routes the task to specialist agents (coder, researcher, debugger, writer). This is common in frameworks like LangGraph/CrewAI.
⚠️ Gotchas (the ones that waste hours)
1) Firewall / ports
Symptom: the endpoint works locally but not from other PCs.
Fix: allow inbound TCP on the LM Studio port (e.g. 1234) and ensure “Serve on Local Network” is enabled.
2) VPN adapters
Symptom: LM Studio binds to the wrong interface.
Fix: temporarily disable VPN or set the correct network behavior in your server settings.
3) Multi-model = multi-cost
Reality: sending a task to 3 models costs ~3× tokens/time.
Fix: use the pattern only when it improves quality (final output, debugging, planning). Use a single fast model for chatty stuff.
🧾 The “Receipts” checklist (so you know it’s real)
- Start LM Studio server on 2–3 PCs
- Enable “Serve on Local Network”
- From the router machine:
curl http://<ip>:1234/v1/models - Run router script and confirm you get: Draft + Critique + Final
- Measure speed (TTFT + tokens/sec) and move roles to faster machines
🔗 The links you actually need
- LM Studio server settings (“Serve on Local Network”, auth): https://lmstudio.ai/docs/developer/core/server/settings
- LM Studio authentication tokens: https://lmstudio.ai/docs/developer/core/authentication
- LM Studio headless daemon (llmster): https://lmstudio.ai/docs/developer/core/headless
- LangGraph patterns (supervisor + agents): https://docs.langchain.com/oss/python/langchain/multi-agent/subagents-personal-assistant
- CrewAI docs (multi-agent orchestration): https://docs.crewai.com/
- AutoGen (multi-agent conversations): https://www.microsoft.com/en-us/research/project/autogen/
- Mixture-of-Agents paper (why synthesis can help): https://arxiv.org/abs/2406.04692
Next upgrade: combine this with a memory layer (chunking + retrieval) so the whole “team” shares a knowledge base — without inflating prompts or slowing down the system.