🌐🤖 LAN AI Cluster: Run LM Studio Across Multiple Computers

by admin | Feb 1, 2026 | Ai, coding, tutorials | 0 comments

Goal: run a team of local AIs that cooperate on one task — so you get better output, faster iteration, and less “single-model tunnel vision.”

This is the practical version of “multi-agent” — not sci-fi. You’ll learn how to:

Run multiple LM Studio servers on different PCs
Call them over your LAN (like little AI appliances)
Use a simple pattern that actually improves results

The rule: ⚡ Every agent must respond fast or show progress — otherwise multi-model becomes multi-waiting.

✅ Why multiple LLMs can beat one

One model is great at one style of thinking. But real projects need different roles:

Draft model → fast, good enough, gets momentum
Critic model → finds mistakes, missing steps, edge cases
Finisher model → writes the clean final answer/output
Tool/lookup model (optional) → does retrieval/search or “reads receipts”

You’re building a mini production pipeline: rough → review → final.

🧠 The simple architecture

Your task
  └─> Router / Coordinator (one script on one machine)
       ├─> Model A (fast)   - drafts
       ├─> Model B (critic) - reviews
       └─> Model C (smart)  - final output

Key idea: your router sends the same task to multiple models, then uses a “finisher” prompt to combine the best parts.

🖥 Step 1 — Set up LM Studio servers on each computer

On each PC:

Open LM Studio
Go to Developer / server controls
Pick a port (default is often 1234)
Enable Serve on Local Network (so other PCs can reach it)
(Optional) Enable authentication and generate a token

Receipt: from another machine you should be able to hit:

http://<PC-IP>:1234/v1/models

Curtision note: link your own “LAN server” guide here:
Curtision: LM Studio guides

🧱 Step 2 — Optional: run headless (no GUI) with llmster

If one of your machines is a “server box” (always on), use LM Studio’s headless daemon instead of the GUI.

# Mac / Linux
curl -fsSL https://lmstudio.ai/install.sh | bash

# Windows (PowerShell)
irm https://lmstudio.ai/install.ps1 | iex

This is how you turn a spare PC into a permanent model endpoint.

🔌 Step 3 — The “router” script (one file, LAN-first)

This example uses Node.js and calls three OpenAI-compatible endpoints (LM Studio servers) over your local network.

What it does:

Sends the task to Draft + Critic in parallel
Sends both outputs to the Finisher
Returns a clean final answer

// router.mjs (concept example)
//
// Edit IPs + model names for your machines.
// If auth is enabled, set LM_API_TOKEN to something like:
// sk-lm-xs4Zrb2e:D9HZa8DU2hDBwgx9HJBW

const ENDPOINTS = {
  draft:   { baseUrl: "http://192.168.1.10:1234/v1", model: "your-fast-model" },
  critic:  { baseUrl: "http://192.168.1.11:1234/v1", model: "your-critic-model" },
  finish:  { baseUrl: "http://192.168.1.12:1234/v1", model: "your-best-model" },
};

const API_KEY = process.env.LM_API_TOKEN || "not-needed";

async function chat({ baseUrl, model, messages, temperature = 0.2, max_tokens = 600 }) {
  const r = await fetch(`${baseUrl}/chat/completions`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${API_KEY}`,
    },
    body: JSON.stringify({ model, messages, temperature, max_tokens }),
  });
  if (!r.ok) throw new Error(`HTTP ${r.status}: ${await r.text()}`);
  const j = await r.json();
  return j.choices?.[0]?.message?.content?.trim() || "";
}

export async function multiModelSolve(task) {
  const draftPrompt = [
    { role: "system", content: "You are a fast drafting assistant. Produce a concise first draft." },
    { role: "user", content: task }
  ];

  const criticPrompt = [
    { role: "system", content: "You are a strict reviewer. Find mistakes, missing steps, and edge cases." },
    { role: "user", content: task }
  ];

  // 1) parallel
  const [draft, critique] = await Promise.all([
    chat({ ...ENDPOINTS.draft, messages: draftPrompt, temperature: 0.4, max_tokens: 600 }),
    chat({ ...ENDPOINTS.critic, messages: criticPrompt, temperature: 0.2, max_tokens: 450 }),
  ]);

  // 2) finish/synthesize
  const finishPrompt = [
    { role: "system", content: "You are the finisher. Combine the best parts, fix issues, produce the final answer." },
    { role: "user", content: `TASK:\n${task}\n\nDRAFT:\n${draft}\n\nCRITIQUE:\n${critique}\n\nNow write the improved final output.` }
  ];

  return await chat({ ...ENDPOINTS.finish, messages: finishPrompt, temperature: 0.2, max_tokens: 900 });
}

This pattern is the “minimum effective multi-model.” It’s fast and it reliably improves results without turning into a complicated agent maze.

🧠 Multi-model patterns that work (use these, ignore the hype)

1) Draft → Critique → Final (best default)

Fast model gets momentum
Critic model catches errors
Finisher writes clean output

2) Parallel brainstorm → Synthesis (good for creative / planning)

Send the task to 3–5 models, then synthesize. This is similar to “Mixture-of-Agents.”

3) Supervisor + specialists (best for tool-heavy projects)

A supervisor routes the task to specialist agents (coder, researcher, debugger, writer). This is common in frameworks like LangGraph/CrewAI.

⚠️ Gotchas (the ones that waste hours)

1) Firewall / ports

Symptom: the endpoint works locally but not from other PCs.

Fix: allow inbound TCP on the LM Studio port (e.g. 1234) and ensure “Serve on Local Network” is enabled.

2) VPN adapters

Symptom: LM Studio binds to the wrong interface.

Fix: temporarily disable VPN or set the correct network behavior in your server settings.

3) Multi-model = multi-cost

Reality: sending a task to 3 models costs ~3× tokens/time.

Fix: use the pattern only when it improves quality (final output, debugging, planning). Use a single fast model for chatty stuff.

🧾 The “Receipts” checklist (so you know it’s real)

Start LM Studio server on 2–3 PCs
Enable “Serve on Local Network”
From the router machine: curl http://<ip>:1234/v1/models
Run router script and confirm you get: Draft + Critique + Final
Measure speed (TTFT + tokens/sec) and move roles to faster machines

🔗 The links you actually need

LM Studio server settings (“Serve on Local Network”, auth): https://lmstudio.ai/docs/developer/core/server/settings
LM Studio authentication tokens: https://lmstudio.ai/docs/developer/core/authentication
LM Studio headless daemon (llmster): https://lmstudio.ai/docs/developer/core/headless
LangGraph patterns (supervisor + agents): https://docs.langchain.com/oss/python/langchain/multi-agent/subagents-personal-assistant
CrewAI docs (multi-agent orchestration): https://docs.crewai.com/
AutoGen (multi-agent conversations): https://www.microsoft.com/en-us/research/project/autogen/
Mixture-of-Agents paper (why synthesis can help): https://arxiv.org/abs/2406.04692

Next upgrade: combine this with a memory layer (chunking + retrieval) so the whole “team” shares a knowledge base — without inflating prompts or slowing down the system.