Running a local LLM on your iPhone just got a first-party answer: on June 4, 2026 LM Studio shipped version 0.4.16 with Locally, a native iPhone and iPad app, paired with LM Link — a remote-access layer that connects the phone to the models already installed on your Mac. The twist is that nothing routes through a cloud relay. The connection is end-to-end encrypted, chat history stays on each device, and the only data that reaches LM Studio's servers is the device discovery list used to find your machine.

That design choice is the whole story. Most mobile-AI coverage treats the world as a binary — either you send prompts to a cloud model (ChatGPT, Claude) or you run a tiny model natively on the phone's chip. LM Link is a third path that almost nobody articulates: your Mac becomes a private inference server you actually own, and the phone is just an encrypted terminal into it.

This guide separates the two products LM Studio shipped, walks through the Tailscale-based privacy architecture, explains the hardware reason the Mac has to be the compute engine, and lays out a clean three-way comparison so you can decide whether this belongs in your stack. Where figures come from LM Studio's own materials, we label them as vendor-stated and hedge accordingly.

Key takeaways

01
Two paired products, not one.Locally is the iPhone and iPad app; LM Link is the remote-access technology it rides on. LM Studio acquired the Locally AI app on April 8, 2026 and brought its creator in to lead native mobile.
02
No cloud relay — chats stay on your devices.The phone-to-Mac link is end-to-end encrypted over custom Tailscale mesh VPNs. Chat history lives on each local device; only a device discovery list ever reaches LM Studio's backend.
03
Your Mac is the brain, the phone is the terminal.Inference runs on the desktop, not the iPhone's chip. Performance depends on your Mac's hardware, and LM Link works with any model already installed in LM Studio.
04
Free during preview — for now, on iOS only.LM Link is free during its request-gated Preview, with free and paid tiers planned at general availability. Launch is iPhone and iPad only; Android has not been announced.
05
It speaks the same OpenAI-compatible API.Because LM Link exposes LM Studio's local API, developer tools that already point at that endpoint can reach it remotely — turning your Mac into a private cloud you control.

01 — What ShippedA native iPhone app, eight weeks after an acquisition.

LM Studio released version 0.4.16 on June 4, 2026, introducing the Locally mobile app for iPhone and iPad alongside LM Link remote-access technology. It is the visible payoff of an acquisition: on April 8, 2026, LM Studio acquired the Locally AI app and brought in its creator, Adrien Grondin, to lead native mobile development. The company framed the move as "doubling down on our mission of making AI accessible and useful to you, across your devices, wherever you go."

The release lands on top of a busy spring of incremental work. Version 0.4.13 (May 13) updated the MLX engine to v1.8.1; 0.4.14 (May 22) shipped MTP Speculative Decoding as stable; and 0.4.15 (May 29) added CUDA tensor parallelism for multi-GPU loading and fixed a prompt-cache bug that had been dropping the cache on every message when used with Claude Code. Locally and LM Link are the headline of 0.4.16, but they sit on a steadily maturing inference stack.

By bringing Locally AI into the LM family, we are doubling down on our mission of making AI accessible and useful to you, across your devices, wherever you go.— LM Studio team, Locally AI Joins LM Studio (April 8, 2026)

Release snapshot

LM Studio 0.4.16 shipped June 4, 2026 with two new things: Locally, a native iPhone and iPad app, and LM Link, the remote-access technology that connects it to a desktop LM Studio instance. LM Link is free during a request-gated Preview, with free and paid tiers planned for general availability and enterprise deployment available on request. At launch it is iPhone and iPad only — Android has not been announced.

02 — Two ProductsLocally is the app; LM Link is the bridge.

The easiest way to get confused here is to treat "Locally" and "LM Link" as one product with two names. They are distinct. Locally is the iPhone and iPad app you install from the App Store. LM Link is the remote-access protocol that lets that app — or other tooling — reach an LM Studio instance running somewhere else. Locally is the most visible consumer of LM Link, but LM Link is the reusable plumbing.

That distinction matters because LM Link is not limited to the phone. LM Studio ships a headless build called llmster for server environments without a GUI, and LM Link can connect to llmster instances too. In practice the "Mac" on the other end of the link can be a headless Linux server or a cloud VM running LM Studio's inference stack — not only a desktop with a monitor attached.

Mobile app

Locally

iPhone + iPad · App Store

The native client. Originally a standalone app (Locally AI), acquired by LM Studio on April 8, 2026 and rebuilt as the first-party mobile front end. Chat history stays on the device.

iOS only at launch

Remote access

LM Link

Encrypted mesh · tsnet / Tailscale

The bridge. Built on tsnet, a userspace Go library that embeds Tailscale's WireGuard-based mesh VPN. It runs as a self-contained use of Tailscale primitives and does not interfere with an existing Tailscale setup on the same machine.

Also connects to llmster

LM Link is also not the first attempt at this problem — it is the first-party answer to it. Community apps such as "Off Grid" and "LM Mini" had already solved the iPhone-to-local-LM Studio connection for early adopters. What LM Studio adds is an official, team-maintained, end-to-end encrypted implementation with no third-party relay in the path. If you want to understand where this fits in the wider movement toward an on-device private AI stack, LM Link is the consumer-facing edge of it.

03 — Privacy ArchitectureEnd-to-end encrypted, with nothing to relay.

The privacy claim is the part worth getting precise about, because "private" is overused in AI marketing. Here is what LM Studio states: the Locally app connects an iPhone to a desktop LM Studio instance over an end-to-end encrypted connection using custom Tailscale mesh VPNs. Devices are never exposed to the public internet, and Tailscale itself cannot see the data. The only thing that reaches LM Studio's servers is the device discovery list used to establish the peer-to-peer connection.

Chat history, by design, stays on each local device. That single fact is what separates LM Link from cloud-relay apps: there is no server in the middle accumulating your conversations. The discovery list exists only so the two devices can find each other; once they do, traffic flows directly between them.

Your devices are never exposed to the public internet, because LM Link runs on top of custom Tailscale mesh VPNs.— LM Studio team, via 9to5Mac (June 4, 2026)

The cryptography underneath is not exotic, which is reassuring rather than disappointing. Tailscale's WireGuard-based mesh uses ChaCha20-Poly1305 for encryption and Curve25519 for key exchange — industry-standard modern primitives. The coordination server only exchanges small encryption keys; the actual data traffic flows peer-to-peer. LM Link layers its own self-contained Tailscale usage on top via tsnet, so it can deliver this model without asking you to manage a Tailscale account yourself.

What actually leaves your devices

On LM Link, the conversation never reaches a vendor server. Only a device discovery list touches LM Studio's backend, and the encrypted tunnel rides standard WireGuard primitives. That is a genuinely different threat model from a cloud chat app, where the prompt, the response, and the history are all server-side by default.

04 — Where Compute LivesThe Mac is the brain — and the hardware explains why.

LM Link works with any model installed on your Mac, and performance depends on the Mac's hardware — because inference runs on the desktop, not the iPhone's chip. The phone is the input/output terminal; the Mac is the compute engine. That is not an arbitrary design decision. It is the only design that makes large models usable on a phone today, and the reason is memory bandwidth.

High-end iPhones built on the A17 or A18 Pro carry roughly 8 GB of unified memory and somewhere in the range of 50–90 GB/s of memory bandwidth. A Mac Studio with an M4 Ultra reaches up to 800 GB/s. Since LLM token generation is bandwidth-bound, that gap is the difference between a model that crawls and one that feels interactive. The phone simply cannot feed a 7B-or-larger model fast enough to be pleasant to use.

Memory bandwidth · why the Mac does the inference

Source: On-Device LLMs: State of the Union, 2026 (independent analysis)

Mac Studio (M4 Ultra)Unified-memory bandwidth · up to

~800 GB/s

iPhone A18 Pro (high end)Unified-memory bandwidth · approximate

50–90 GB/s

That bandwidth ceiling is why native on-device inference, without a Mac in the loop, is limited to small models. The practical floor for usable iPhone-native inference is models under 3B parameters with 4-bit quantization. An independent 2026 analysis reports a 125M parameter model running at roughly 50 tokens per second on current iPhones, with a 1B model practical for short-context tasks and the Apple A19 Pro Neural Engine delivering around 35 TOPS. Useful for a quick autocomplete or a small summarizer — not for the 7B-to-30B-class models people actually want to chat with.

This is the analytical core of why LM Link is interesting rather than redundant. The "use your Mac because the iPhone can't run big models" line shows up everywhere, but the bandwidth and RAM numbers explain why it is true, and why a relay-less bridge to your own hardware is the only way to get desktop-class local models onto a phone right now.

05 — Three-Way ComparisonCloud, bridge, or native — what you trade.

Most coverage frames mobile AI as cloud versus on-device. There is a third option, and the cleanest way to see the trade-offs is to put all three side by side. The table below is our own framing, with cells sourced from LM Studio's materials, the cloud providers' public pricing, and independent on-device analysis. Speed figures are deliberately rough and hardware-dependent.

Approach

Cloud chat app

Privacy & where chat lives

Vendor-hosted · server-side

Best for

ChatGPT Plus or Claude Pro at about $20/month. Largest, strongest models with zero setup, but prompts and history live on the provider's servers. Roughly $240–480/year.

Approach

LM Studio LM Link

Privacy & where chat lives

End-to-end encrypted · device-local history

Best for

The Mac runs the model; the phone is an encrypted terminal over Tailscale mesh. Free during preview. Model size limited only by your Mac's hardware. Setup: install LM Studio plus the Locally app and pair the devices.

Approach

Native on-device LLM

Privacy & where chat lives

Fully offline · nothing leaves the phone

Best for

Small models (under 3B, 4-bit) run on the iPhone chip — about 50 tokens/second at 125M params. Works with no network at all, but the ~8 GB RAM and 50–90 GB/s bandwidth ceiling caps practical model size.

Approach	Privacy & where chat lives	Best for
Cloud chat app	Vendor-hosted · server-side	ChatGPT Plus or Claude Pro at about $20/month. Largest, strongest models with zero setup, but prompts and history live on the provider's servers. Roughly $240–480/year.
LM Studio LM Link	End-to-end encrypted · device-local history	The Mac runs the model; the phone is an encrypted terminal over Tailscale mesh. Free during preview. Model size limited only by your Mac's hardware. Setup: install LM Studio plus the Locally app and pair the devices.
Native on-device LLM	Fully offline · nothing leaves the phone	Small models (under 3B, 4-bit) run on the iPhone chip — about 50 tokens/second at 125M params. Works with no network at all, but the ~8 GB RAM and 50–90 GB/s bandwidth ceiling caps practical model size.

Read down the privacy column and the value of LM Link becomes obvious: it is the only row that gives you device-local chat history and access to large models. Cloud trades privacy for capability; native trades capability for total offline privacy; LM Link keeps the conversation private while letting your Mac do the heavy lifting. The one thing it cannot do is run with no machine at all — if your Mac is asleep or off the network, there is nothing to connect to.

On cost, the math is straightforward. Cloud assistants run about $20 a month, or $240–480 a year. The LM Link and Locally software is free during preview, so the only spend is the Mac hardware you likely already own — a one-time sunk cost rather than a recurring subscription. For teams already weighing a self-hosted local LLM workflow, LM Link is a low-friction way to put that infrastructure in everyone's pocket.

Your Mac's AI is now in your pocket.— Ali Cherawalla, DEV.to (March 18, 2026)

06 — For DevelopersThe same API you already point at.

The detail that makes LM Link more than a consumer toy is the API surface. LM Studio's local API is OpenAI-compatible, served on localhost:1234 by default. Any app or SDK that supports a custom OpenAI base URL can point at it — and that same API surface is what LM Link exposes remotely. So tools like Codex CLI, Claude Code, and OpenCode that already target that endpoint work over LM Link without reconfiguration.

That compatibility is the unlock. You are not adopting a proprietary protocol; you are taking the local endpoint you already use on your desktop and reaching it from elsewhere over an encrypted tunnel. For a developer, "my laptop's model, available from my phone or a second machine, on the API I already wrote against" is a much smaller change than it sounds.

Local API

OpenAI-compatible

1234port

LM Studio serves an OpenAI-compatible API on localhost:1234. Anything that accepts a custom base URL — SDKs, agent frameworks, CLIs — can talk to it, locally or over LM Link.

No new protocol

Runtimes

llama.cpp + MLX

Two inference runtimes: llama.cpp (GGUF, cross-platform) and Apple MLX (Apple Silicon only, added in v0.3.4). Models can load on either, so you tune for the hardware on the Mac side.

GGUF · MLX

Reach

Headless or desktop

Any

LM Link connects to the llmster headless build too, so the compute host can be a Linux server or cloud VM. The phone-as-terminal model is not tied to a Mac with a screen.

llmster supported

One runtime nuance is worth a careful note. On Apple Silicon, vendor and community reports suggest MLX-format models run roughly 30–50% faster than llama.cpp on Metal, attributed to unified memory removing CPU-to-GPU transfer overhead. Treat that as an approximate, community-reported range rather than a controlled benchmark — no single published test pins down that exact figure, and your real speed depends entirely on the model, quantization, and the specific Mac doing the work.

07 — Honest CaveatsWhat the launch posts don't spell out.

A few things deserve to be stated plainly, because they are easy to gloss over in launch-day enthusiasm. First, LM Studio is not open source. It is free for home and work use, and it runs on macOS, Windows, and Linux — but its license is more restrictive than free/libre/open-source software, a point the developer community has raised. If open-source licensing is a hard requirement for your organization, treat LM Studio as a free tool, not an open one, and evaluate accordingly.

Second, LM Studio is not a model provider. It does not bundle or license models; it facilitates downloading and running third-party open-weight models — Llama 3.x, Qwen3.x, DeepSeek R1, Mistral, Gemma 4, Phi-3.5 and hundreds of community fine-tunes via Hugging Face integration. The quality you get is the quality of whichever open-weight model you load, not something LM Studio itself produces.

Third, on availability: LM Link is in a request-gated Preview, free for now, with free and paid tiers planned at general availability. The Locally app is available free during this Preview period, but permanent App Store pricing has not been confirmed in primary sources — so plan around "free at launch during preview," not a guaranteed-free future. And the launch is iPhone and iPad only; Android support has not been announced.

The honest version

LM Link is a strong release, but read it precisely: it is free during preview, not free forever; it is iPhone and iPad only, with no Android timeline; and LM Studio is a free tool, not an open-source one. None of that undercuts the privacy story — it just sets the right expectations before you build a workflow on top of it.

08 — Getting AccessInstall, request, and pair.

The rollout is deliberately incremental. The June 2026 Locally and LM Link launch was batch-limited, with preview access granted via a request form rather than an instant open release — consistent with how LM Studio has staged previous major features. The practical path is three steps: run LM Studio on the Mac, request LM Link Preview access, then install the Locally app and pair the devices.

Step 1 · Desktop

Install and load a model in LM Studio

LM Studio is free for home and work and runs on macOS (Apple Silicon and Intel), Windows, and Linux. Download an open-weight model and confirm it serves on the local API at localhost:1234.

Mac as compute host

Step 2 · Access

Request LM Link Preview

LM Link is in a request-gated Preview, free for now, with enterprise deployment available on request. Submit the access request from LM Studio's LM Link page before expecting the phone pairing to work.

Free during preview

Step 3 · Mobile

Install Locally and pair

Install the Locally app on an iPhone or iPad and connect to your desktop instance over the encrypted mesh. Chat history stays on the device; inference runs on the Mac.

iOS only at launch

Decide

Cloud, bridge, or native?

Pick LM Link when you want large models with device-local privacy and own a capable Mac. Stay on cloud for zero-setup frontier capability; go native-only for fully offline small-model tasks.

Match to the workload

For most individuals, the deciding question is simple: do you already own a Mac strong enough to run the model class you want? If yes, LM Link converts that hardware into a private, always-available inference server you reach from your phone — at no recurring software cost during preview. If you are weighing it against staying on a cloud assistant, it sits naturally alongside the broader decision of switching away from cloud chat apps. For teams, the same evaluation belongs inside a deliberate stack choice — exactly the kind of comparison our AI digital transformation engagements are built to run.

09 — ConclusionThe Mac as a private cloud you actually own.

The shape of local-first mobile AI, June 2026

Local AI just stopped being a desktop-only experience.

LM Studio's Locally and LM Link release does something subtle but meaningful: it reframes local AI from a thing you do at your desk into a private cloud you carry with you. The mental model is no longer "run AI offline on a weak chip" — it is "reach the capable hardware you already own, from anywhere, over an encrypted tunnel that nobody else can read."

The architecture earns the privacy claim honestly. Chat history stays on your devices, only a discovery list touches LM Studio's servers, and the encrypted path rides standard WireGuard primitives through Tailscale. The hardware reality — an order-of-magnitude memory bandwidth gap between phone and Mac — explains why the Mac has to be the brain, and why a relay-less bridge is the only way to put desktop-class local models on a phone today.

The right posture is enthusiastic but precise. This is free during preview, not free forever; iPhone and iPad only, with no Android timeline; and built on a tool that is free rather than open source. For anyone who already runs LM Studio on a capable Mac, none of that is a reason to wait — it is simply the fine print on one of the cleaner local-first mobile releases of the year. The broader signal is the one worth remembering: the most private cloud is the one you already own.

LM Studio Ships Locally + LM Link: Local LLMs Go Mobile

01 — What ShippedA native iPhone app, eight weeks after an acquisition.

02 — Two ProductsLocally is the app; LM Link is the bridge.

Locally

LM Link

03 — Privacy ArchitectureEnd-to-end encrypted, with nothing to relay.

04 — Where Compute LivesThe Mac is the brain — and the hardware explains why.

Memory bandwidth · why the Mac does the inference

05 — Three-Way ComparisonCloud, bridge, or native — what you trade.

06 — For DevelopersThe same API you already point at.

OpenAI-compatible

llama.cpp + MLX

Headless or desktop

07 — Honest CaveatsWhat the launch posts don't spell out.

08 — Getting AccessInstall, request, and pair.

Install and load a model in LM Studio

Request LM Link Preview

Install Locally and pair

Cloud, bridge, or native?

09 — ConclusionThe Mac as a private cloud you actually own.

Local AI just stopped being a desktop-only experience.

Turn the hardware you already own into a private AI stack that's genuinely yours.

Local & hybrid AI engagements

The questions we get every week.

Continue exploring local-first AI.

Grok Build Goes Open Source: Trust Repair in 72 Hours

Gemma 4 12B: Multimodal AI That Runs on Your Laptop

The On-Device Agent Era: Local AI Goes Personal in 2026

Synthetic Data for LLM Training: Decision Guide 2026