Local AI image generation in 2026 means running models like Flux and Stable Diffusion on your own GPU through a tool such as ComfyUI — no API bill, no per-image fee, and no images leaving your machine. For a marketing team, the appeal is obvious: brand visuals at $0 per image after the hardware is paid for. The catch almost nobody spells out is licensing.

The cloud comparison is stark on paper. DALL·E (gpt-image-2) costs roughly $0.04 to $0.08 per image on the API; Midjourney runs $10 to $120 a month; Adobe Firefly sits in between. Local generation collapses that variable cost to zero. But “free to download” and “free to use for a client” are not the same thing — and getting that distinction wrong is a contract risk, not a rounding error.

This guide is the map agencies actually need: which models exist and what they cost, how much VRAM each one demands, which licenses clear them for paid client work, the tooling that has won by 2026, and an honest break-even on local hardware versus a subscription. Every figure below is sourced to a primary or benchmark reference, and the speculative ones are flagged as such.

Key takeaways

01
Local generation costs $0 per image — after the hardware.Once a capable GPU is paid for, the marginal cost of every image is zero, versus roughly $0.04–$0.08 per image on DALL·E or $10–$120/month on Midjourney. Whether that hardware pays for itself depends entirely on your monthly volume.
02
License, not capability, is the real agency risk.FLUX.1 [schnell] (Apache 2.0) and SDXL (no revenue cap) are clean for client work. FLUX.1 [dev] is freely downloadable but Non-Commercial — using it for client deliverables without a paid Black Forest Labs license is a violation.
03
VRAM is the gatekeeper.SDXL runs from ~8GB, Flux at Q4 quantization from ~6–8GB, FP8 Flux from ~12GB, and SD 3.5 Large from ~18GB at FP16. Headline VRAM numbers often exclude the text encoder, so budget more in a real pipeline.
04
ComfyUI is the 2026 production default.Node-based, with the fastest support for new models and meaningfully quicker than Automatic1111 on complex workflows. Forge covers low-VRAM cards; Fooocus is SDXL-only and effectively abandoned.
05
A brand LoRA is an asset you own.A few hours of training on an RTX 4090 produces a 50–500MB file that reproduces your brand style offline, indefinitely, with no platform terms attached — the strategic reason to go local, beyond the cost math.

01 — Why Local NowThe case is ownership, not just cost.

Three things changed by 2026. Open-weight models like Flux closed most of the quality gap to closed cloud tools. Consumer GPUs got enough VRAM to run them. And the tooling — ComfyUI above all — matured from hobbyist novelty into something a production team can actually depend on. The result is that a marketing studio can now generate on-brand visuals locally that, two years ago, only a cloud subscription could deliver.

The cost argument is the headline, and it is real: after the hardware, every image is free. But the more durable argument is ownership. A cloud tool can change its pricing, its content policy, or its terms of service overnight; a model and a fine-tune sitting on your own disk cannot. For agencies handling client IP, the privacy angle compounds it — nothing is uploaded, nothing is retained on a third-party server, nothing is used to train someone else’s model. That is the same logic driving teams to run language models locally, which we covered in the case for on-device AI agents and the infrastructure walkthrough in our local LLM deployment guide.

Where this is heading is worth naming. As open models keep shrinking — FLUX.2 [klein]’s 4B variant runs license-clean on modest hardware — the local option stops being a power-user niche and becomes a default for any team with steady image volume. The studios that win the next two years will treat a trained brand model as infrastructure they own, not a feature they rent.

The distinction that matters

Local image generation is not one decision — it is two. The first is economic: does the hardware beat the subscription at your volume? The second is legal: is the model you picked actually licensed for the client work you are doing? Most “run Flux locally” articles answer only the first and quietly get the second wrong.

02 — The ModelsThe Flux and Stable Diffusion families in 2026.

Two model families dominate local generation. Black Forest Labs ships Flux — a 12B-parameter rectified-flow transformer in its first generation, with the larger FLUX.2 line arriving in late 2025 and early 2026. Stability AI ships Stable Diffusion, now at version 3.5, alongside the still-ubiquitous SDXL. They differ in fidelity, VRAM appetite, and — critically — in how freely you can use them commercially.

Apache 2.0 · 12B

FLUX.1 [schnell]

~4 steps · ~5.45s on RTX 4090

The only fully license-clean Flux model: Apache 2.0, free for client work with no agreement. A 12B rectified-flow transformer that runs from ~6–8GB VRAM at Q4 quantization.

Local · text-to-image

Non-commercial · 12B

FLUX.1 [dev]

~18s on RTX 4090 (20–50 steps)

Higher fidelity than schnell, but released under Black Forest Labs’ Non-Commercial License v2.0. Client deliverables require a paid BFL license — downloadable is not the same as free for commercial.

Local · license required

Released Jan 15, 2026

FLUX.2 [klein]

Sub-second (vendor-stated)

Fastest in the family. The 4B variant is Apache 2.0 (license-clean); the 9B is non-commercial and needs ~13GB VRAM. The sub-second figure is BFL’s own claim — no independent benchmark confirms it yet.

Local · generate + edit

Community License · 8.1B

Stable Diffusion 3.5

~18GB FP16 / ~11GB FP8 (Large)

Free for commercial use under $1M annual revenue; enterprise license above that. Large is 8.1B params, Medium 2.5B. You keep rights to everything you generate.

Local · text-to-image

Open-source · no cap

SDXL

~3.2s on RTX 4090

Fully open-source with no revenue cap — the safest commercial default if you skip Flux. Runs from ~8GB (12GB comfortable), with the most mature LoRA and ControlNet ecosystem.

Local · text-to-image

Two more Flux models matter for context, even though most teams won’t run them daily. FLUX.2 [dev] is a 32B open-weight model released November 25, 2025 — the largest locally-runnable Flux, with multi-reference support for up to 10 images and high-resolution editing. Don’t confuse it with the 12B FLUX.1 [dev]; they are different models with different release dates and parameter counts. FLUX.2 [max] is the top-tier, cloud-API-only model — no local weights, available as of mid-2026, adding grounded generation and the highest editing consistency in the family. We cover the API tier in depth in our FLUX.2 [max] guide. There is also a purpose-built editing model, FLUX.1 Kontext [dev] (June 2025), for iterative edits with character preservation.

03 — Hardware & VRAMWhat GPU you actually need.

VRAM is the single hard constraint. The model has to fit in your GPU’s memory to run at full speed; quantization (FP8, or GGUF Q4) shrinks the footprint at some cost to quality. The table below maps each model to its minimum VRAM and a measured generation speed on an RTX 4090 — the de facto reference card for local Flux work.

Parameter count, minimum VRAM, and RTX 4090 generation speed by local image-generation model
Model	Params	Min VRAM	RTX 4090 speed	Runs
Stable Diffusion family
SD 1.5	—	~4GB	≈1–2s ¹	Local
SDXL	—	~8GB (12GB rec.)	~3.2s	Local
SD 3.5 Medium	2.5B	~9.9GB ²	—	Local
SD 3.5 Large (FP16)	8.1B	~18GB ²	—	Local
SD 3.5 Large (FP8)	8.1B	~11GB	—	Local
FLUX family
FLUX.1 [schnell]	12B	~6–8GB (Q4) / ~12GB (FP8)	~5.45s ³	Local
FLUX.1 [dev]	12B	~12GB (FP8) / ~24GB (FP16)	~18s ³	Local
FLUX.2 [klein] 9B	9B	~13GB	sub-second ⁴	Local
FLUX.2 [dev]	32B	High (24GB+) ⁴	—	Local
FLUX.2 [max]	—	N/A	—	API only

¹ SD 1.5 speed is an approximate community figure; it varies with sampler and step count.

² SD 3.5 VRAM figures exclude the T5 text encoder — budget ~12GB+ for Medium in a real pipeline, and 15–20GB for Large once the encoder is loaded.

³ Flux speeds are community benchmarks at fixed step counts (schnell ~4 steps, dev 20–50 steps); quantization and sampler change them. An RTX 5090 trims SDXL to ~2.2s and Flux to ~9s.

⁴ FLUX.2 [klein]’s sub-second speed is Black Forest Labs’ own claim, not independently benchmarked; the 32B FLUX.2 [dev] VRAM floor is not precisely documented for consumer cards.

The practical brackets: an 8GB card runs SDXL and quantized Flux; a 12GB card is the comfortable floor for FP8 Flux and SD 3.5 Medium; a 16GB card handles most workflows; and 24GB (the RTX 4090 / 3090 tier) runs everything short of full-precision 32B FLUX.2 [dev] cleanly. One caveat worth repeating: GPU street pricing in 2026 has stayed volatile, partly because of a global DRAM and GDDR7 shortage — the same squeeze that pulled Apple’s high-memory Mac Studio configs and pushed workstation-GPU prices up. Verify current pricing before you buy. For the broader picture of what a well-equipped local AI studio looks like, see our guide to local AI hardware.

04 — Licensing & Commercial UseThe licensing trap agencies fall into.

This is the section that matters most for client work, and the one most guides skip. A model being free to download tells you nothing about whether you can legally use its output in a paid deliverable. The table below combines license and commercial eligibility in one place — verify the current terms on each model’s own page before you ship, because licenses do change.

License and agency commercial-use eligibility by local image-generation model
Model	License	Agency client work?	Runs
Stable Diffusion family
SDXL	Open-source (no revenue cap)	Yes	Local
SD 3.5 (Medium & Large)	Stability AI Community License	Yes under $1M revenue; enterprise license above	Local
FLUX family
FLUX.1 [schnell]	Apache 2.0	Yes — fully license-clean	Local
FLUX.1 [dev]	BFL Non-Commercial v2.0	No — paid BFL commercial license required	Local
FLUX.2 [klein] 4B	Apache 2.0	Yes — fully license-clean	Local
FLUX.2 [klein] 9B	FLUX.2-dev Non-Commercial	No — BFL license required	Local
FLUX.2 [dev] 32B	BFL Non-Commercial (commercial option)	No — BFL license required	Local
FLUX.2 [max]	API commercial terms	Via BFL licensing tiers	API only

Licenses confirmed against Black Forest Labs and Stability AI published terms, June 2026. SD 1.5 is omitted because its commercial terms were not confirmed for this guide — verify before relying on it.

The trap, named

FLUX.1 [dev] is freely downloadable and runs locally — which is exactly why agencies get it wrong. The weights are free; the commercial rights are not. Producing a client deliverable with FLUX.1 [dev] under its Non-Commercial License is a license violation unless you have bought a Black Forest Labs commercial license. For zero-cost, zero-paperwork client work, the license-clean choices are FLUX.1 [schnell] and the FLUX.2 [klein] 4B variant (both Apache 2.0), or SDXL (no revenue cap).

If you do want the higher-fidelity dev-tier models for client work, Black Forest Labs sells four commercial tiers. Builder covers 10,000 images a month on a single domain with fine-tuning and LoRA rights; Platform raises that to 100,000 a month and adds FLUX.2 [klein] 9B and dev; Professional keeps 100,000 a month but extends to three domains, aimed squarely at creative agencies producing images for named clients; Enterprise is custom volume across all models. Choosing a model is half the decision — matching the license tier to how you bill clients is the other half, and it is exactly the kind of thing we sort out inside a content-engine engagement.

05 — ComfyUI & ToolingComfyUI won — here is why.

ComfyUI is a node-based interface where every step of the pipeline — load checkpoint, encode the prompt, sample, decode, save — is a visible, wirable node. That sounds fiddly, and the learning curve is real, but it buys two things that matter for production: the best day-one support for new models like Flux, and meaningfully faster execution on complex workflows. In community benchmarks it generates SDXL images ~25% faster than Automatic1111 and handles ControlNet plus upscaling workflows ~60% faster.

Recommended

ComfyUI

Node-based, with the fastest support for new models and the quickest complex-workflow execution. The agency default in 2026 — what you build production pipelines on.

Pick for production

Low-VRAM route

Forge (A1111 fork)

Automatic VRAM management lets SDXL and Flux run on 6GB cards via --lowvram (CPU offload, ~3–5× slower). The route for older RTX 2060 / GTX 1080 Ti hardware. Vanilla Automatic1111 needs the Forge fork for Flux at all.

Pick for 6GB GPUs

Avoid for new builds

Fooocus

The closest local analog to Midjourney’s simple, node-free UX — but SDXL-only and no longer actively maintained as of 2026. Fine for casual SDXL; not a basis for a production Flux workflow.

Skip in 2026

Don't copy SDXL workflows onto Flux

Flux pipelines differ from Stable Diffusion in two structural ways: Flux uses no negative prompt, and its CFG-scale behavior is fundamentally different from SDXL and SD 1.5. Wiring a known-good SDXL ComfyUI graph straight onto a Flux model produces broken, washed-out results. Start from a Flux-specific template, not your SDXL one.

06 — Brand LoRAYour brand as an owned asset.

A LoRA (Low-Rank Adaptation) is a small add-on you train on top of a base model to teach it a specific style, product, or look. For an agency this is the whole game: train a LoRA on a client’s brand — their product photography, their palette, their art direction — and every generated image inherits that style consistently. The output is not a subscription feature; it is a file you keep.

Min VRAM

Flux LoRA training

12GB

A brand-style Flux LoRA trains from ~12GB VRAM using ComfyUI-FluxTrainer (with split_mode and ~32GB system RAM). SD 1.5 LoRAs train comfortably on 6–8GB.

SD 1.5: 6–8GB

Training time

RTX 4090 · 2,000 steps

~2.5–3hrs

Roughly 2.5–3 hours for 2,000 steps on a 30-image dataset on an RTX 4090. An RTX 3080 takes ~4–5h; 8GB cards 6h+. Highly dependent on batch size and hyperparameters.

Approximate

Asset you own

Portable LoRA file

50–500MB

The trained LoRA is a 50–500MB file that loads into any compatible ComfyUI or Forge workflow in seconds — an offline, ownable brand asset, not subject to a platform’s changing terms.

Yours to keep

Dataset size sets the quality ceiling: products want 10–20 images from multiple angles; an art style wants 20–50 representative examples. Once trained, the LoRA loads in seconds and runs offline forever. That permanence is the strategic point. A brand LoRA you own cannot be deprecated, re-priced, or have its terms rewritten — it is closer to a typeface license you hold outright than to a SaaS seat you rent.

07 — Local vs Cloud CostThe break-even, recomputed honestly.

Here is where most local-vs-cloud articles oversell. Yes, local generation costs $0 per image — but the hardware is real money up front, and whether it pays off depends entirely on volume. The table below is the recurring monthly cost at three realistic agency volumes; the break-even on the hardware follows.

Monthly image-generation cost by volume — cloud subscription and API versus local RTX 4090 hardware
Images / month	Cloud — recurring monthly			Local — RTX 4090 build
Images / month	Midjourney Standard	DALL·E API	Adobe Firefly API	Local — RTX 4090 build
500	$30 *	$20	$10	$0 / image †
2,000	$30 *	$80	$40	$0 / image †
10,000	$30 *	$400	$200	$0 / image †

* Midjourney Standard is $30/month flat but includes only ~900 fast images plus unlimited slower Relax-mode generations; sustained high volume realistically moves you to Pro ($60) or Mega ($120).

† DALL·E priced at $0.04/image (gpt-image-2, 1024×1024; $0.08 at 1024×1792); Firefly API ~$0.02/image. Local assumes a ~$3,500 RTX 4090 build (one-time) plus ~$10/month electricity — marginal cost per image is $0.

Recurring monthly cost at 10,000 images / month

Local excludes the one-time ~$3,500 hardware; cloud figures from vendor pricing, June 2026

DALL·E API10,000 × $0.04 / image

$400

Adobe Firefly API10,000 × $0.02 / image

$200

Midjourney Standardflat rate · capacity-capped

$30

Local — RTX 4090electricity only, after hardware

~$10

The break-even math, recomputed

A ~$3,500 RTX 4090 build divided by your cloud bill is the whole decision. Against DALL·E’s metered $0.04/image, that hardware pays for itself in roughly 175 months at 500 images/month, ~44 months at 2,000, and ~9 months at 10,000. Against Adobe Firefly’s $0.02/image it is ~88 months at 2,000 and ~18 months at 10,000. Below a few thousand images a month, a subscription is simply the rational choice — local economics only dominate at high, sustained volume, or when license-clean ownership, privacy, and unlimited iteration matter more than the spreadsheet.

Two things shift that math in local’s favor in the real world. First, capex is the lever: many studios already own a capable gaming GPU, which makes the hardware a sunk cost and collapses the break-even to near-immediate. Second, the per-image cloud numbers assume you generate exactly what you need — but creative work is iterative, and at $0.04 a try, exploring 60 variations of a concept costs $2.40 every time, where locally it costs nothing. For the full cloud-side pricing picture, see our AI image API pricing comparison, and for the broader local-versus-subscription ROI argument, our local-vs-cloud ROI analysis.

08 — Agency WorkflowHow a studio actually deploys this.

The pieces assemble into a clear default stack for 2026. Pick a license-clean base — FLUX.1 [schnell] for speed, SDXL for the most mature ecosystem, or a BFL commercial tier if you need dev-grade fidelity for clients. Run it in ComfyUI on a 16–24GB card. Train a brand LoRA per client and keep the file. That combination delivers on-brand images at zero marginal cost, fully offline, with a clean commercial license — and the brand model is an asset on your books, not a line item you rent.

The forward signal is iteration speed. If FLUX.2 [klein]’s sub-second generation figures hold up under independent testing — and that is still a vendor claim, not a verified one — the creative loop changes shape. Evaluating dozens of concepts a minute, rather than waiting 45–60 seconds per Midjourney render, turns image generation from a request-and-wait task into something closer to live sketching. The studios that internalize that shift will move from “generate an image” to “explore a space of images,” and the output quality follows the number of iterations you can afford — which, locally, is unlimited.

None of this removes the human. Brand judgment, art direction, and knowing which of 60 variations is actually on-strategy is the work that matters; the local stack just removes the meter from the iteration. That division of labor — senior judgment plus owned, agent-assisted capacity — is exactly how we structure visual production inside an AI transformation engagement.

09 — ConclusionCost is the hook; ownership is the reason.

The shape of local image generation, mid-2026

After the hardware, every image is free — but volume and licensing decide whether that matters.

Local image generation in 2026 is genuinely production-ready for marketing teams. Flux and Stable Diffusion close most of the quality gap to cloud tools, consumer GPUs have the VRAM to run them, and ComfyUI is a dependable production environment. The $0-per-image headline is true — but it is a volume argument, and the honest break-even only favors local hardware at sustained, high output or where you already own the GPU.

The decision that actually trips agencies up is licensing, not cost. FLUX.1 [schnell] and the FLUX.2 [klein] 4B variant (Apache 2.0) and SDXL (no revenue cap) are clean for client work; FLUX.1 [dev] and the larger FLUX.2 models are freely downloadable but require a paid Black Forest Labs license to use commercially. Get that right and local generation is a license-clean, zero-marginal-cost capability. Get it wrong and a “free” model becomes a contract liability.

The most durable reason to go local, though, isn’t on any spreadsheet. A trained brand LoRA sitting on your own disk is an asset you own outright — offline, permanent, and immune to a vendor’s next price change or policy update. That is the shift worth planning for: not renting a creative tool, but owning a creative capability.

Local AI image generation in 2026: Flux, SD & ComfyUI

01 — Why Local NowThe case is ownership, not just cost.

02 — The ModelsThe Flux and Stable Diffusion families in 2026.

FLUX.1 [schnell]

FLUX.1 [dev]

FLUX.2 [klein]

Stable Diffusion 3.5

SDXL

03 — Hardware & VRAMWhat GPU you actually need.

04 — Licensing & Commercial UseThe licensing trap agencies fall into.

05 — ComfyUI & ToolingComfyUI won — here is why.

ComfyUI

Forge (A1111 fork)

Fooocus

06 — Brand LoRAYour brand as an owned asset.

Flux LoRA training

RTX 4090 · 2,000 steps

Portable LoRA file

07 — Local vs Cloud CostThe break-even, recomputed honestly.

Recurring monthly cost at 10,000 images / month

08 — Agency WorkflowHow a studio actually deploys this.

09 — ConclusionCost is the hook; ownership is the reason.

After the hardware, every image is free — but volume and licensing decide whether that matters.

Brand visuals at zero marginal cost, fully license-clean.

Local image-generation engagements

The questions we get every week.

Continue exploring AI for marketing.

Self-Hosted Whisper in 2026: Local AI Transcription Guide

Meta & TikTok Conversions API: Server-Side Tracking 2026

Getty + OpenAI Bring Licensed Images to ChatGPT Search

Meta AI Creative Ads at Cannes Lions 2026: A Playbook

Email Marketing AI Agents: 2026 Automation Playbook

Agent-First Marketing Ops: Post-I/O 2026 Recipe Playbook