Topic
#open-weight-models
16 articles tagged open-weight-models. Browse the full set below, or see all topics.
Tagged "open-weight-models"
Cross-cutting reads on this topic
The Fable 5 export shutdown showed single-vendor AI can halt your business overnight. A four-step second-source playbook with open-weight failover backups.
#AI vendor resilience#open-weight models+5 more
2026-06-21
Read Article
DiffusionGemma is Google's first open-weight text diffusion LLM: a 26B MoE under Apache 2.0 hitting 1,100+ tokens/sec on one H100. Where it wins and loses.
#diffusiongemma#google-deepmind+5 more
2026-06-13
Read Article
OpenRouter added five models in ten days, from Opus 4.8 to MiniMax M3 at $0.30/M input. June 2026 pricing, context windows, and usage rankings.
#openrouter#minimax-m3+6 more
2026-06-04
Read Article
Open-weight models now run 10-12x cheaper than frontier SaaS. A 2026 framework on the TCO crossover, vendor lock-in, and data sovereignty for agencies.
#build-vs-buy#ai-strategy+6 more
2026-06-04
Read Article
MiniMax M3 lands at 5-17x lower cost, but Opus 4.8 leads SWE-bench Pro and GPT-5.5 wins Terminal-Bench. A full three-way agentic coding routing matrix.
#minimax-m3#claude-opus-4-8+6 more
2026-06-03
Read Article
DeepSeek abandons its no-outside-capital stance in a ~$7.4B maiden round led by Tencent and CATL, valuing it near $59B and reshaping open-weight economics.
#deepseek#ai-funding+6 more
2026-06-03
Read Article
Cosmos 3 is the first fully open physical-AI omnimodel: one model reasons, simulates, and predicts robot actions. Inside the two-tower design and how to run it.
#nvidia-cosmos-3#physical-ai+6 more
2026-06-01
Read Article
MiniMax M3 fuses frontier coding, a 1M-token context window, and native multimodality. Inside its Sparse Attention design, vendor benchmarks, and pricing.
#minimax-m3#open-weight-models+6 more
2026-05-31
Read Article
StepFun's Apache-2.0 Step 3.7 Flash pairs a 196B MoE backbone with a 1.8B vision encoder, activating ~11B params per token. The cost case for agentic teams.
#stepfun-step-3-7-flash#mixture-of-experts+6 more
2026-05-30
Read Article
When self-hosting open-weight models beats API calls: a cost-crossover model, GPU sizing tables, and a deployment matrix for vLLM, SGLang, and Ollama.
#self-hosting-llm#open-weight-models+6 more
2026-05-27
Read Article
Migrate DeepSeek V3.2 to V4 across open-weight stacks — three reasoning modes, tokenizer change, HCA/CSA attention deltas, KV-cache reduction.
#deepseek-v4#migration-playbook+7 more
2026-05-05
Read Article
GPU spend, ops headcount, latency, and break-even volume for hosting Llama, Qwen, DeepSeek, and Mistral yourself vs API. With per-token cost curves at 4 scales.
#self-hosting-llm#ai-tco+8 more
2026-04-24
Read Article
Cross-model quality regression, throughput lift, and VRAM savings at GPTQ-4, AWQ-4, INT8, and FP8 — benchmark data across 6 open-weight models.
#quantization#gptq+8 more
2026-04-24
Read Article
Seven serverless inference providers compared on price, latency, model availability, and throughput. 60+ data points across 12 popular models.
#ai-inference-providers#together-ai+8 more
2026-04-24
Read Article
Side-by-side input, output, cached, and batch pricing for 30 frontier and open-weight models across 12 providers. Updated April 2026 with 200+ price points.
#ai-model-pricing#llm-pricing+8 more
2026-04-23
Read Article
Q2 2026 gap analysis between open-weight and closed-source frontier models — capability parity, cost economics, and the agency deployment decision tree.
#open-weight-models#closed-source-models+4 more
2026-04-12
Read Article