Topic
#frontier-models
28 articles tagged frontier-models. Browse the full set below, or see all topics.
Tagged "frontier-models"
Cross-cutting reads on this topic
Grok 4.3 landed on Amazon Bedrock June 15, 2026 as the cheapest US-lab frontier reasoning model. The pricing, Mantle endpoint gotchas, and enterprise risks.
#grok-4-3#amazon-bedrock+5 more
2026-06-22
Read Article
Sakana Fugu wraps a pool of frontier models behind one orchestration API. We cover the two models, vendor benchmarks, pricing, and the export-control angle.
#Sakana AI#Sakana Fugu+6 more
2026-06-22
Read Article
Claude Fable 5 leads the benchmarks; GPT-5.5 costs half as much and owns Codex. We compare coding, knowledge work, long context, and cost to find the fit.
#claude-fable-5#gpt-5-5+6 more
2026-06-09
Read Article
AI DevelopmentPopular
Anthropic shipped its strongest model as two products: Fable 5, generally available with safeguards, and restricted Mythos 5. Benchmarks, pricing, the catch.
#claude-fable-5#claude-mythos-5+6 more
2026-06-09
Read Article
We compare Claude Opus 4.8 and GPT-5.5 on coding, agents, reasoning, and real cost — including where GPT-5.5 still wins and which model fits which job.
#claude-opus-4-8#gpt-5-5+6 more
2026-05-28
Read Article
Alibaba's Qwen 3.7 Max ships with 1M context, $2.50/$7.50 pricing, and benchmarks topping Opus 4.6 on Terminal-Bench, SWE-Bench Pro, and MCP-Atlas.
#qwen-3-7-max#alibaba-qwen+7 more
2026-05-25
Read Article
The Q2 2026 agentic-AI quarterly — model releases, MCP adoption, enterprise deployments, funding, regulatory shifts. 12 charts on where the market moved.
#state-of-agentic-ai#quarterly-report+7 more
2026-05-01
Read Article
DeepSeek-V4 ships April 24, 2026 as open-weight MoE: Pro (1.6T/49B active) and Flash (284B/13B), 1M context, 27% FLOPs and 10% KV cache vs V3.2.
#deepseek-v4#deepseek-v4-pro+6 more
2026-04-24
Read Article
MoE choices powering 2026 frontier models compared — total vs active params, routing strategies, sparsity ratios, and the downstream cost implications.
#mixture-of-experts#moe-architecture+8 more
2026-04-24
Read Article
Per-query energy and water data for frontier models, training-vs-inference split, and emissions per million tokens. Methodology and trend analysis.
#ai-sustainability#ai-energy-use+8 more
2026-04-24
Read Article
Head-to-head: GPT-5.5 and Claude Opus 4.7 on agentic coding, computer use, 1M context, pricing, and the right model for each production workload.
#gpt-5-5#claude-opus-4-7+8 more
2026-04-23
Read Article
OpenAI's GPT-5.5 ships April 23, 2026 with 1M context, Thinking and Pro variants, 82.7% Terminal-Bench, and same latency as GPT-5.4. Pricing inside.
#gpt-5-5#gpt-5-5-pro+8 more
2026-04-23
Read Article
Side-by-side input, output, cached, and batch pricing for 30 frontier and open-weight models across 12 providers. Updated April 2026 with 200+ price points.
#ai-model-pricing#llm-pricing+8 more
2026-04-23
Read Article
We measured low/medium/high reasoning effort across 5 frontier models on math, code, and analysis. Quality lift, latency tax, and cost-per-correct-answer data.
#reasoning-effort#ai-benchmarks+8 more
2026-04-23
Read Article
Cross-model hallucination rates on factual recall, citation accuracy, and code reference. 5,000 prompts tested across 5 frontier models with confidence bands.
#ai-hallucination#ai-benchmarks+8 more
2026-04-23
Read Article
MCP tool-call success across 12 task types — search, file ops, data, calendar, email. Pass-rate, retry-rate, and cost-to-completion for 5 frontier AI models.
#tool-use#mcp+8 more
2026-04-23
Read Article
Time-to-first-token and tokens-per-second across 30 model+provider pairings. P50/P95 numbers, regional spread, and how reasoning-mode tax cold latency budgets.
#ai-latency#ttft+8 more
2026-04-23
Read Article
Why $/token is the wrong unit and $/successful-task is the right one. Formulas, worked examples across 6 task families, and a downloadable scoring template.
#ai-evaluation#cost-per-task+8 more
2026-04-23
Read Article
Claude Opus 4.7 scores 64.3% on SWE-bench Pro with 2576px vision, xhigh effort, and same Opus 4.6 pricing. Full benchmark and migration guide.
#Claude#Anthropic+4 more
2026-04-16
Read Article
The Frontier Model Release Velocity Index tracks new-model launch rates per provider — OpenAI, Anthropic, Google, Alibaba, Zhipu. Q2 2026 trajectory data.
#release-velocity-index#frontier-models+4 more
2026-04-12
Read Article
Preview of Q2 2026 AI model releases. DeepSeek V4 at ~1T parameters, GPT-5.5 Spud with pretraining done, and Grok 5 expected by mid-2026. Timeline and specs.
#deepseek-v4#gpt-5-5+5 more
2026-04-03
Read Article
Frontier model comparison: Qwen 3.6 Plus vs Claude Opus 4.6 vs GPT-5.4. Benchmarks, pricing, context windows, and capabilities for 1M+ token models.
#qwen-3-6-plus#claude-opus-4-6+5 more
2026-04-02
Read Article
Xiaomi's MiMo-V2-Pro has 1T+ parameters with 42B active, ranking #8 worldwide. $1/$3 per million tokens. Complete model guide and deployment.
#mimo-v2-pro#xiaomi-ai+5 more
2026-03-18
Read Article
Twelve AI models launched in a single week of March 2026 from OpenAI, Google, Mistral, xAI, and more. Developer guide to capabilities, pricing, and selection.
#ai-models#march-2026+4 more
2026-03-15
Read Article
Google's Gemini 3.1 Flash-Lite costs $0.25 per million tokens and outperforms GPT-5 Mini on key benchmarks. Complete pricing and performance comparison guide.
#gemini-3-1-flash-lite#google-ai+4 more
2026-03-09
Read Article
GPT-5.4 ships three variants: Standard, Thinking, and Pro. Native computer use, 1M context, tool search, and 33% fewer factual errors. Complete guide.
#gpt-5-4#openai+5 more
2026-03-06
Read Article
Three-way frontier model comparison: GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro benchmarks, agentic AI capabilities, pricing, and which model wins.
#gpt-5-4#claude-opus-4-6+6 more
2026-03-05
Read Article
Master Mistral 3's 10-model family. Large 3 (675B params), Ministral 3. First open frontier with multimodal + multilingual. Apache 2.0 guide.
#Mistral 3#Open Source AI+5 more
2025-12-02
Read Article