Topic

#frontier-models

28 articles tagged frontier-models. Browse the full set below, or see all topics.

Tagged "frontier-models"

Cross-cutting reads on this topic

28 articles

AI Development

Grok 4.3 on Amazon Bedrock: xAI Goes Enterprise 2026

Grok 4.3 landed on Amazon Bedrock June 15, 2026 as the cheapest US-lab frontier reasoning model. The pricing, Mantle endpoint gotchas, and enterprise risks.

#grok-4-3#amazon-bedrock+5 more

2026-06-22

Read Article

AI Development

Sakana Fugu: A Multi-Agent AI Orchestration Model 2026

Sakana Fugu wraps a pool of frontier models behind one orchestration API. We cover the two models, vendor benchmarks, pricing, and the export-control angle.

#Sakana AI#Sakana Fugu+6 more

2026-06-22

Read Article

AI Development

Claude Fable 5 vs GPT-5.5: Benchmarks & Cost Compared

Claude Fable 5 leads the benchmarks; GPT-5.5 costs half as much and owns Codex. We compare coding, knowledge work, long context, and cost to find the fit.

#claude-fable-5#gpt-5-5+6 more

2026-06-09

Read Article

AI DevelopmentPopular

Claude Fable 5 & Mythos 5: The Frontier, Split in Two

Anthropic shipped its strongest model as two products: Fable 5, generally available with safeguards, and restricted Mythos 5. Benchmarks, pricing, the catch.

#claude-fable-5#claude-mythos-5+6 more

2026-06-09

Read Article

AI Development

Claude Opus 4.8 vs GPT-5.5: Benchmarks & Cost Compared

We compare Claude Opus 4.8 and GPT-5.5 on coding, agents, reasoning, and real cost — including where GPT-5.5 still wins and which model fits which job.

#claude-opus-4-8#gpt-5-5+6 more

2026-05-28

Read Article

AI Development

Qwen 3.7 Max: Alibaba's New Flagship AI Model 2026

Alibaba's Qwen 3.7 Max ships with 1M context, $2.50/$7.50 pricing, and benchmarks topping Opus 4.6 on Terminal-Bench, SWE-Bench Pro, and MCP-Atlas.

#qwen-3-7-max#alibaba-qwen+7 more

2026-05-25

Read Article

AI Development

State of Agentic AI Q2 2026: The Quarterly Report

The Q2 2026 agentic-AI quarterly — model releases, MCP adoption, enterprise deployments, funding, regulatory shifts. 12 charts on where the market moved.

#state-of-agentic-ai#quarterly-report+7 more

2026-05-01

Read Article

AI Development

DeepSeek V4 Launches: 1.6T MoE, 1M Context, 10% KV

DeepSeek-V4 ships April 24, 2026 as open-weight MoE: Pro (1.6T/49B active) and Flash (284B/13B), 1M context, 27% FLOPs and 10% KV cache vs V3.2.

#deepseek-v4#deepseek-v4-pro+6 more

2026-04-24

Read Article

AI Development

MoE Architecture: GPT, Claude, DeepSeek, Qwen Compared

MoE choices powering 2026 frontier models compared — total vs active params, routing strategies, sparsity ratios, and the downstream cost implications.

#mixture-of-experts#moe-architecture+8 more

2026-04-24

Read Article

AI Development

AI Model Sustainability Report 2026: Energy Use Data

Per-query energy and water data for frontier models, training-vs-inference split, and emissions per million tokens. Methodology and trend analysis.

#ai-sustainability#ai-energy-use+8 more

2026-04-24

Read Article

AI Development

GPT-5.5 vs Claude Opus 4.7: Benchmarks & Pricing

Head-to-head: GPT-5.5 and Claude Opus 4.7 on agentic coding, computer use, 1M context, pricing, and the right model for each production workload.

#gpt-5-5#claude-opus-4-7+8 more

2026-04-23

Read Article

AI Development

GPT-5.5 Complete Guide: Thinking, Pro & 1M Context

OpenAI's GPT-5.5 ships April 23, 2026 with 1M context, Thinking and Pro variants, 82.7% Terminal-Bench, and same latency as GPT-5.4. Pricing inside.

#gpt-5-5#gpt-5-5-pro+8 more

2026-04-23

Read Article

AI Development

AI Model API Pricing Tracker Q2 2026: 200 Data Points

Side-by-side input, output, cached, and batch pricing for 30 frontier and open-weight models across 12 providers. Updated April 2026 with 200+ price points.

#ai-model-pricing#llm-pricing+8 more

2026-04-23

Read Article

AI Development

Reasoning Effort: Cost vs Quality Benchmarks 2026

We measured low/medium/high reasoning effort across 5 frontier models on math, code, and analysis. Quality lift, latency tax, and cost-per-correct-answer data.

#reasoning-effort#ai-benchmarks+8 more

2026-04-23

Read Article

AI Development

AI Hallucination Rate Benchmarks 2026: 5-Model Study

Cross-model hallucination rates on factual recall, citation accuracy, and code reference. 5,000 prompts tested across 5 frontier models with confidence bands.

#ai-hallucination#ai-benchmarks+8 more

2026-04-23

Read Article

AI Development

Tool-Use Success Rates: 5 Frontier Models Tested

MCP tool-call success across 12 task types — search, file ops, data, calendar, email. Pass-rate, retry-rate, and cost-to-completion for 5 frontier AI models.

#tool-use#mcp+8 more

2026-04-23

Read Article

AI Development

AI Model Latency Benchmarks 2026: TTFT & TPS Data

Time-to-first-token and tokens-per-second across 30 model+provider pairings. P50/P95 numbers, regional spread, and how reasoning-mode tax cold latency budgets.

#ai-latency#ttft+8 more

2026-04-23

Read Article

AI Development

Cost-Per-Successful-Task: A New AI Evaluation Metric

Why $/token is the wrong unit and $/successful-task is the right one. Formulas, worked examples across 6 task families, and a downloadable scoring template.

#ai-evaluation#cost-per-task+8 more

2026-04-23

Read Article

AI Development

Claude Opus 4.7: Anthropic's New Frontier Model Guide

Claude Opus 4.7 scores 64.3% on SWE-bench Pro with 2576px vision, xhigh effort, and same Opus 4.6 pricing. Full benchmark and migration guide.

#Claude#Anthropic+4 more

2026-04-16

Read Article

AI Development

Frontier Model Release Velocity Index 2026 Q2 Report

The Frontier Model Release Velocity Index tracks new-model launch rates per provider — OpenAI, Anthropic, Google, Alibaba, Zhipu. Q2 2026 trajectory data.

#release-velocity-index#frontier-models+4 more

2026-04-12

Read Article

AI Development

DeepSeek V4, GPT-5.5, Grok 5: Q2 2026 AI Preview

Preview of Q2 2026 AI model releases. DeepSeek V4 at ~1T parameters, GPT-5.5 Spud with pretraining done, and Grok 5 expected by mid-2026. Timeline and specs.

#deepseek-v4#gpt-5-5+5 more

2026-04-03

Read Article

AI Development

Qwen 3.6 Plus vs Claude Opus 4.6 vs GPT-5.4 Compared

Frontier model comparison: Qwen 3.6 Plus vs Claude Opus 4.6 vs GPT-5.4. Benchmarks, pricing, context windows, and capabilities for 1M+ token models.

#qwen-3-6-plus#claude-opus-4-6+5 more

2026-04-02

Read Article

AI Development

MiMo-V2-Pro: Xiaomi's Trillion-Parameter LLM Rivals GPT-5

Xiaomi's MiMo-V2-Pro has 1T+ parameters with 42B active, ranking #8 worldwide. $1/$3 per million tokens. Complete model guide and deployment.

#mimo-v2-pro#xiaomi-ai+5 more

2026-03-18

Read Article

AI Development

12 AI Models Released in One Week: March 2026 Guide

Twelve AI models launched in a single week of March 2026 from OpenAI, Google, Mistral, xAI, and more. Developer guide to capabilities, pricing, and selection.

#ai-models#march-2026+4 more

2026-03-15

Read Article

AI Development

Gemini 3.1 Flash-Lite: Cheapest AI That Beats GPT-5 Mini

Google's Gemini 3.1 Flash-Lite costs $0.25 per million tokens and outperforms GPT-5 Mini on key benchmarks. Complete pricing and performance comparison guide.

#gemini-3-1-flash-lite#google-ai+4 more

2026-03-09

Read Article

AI Development

GPT-5.4 Complete Guide: Standard, Thinking, and Pro

GPT-5.4 ships three variants: Standard, Thinking, and Pro. Native computer use, 1M context, tool search, and 33% fewer factual errors. Complete guide.

#gpt-5-4#openai+5 more

2026-03-06

Read Article

AI Development

GPT-5.4 vs Opus 4.6 vs Gemini 3.1 Pro: Best AI Model?

Three-way frontier model comparison: GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro benchmarks, agentic AI capabilities, pricing, and which model wins.

#gpt-5-4#claude-opus-4-6+6 more

2026-03-05

Read Article

AI Development

Mistral 3: Open-Weight Frontier Model Complete Guide

Master Mistral 3's 10-model family. Large 3 (675B params), Ministral 3. First open frontier with multimodal + multilingual. Apache 2.0 guide.

#Mistral 3#Open Source AI+5 more

2025-12-02

Read Article