Topic

#agentic-coding

34 articles tagged agentic-coding. Browse the full set below, or see all topics.

Tagged "agentic-coding"

Cross-cutting reads on this topic

34 articles
Anthropic's study of 400K Claude Code sessions finds domain knowledge beats coding background. What the data means for hiring, agencies, and senior judgment.
#agentic-coding#anthropic-research+5 more
2026-06-20
Read Article
Cohere's first open-source model is a 30B MoE that runs on a single H100, scores 33.4 on the Coding Index, and ships under Apache 2.0. Full breakdown.
#cohere#north-mini-code+5 more
2026-06-13
Read Article
Kimi K2.7-Code is Moonshot's new open-source coding model: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens, plus Kimi Code CLI plans from $19/mo.
#kimi-k2-7-code#moonshot-ai+7 more
2026-06-12
Read Article
Claude Fable 5 & Mythos 5 as an agentic coding model, read from the system card: the real coding benchmarks, the candid failure modes, and how to oversee it.
#claude-fable-5#claude-mythos-5+6 more
2026-06-09
Read Article
Claude Fable 5 leads the benchmarks; GPT-5.5 costs half as much and owns Codex. We compare coding, knowledge work, long context, and cost to find the fit.
#claude-fable-5#gpt-5-5+6 more
2026-06-09
Read Article
Anthropic shipped its strongest model as two products: Fable 5, generally available with safeguards, and restricted Mythos 5. Benchmarks, pricing, the catch.
#claude-fable-5#claude-mythos-5+6 more
2026-06-09
Read Article
MiniMax M3 lands at 5-17x lower cost, but Opus 4.8 leads SWE-bench Pro and GPT-5.5 wins Terminal-Bench. A full three-way agentic coding routing matrix.
#minimax-m3#claude-opus-4-8+6 more
2026-06-03
Read Article
MiniMax M3 fuses frontier coding, a 1M-token context window, and native multimodality. Inside its Sparse Attention design, vendor benchmarks, and pricing.
#minimax-m3#open-weight-models+6 more
2026-05-31
Read Article
StepFun's Apache-2.0 Step 3.7 Flash pairs a 196B MoE backbone with a 1.8B vision encoder, activating ~11B params per token. The cost case for agentic teams.
#stepfun-step-3-7-flash#mixture-of-experts+6 more
2026-05-30
Read Article
Claude Opus 4.8 lands May 28 with stronger coding benchmarks, a major honesty gain, new effort controls, and dynamic workflows in Claude Code.
#claude-opus-4-8#anthropic+6 more
2026-05-28
Read Article
We compare Claude Opus 4.8 and GPT-5.5 on coding, agents, reasoning, and real cost — including where GPT-5.5 still wins and which model fits which job.
#claude-opus-4-8#gpt-5-5+6 more
2026-05-28
Read Article
Alibaba's Qwen 3.7 Max ships with 1M context, $2.50/$7.50 pricing, and benchmarks topping Opus 4.6 on Terminal-Bench, SWE-Bench Pro, and MCP-Atlas.
#qwen-3-7-max#alibaba-qwen+7 more
2026-05-25
Read Article
Agentic coding head-to-head: Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7. MCP Atlas, SWE-Bench Pro, Terminal-Bench, plus Antigravity 2.0 launch context.
#gemini-3-5-flash#gpt-5-5+8 more
2026-05-19
Read Article
Hands-on deep dive into Aider — the CLI agentic coder, repo-map context, edit modes, voice coding, git integration, and the workflows that ship.
#aider#deep-dive+7 more
2026-05-10
Read Article
Hands-on deep dive into Windsurf 2 — Cascade, Workflows, multi-file edits, memory, MCP integration, and what distinguishes it from Cursor and Claude Code.
#windsurf-2#deep-dive+7 more
2026-05-10
Read Article
Hands-on deep dive into OpenAI Codex CLI — config.toml schema, profiles, sandbox modes, MCP integration, agent loops, and the workflows that ship.
#codex-cli#deep-dive+7 more
2026-05-10
Read Article
Migrate from the legacy TypeScript Codex CLI to the current Rust CLI — config.toml schema, auth surface, profile API, sandbox modes, rollout plan.
#codex-cli#migration-playbook+7 more
2026-05-05
Read Article
Wire OpenAI Codex into CI to generate Jest tests for new functions on every PR. Scripted prompts, diff parsing, and a PR-comment bot included.
#codex#test-generation+7 more
2026-05-02
Read Article
Build a CLI that reads your staged diff and generates conventional-commits messages with Claude Haiku 4.5. Bash hooks, prompt design, CI wiring inside.
#git#claude-haiku+7 more
2026-05-02
Read Article
Head-to-head: GPT-5.5 and Claude Opus 4.7 on agentic coding, computer use, 1M context, pricing, and the right model for each production workload.
#gpt-5-5#claude-opus-4-7+8 more
2026-04-23
Read Article
OpenAI's GPT-5.5 ships April 23, 2026 with 1M context, Thinking and Pro variants, 82.7% Terminal-Bench, and same latency as GPT-5.4. Pricing inside.
#gpt-5-5#gpt-5-5-pro+8 more
2026-04-23
Read Article
Six production-tested GPT-5.5 Pro coding workflows — refactor, review, debug, test-gen, migration, codebase Q&A — with cost, latency, and success-rate data.
#gpt-5-5-pro#openai+8 more
2026-04-23
Read Article
When 1M context pays off — and when it bankrupts you. Token-spend math, prompt-cache strategy, and break-even tables for agentic Claude Opus 4.7 workloads.
#claude-opus-4-7#anthropic+8 more
2026-04-23
Read Article
Cursor 3 launches with the Agents Window for parallel agent orchestration, Composer 2, seamless cloud handoff, an integrated browser, and Marketplace plugins.
#cursor#ai-coding+5 more
2026-04-03
Read Article
GitHub Copilot Coding Agent starts work 50% faster with semantic code search and JetBrains GA. Complete guide to the March 2026 agentic coding improvements.
#github-copilot#coding-agent+4 more
2026-03-09
Read Article
Cursor launches Automations: always-on coding agents triggered by Slack, Linear, GitHub, and webhooks. Guide to event-driven autonomous coding.
#cursor#automations+4 more
2026-03-07
Read Article
Gemini 3.1 Pro vs Claude Opus 4.6 vs GPT-5.3-Codex for agentic coding. SWE-Bench, Terminal-Bench, LiveCodeBench, and pricing comparison with recommendations.
#Gemini 3.1 Pro#Claude Opus 4.6+6 more
2026-02-19
Read Article
Cursor's Composer 1.5 scales reinforcement learning 20x to score 47.9% on Terminal-Bench 2.0 with adaptive thinking and self-summarization. Full analysis.
#Cursor#Composer 1.5+4 more
2026-02-12
Read Article
OpenAI's GPT-5.3-Codex brings 25% faster inference and major Terminal-Bench and OSWorld gains. Full benchmarks, access details, and migration guide.
#GPT-5.3-Codex#OpenAI+3 more
2026-02-05
Read Article
MiniMax M2.1 is a real Dec 2025 coding and agentic workflow model listed on OpenRouter, with 10B active parameters and Digital Employee positioning.
#MiniMax M2.1#Open-Source LLM+3 more
2025-12-24
Read Article
GLM-4.7 achieves 73.8% SWE-bench and 87.4% τ²-Bench with Preserved Thinking. Complete developer guide for the $3/month Claude Code alternative.
#GLM-4.7#Z.ai+5 more
2025-12-23
Read Article
GPT-5.2-Codex achieves 56.4% SWE-Bench Pro and 64% Terminal-Bench. Learn Codex CLI setup, 400K context workflows, and cybersecurity use cases.
#GPT-5.2-Codex#OpenAI+6 more
2025-12-19
Read Article
Master GPT-5.1-Codex-Max with context compaction for million-token projects. Compare vs Claude Code & Cursor. Pricing, benchmarks, and best practices.
#GPT-5.1 Codex-Max#OpenAI+4 more
2025-11-19
Read Article
Master Windsurf's SWE-1.5 at 950 tok/s, Cascade Hooks for SOC 2 compliance, and Codemaps visualization. Pricing comparison vs Cursor vs Copilot.
#Windsurf#SWE-1.5+6 more
2025-11-14
Read Article