Grok 4.3 on Amazon Bedrock went generally available on June 15, 2026 — the first time an xAI model has been offered through the platform, making xAI the third major independent lab there alongside Anthropic and OpenAI. The launch reads like a straightforward catalog addition. It is not.
At an on-demand rate of $1.25 input and $2.50 output per million tokens, Grok 4.3 is the cheapest US-lab frontier reasoning model on Bedrock by a wide margin. But three things sit just under that headline that most coverage skips: a Mantle endpoint that breaks standard Bedrock SDK code, a context-window pricing structure that doubles costs above 200,000 tokens, and a vendor — xAI — in the middle of a SpaceX restructuring with most of its founding team gone.
This guide covers exactly what landed on June 15, how the pricing actually works once you account for long context, the practical integration gotcha that no other coverage surfaces clearly, an honest benchmark read by enterprise domain, and the governance questions a regulated-sector team has to answer before deploying. Every number below is sourced and dated; where a figure is vendor- stated or carries a caveat, we say so.
- 01Grok 4.3 is live on Bedrock as of June 15, 2026.xAI becomes the third independent lab on the platform, behind Anthropic and OpenAI. The Bedrock launch arrived roughly six weeks after the direct xAI API release on April 30, 2026.
- 02It is the cheapest US-lab frontier reasoning model on Bedrock.On-demand pricing is $1.25 input and $2.50 output per million tokens, with cached input at $0.20 — undercutting Claude Sonnet 4.6 ($3 / $15) and the Bedrock GPT tier on both sides of the bill.
- 03The 1M context window doubles in price above 200K tokens.The headline million-token window is real, but requests over 200,000 tokens are billed at the higher context tier. For long-document workloads the effective cost can land well above the sticker rate.
- 04Grok on Bedrock uses Mantle, not bedrock-runtime.It runs on a new inference engine via the bedrock-mantle endpoint with an OpenAI-compatible path — not the Converse or InvokeModel APIs. Existing Bedrock SDK code will not work unchanged.
- 05Strong benchmarks, real governance questions.Grok 4.3 posts competitive agentic and domain scores, but factual-accuracy gains came alongside a non-hallucination regression, and the vendor is mid-restructuring with 9 of 11 co-founders departed.
01 — What LaunchedxAI joins Bedrock, six weeks after the API.
Grok 4.3 was not new on June 15. The model was first released on the direct xAI API on April 17, 2026, with the API default switched to grok-4.3 on April 30. The Bedrock general availability announcement came roughly six weeks later, which matters: the model, its benchmark profile, and its pricing were all established before AWS added it to the catalog. The Bedrock story is about distribution and enterprise rails, not a new capability tier.
On Bedrock the model ID is xai.grok-4.3. It ships with a 1-million-token context window and a maximum output of 30,000 tokens. Input modalities are text and image; audio, speech, and video are not supported, and output is text only. Reasoning is always on by default and cannot be fully switched off — it is configurable through a reasoning effort parameter with four levels (none, low, medium, high), where none suppresses reasoning tokens from the output rather than stopping the internal process.
Grok-4.3 on Bedrock
Text + image input, text output. Always-on, configurable reasoning effort (none / low / medium / high). Service tiers: Standard, Priority, and Flex — Reserved is not supported.
In-Region only at launch
Oregon, N. Virginia, and Ohio at launch. Geo Cross-Region and Global Cross-Region inference are not supported, so a multi-region failover plan must account for the limited footprint.
02 — PricingThe cheapest US-lab reasoning model on Bedrock.
The pricing is the reason most teams will look at this launch at all. On-demand Bedrock rates are $1.25 per million input tokens and $2.50 per million output tokens, with cached input at $0.20 per million. This is consistent with the rate on the direct xAI API. Against the other US-lab frontier options on Bedrock, that is the lowest sticker price on both sides of the bill — though, as always with Bedrock, verify the current numbers on the AWS pricing page before you budget against them.
Relative to its own predecessor, Grok 4.20, this is a large cut. Independent benchmark coverage puts it at roughly a 40% reduction in input cost and a 60% reduction in output cost versus the prior generation. (Note the version numbering: xAI went from 4.1 to 4.20 — there is no Grok 4.2.)
Grok 4.3 vs Grok 4.20 · per-million-token pricing
Source: AWS Bedrock model card; reduction figures via Artificial Analysis · bars relative to prior-gen Grok 4.20One cost factor offsets part of the per-token saving: higher-reasoning Grok 4.3 reportedly emits roughly 44% more output tokens than Grok 4.20 on comparable tasks. At high request volumes, more output tokens at a lower per-token rate can still net out cheaper — but the saving is smaller than the headline cut suggests, and it depends on your workload mix. Run the math on your own traffic, not on the sticker price.
03 — The Context CliffThe 1M window doubles in price above 200K.
This is the single most important caveat in the whole launch, and it is buried. The 1-million-token context window is genuinely available, but pricing is not flat across it. Requests that exceed 200,000 total tokens are billed at a higher context tier, where the per-token rate doubles. For the long-document workloads that a million-token window is supposed to enable — contract sets, case-law corpora, full financial filings — the effective cost can be meaningfully above the sticker rate.
Treat the $1.25 / $2.50 numbers as the price for everything under 200K tokens. The moment a request crosses that line, model your costs at the higher tier. A long-document RAG pipeline that routinely assembles 400K–800K-token prompts is not running at the headline rate; it is running at the doubled one. The marketing claim and the invoice live in two different pricing regimes.
Context window
The full million-token window is real and available on Bedrock — useful for whole-codebase analysis, multi-document reasoning, and long financial or legal corpora.
Where the rate doubles
Requests above 200,000 total tokens are billed at the higher context tier. Long-document prompts that routinely exceed this run at roughly double the headline per-token cost.
Max output tokens
Output is capped at 30,000 tokens per request. Reasoning-heavy responses with always-on reasoning can consume meaningful output budget, so size jobs accordingly.
"Grok 4.3 is as smart as Sonnet 4.6 and 5x cheaper and faster."— Bindu Reddy, CEO of Abacus AI, on X, May 1, 2026 (market sentiment, not an independent benchmark)
04 — Integration GotchaMantle, not bedrock-runtime.
Here is the practical surprise that breaks copy-paste integrations. Grok 4.3 on Bedrock does not run on the standard bedrock-runtime endpoint that Claude, Amazon Titan, and other Bedrock models use. It runs on Mantle, a new inference engine inside Bedrock built for price-performance, reached through the bedrock-mantle endpoint. It does not support the Converse API or InvokeModel.
Instead, Mantle is OpenAI-compatible: it exposes an openai/v1 path, so developers already using OpenAI SDKs can point them at the Mantle endpoint and migrate with relatively small code changes. The endpoint format is https://bedrock-mantle.{region}.api.aws/openai/v1. The catch for teams standardized on Bedrock's own SDK is real: existing Converse or InvokeModel code will not call Grok unchanged.
Two further migration details matter. The compatibility is on the openai/v1/responses path rather than the classic Chat Completions path, so describe it as OpenAI-compatible, not fully interchangeable. And the Bedrock defaults differ from the OpenAI API defaults: temperature defaults to 0.7 (not 1), top_p to 0.95 (not 1), and max_completion_tokens to 131,072. Set these explicitly when porting OpenAI SDK code, or your outputs will drift from what the same code produced against OpenAI.
If you want a reference point for what good agentic tooling around Grok looks like outside Bedrock, xAI's own Grok CLI and parallel agent tooling is a useful companion read on how the model is meant to be driven for tool-calling and multi-agent work.
05 — BenchmarksStrong on agents, an honest accuracy caveat.
Grok 4.3 benchmarks well on the workloads AWS is targeting — customer support, legal research, financial document analysis — and it makes a clear generational jump on agentic tasks. But the accuracy picture is mixed in a way that matters for regulated sectors, and it is worth stating plainly rather than cherry-picking.
On the Artificial Analysis Intelligence Index, the headline depends on reasoning effort: 53 at high effort and roughly 38 at low effort — never a single bare number, because the gap between the two settings is the difference between competitive and middling. At high effort the 53 places it above the score for the previous Grok 4.20 release; cite the effort level whenever you quote it.
Grok 4.3 by enterprise domain · selected benchmarks
Source: Artificial Analysis; Vals AI scores via VentureBeat (vendor-adjacent) · agentic bars indexed for displayThe agentic story is the genuinely strong one. The GDPval-AA agentic ELO of 1,500 is a 321-point jump over Grok 4.20's 1,179, and it surpasses several named competitors on that benchmark — though it still trails GPT-5.5 (xhigh) by a wide margin, with an expected win rate well under a coin flip. So the right framing is not best-in-class; it is markedly better than its predecessor and competitive on a price-adjusted basis. For a full top-tier view of where it sits, our frontier model comparison of Claude Opus 4.8 and GPT-5.5 sets the ceiling Grok 4.3 is measured against.
"The energy drink of frontier models: it'll keep you up, but you won't enjoy the experience and you'll regret it in the morning."— Corey Quinn, cloud cost analyst, The Register, May 29, 2026
06 — Bedrock LineupWhere Grok 4.3 sits in the Bedrock catalog.
The table below maps Grok 4.3 against the other frontier and efficient options a Bedrock team is likely to weigh, on the two axes that drive a model-selection decision: price and the endpoint you integrate against. The endpoint-type column is the one no published comparison includes, and it is the one that determines how much integration work a switch actually costs. Bedrock pricing moves often — verify every figure on the AWS pricing page before you commit.
| Model | Provider | Input $/M | Output $/M | Context | Endpoint |
|---|---|---|---|---|---|
| Grok 4.3 | xAI | $1.25 | $2.50 | 1M | Mantle |
| Amazon Nova Pro | Amazon | $0.80 | $3.20 | 300K | Runtime |
| DeepSeek V3.2 | DeepSeek | $0.62 | $1.85 | 128K | Runtime |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200K | Runtime |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 200K | Runtime |
| GPT-5.4 (via Bedrock) | OpenAI | ~$2.50 | ~$15.00 | 128K | Runtime |
Read across one row and the positioning is clear. Grok 4.3 is the only frontier-class reasoning model in this set under $1.50 input, and the only one with a million-token window — but it is also the only one on a non-standard endpoint. Amazon Nova Pro and DeepSeek V3.2 are cheaper still on input, but they sit in a different tier on reasoning. The decision is rarely price alone; it is price weighed against integration cost and the accuracy trade-off above. With AWS now hosting all three independent labs, you can also read this alongside OpenAI's GPT-5.5 and its 1M-context options when you shortlist.
07 — The Governance GapCompliance credentials are not the same as stability.
This is the analysis that separates a model-selection decision from a vendor-risk decision. Grok 4.3 has the certifications a regulated team looks for — xAI maintains SOC 2 Type II, HIPAA eligibility, and GDPR compliance for production workloads, and on Bedrock the infrastructure layer inherits AWS's security posture. On paper, a finance or healthcare team can deploy it today on rails it already trusts.
The certifications, though, describe the model and the platform — not the organization behind the model, which is in unusual flux. xAI merged into SpaceX in February 2026 in an all-stock transaction, and by May 2026 the plan was for xAI to cease to exist as a separate company, with Grok and X folded into a SpaceX AI division. Nine of the eleven original co-founders have departed (some accounts say ten; the most-cited figure is 9 of 11). And in June 2026 a former engineer filed a whistleblower retaliation lawsuit alleging he was fired for raising safety concerns about Grok — a claim naming both xAI and SpaceX, and unproven at the time of writing.
Some industry analysts argue the Bedrock listing is less about enterprise pull than about compute deals. The pattern they point to: AWS has tended to secure large commitments to its custom Trainium silicon around the labs it adds to Bedrock, and xAI trains Grok on a very large NVIDIA GPU cluster at its Memphis Colossus site — a natural migration target for Amazon's chips. On that reading, the launch is a chip-sales story wearing a model-availability headline. Independent signals are consistent with a slow enterprise start: of more than 400 documented federal AI deployments naming a vendor, only three involve xAI or Grok. Treat that as a caution flag on regulated-sector traction, not a verdict on the model.
"Bedrock becomes little more than a sales funnel with infuriatingly bad documentation."— Corey Quinn, cloud cost analyst, The Register, May 29, 2026
08 — Decision MatrixShould your team deploy it?
The answer is workload-specific, not headline-specific. Grok 4.3 is a strong fit for some classes of work and a poor one for others, and the right move is to sort your pipelines into the buckets below before you touch the Mantle endpoint.
High-volume agentic workloads
At $1.25 / $2.50 with strong agentic and tool-calling scores, Grok 4.3 is the price-performance pick on Bedrock for support automation and tool-use pipelines — provided you account for the ~44% extra output tokens.
Million-token workloads
The 1M window is real, but cost doubles above 200K tokens. Model the doubled tier honestly; if your prompts routinely exceed 200K, the effective price may erase the headline advantage versus a 200K-window model.
Finance, healthcare, legal
Certifications are in place, but the non-hallucination regression and vendor-in-flux risk warrant extra diligence. For high-liability outputs, keep a frontier accuracy model in the loop and pilot before committing.
Converse / InvokeModel shops
Grok runs on Mantle, not bedrock-runtime, so it is a separate OpenAI-compatible integration path, not a config flip. Budget the engineering before promising it, and set the non-default temperature / top_p explicitly.
For most teams the pragmatic sequence is the same: shortlist Grok 4.3 on price, prototype against the Mantle endpoint on your own prompts, measure real token spend with the 200K cliff and the extra output tokens factored in, and run your highest-liability prompts through an accuracy check before you trust it in production. Deciding between Grok and closed frontier for specific pipelines is exactly the kind of comparative evaluation our AI and digital transformation engagements start with — and the kind of multi-vendor routing we help teams stand up so no single vendor's instability becomes a single point of failure.
09 — ConclusionA genuine bargain with real fine print.
The cheapest US-lab reasoning model on Bedrock — read the fine print before you budget.
Grok 4.3 on Amazon Bedrock is a real event, not just a catalog line. xAI is now the third independent lab on the platform, and at $1.25 / $2.50 per million tokens it is the cheapest US-lab frontier reasoning model there. For high-volume agentic and tool-use workloads, that price-performance profile is genuinely compelling — and the agentic benchmark jump over the prior generation is the strongest part of the story.
The fine print is where teams will win or lose money. The 1M window doubles in price above 200K tokens, so long-document workloads do not run at the headline rate. The model lives on the Mantle endpoint, so it is a separate integration for anyone standardized on Bedrock's own SDK. And the accuracy story is mixed: more correct answers, but a measured regression on hallucination that matters most in exactly the regulated verticals AWS is targeting.
The broader signal is that frontier capability is no longer the scarce thing — distribution, price discipline, and organizational stability are. Grok 4.3 brings the first two convincingly. The third, with a vendor mid-restructuring, is the open question. The right response is not a vendor decision off a headline; it is your own eval on the prompts you care about, with the cliff, the endpoint, and the accuracy trade-off all priced in.