Mistral OCR 4 is a document-AI model, released June 23, 2026, that returns a structured representation of any enterprise document — bounding boxes, block types, and confidence scores — rather than the flat wall of text earlier OCR generations produced. For teams building document-automation pipelines, that shift is the headline: extraction stops being a parsing problem and becomes a structured-data problem.
It is Mistral’s fourth OCR generation in roughly 15 months, and it lands in a crowded field — Google Document AI, Amazon Textract, Azure Document Intelligence, ABBYY, and a wave of open-weight models. What sets OCR 4 apart is less any single benchmark number and more the combination of aggressive pricing ($4 per 1,000 pages, $2 in batch), a single-container self-hosting option, and a structured output that removes integration layers teams used to build by hand.
This guide covers what actually shipped, why the structured output changes automation economics, an honest read of the benchmark claims (several of which are vendor-stated), a recomputed cost comparison against the hyperscalers, and how to put OCR 4 to work without overpromising on the numbers.
- 01Structured representation, not just text.OCR 4 returns bounding boxes, block types (title, table, equation, signature), and page- and word-level confidence scores as first-class model outputs — the raw material for traceable, auto-approving pipelines.
- 02Pricing is the durable advantage.$4 per 1,000 pages standard, $2 in batch, $5 for schema-driven Document AI. At batch pricing it undercuts Azure Document Intelligence's custom tier by up to 15x (about 7.5x at the standard rate).
- 03Read the benchmarks with care.The 85.20 OlmOCRBench, 93.07 OmniDocBench, and ~72% win-rate figures are vendor-stated. On the public OlmOCRBench leaderboard (last updated May 21, 2026) OCR 4 would rank roughly third — not first.
- 04Self-hosted, but not open-source.A single-container deployment keeps sensitive documents inside your own jurisdiction — useful as the EU AI Act's high-risk obligations approach on August 2, 2026. Self-hosting a commercial model is not the same as open weights.
- 05Mistral is buying the document layer.Targeting €1B revenue in 2026 (up from ~€200M) and reportedly in early talks to raise ~€3B at roughly €20B, Mistral is pricing OCR 4 to win the enterprise ingestion layer for RAG and search.
01 — What ShippedA fourth OCR generation that returns structure, not text.
Mistral describes the leap plainly. Where earlier generations focused on converting a page into clean text and tables, OCR 4 returns a structured representation of the document. In practice that means every extracted element arrives with its location on the page, a type label, and a confidence score — three things downstream automation historically had to reconstruct or guess at.
The model accepts PDF, DOC, PPT, and OpenDocument files directly, so the full spread of back-office documents flows in without a pre-conversion step. Mistral reports coverage across 170 languages in 10 language groups, with gains on rare and low-resource languages where competing systems tend to degrade — a meaningful detail for multinational document pipelines.
OCR generation
Mistral's fourth OCR model in roughly 15 months, released June 23, 2026. Available through the Mistral API and Studio, Amazon SageMaker, and Microsoft Foundry at launch, with Snowflake Parse Document integration announced as coming soon.
PDF · DOC · PPT · ODF
Accepts PDF, DOC, PPT, and OpenDocument files directly — the full spread of back-office documents, with no separate pre-conversion step before extraction.
10 language groups
Reported coverage across 170 languages, with gains on rare and low-resource languages where competing OCR systems tend to lose accuracy first.
02 — Structured OutputWhy bounding boxes change the pipeline.
Bounding boxes were OCR 4’s most-requested feature, and the reason is structural. Without location data, a downstream RAG or compliance pipeline cannot trace an extracted fact back to the page it came from — the traceability gap that makes audit-ready extraction genuinely hard. With coordinates attached to every element, an extracted number can point back to the exact cell it was read from.
Block classification does similar work one layer up. OCR 4 assigns every element a type — title, table, equation, signature, and others — as a first-class model output rather than a separate post-processing stage. That removes an integration layer enterprise teams used to build in-house just to tell a heading from a table. Confidence scores complete the set: they operate at both page and word level, which is what makes confidence-gated automation possible.
Bounding boxes
The most-requested OCR 4 feature. Every extracted element carries its position on the page, so a fact can be traced back to its source region — the prerequisite for audit-ready RAG and compliance workflows.
Block classification
Each element is typed as a first-class model output, not a bolted-on post-processing step. That removes an integration layer teams previously built by hand to distinguish headings, tables, and signatures.
Confidence scores
Inline confidence at both page and word granularity. Auto-approve high-confidence regions and route only low-confidence ones to a human reviewer — no need to read every page.
There is a second, related product worth separating clearly. OCR 4 is the extraction model. Document AI is a Studio product, priced at $5 per 1,000 pages, that wraps OCR 4 with a second-pass mistral-small-2603 call to reshape the output into custom JSON schemas. If your automation needs fields in a fixed shape rather than a generic structured document, Document AI is the mode to evaluate — and its dependence on the smaller model is one reason the Mistral Small model family matters to the document stack.
03 — The Cost CaseThe number that actually moves a decision.
Strip away the benchmarks and the durable advantage is price. OCR 4 is $4 per 1,000 pages on the standard API and $2 per 1,000 in batch mode — a 50% discount for non-interactive workloads. Schema-driven Document AI is $5 per 1,000. The table below annualizes published list pricing across three volumes; every Mistral, Azure, and Google cell is the per-1,000-page rate multiplied out by hand.
| Offering | Per 1,000 pages | 10K pages / yr | 100K pages / yr | 1M pages / yr |
|---|---|---|---|---|
| Mistral OCR 4 | ||||
| OCR 4 — Batch API | $2.00 | $20 | $200 | $2,000 |
| OCR 4 — Standard API | $4.00 | $40 | $400 | $4,000 |
| Document AI (schema JSON) | $5.00 | $50 | $500 | $5,000 |
| Hyperscaler document AI | ||||
| Azure Doc Intelligence — Read | $1.50 | $15 | $150 | $1,500 |
| Azure Doc Intelligence — Custom | $30.00 | $300 | $3,000 | $30,000 |
| Google Form Parser | ~$30.00 | ~$300 | ~$3,000 | ~$30,000 |
| Open-weight, self-hosted | ||||
| Baidu Unlimited-OCR | No per-page fee* | Infra only* | Infra only* | Infra only* |
The 100K-page row is the one to sit with. A firm processing 100,000 pages a year pays $200 in OCR 4 batch mode versus $3,000 on Azure Document Intelligence’s custom extraction tier — a 15x gap that holds at every volume. At the standard $4 API rate the gap is about 7.5x. Azure’s Read tier matches OCR 4’s entry economics at $1.50 per 1,000, but it returns text without the bounding boxes and block types that make OCR 4’s output automation-ready.
A caveat on the bottom row. Baidu’s open-weight Unlimited-OCR has no per-page license fee, but “free” is not zero: you pay for GPU compute, deployment, and operations. A precise per-page figure depends on your hardware, utilization, and throughput, so the table marks those cells as an infrastructure estimate rather than a quoted rate. The honest comparison is a managed per-page price against an amortized infrastructure cost you have to model for your own load.
04 — BenchmarksThe scores, read honestly.
Mistral reports a top-line OlmOCRBench score of 85.20 and calls it the “top overall score.” That claim deserves a caveat. The public OlmOCRBench leaderboard — last updated May 21, 2026, before OCR 4’s release — places Infinity-Parser2-Pro at 87.6 and Chandra-2 at 85.9 above it, and VentureBeat independently notes OCR 4 would rank roughly third on the current public board. OCR 4’s 85.20 is a vendor-submitted figure that does not yet appear on the independently reproduced leaderboard.
OlmOCRBench · vendor-stated OCR 4 vs the independent leaderboard
Source: Mistral (OCR 4, vendor-stated); OlmOCRBench public leaderboard via CodeSOTA, last updated May 21, 2026The rest of the benchmark story is similarly vendor-framed, and worth keeping in that frame. OlmOCRBench itself, built by the Allen Institute for AI, runs 7,010 unit tests across 1,403 PDFs in seven categories, with per-score uncertainty of roughly a point either way — so small gaps between models are inside the noise. The figures below are Mistral’s own; treat them as directional.
Vendor-stated
Mistral's reported OmniDocBench score. For context, PaddleOCR-VL-1.6 self-reports 96.33, though that result has not been independently reproduced on the public leaderboard either.
Average win rate
Average head-to-head win rate against leading competitors across 600+ real-world documents in 12+ languages, judged by independent annotators Mistral commissioned. The annotators were independent; the study was vendor-run.
Crawl Multilingual
Mistral's internal multilingual evaluation, reported as leading across all eight language groups. This is an internal benchmark and cannot be independently verified.
What does the 85.20 actually measure? The table below maps OlmOCRBench’s seven categories to the back-office failure each one predicts — the difference between an abstract leaderboard and a decision about whether to trust extraction on your own documents.
| Category | What it checks | What failure looks like in your workflow |
|---|---|---|
| arXiv Math | Equation fidelity in LaTeX | A formula in a research report or actuarial model transcribes wrong, silently changing a result. |
| Tables | Row/column structure recovery | An invoice or financial statement loses its grid, so totals land in the wrong fields downstream. |
| Headers / Footers | Boilerplate vs body separation | Page numbers, disclaimers, or letterhead bleed into the extracted body text of a contract. |
| Multi-Column | Reading order across columns | A two-column policy or terms document gets interleaved, scrambling clauses out of sequence. |
| Old Scans | Degraded-image legibility | An archived deed, claim file, or shipping record returns garbled text the pipeline cannot trust. |
| Old Scans Math | Formulas on degraded scans | Both failure modes stack — a faint historical engineering or finance document loses its numbers. |
| Long / Tiny Text | Dense or small-font passages | Fine-print footnotes and dense appendices — exactly where the binding terms hide — drop out. |
05 — DeploymentAPI, marketplace, or your own jurisdiction.
OCR 4 ships three ways to consume it, and the third is the strategic one. Beyond the managed API and the cloud marketplaces, Mistral supports a single-container self-hosted deployment — letting a regulated enterprise process sensitive documents entirely inside its own infrastructure, with no routing to an external U.S.-jurisdiction cloud API. For organizations weighing the broader tradeoffs, our self-hosted deployment decision guide covers the infrastructure side in depth.
Mistral API / Studio
Lowest-friction path. Per-page billing, batch discount for non-interactive jobs, and Document AI schema extraction in the same place. Best when data residency is not a hard constraint and you want to move fast.
SageMaker · Microsoft Foundry
Run OCR 4 inside an existing AWS or Azure footprint, billed through accounts you already govern. Snowflake Parse Document integration is announced as coming soon. Best when you have committed cloud spend and procurement rails.
Single-container self-host
Documents never leave your infrastructure — the answer to data-residency and sovereignty requirements as the EU AI Act's high-risk obligations approach on August 2, 2026. Self-hosting a commercial model is not the same as open weights.
"At some point, you need to be able to turn it off or turn it on, and you don't want to leave it to another country."— Arthur Mensch, CEO, Mistral AI, on AI sovereignty (London Tech Week, June 2025)
06 — The FieldWho OCR 4 is actually competing with.
OCR 4 arrives against established hyperscaler document AI and a fast wave of open-weight models. The most direct counterpoint shipped one day earlier: Baidu’s Unlimited-OCR, a 3-billion-parameter, MIT-licensed model that parses entire PDFs in a single forward pass and gathered roughly 1,800 GitHub stars in its first 24 hours. It is free and self-hosted — and it has no managed API and no enterprise SLA, which is exactly the gap OCR 4’s paid tier fills. Mistral’s own open-weight model lineage is part of why a self-hosting story is even credible from this vendor.
Azure Document Intelligence
The incumbent comparison. Read tier at $1.50 per 1,000 pages matches OCR 4's entry price but without bounding boxes; the custom extraction tier runs $30 per 1,000 — the 15x gap at the top of this post.
Google & Amazon
Google's Form Parser runs ~$30 per 1,000 pages; Amazon Textract is the established AWS option. Deep ecosystem integration, but priced well above OCR 4's per-page economics for structured extraction.
Baidu Unlimited-OCR
Free, MIT-licensed, self-hosted, 3B params, single-pass PDF parsing. No managed API and no enterprise SLA — you own the deployment and the operations. The DIY counterpoint to a paid managed model.
ABBYY · Textract incumbents
Mature intelligent-document-processing suites with template libraries and human-in-the-loop tooling built in. Strong on entrenched workflows; the question is per-page cost and how much of the new structured output you'd be re-buying.
07 — Market & MomentumA land-grab for the ingestion layer.
The pricing makes more sense as strategy than as a margin play. The global intelligent document processing market was about $2.30B in 2024 and is projected to reach $12.35B by 2030 at a 33.1% CAGR, with BFSI the largest segment. OCR 4 feeds directly into Mistral’s Search Toolkit as the ingestion layer for RAG and enterprise search — so winning document extraction is really about owning the front door to every downstream AI workflow.
The financial backdrop fits that ambition. Mistral is targeting €1 billion in revenue for 2026, up from roughly €200 million in 2025, and is reportedly in early discussions to raise about €3 billion at a valuation near €20 billion — nearly double its €11.7 billion Series C from September 2025. No round has been announced as of late June. Pricing OCR 4 to undercut the hyperscalers by an order of magnitude is how you buy share in a market growing at 33% a year, and pairs naturally with Mistral’s broader enterprise AI stack.
IDP market forecast
Up from $2.30B in 2024 at a 33.1% CAGR, per Grand View Research. North America holds 32%+ of the 2024 market and BFSI is the largest end-use segment — the buyers OCR 4's sovereignty story targets.
2026 revenue target
Up from roughly €200M in 2025 — a 5x target (Le Monde, via VentureBeat). OCR 4 and its document-AI pipeline are central to that trajectory, which is why the per-page price is set to win share.
Reported funding talks
Mistral is reportedly in early discussions to raise ~€3B at roughly €20B — nearly double its €11.7B Series C (Sep 2025), when ASML took an 11% stake. No deal has been announced as of June 25.
08 — Putting It to WorkFrom a structured output to a pipeline that approves itself.
The practical payoff of confidence scores is a pipeline that does not ask a human to read every page. Set a threshold; auto-approve regions above it; route the rest to review. Bounding boxes give the reviewer the exact spot to look, and block types let you apply different rules to tables, signatures, and free text. That is the difference between an OCR tool and a document-automation system — and it is where the customer testimonials, hedged appropriately, point.
Two of those testimonials are worth quoting with that hedge in mind. Rogo, a financial-AI firm, reported reaching equivalent accuracy at roughly 8x lower cost and 17x lower latency versus leading agentic document parsers; Anaqua, an IP-management firm, reported OCR 4 is roughly 4x faster per page than its incumbent. Both are customer statements on single, undisclosed datasets — directional evidence, not reproduced benchmarks. The right move is to run OCR 4 against your own documents before you commit a forecast to it.
If you are mapping document automation onto a real back-office process — invoice capture, claims intake, contract review, CRM data entry — the structured output is the input, but the value is in the workflow around it. That scoping work is exactly what our AI digital transformation engagements start with: a confidence-gating design and an honest per-page cost model before any vendor commitment.
Invoice & form capture
Batch mode at $2 per 1,000 pages plus confidence-gated review is the strongest fit. The 100K-page row in the cost table is this use case — $200 a year in extraction versus thousands on a custom hyperscaler tier.
Structured JSON pipelines
When you need fields in a fixed shape, Document AI at $5 per 1,000 reshapes OCR output into custom schemas via a second-pass model — worth the premium only if the schema step earns it.
Sovereignty-bound workloads
Single-container self-hosting keeps documents in your jurisdiction as the EU AI Act's high-risk obligations approach. Model the amortized infrastructure cost against the managed per-page price for your real volume.
No commercial dependency
If open weights are non-negotiable, evaluate Baidu Unlimited-OCR instead and budget for the GPU and ops you'll own. OCR 4 in a container is sovereign, but it is still a commercial license.
09 — ConclusionA pricing move dressed as a model release.
Document automation just became a cost question, not a capability one.
Mistral OCR 4 is best understood less as a benchmark winner than as a pricing and packaging move. The structured output — bounding boxes, block types, confidence scores — is genuinely useful and removes integration work teams used to do by hand. The per-page price, at $2 in batch, is what makes large-scale digitization economically boring in the best sense: a 100,000-page archive for $200 stops being a budget conversation.
Keep the benchmark claims in their box. The 85.20 OlmOCRBench, 93.07 OmniDocBench, and ~72% win-rate figures are vendor-stated, and on the independent public leaderboard OCR 4 would sit around third, not first. Mistral’s own “directional rather than definitive” framing is the right posture to borrow — and the reason to run the model against your own documents rather than trust the headline.
The forward read is straightforward. With Baidu shipping a free open-weight parser the day before and Mistral pricing a managed model at an order-of-magnitude discount to the hyperscalers, the margin in raw extraction is compressing fast. The value is migrating to the workflow above it — confidence-gating, schema design, and the sovereignty wrapper — and to whoever owns the ingestion layer feeding every downstream RAG and search pipeline. That, not a leaderboard row, is what OCR 4 is really competing for.