The GA4 BigQuery export is the single biggest unlock in Google Analytics 4, and most teams never turn it on. It streams every raw, event-level row out of GA4 into a Google Cloud data warehouse where you can query it with SQL — unsampled, uncapped by the 14-month UI retention limit, and free to enable for standard properties.
What stops people is not access. It is the schema. GA4 data lands in BigQuery as deeply nested, repeated records — arrays of structs that no other Google product makes you reckon with directly. There are four different places traffic source can live, no native session table, and timestamps in microseconds. The gap between flipping the export on and getting trustworthy answers is wider than most setup guides admit.
This reference closes that gap. It covers why the export beats the UI for serious analysis, how to enable it correctly, how to read the nested schema, the four competing traffic-source structures and which one to trust, how to rebuild sessions from raw events, what it actually costs, and the sharded-to-partitioned migration every data team eventually runs. For the dimension-by-dimension mapping back to the GA4 interface, pair this with our GA4 dimensions and metrics reference.
- 01The export is free to enable and unsampled.Every standard GA4 property can export to BigQuery at no platform charge — only Google Cloud usage beyond the free tier (1 TiB query / 10 GiB storage per month) costs anything. You get 100% of collected events with no Explorations-style sampling.
- 02The schema is nested and repeated, not flat.event_params is a REPEATED RECORD — an array of key/value structs with four typed value columns. Pulling any custom parameter means an UNNEST subquery. This is the learning curve that stops most teams.
- 03Four traffic-source structures exist; only some align with the UI.traffic_source (user-scoped), collected_traffic_source (raw event-scoped), session_traffic_source_last_click (session-scoped, attribution applied), and event_params keys all describe acquisition differently. Picking the wrong one diverges from GA4 UI reports.
- 04There is no native session table.GA4 stores ga_session_id inside event_params, not as a top-level field. A session key is built by concatenating user_pseudo_id with the extracted ga_session_id — counting session_start events directly gives wrong answers.
- 05Migrate sharded tables to partitioned to control cost.GA4 exports one events_YYYYMMDD table per day. Consolidating into a single partitioned, clustered table lets partition pruning cut bytes scanned dramatically on date-filtered queries — vendor benchmarks cite up to 80-95% savings.
01 — Why ExportWhat the UI cannot give you.
The GA4 interface is built for predefined reports and exploration. It is not built for arbitrary questions, joins against your own business data, or guaranteed completeness. Three hard limits push serious analysis off the UI and into the warehouse.
Sampling. GA4 Explorations apply data sampling once a dataset crosses roughly ten million events or includes high-cardinality dimensions. BigQuery export gives you 100% of collected events with no sampling, regardless of property size. (Treat that figure as approximate — Google does not publish an official sampling threshold, and it varies by report type and date range.)
Retention. The GA4 UI caps event-level data retention at a maximum of 14 months. Data exported to BigQuery is not subject to that limit — it persists as long as the table exists, giving you indefinite raw history. The catch: the export only captures data collected after you enable it. There is no historical backfill, so turning it on early is the cheapest insurance you can buy.
SQL, not screenshots. Raw event rows mean you can join GA4 data to CRM records, compute custom attribution windows, reconstruct funnels exactly how your business defines them, and feed clean tables to downstream tools. This is the foundation underneath every serious GA4 AI analytics dashboard — the dashboard is only as good as the model beneath it.
One honest caveat before you build dashboards on top of BigQuery: the export removes sampling, but it does not produce perfect parity with the GA4 UI. Legitimate differences remain — privacy-based data thresholding, timezone handling, and session-definition edge cases all cause numbers to diverge. The right claim is "no sampling," not "identical to the UI."
02 — Enabling ExportLinking GA4 to a Cloud project.
You link a GA4 property to a Google Cloud project from Admin → Product links → BigQuery links. Two permission levels are required: Editor or higher on the GA4 property, and Owner on the destination Cloud project. Once linked, GA4 creates a single dataset named analytics_<property_id> and begins populating it. (Firebase projects use the same schema and an analytics_<project_id> dataset.)
One setting is permanent and easy to get wrong: the data location you choose at link time cannot be changed. Switching it later means deleting and recreating the link, which opens a gap in your data. Pick the region deliberately the first time.
Daily vs streaming export
You choose one or both export types, and the difference matters for both cost and correctness:
Daily export
Free to enable. Creates one full-day table per day, complete with first-user traffic_source. Analytics back-fills late-arriving events for up to 3 days, so yesterday's table can be incomplete if queried immediately — wait 48-72 hours before treating a daily table as final.
Streaming export
Near-real-time intraday tables, populated continuously then deleted at day-end. Carries a streaming-insert surcharge (vendor-reported around $0.05 per ~600K events). Crucially, first-user traffic_source does not populate for new users in intraday tables — do not use them for acquisition attribution.
First-user traffic_source fields are absent from intraday (streaming) tables. If you build acquisition reporting on intraday data, new-user source, medium, and campaign come back null or misleading. Use the daily events_YYYYMMDD tables for any acquisition attribution, and reserve intraday for genuine real-time needs.
03 — The SchemaReading nested and repeated records.
The reason GA4 data feels hard in BigQuery is the shape, not the volume. Each row is one event, and the most useful detail lives inside nested, repeated fields rather than flat columns. Three structural facts account for almost every beginner mistake.
event_params is an array of structs
event_params is a REPEATED RECORD — an array where each entry holds a key and four typed value columns: string_value, int_value, float_value, and double_value. Only one value column is populated per parameter. To read a single parameter you UNNEST the array and filter on the key — there is no event_params.page_location shortcut. The ecommerce and items records, by contrast, sit at the top level, not inside event_params.
Timestamps are microseconds, UTC
event_timestamp is stored in microseconds, not milliseconds. Convert with TIMESTAMP_MICROS(event_timestamp) before applying DATE() — applying DATE() straight to the raw integer is one of the most common errors, and it silently returns garbage rather than failing.
Parameters carry hard limits
Custom parameters are capped at the collection layer and carry through to the export: 25 custom parameters per event, 40 characters for a parameter name, and 100 characters for a parameter value. Automatically collected fields have their own caps (for example, page_location allows up to 1,000 characters). Design event schemas with these ceilings in mind — truncated values are not recoverable after the fact.
Tame UNNEST with a temp function
Repeating an UNNEST subquery for every parameter you extract makes queries long and error-prone. A reusable temporary function wraps the UNNEST logic once — a pattern like CREATE TEMP FUNCTION GetParamString(event_params ANY TYPE, name STRING) lets the rest of the query read like plain column access. It is the single highest-leverage habit for teams of analysts sharing one codebase.
"BigQuery arranges data in nested and repeated fields. UNNEST means that the nested structure is expanded so that each item within can be joined with the rest."— Simo Ahava, Analytics Engineer
One performance fork is worth internalizing early. A subquery UNNEST — extracting one specific parameter per SELECT expression — is the right tool when you want a handful of named parameters as columns. A CROSS JOIN UNNEST that expands every parameter into rows is appropriate when you are pivoting or auditing the full parameter set, but it multiplies row count and cost. Most guides treat the two as interchangeable; they are not. Pick by intent, and you avoid a class of surprise bills.
04 — Traffic SourcesFour structures, one that matches the UI.
This is where most GA4 BigQuery analyses quietly go wrong. The schema carries four distinct traffic-source structures, each with a different scope, a different (or absent) attribution model, and different availability. Choose the wrong one and your acquisition numbers diverge from the GA4 UI with no error to warn you.
The newest structure, session_traffic_source_last_click (STSLC), was added to the export schema on July 17, 2024, and a cross_channel_campaign sub-record carrying default and primary channel groupings followed on October 9, 2024. STSLC is the record to use when you need session acquisition that lines up with GA4 UI session reports, because it is the only one that applies last-non-direct-click attribution at session scope.
| Structure | Scope | Attribution | Daily / Intraday | Added | Use when |
|---|---|---|---|---|---|
| traffic_source | User (first acquisition) | None applied | Daily only | Original | First-touch / user acquisition |
| collected_traffic_source | Event | None (raw) | Both | May 2023 | Raw per-event values, debugging |
| session_traffic_source_last_click | Session | Last non-direct click | Daily | Jul 17, 2024 | Matching GA4 UI session acquisition |
| event_params (source/medium) | Event | None (legacy) | Both | Original | Legacy queries · higher query cost |
The practical rule: use session_traffic_source_last_click when you want session acquisition that reconciles with the GA4 UI, use traffic_source for first-touch user acquisition, and reach for collected_traffic_source only when you specifically need raw, unattributed per-event values (for example, debugging tagging). The legacy event_params keys still work but cost more to query because they force an UNNEST — prefer the dedicated records where they exist. This is the kind of plumbing our analytics and measurement engagements standardize so reporting stays consistent across a team.
05 — SessionizationRebuilding sessions from scratch.
GA4 has no native session table in BigQuery. There is no session_id top-level column to group by. Instead, ga_session_id lives inside event_params as an integer (it is the Unix timestamp of session start), and a unique session is identified by combining it with the user.
The standard approach is to compute a session key by concatenating user_pseudo_id with the extracted ga_session_id. Counting session_start events directly is the tempting shortcut, and it gives wrong answers — sessions can span the daily-table boundary and the event does not fire in every reconstruction scenario. Build the composite key and count distinct keys instead.
What GA4 counts as engagement
GA4 redefined sessions around engagement, and you can rebuild that definition from raw events. GA4 treats a session as engagedwhen it lasts 10+ seconds, contains a conversion event, or registers 2+ page views. Engagement Rate, Engagement Time, and Conversions are the metrics that replaced Universal Analytics' session-and-bounce-rate model — and all of them are reconstructable in SQL from the event stream once you have a reliable session key.
user_pseudo_id + ga_session_id
No native session_id exists. Concatenate the user_pseudo_id with the ga_session_id extracted from event_params to get a unique, stable session identifier you can COUNT(DISTINCT ...) on.
Or conversion, or 2+ views
GA4 marks a session engaged at 10+ seconds, a conversion event, or 2+ page views. All three conditions are reconstructable from raw events, letting you align BigQuery engagement metrics with the GA4 UI definition.
Default session window
GA4's default session timeout is 30 minutes of inactivity. When you reconstruct sessions manually rather than relying on ga_session_id, this is the inactivity gap that should close one session and open the next.
06 — CostWhat it actually costs to run.
For most marketing sites, the GA4 BigQuery export is effectively free, but "basically free" is not a budget. Here are the numbers that matter, drawn from the Google Cloud pricing page (verify current rates before you commit — Google adjusts them periodically).
Query (on-demand): $6.25 per TiB scanned, with the first 1 TiB per month free per billing account. Storage: 10 GiB per month free, then $0.02/GiB/month for active storage and $0.01/GiB/month for long-term storage (tables a partition has not been modified in 90+ days drop to that rate automatically — a 50% discount). A small site collecting on the order of tens of thousands of events a month can sit comfortably inside the free tier, with storage cost measured in cents per year.
BigQuery on-demand pricing for GA4 export · 2026
Source: Google Cloud BigQuery pricing (verify current rates)Standard GA4 properties cap the daily (batch) export at 1 million events per day. If a property consistently exceeds that ceiling, Google may pause daily exports outright — the overflow is lost, not queued. (GA4 360 raises the daily limit dramatically, but that is enterprise pricing.) High-volume sites should monitor event counts and plan for streaming or a 360 upgrade before they hit the wall.
The cost picture scales predictably with property size and query discipline. The table below is a directional budget sanity check — the event-volume bands map to the table strategy you should pick and whether the free tier still covers you.
| Site size | Events / month | Free tier covers it? | Daily export risk | Table strategy |
|---|---|---|---|---|
| Micro | ~10K | Yes, comfortably | None | Sharded is fine |
| Small | ~100K | Yes, with care | None | Sharded; add date filters |
| Medium | ~1M | Usually, if queries are scoped | Low | Migrate to partitioned |
| Large | ~10M | Query cost is real; storage minor | Low | Partitioned + clustered |
| Enterprise | ~100M+ | No — budget query spend | Approaching 1M/day cap | Partitioned + clustered; consider 360 |
Event-volume bands are illustrative and meant for budgeting, not invoicing. Actual query cost depends almost entirely on how disciplined your WHERE clauses are — which is the subject of the next section.
07 — OptimizationFrom sharded to partitioned.
GA4 exports date-sharded tables — a separate events_YYYYMMDD table for every day. To query a range you use the events_* wildcard and the _TABLE_SUFFIX pseudo-column: WHERE _TABLE_SUFFIX BETWEEN '20260101' AND '20260131'. Forget that filter and BigQuery scans every matching table, which is the most common way teams accidentally blow through the free query tier in an afternoon.
The durable fix is to consolidate the shards into a single native partitioned table partitioned on event_date. Partition pruning then skips the days a query does not touch automatically, without the metadata overhead of maintaining thousands of individual daily tables. Vendor benchmarks cite query-cost reductions of up to 80-95% for typical analytics workloads — treat that as an upper-bound estimate that depends heavily on your query patterns and whether date filters are consistently applied, not a guaranteed outcome.
Then add clustering within partitions. For marketing queries that filter by event type and user, the recommended order is event_name first, user_pseudo_id second. Clustering further trims the bytes scanned inside each partition. The usual production pattern is a partitioned, clustered destination table plus a scheduled query that appends each finalized daily shard — and because daily tables back-fill for up to three days, schedule that append to run on data that is 48-72 hours old so you never copy an incomplete day.
Date-sharded events_YYYYMMDD
What GA4 gives you out of the box. Fine for micro and small sites. Query with the events_* wildcard and always include a _TABLE_SUFFIX date filter — omitting it scans every table and bills accordingly.
Partitioned + clustered
Consolidate shards into one table partitioned on event_date, clustered by event_name then user_pseudo_id. Partition pruning plus clustering cuts bytes scanned sharply on date-and-event-filtered queries. Backfill once, then append finalized days via scheduled query.
Always filter the date range
Whatever table strategy you run, the single biggest cost lever is the WHERE clause. A query without a date bound on a multi-year dataset is the fastest path to an unexpected bill. Make date filtering a non-negotiable review item.
"Without a data model, your GA4 export doesn't unlock value — it hides it."— OWOX analytics team
08 — Get Hands-OnWhere to practice before you link.
You do not need a live property or a billing account to learn the schema. Google publishes an obfuscated sample dataset from the Google Merchandise Store that is the canonical practice ground for every official query example.
bigquery-public-data.ga4_obfuscated_sample_ecommercesupport.google.com — Export schemadevelopers.google.com — Basic queries| Resource | What it is | How to use it |
|---|---|---|
bigquery-public-data.ga4_obfuscated_sample_ecommerce | 3 months of obfuscated GMS data | Three months of real GA4 ecommerce events (Nov 1, 2020 – Jan 31, 2021), accessible in free BigQuery Sandbox mode with no billing required. The dataset behind every official GA4 query example. |
support.google.com — Export schema | Official field reference | The source of truth for every field name, type, and nesting level. Bookmark it — Google adds fields without always updating older help articles, so re-check before relying on a field. |
developers.google.com — Basic queries | Official SQL examples | Google's own starter queries, including the canonical UNNEST patterns and _TABLE_SUFFIX date filtering. Run them against the sample dataset to see the schema behave before touching production data. |
A sensible learning path: open the sample dataset in Sandbox mode, run Google's basic queries, then rewrite each one using a temp function for parameter extraction. Once the nested-and-repeated model clicks, the jump to your own property is small. If you want the export wired into a production reporting stack — partitioned tables, scheduled appends, reconciliation against the GA4 UI — that is exactly the work our analytics engineering team does, and it pairs naturally with server-side tracking when you want server-collected events writing custom data alongside the GA4 stream.
09 — ConclusionA checkbox that changes what analytics can answer.
Raw event-level data turns analytics from a report into a question engine.
The GA4 BigQuery export is the line between using analytics and interrogating it. The UI answers the questions Google anticipated; the export answers the questions your business actually has — custom attribution, joins to CRM data, funnels defined your way, all unsampled and retained indefinitely. And for standard properties it is free to turn on.
The cost of entry is learning the schema, and that is real: nested and repeated records, four competing traffic-source structures, no native session table, microsecond timestamps. None of it is hard once you have seen it, but all of it is unforgiving if you guess. Use session_traffic_source_last_click for UI-aligned acquisition, build a composite session key, convert timestamps before you date them, and never run a query without a date filter.
Looking forward, the trajectory is clear: as AI-assisted analytics and agentic reporting mature, the teams that win will be the ones sitting on clean, complete, queryable raw data — not screenshots of dashboards. The export is the cheapest strategic move in the GA4 stack. Turn it on today, even if no one queries it for a year, because the one thing you cannot buy back later is the history you did not capture.