An event taxonomy — a structured system for naming, organizing, and categorizing the user actions and properties your product captures — is the foundation every dashboard, funnel, and cohort sits on. When it is sound, analysis is boring and fast. When it rots, every downstream number quietly becomes a guess, and nobody can say which ones.
Taxonomies rarely fail loudly. They erode. One engineer logs Song Played; another, six weeks later and unaware, logs song played. Amplitude treats those as two entirely separate events — and the metric that was supposed to count one behavior now splits across two irreconcilable streams. No malice, no bug report, just entropy. The plan on the wiki still looks correct; the data underneath it no longer matches.
This guide covers the pathology and the cure: why event data decays, the naming convention that has become the industry default, the client-versus-server capture decision, the cross-platform identity trap that silently corrupts every cohort, how the major analytics platforms enforce governance, and — the part most teams skip — how to turn a tracking plan from a document into a gate that runs in your CI pipeline.
- 01Event data rots from entropy, not malice.Locally rational naming choices made by individual engineers under deadline pressure, without coordination, accumulate into a globally irrational namespace. Capitalization drift alone — Song Played vs song played — fragments a single metric into two.
- 02Object-action naming is the default convention.It describes the action that already happened, makes events discoverable, speeds naming decisions, and reduces duplicate names. Avo's framework covers four axes: casing, format, tense, and a controlled vocabulary. Amplitude recommends 10–200 events, up to 20 properties each.
- 03Server-side capture is the reliable path for core events.Ad blockers reportedly affect a meaningful share of client-side traffic — vendor estimates put it at 25–40%, though that range is not independently audited. The recommended model is hybrid: autocapture for baseline behavior, server-side instrumentation for 10–20 core KPIs.
- 04Identity stitching is the under-covered decay vector.When distinct_id is assigned inconsistently across anonymous and identified sessions, every downstream cohort is silently wrong. Call identify() at first load and again after login; use set vs set-once correctly to preserve original acquisition context.
- 05A tracking plan is worthless without a gate.The most common failure is organizational, not technical: the plan exists but is not enforced. Treat tracking like code — validate schema in CI, require review for tracking changes, populate an owner field. Enforce before events ship, not after the data is already broken.
01 — The PathologyWhy event taxonomies rot — a tragedy of the commons.
The decay is structural. Tracking is implemented at the edges of a codebase, by many people, over a long stretch of time, usually under deadline pressure. Each engineer makes a defensible local choice — checkout_complete versus Checkout Completed versus purchase — at the exact moment a feature ships. None of those choices is wrong in isolation. The damage is in the aggregate: an uncoordinated set of locally rational decisions produces a shared namespace that nobody designed and everybody has to live with.
Atticus Li frames this precisely as a tragedy of the commons, and the framing is worth quoting in full because it names the mechanism rather than the symptom.
Each engineer makes a locally rational naming choice at the moment of implementation, but the cumulative effect of many locally rational choices, made without coordination, produces a globally irrational system where the shared namespace degradation imposes costs on everyone who needs to use the data downstream.— Atticus Li, Event Tracking Architecture
Naming drift is the visible symptom, but it is not the only decay vector — it is just the easiest to see. Property types mutate when an amount field arrives as a number in one event and a string in another. Autocaptured events shift when a button is renamed or a page restructured, breaking the dashboards built on top of them. And the subtlest killer, covered in Section 05, is identity: anonymous-to-identified stitching that fails silently at login, poisoning every cohort downstream without throwing a single error.
The corrosive part is the false confidence. A plan that has drifted from reality is, in a real sense, worse than no plan at all — it invites teams to trust numbers that no longer mean what the documentation says they mean. The question for any non-trivial product is not whether tracking will degrade, but how quickly, and whether anything stands in the way.
02 — The ConventionThe object-action convention, and the four axes underneath it.
The object-action naming convention has become the industry standard. The reasoning is functional, not aesthetic: it clearly describes the action that has already taken place, makes events more discoverable when someone is searching the catalog, speeds up the naming decision at implementation time, and — critically — reduces duplicate event names. Checkout Completed reads as a fact about a thing that happened, and the next engineer reaches for the same shape without thinking.
Avo's naming framework decomposes the decision into four independent axes, and the value is in fixing each one once, as a team, rather than re-litigating it at every implementation:
Pick one and never mix
snake_case, camelCase, Title Case, or lowercase. The choice matters less than the consistency — Amplitude treats Song Played and song played as two separate events, so casing drift alone fragments a metric.
[object] [action]
Either [object] [action] or [context] [object] [action]. The object-first shape is what makes a catalog browsable — events for the same object cluster together alphabetically.
Past, consistently
Past simple ("game started") or present simple ("game start") — choose once. Past tense reads naturally as a record of something that already occurred, which is what an analytics event is.
A controlled term list
An approved list of verbs and nouns. PostHog's restricted verb list — click, submit, create, view, add, invite, update, delete, remove, start, end, cancel, fail, generate, send — is a strong default; deviating from it is explicitly discouraged.
PostHog pushes the convention one notch further with a category:object_action pattern — for example account_settings:forgot_password_button_click — which adds a namespace prefix so events from the same product surface stay grouped. Property names get their own discipline: object_adjective format (user_id, item_price, member_count), boolean properties prefixed with is_ or has_ (is_subscribed, has_seen_upsell), and date or timestamp properties suffixed with _date or _timestamp.
The single most important rule across every platform is this: event names and property names must be fixed strings in your code, never generated dynamically. Variable data belongs in property values, never in event names — the moment you interpolate a user ID or a product name into an event name, you have created an unbounded namespace that no plan can govern. On sizing, Amplitude’s guidance is concrete: keep a tracking plan to between 10 and 200 event types (fewer obscures funnel analysis; more becomes a dictionary nobody can navigate), and no more than 20 properties per event. For the full vocabulary of events, properties, and identity, our analytics glossary of event, property, and identity terminology is a useful companion reference.
03 — The Failure ModesA taxonomy of the things that break a taxonomy.
Naming drift gets the press, but it is one of several distinct decay vectors — each with a different root cause, a different detection signal, and a very different recovery cost. The table below maps them. It is, deliberately, a taxonomy of the things that break a taxonomy. Read your row, find the prevention mechanism, and you have most of a governance program.
| Decay vector | Root cause | Detection signal | Recovery cost | Prevention mechanism |
|---|---|---|---|---|
| Naming drift | Uncoordinated casing / verb choices across engineers | Near-duplicate events in the catalog | Medium | Controlled vocabulary + naming gate in CI |
| Property type mutation | Same property sent as number, then string | Aggregations break or silently coerce | Medium | JSON Schema validation on ingestion |
| Owner departure | Event author leaves; no accountable steward | Nobody can explain what an event means | High | Required owner field; data-steward role |
| Autocapture schema shift | UI element renamed or page restructured | Dashboards drop to zero after a deploy | Medium | Manually instrument core KPIs, not autocapture |
| Anonymous ID fragmentation | distinct_id inconsistent at login | Cohorts silently wrong; no error thrown | High | identify() at first load and after login |
| Over-tracking noise | Plan grows past a navigable size | Consumers cannot find the right event | Low | Hold to 10–200 events; deprecate aggressively |
| Silent blocking (device-mode) | Segment omission is cloud-mode-only | Unplanned props pass through device-mode | Medium | Validate at the source; know the mode caveat |
The pattern in the recovery-cost column is the actionable insight: the two highest-cost failures — owner departure and identity fragmentation — are precisely the two that throw no error and surface no obvious signal. They are expensive to recover from because they are silent. That is the argument for prevention over cleanup: the failures that hurt most are the ones you will not notice until a quarter of cohort analysis is already built on bad foundations.
04 — Where To CaptureClient versus server, and why hybrid wins.
Client-side tracking is convenient and lossy. Server-side tracking is reliable and more work. The honest answer for almost every product is to use both, deliberately — and the reason comes down to what you can and cannot control on a user's device.
Where possible, it's a good idea to log events on your server instead of your client. Backend analytics are more reliable than frontend analytics because many users have tracking disabled or blocked on their browsers, frontend analytics often rely on JavaScript execution which can be interrupted by various factors like network issues and CORS, and you have complete control of your backend implementation.— PostHog, Product Analytics Best Practices
How big is the client-side loss? Tracking vendors put the figure at roughly 25–40% of client-side traffic affected by ad blockers, depending on industry, with Apple's Intelligent Tracking Prevention further limiting cookie lifespan and blocking third-party trackers. Treat that range as directional, not gospel — it is vendor-stated and not independently audited, and the vendors quoting it sell server-side tracking, so the incentive runs toward emphasizing the problem. The qualitative point survives the caveat regardless: a non-trivial, hard-to-measure slice of client-side events never arrives, and you cannot tell in advance which users it affects. For anything mission-critical — revenue, sign-ups, activation — that uncertainty is unacceptable, and server-side capture is the only way to remove it.
The recommended approach is a hybrid capture model: use client-side autocapture for baseline behavioral data, and manually instrument your 10–20 core KPIs server-side with structured properties. When precision is critical — sign-up events are the canonical example — Amplitude recommends instrumenting the same core event on both client and server, but with different names to avoid double-counting. PostHog makes the same demand: backend and frontend events tracking the same action must have distinct names, and the distinct_id must always be supplied for server-side events, typically from the user's database ID or session cookie. Our deeper treatment of this lives in the guide to server-side event capture.
There is a hidden cost to leaning hard on autocapture, and it is its own decay vector. Autocaptured events are tied to the structure of your UI — so when a button is renamed or a page is restructured, the captured events shift in ways that quietly break the dashboards and analyses built on them. Schema drift from autocapture is subtler than naming drift, but just as destructive, which is exactly why the core, business-critical events should be instrumented by hand rather than inferred from the DOM.
05 — IdentityThe silent killer: identity stitching that fails at login.
Naming drift and untyped properties get the attention. The vector that does the most damage with the least noise is identity. When a user is anonymous, your analytics tool assigns a distinct_id. When they log in, that anonymous identity has to be stitched to their known identity — and if that stitch is done inconsistently, or at the wrong moment, the result is not an error. It is a quietly fractured user graph where one human shows up as several, and every funnel, retention curve, and cohort built on top is wrong in ways no alert will catch.
The implementation rule is concrete: call identify() at first load (to bind the anonymous session) and again after login (to merge it into the known user). Skip the first call and you lose pre-login behavior; skip the second and your logged-in users fragment from their anonymous selves.
User properties carry their own subtle rule. PostHog distinguishes two setters, and using the wrong one corrupts your acquisition analysis:
$set — overwrites
Use for attributes that legitimately change over a user's lifetime — current location, current plan tier, current email. Each call overwrites the prior value, which is exactly what you want for a 'where are they now' attribute.
$set_once — preserves
Use for data that must never change — referral source, original signup date, first-touch campaign. Set-once preserves the original acquisition context across the user's entire lifetime, which is what makes attribution honest.
Get this distinction wrong and the damage is permanent in a specific way: if you use $set for referral source, every later session overwrites the original acquisition channel, and you lose the ability to ever say where a user truly came from. The original context is gone — there is no recovering it from the data, because the data was overwritten the moment the user came back through a different door. This is why identity discipline belongs in the same governed, reviewed layer as event naming.
06 — Governance ToolingHow the major platforms enforce governance.
Every major analytics platform now ships some form of governance tooling, but they differ sharply in how they enforce, where in the pipeline enforcement happens, and what plan tier it requires. The comparison below maps the five most relevant platforms across the governance dimensions that actually determine whether your taxonomy holds. Every cell is drawn from primary vendor documentation.
| Platform | Schema validation | CI / codegen | Event approval | Tier note |
|---|---|---|---|---|
| Amplitude | Tracking plan as single source of truth | Codegen via Amplitude Data (formerly Iteratively) | Four roles: requester, steward, owner, implementer | Version labels: proposed / active / deprecated / removed |
| Mixpanel (Data Standards) | Automated compliance checks; flags in Lexicon | Not codegen-first; proactive convention enforcement | Required metadata: descriptions, owners, media | Enterprise tier; launched Apr 16, 2025 |
| Segment Protocols | JSON Schema validation on ingestion | Tracking plan as CSV; regex value checks | Governance council recommended before enforcement | Business + Add-on tiers only (not Free / Team) |
| PostHog (Schema Management) | Compile-time validation from typed definitions | Typed defs (TS / Go / Python) via posthog-cli | posthog.json + types committed to version control | Restricted verb list; fixed-string naming rule |
| Avo (Codegen) | Type-safe code; identical names across platforms | Codegen in 12 languages; events skip Avo servers | Four-axis naming framework (casing/format/tense/vocab) | Destination-agnostic; routes direct to analytics |
Two structural choices separate these tools. The first is where validation runs: PostHog and Avo push it left to compile time via generated, type-safe code (Avo Codegen produces type-safe code in 12 languages, with events and properties named identically across every platform and codebase), while Segment Protocols validates at ingestion using JSON Schema. The second is what happens on a violation — covered next, because Segment’s answer is the most consequential and the most misunderstood.
On Track calls, Segment Protocols handles unplanned events three ways: Allow (log a violation, pass through), Omit Properties (strip undeclared properties, keep the event), and Block Event (permanently discard it). The critical caveats: a blocked event not forwarded to a separate Source is permanently discarded and cannot be recovered, and property omission plus schema-violation blocking are cloud-mode-only — unplanned properties are not stripped when sending to device-mode destinations. Plan against that gap rather than assuming enforcement is universal.
07 — The GateThe plan is not the solution — the gate is.
Here is the insight most content on this topic misses. The most common tracking-plan failure mode is not technical — it is organizational. The plan exists. It is just not enforced. Engineers implement tracking under deadline pressure without consulting it, the plan and the reality diverge, and you arrive at the state where the plan is worse than no plan at all because it provides false confidence. A document cannot prevent this. Only a gate can.
The governance shift that works is to treat tracking like code: integrate event testing into your existing CI/CD and unit-testing workflows, require code review for any tracking change, and run regular audits of event volume and property completeness. Without governance, entropy wins — the question was never whether tracking will degrade, only how fast. PostHog operationalizes this concretely: generate typed definitions (TypeScript, Go, Python) from your property groups via posthog-cli exp schema pull, then commit the posthog.json schema file and the generated types to version control so the whole team stays in sync and every tracking call is checked at compile time.
The practical artifact is a pull-request review checklist that turns the abstract plan into a pass/fail gate on every change that touches tracking:
Name passes the convention
Event name matches the agreed casing, object-action format, tense, and controlled vocabulary. It is a fixed string, not dynamically generated. No near-duplicate already exists in the catalog.
Properties present and typed
Every required property is present, each property has a declared type that matches its prior usage, and the event stays within the 20-property guideline. Type mutations are rejected at review, not discovered in a broken aggregation.
Ownership field populated
The event has a named owner or data steward accountable for its definition. An event with no owner is the owner-departure decay vector waiting to happen — the highest recovery cost in the decay table.
Deprecations replaced, not renamed
A breaking change ships with a migration plan before the old event stops collecting. Deprecated events move through proposed / active / deprecated / removed states deliberately, never abandoned in place under a new name.
That last check is the one teams skip most. Renaming an event without a migration plan does not fix a naming problem — it creates two broken time series where there was one. The disciplined move is to ship the new event alongside the old, run both until the dashboards are migrated, then formally remove the deprecated event. This is the difference between an organization that has a tracking plan and one whose plan still describes reality a year later. If you want this gate built into your stack rather than maintained by hand, our analytics engineering engagements start by wiring schema validation into CI before the first event ships.
08 — Platform LimitsThe GA4 hard limits your taxonomy must respect.
If GA4 is anywhere in your stack, its limits are not guidelines — they are hard constraints the platform enforces, and a taxonomy that ignores them silently loses data. These are the ones that bite most often.
GA4 hard limits relevant to event taxonomy design
Source: Google Analytics Help, event collection & custom-dimension limitsA few rules within those limits trip teams up repeatedly. GA4 event names are case-sensitive — my_event and My_Event are distinct, the same fragmentation trap as Amplitude — must start with a letter, may contain only letters, numbers, and underscores, may not begin with ga_, firebase_, google_, or gtag., and may not contain spaces. Worth noting on the distinct-name limit: web streams are effectively unlimited, but app streams cap distinct event names at 500 per app user — always qualify which stream you mean before claiming GA4 has no event-name ceiling.
One operational gotcha sits in custom dimensions: deleting one does not free its slot immediately. The dimension enters an archived state, and the slot only becomes available again after 48 hours. If you are juggling near the 50 event-scoped limit, plan that delay into any reshuffle. For the full GA4 picture — including how these event limits and schema constraints flow into warehouse-grade analysis — see our reference on GA4's event limits and schema constraints.
09 — ConclusionA taxonomy is a process, not a document.
The tracking plan that survives is the one that runs in CI.
Event taxonomies do not fail because someone made a bad decision. They fail because many people made locally reasonable decisions without coordination, and entropy did the rest. Naming drift, type mutation, autocapture shift, identity fragmentation — each is a distinct decay vector, and the two that cost the most to recover from are precisely the two that throw no error.
The convention is settled: object-action naming, a controlled vocabulary, fixed-string event names, 10–200 events with up to 20 properties each. The capture model is settled too: a hybrid of autocapture for baseline behavior and hand-instrumented server-side events for the 10–20 KPIs you report to the business — because a non-trivial, unmeasurable share of client-side events never arrives. None of that is the hard part.
The hard part is enforcement. A plan in a spreadsheet drifts; a plan wired into CI holds. Validate schema before events ship, require review on every tracking change, put a name in the owner field, and replace deprecated events with a migration plan rather than a rename. Treat tracking like code and the taxonomy becomes durable infrastructure. Treat it like documentation and you are simply choosing how quickly it will rot.