Event Stores and Tags: A Misunderstood Optimization

A Critique of Tagging in Event Stores

Event Stores and Tags: A Misunderstood Optimization
Image Credit: Olga Peshkova

Tags in event stores sound helpful, even obvious. They promise easier correlation, cleaner queries, and clearer intent. Instead of digging into payloads, you just filter by labels. The event type tells you what happened; the tag tells you to whom.

That is the argument made in this article on DCB (Dynamic Consistency Boundary), which presents tagging as a pragmatic enhancement to event-sourced systems.

I used to see it the same way. My stance was: you can use tags, but you don't have to.
Ralf Westphal challenged that. Repeatedly. And to be honest, I did not fully appreciate the depth of his point, until now.

Tags add nothing to the concept of an event store. Tags make it harder to understand and more difficult to implement.
Tags for Event Stores? | Ralf Westphal
Tags for Event Stores? I understand the motivation, but I still disagree. This sounds very reasonable: 1. “Tags help correlate events with specific instances in the domain” 2. “While an event type tells us what happened, tags tell us to whom or to what it happened.” But then... I cannot shed the feeling it’s excessive. It’s more than needed. It’s diluting the conceptional beauty of Event Sourcing. As for 1: What are “instances in the domain”? Something is referred to without explaining it. I would have understood, if there was a need to tie together events, to related events to each other. But what’s this ominous stuff in the domain? (Of course I am playing dumb here. I know what’s meant. But to me the phrasing is a petitio principi.) To me it’s the other way around: “instances” of whatever can be constructed at any time from events as needed as an act of creation. It’s a matter of need and imagination. To make the event payload opaque would hinder that. To give preference to something like tags would hinder that. As for 2: “to whom or what it happened” again is referring to something else. An “instance in the domain”, probably. But do you see that in nature? Do events come with a label? Or do we construct labels from events? I believe the latter is the case. That’s what the brain is doing all day long: processing raw events, correlating them (in time and space), looking for patterns, and then constructing “regions of stability” from them. Objects are abstractions. We’d rob the users of Event Stores of the ability to do that by making the payload opaque and limiting them to correlation by preconceived tags. We are talking about Event Stores. Events are more than blobs of data. Otherwise we’d be talking about blob stores.😉 That means to me it’s conceptually sounds to separate event types (why) and associated partial application state aka data aka payload (what). Opacity of payload (black box) to me is not part of this concept (black). Defining event types or payload structures up-front (like a database schema) on the other hand would be in opposition to the concept (white). So I don’t think a faithful implementation should be black or white - but grey. That means: the Event Store knows the meta-schema of payloads, for example that it’s always JSON. With this knowledge an Event Store supports (re)construction of whatever consumers of events see fit. It allows selection of events according to arbitrary patterns. That, to me, is perfectly in line with the original concept. This also easily supports all sorts of “ES fads”😉: - opaque payloads: { dataUrl: “https://...” } - streams: { streamID: ”...”, data: { ... } } - tags: { data: { ... }, tags: [ {customer: “alice-smith”} ] } Grey payloads are highly flexible, conceptionally parsimonious, fast enough until proven (!) otherwise for concrete use cases. Tags are adding nothing to the concept of an Event Store. Tags are making it harder to understand, and more difficult to implement.

He was right.

What tags introduce is a premature structure. They assume that correlation must be declared at write time. That events must carry external identifiers describing their "subject". Although it sounds efficient, it quietly undermines the very basis of event sourcing.

Event Sourcing Is About Deferring Meaning

The strength of event sourcing is that events do not require immediate interpretation.
They simply record what happened, allowing meaning to emerge later depending on context and intent.

Events do not declare what they mean; they become meaningful through interpretation. Event logs are not spreadsheets. They are streams of recorded facts, waiting to be understood from different perspectives, at different times, by different consumers.

This article is not an attack on tags as metadata. It's a critique of tagging as a core feature of event stores because that crosses a line:
It introduces a black-and-white mindset into a system that thrives in shades of grey.

The Tag Temptation

Tags are seductive.

They promise clarity and convenience. Add a "customer: alice-smith" or "order: 12345" to your event, and correlation becomes simple. It feels like good housekeeping like documenting what the event is about.

But it's a shortcut with a cost.

In the DCB article, tags are said to correlate events with “instances in the domain.” That wording carries an assumption: that such instances already exist, independently of the events. That they are real, stable things the system should recognize and tag.

But that's not how event-sourced systems work.

"Instances" (customers, deliveries, projects) are not predefined. They are retrospective constructs, shaped by event interpretation. They are derived from patterns and context but not from tags.

By tagging events, we are no longer just annotating. We are declaring. We are baking one interpretation into the storage layer, saying: this event happened to Alice, even though “Alice” may not exist yet, or that meaning might change later.

Worse, this shifts focus away from what happened toward metadata about what we think it meant.

That's a shift from event-first to structure-first thinking. It's reversing causality.

And once tags become the primary mechanism for querying or grouping events, they stop being metadata, and start being structural. At that point:

We are no longer sourcing events. We are indexing snapshots. That's database thinking.

Black, White, Grey

When it comes to event modeling, two extremes are easy to recognize:

  • Black box: Events are opaque blobs. They're appendable and replayable but you can't inspect or query them.
  • White box: Events follow rigid schemas. They're tightly versioned, fully structured, and easily queryable but inflexible.

Both are wrong.

The sweet spot is grey.

Events that are introspectable, typically JSON, but without fixed contracts. No enforced schemas, no premature structure. Just enough form to support interpretation, but never to prescribe it.

As Ralf puts it:

"Grey payloads are highly flexible, conceptually parsimonious, fast enough until proven (!) otherwise."

Tags pull you out of grey.

They impose a fixed structure for correlation. They encode assumptions about identity, ownership, and domain semantics into every write. They hardcode a worldview that might not hold tomorrow.

Tags Are Not Free

At first glance, tags seem harmless. Just a few fields to help with querying. But here's what they really do.

Tags must be attached at write time. That means the writer needs to know the correct subject of an event in advance. Which identity? Which domain concept? According to which rule?

You're committing to an interpretation before you've even seen what happens next. That's not just brittle; it's also backwards in terms of how we know things.

Tags are informal contracts. A tag like order_id = 123 or customer = alice-smith assumes those concepts are universal and stable. Over time, they become required conventions. Queries start to rely on them. If they’re missing or change format, things break.

It's schema drift just without migrations.

To make tags useful, you need to index them. That means query paths, maintenance overhead, consistency rules. All of this, just to query facts that are already in the payload.

Why add a second layer of structure, when the payload already contains the truth?
Why trust a label, when you can inspect the fact?

A Note on Performance

Performance matters deeply. But it must be approached with precision, not assumption.

Premature optimizations like tagging for correlation often solve unproven problems while introducing hidden complexity.

If your system needs indexing, prove it with real workloads and real bottlenecks.

Until then, favor clarity over convenience.

In most systems, a JSON-based event store with introspectable payloads and projection-driven queries is fast enough until it's measurably not.

That's when you optimize. Not before.

Want Tags? Model Them

Let's be clear: correlation is important. But the right place to model correlation is not in a metadata side-channel. It's in the stream itself.

If tagging matters in your domain, make it explicit. Treat it as behavior. Record it as an event.

Instead of tagging "customer: alice-smith", record:
CustomerWasTagged { customerId: "alice-smith", tag: "vip" }.

Now that information is part of the log: versioned, auditable, replayable. It reflects a real action, not an implicit assumption. You can track when a correlation was made, by whom, and in response to what.

This approach keeps the event store honest. It avoids structural assumptions. It respects causality.

And it aligns with a foundational truth of event-sourced systems:

Principle of the Event Producer

The producer of events does not care about — nor take care of — event consumers.

Its sole responsibility is to faithfully record what happened. In that, the producer must be ego-less. It must not encode interpretation into the event.

That’s difficult because decisions must still be made: when to record, what to record, at what granularity.

But the goal is always the same: capture what happened, without collapsing it into what we think it means.

When producers start embedding their own interpretations (whether through tags, stream IDs, or inferred identities) they close off future possibilities. They decide too early what matters, and in doing so, they let potential information fall through the cracks.

That's why event sourcing works best when producers stay impartial and let meaning emerge later, when it's actually needed.

Let Event Stores Be Event Stores

An event store is not a document store. Not a blob store. Not a read-optimized table.

It is a log of what happened in the order it happened with no declared meaning beyond the facts it records.

The moment you start injecting tags, identities, and labels, you are no longer modeling behavior. You are modeling structure.

You are turning facts into opinions. You are asking the store to behave like a relational index, and that's not its job.

Let the event store be the source of truth and not the source of structure. Let correlation happen downstream. Let meaning emerge. Let consumers decide.

That’s the whole point: The job of the event store is simple, capture what happened. Faithfully. Durably. Transparently.

Everything else is interpretation.

Cheers!

Subscribe to Rico Fritzsche

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe