Why does marketing attribution fail even with sophisticated models?

Attribution models process whatever signal exists in the underlying data. If that data has gaps -- missing touch points, broken cross-device identity, inconsistent event taxonomy across channels -- the model amplifies those gaps into its output. A sophisticated model applied to fragmented data produces a more precisely wrong answer, not a more accurate one.

What is data unification in the context of attribution?

Data unification for attribution means: a consistent event taxonomy across all channels and platforms; a single canonical identity graph that resolves the same user across sessions, devices, and channels; complete channel coverage; agreed attribution windows applied consistently across all sources; and a data warehouse that is the single source of truth rather than an aggregation layer on top of platform-reported numbers.

What is identity resolution and why does it matter for attribution?

Identity resolution is the process of connecting the same customer's interactions across different sessions, devices, and channels into a single identity. Without it, a customer who encounters a brand on mobile, returns on desktop, and converts on a laptop appears as three separate users in the data. The converting session gets all the attribution weight; the earlier touch points are invisible. This systematically over-credits last-touch channels and under-credits channels that operate earlier in the journey.

Should you invest in a better attribution model or better data infrastructure first?

Data infrastructure first. Attribution model improvements compound on the data layer beneath them. A better model applied to fragmented data returns a more sophisticated description of the same underlying problem. A standard multi-touch model applied to clean, unified data produces outputs that are meaningfully more reliable than a data-driven model applied to incomplete identity resolution and inconsistent event tracking.

How does the deprecation of third-party cookies affect attribution?

Third-party cookies provided a cross-site identity mechanism -- imperfect but functional. Their deprecation pushes identity resolution towards first-party mechanisms: authenticated sessions, email matching, server-side tracking. Each of these is more reliable within its scope but covers less of the full customer journey. The practical result is that identity gaps widen for organisations that have not built a first-party data strategy, and attributing cross-channel journeys becomes structurally harder without that investment.

Why Marketing Attribution ROI Starts with Unified Data

Summary

The marketing attribution debate is about which model to use. The more consequential question is whether the data those models run on is reliable enough to produce actionable outputs. In most organisations it is not — because identity resolution is broken, event taxonomy is inconsistent across channels, and no single source of truth exists. Until the data layer is unified, choosing between last-click and data-driven attribution is a debate about which lens to apply to an unreliable picture. The data layer investment has a higher return than the modelling layer investment, and it comes first.

The marketing attribution debate has been running for a long time. Last-click attribution is too simple: it credits the final interaction and ignores everything that preceded it. First-click misses the conversion context entirely. Multi-touch attribution is more honest about the journey but depends on assumptions about how credit should be divided. Data-driven attribution uses observed behaviour rather than assumed weights, which sounds better, until you ask what observed behaviour it is actually working with. Marketing mix modelling captures offline channels and long attribution windows that digital tracking cannot measure. Each position has genuine merit.

What the debate consistently skips is the question upstream of all of it: is the data these models are processing reliable enough to produce outputs you can act on? In every organisation I have worked with, across every sector and maturity level, the honest answer has been: partially. Often considerably less than partially.

Attribution models do not generate signal. They process signal that already exists in the data layer. When that layer has gaps — missing touch points, broken cross-device identity, inconsistent event definitions across platforms — the model processes those gaps alongside legitimate data and produces an output that looks like analysis. The problem is not that the output is obviously wrong. It is that it is structured, well-presented, and wrong in ways that the output itself cannot flag.

The Channel That Always Looks Good

The most consistent symptom of fragmented attribution data is the systematic over-crediting of last-touch channels. The mechanism is straightforward once you see it.

A customer encounters a brand for the first time through a display ad. They engage with a social post three days later. They see a retargeting ad at the weekend. They search by name and convert. The converting click is tracked. The prior touch points — depending on cross-device identity resolution, cookie consent state, and whether those channels use consistent user identifiers — may not be connected to the converting user in the data at all.

The result is that paid search, direct, and email — the channels most likely to capture an identifiable user at the end of the journey — look disproportionately effective. The channels that operated earlier in the journey, often the channels that created the demand being captured, look correspondingly less effective. The attribution model faithfully represents the data it received. The data was incomplete before the model ever saw it.

The channel that closes the deal looks better than it is. Every channel that opened the deal looks worse than it is.

This is not a new problem. It has been understood since the earliest days of digital attribution. It persists because the fix requires infrastructure investment that sits upstream of any attribution model, in territory that is less visible and less celebrated than a new attribution tool: data collection architecture, event standardisation, and identity resolution.

What Identity Resolution Actually Requires

The identity problem is harder than most discussions acknowledge, and the difficulty is structural rather than technical.

A customer who encounters your brand on a mobile browser, returns on a desktop browser, and converts using a laptop is, without identity resolution, three separate users in your data. Their customer journey is three separate single-session visits. The one that converted gets all the attribution credit. The other two are invisible. Their contribution to the outcome disappears.

Solving this requires connecting those three sessions into a single identity. Before third-party cookie deprecation, cross-site cookies provided a mechanism for this — imperfect, contested on privacy grounds, but functional across much of the open web. Their removal has pushed identity resolution towards first-party mechanisms: authenticated sessions, email-based matching, server-side event tracking that sends user identifiers alongside conversion events. Each of these is more reliable than third-party cookies within its scope. None of them covers the full journey as broadly.

Identity resolution is not a feature you configure. It is an infrastructure problem you solve incrementally, with imperfect coverage that improves as first-party data accumulates.

First-party identity requires that users authenticate, or at minimum provide an identifier such as an email address, at a point that can be connected back to their earlier anonymous sessions. That requires product design choices — where authentication is prompted, what value is offered in exchange — as well as data engineering choices: how identifiers are stored, how sessions are stitched, how the identity graph is maintained over time as email addresses change and devices turn over.

Organisations that have not made these investments are running attribution models on a partial identity graph. The model does not know it is working with partial data. It produces outputs calibrated to what it can see, and what it cannot see is structurally correlated with the channels you most need to understand.

The Unification Problem Is an Engineering Problem

Data unification for attribution is not a configuration task inside an analytics platform. It is a data engineering task, and its scope is usually larger than marketing teams estimate because the work happens in infrastructure rather than in dashboards.

Unified attribution data requires several things to be true simultaneously. A consistent event taxonomy: the same event names, the same parameter structures, the same conversion definitions applied across every channel and every platform. Without this, you cannot aggregate touch points across sources without introducing definitional inconsistency into the join. A single canonical identity graph: one record per customer, resolved across sessions, devices, and channels, maintained as the authoritative source rather than reconstructed at query time. Complete channel coverage: paid, owned, and earned touch points tracked in a way that allows them to be assembled into a coherent journey. Agreed attribution windows, applied consistently across all sources rather than each platform defaulting to its own definition. And a data warehouse that is the single source of truth — not an aggregation layer on top of platform-reported figures, but the layer where all events land and where identity resolution and taxonomy normalisation are applied.

Most organisations have none of these fully in place. They have GA4 tracking some events, a CRM tracking others, a CDP that was deployed for email segmentation holding a third view of the customer, and paid channels reporting their own attributed conversions using their own attribution windows. An attribution model built on this patchwork is measuring the patchwork, not the customer journey.

Multi-Touch, MMM, and the Question They Cannot Answer Alone

Marketing mix modelling has seen significant renewed interest as a response to cookie deprecation and the limits of user-level digital attribution. The attraction is real: MMM works at an aggregate level, does not depend on individual user tracking, and can incorporate offline media and long attribution windows that digital touch-point models cannot capture. For organisations with significant offline spend or long consideration cycles, it is often the more appropriate modelling frame.

But MMM is not a substitute for unified digital data. It is a complement. MMM works by correlating spend patterns with outcome patterns at an aggregate level. The digital data it consumes needs to be consistent over time and across channels for the model to identify clean signal. Organisations that have fragmented digital tracking — where event definitions change, where GA4 migration introduced discontinuities, where channel measurement is inconsistent across quarters — feed those inconsistencies into the MMM as noise. The model cannot reliably separate the effect of a channel from the artefact of how that channel was measured.

A better attribution model applied to fragmented data produces a more sophisticated description of the same underlying problem.

The more accurate framing is that MMM and multi-touch attribution answer different questions and require different data inputs. MMM answers questions about channel-level investment efficiency over time. MTA answers questions about individual journey dynamics. Both are more valuable when the underlying data is unified and consistent. Neither substitutes for the data work.

What Unified Data Actually Unlocks

The practical implication of unified data is not just that attribution numbers become more accurate, though they do. It is that a set of analytical capabilities that were previously structurally impossible become viable.

Incrementality testing — the most defensible way to measure the true contribution of a channel rather than its correlation with conversion — requires a clean baseline. You need to know what conversion rate looks like in the absence of a channel before you can credibly measure what it looks like with it. That baseline is uncertain when the data layer is fragmented. You cannot run a reliable holdout test when the measurement is unreliable inside the holdout.

Budget optimisation modelling requires outputs you can act on with confidence. Attribution outputs are used to make allocation decisions: more budget here, less there. If the outputs are calibrated to a partial identity graph, the optimisation compounds the identity gap into the allocation decision. You are optimising for what you can see, and what you cannot see is not randomly distributed.

Customer lifetime value modelling, retention analysis, cohort comparisons — all of these depend on being able to connect the acquisition channel to the subsequent customer behaviour. Without identity resolution, the acquisition-to-behaviour link breaks. The data to answer "which acquisition channel produces the highest lifetime value customers?" does not exist in fragmented form.

Sequencing the Investment

The practical sequence for improving attribution ROI is: data layer first, modelling layer second. This runs against the usual instinct, which is to adopt a new attribution platform because it is visible, it produces outputs quickly, and it creates the impression of progress. A new modelling tool applied to fragmented data produces new-looking outputs with the same underlying reliability problem.

The data layer investment has three components. The first is a tracking audit: map every touch point in the customer journey, identify every gap in event coverage, assess where identity resolution currently breaks down and what user volume is affected. This produces a prioritised list of the gaps whose closure would most change the attribution picture. The second is server-side tracking for high-value conversion events. This reduces the platform dependency that creates both data gaps and attribution window inconsistency, and establishes a first-party event record that survives browser privacy restrictions. The third is a data warehouse as the single source of truth: not an aggregation layer, but a warehouse where all events land, where identity resolution is applied, and where the attribution model runs against a unified, normalised data set.

This work does not produce a dashboard on day one. It produces a foundation. The attribution model that runs on that foundation is not just more accurate. It is interpretable in a way that fragmented data cannot support. When it says a channel is over-invested, the number means something. That is what attribution ROI actually requires — not a more elegant model, but reliable inputs.

Key Takeaways

Attribution models process the data they receive. They cannot compensate for gaps in identity resolution or inconsistencies in event taxonomy — they incorporate those gaps into their outputs.
The systematic over-crediting of last-touch channels is a data problem, not a modelling problem. It occurs because earlier touch points are not connected to the converting user in the identity graph.
Identity resolution requires first-party mechanisms post-cookie: authenticated sessions, email-based matching, server-side tracking. Each covers less of the journey than third-party cookies did, requiring deliberate product and engineering choices to extend coverage.
Marketing mix modelling does not substitute for unified digital data. It complements it. MMM run on inconsistent digital data absorbs measurement artefacts as if they were channel effects.
Incrementality testing and LTV-by-channel analysis are structurally dependent on unified data. They are not just more accurate with better data — they are not meaningful without it.
The investment sequence is: tracking audit, server-side conversion tracking, warehouse as single source of truth. Data layer before modelling layer. The return on the data investment is higher because model improvements compound on the data below them.

FAQ

We already use a data-driven attribution model in GA4 — isn’t that enough?: GA4’s data-driven attribution model is better than rules-based alternatives within the GA4 data set. The limitation is the data set itself: it covers only the sessions and conversions that GA4 can track within its consent and browser constraints, using Google’s own identity mechanisms. It does not resolve identity across devices outside the Google ecosystem, does not incorporate channels that do not report to GA4, and uses Google’s own attribution windows. It is a good tool for what it can see. Understanding what it cannot see is the more useful investment.
How do we prioritise which data gaps to close first?: Start with the gaps that most affect the decisions you are actually making. If paid search and brand spend allocation is the active budget question, prioritise closing the identity and attribution window inconsistencies that affect those channels. If the question is upper-funnel channel efficiency, prioritise the touch points that operate early in the journey where identity resolution is most likely to break down. A tracking audit that maps gap volume to decision relevance produces a prioritised fix list more efficiently than addressing everything equally.
Does server-side tracking solve the identity problem?: Server-side tracking solves one part of it: the loss of conversion signal due to browser privacy restrictions, ad blockers, and cookie consent declines. It does not automatically resolve cross-device or cross-session identity. For that, you still need a first-party identifier — typically an email address or authenticated user ID — that can connect server-side conversion events to the prior anonymous sessions. Server-side tracking is a prerequisite for complete conversion measurement; identity resolution requires it plus a first-party data strategy.
Is this argument against adopting a new attribution platform?: Not categorically. Attribution platforms differ significantly in their identity resolution capabilities, their ability to ingest first-party data, and their attribution modelling approaches. The argument is that the data layer work is not optional and cannot be deferred until after the platform decision. Evaluate platforms partly on how well they support first-party identity resolution, whether they can consume a warehouse-level identity graph, and whether they produce auditable outputs you can trace back to the underlying data.

Why Marketing Attribution ROI Starts with Unified Data

The Channel That Always Looks Good

What Identity Resolution Actually Requires

The Unification Problem Is an Engineering Problem

Multi-Touch, MMM, and the Question They Cannot Answer Alone

What Unified Data Actually Unlocks

Sequencing the Investment

Key Takeaways

FAQ

Working on attribution or data infrastructure?

Theo Valmis

© Theo Valmis
*This website uses cookies only for statistical purposes

Why Marketing Attribution ROI Starts with Unified Data

The Channel That Always Looks Good

What Identity Resolution Actually Requires

The Unification Problem Is an Engineering Problem

Multi-Touch, MMM, and the Question They Cannot Answer Alone

What Unified Data Actually Unlocks

Sequencing the Investment

Key Takeaways

FAQ

Working on attribution or data infrastructure?

Why Your Data Lake Is Not Ready for AI

Why AI Governance Must Shift Left

Open Source as Validation for Developer Tools

Theo Valmis

© Theo Valmis *This website uses cookies only for statistical purposes

© Theo Valmis
*This website uses cookies only for statistical purposes