35%
reduction in ad spend wastage
Multi-Touch AI Attribution Engine Cutting Ad Wastage by 35% for Pixis
Pixis
35%
Ad wastage reduction
5M+
Events processed daily
6wks
Full platform delivery
3
Ad platforms integrated
Context
The business context
Attribution is one of marketing's oldest unsolved problems. Every platform claims credit. Last-click models give all credit to whoever closes the conversion. First-click models over-reward awareness channels. Multi-touch models exist in theory but most implementations are correlation machines - they tell you what happened, not why. Pixis was building AI marketing products for enterprise clients and needed an attribution engine that was both accurate and explainable - because marketing directors need to defend budget decisions in board meetings, not just in dashboards.
The problem
5 specific problems that needed solving
Last-click attribution gave 100% conversion credit to retargeting and branded search - channels that were intercepting organic conversions rather than creating them
Marketing teams were cutting upper-funnel spend based on attribution data that made those channels look like wasted budget
No existing model could disentangle channels that were genuinely driving incremental conversions from those cannibalising organic
Attribution reports were black boxes - clients trusted the numbers but couldn't explain the methodology to internal stakeholders
Each ad platform's native attribution tool used different windows, different counting logic, and different attribution models - making cross-platform comparison meaningless
Our approach
Causal inference, not just correlation.
Standard multi-touch attribution assigns fractional credit based on touchpoint position or frequency - but position and frequency aren't causation. A user who clicks a branded search ad two minutes before purchasing wasn't converted by that ad; they were already going to buy. We used Shapley value game theory - a method from cooperative game theory used to fairly distribute credit among contributing players - to model what each touchpoint genuinely contributed to the probability of conversion, controlling for what would have happened without it. This gave Pixis a model they could explain mathematically to clients, not just show as a black-box score.
Shapley value attribution: each channel's credit is its marginal contribution to conversion probability, averaged across all possible orderings of the customer journey
Holdout experiment framework: regular geo-split tests validate the model's incremental contribution estimates against real-world holdout groups
Per-client model training: each Pixis client gets a model fine-tuned on their own conversion data, not a generic industry model
Weekly retraining pipeline: campaign creative, audience, and seasonal effects shift constantly - the model retrains on a rolling 90-day window to stay calibrated
What we built
A real-time attribution pipeline processing millions of events per day
The platform consists of an event ingestion layer, an attribution computation engine, and a Next.js reporting dashboard. The event ingestion layer connects to the Google Ads API, Meta Marketing API, and TikTok Ads API via a custom ETL pipeline built on Apache Spark and Airflow. Raw conversion events are unified into a single event schema (solving cross-platform discrepancies in event naming and timing). The attribution engine applies the Shapley-value model to each completed conversion journey, producing per-channel credit scores that feed into the reporting dashboard. The model is stored in versioned MLflow experiments, with automatic retraining triggered every seven days.
Event unification layer
Custom ETL normalises events from Google, Meta, and TikTok into a single canonical event schema - resolving discrepancies in click attribution windows, view-through counting logic, and conversion event naming across platforms. This alone eliminated the 'attribution gap' that made cross-platform reporting meaningless.
Shapley attribution engine
For each completed conversion journey, the engine enumerates the contribution of each touchpoint using Shapley values - mathematically fair credit distribution from cooperative game theory. Runs on Apache Spark to process 5M+ events per day within a 4-hour computation window.
Incremental validation framework
Monthly geo-split holdout tests measure the actual incremental impact of each major channel. Results are fed back into the model as calibration signals, ensuring the Shapley estimates track real-world incrementality rather than drifting over time.
Explainability layer
Every attribution report includes a methodology note, channel-level confidence intervals, and a 'what changed' comparison to the previous model version - so marketing directors can defend the numbers to CFOs and media agencies.
Automated retraining pipeline
An Airflow DAG triggers weekly model retraining on a rolling 90-day conversion window. MLflow tracks every model version, enabling rollback if a retrain degrades accuracy on the validation dataset.
Impact
What changed in production
The 35% wastage reduction was measured against actual client spend reallocation following the first 90 days of model output - not a theoretical estimate.
35% reduction in ad spend wastage in 90 days. Platform now processes 5M+ events per day. Used as a core differentiator in Pixis's client-facing product.
35%
Ad wastage reduction
5M+
Events processed daily
6wks
Full platform delivery
3
Ad platforms integrated
“We finally have an attribution model we can defend in a budget meeting. It changed how our clients think about performance marketing entirely - and it's now a core part of our product differentiation.”
Head of Data Science
Head of Data Science - Pixis
Learnings
What we took away from this project
Explainability is a commercial requirement, not an engineering nice-to-have
Every technically accurate attribution model we'd seen in the market was a black box. Pixis's clients needed to walk into budget reviews and explain why they were reallocating spend away from channels that appeared to be 'working' under last-click. The Shapley value framing gave them a mathematical narrative: 'This channel's marginal contribution, when we account for what would have happened without it, is X.' That made the model commercially viable in a way that a higher-accuracy black box would not have been.
Cross-platform event unification is underestimated
We scoped the event unification layer as two weeks of work. It took four. Every platform uses different attribution windows, different conversion event schemas, and different logic for view-through vs click-through. Building a truly unified event model - one where a conversion in Google and a conversion in Meta mean the same thing - required significantly more mapping work than expected. This is the unsexy part that makes everything else work.
Weekly retraining is necessary, but needs guardrails
Campaign creative, audiences, and seasonality shift constantly. A model trained in January will be miscalibrated by March. But automated retraining without quality gates can also introduce degradation silently. We built an automated validation gate that compares new model performance on a held-out test set against the previous version - and blocks deployment if accuracy declines, alerting the team for manual review instead.
35%
reduction in ad spend wastage
At a glance
Tech stack
Capabilities
Build something similar?
We've solved this category of problem before. Let's scope yours.
Start a conversation View related service