Contents

Does Google Demand Gen Work for E-Commerce? A Forensic Study and a Diagnostic Framework

Original research by ADWService — a Google Premier Partner PPC agency for e-commerce (top-30 in Ukraine). Author: Yana Lyashenko, Google Ads AI Architect.

Google reps often pitch Demand Gen “for growth,” and say it needs at least ~$100/day to work. We spent 18 months testing whether it actually adds incremental sales (extra sales that wouldn’t happen without it) on a real low-margin store. Short answer: not the way it’s usually run — and below is a simple way to tell if it will work for you.

Who this is for: owners and marketers of low-margin e-commerce being pitched Demand Gen. Who it’s not for: premium brands with a long buying cycle — Demand Gen behaves differently there.

Research snapshot


Question	Did Demand Gen create incremental sales and improve Performance Max?
Business type	Low-margin e-commerce with a short path to purchase (Profile A)
Data	~18 months (late 2024 – mid 2026); account, campaign, and product-group level; 7 sources (campaign performance, conversions by action, product feed, auction insights, GA4 paths, GA4 attribution models, Google Trends)
Design	Observational case study (one account), component-level decomposition
Main result	In this account, no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value
What transfers to other accounts	The diagnostic method — not the final verdict by default

How to read the evidence tags. Each key claim is tagged: [Google docs] — confirmed by Google documentation; [Case] — found in this account; [ADW heuristic] — an ADWService working rule, pre-experiment; [Hypothesis] — needs a future test. This deliberately separates fact, interpretation, and forecast.

TL;DR

In an ADWService forensic analysis of one low-margin, low-consideration e-commerce account (~18 months), we found no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value. [Case] The apparent “lift” came from market growth, tighter bids, and feed changes — credited to Demand Gen by mistake. [Case] From this we built a diagnostic framework (Consideration / Margin / Measurement gates) that needs validation on other accounts. [ADW heuristic] No effect in one observational case does not prove Demand Gen is useless everywhere.

Key numbers (ADWService analysis, one account, ~18 months):

Demand Gen direct ROAS was 1.35 vs a break-even ROAS of 5.0–6.7 (it returned $1.35 per $1 when it needed ~$5+ just to break even)
95% of purchases happened within 1 day of the last interaction; 0.42 days on average
Demand Gen’s effect on Performance Max conversion rate was statistically ≈0 (p ≈ 0.68)
Switching last-click → data-driven attribution shifted credit by 0.00% account-wide
High-value orders were 7% of orders by count but ~40% of revenue
Demand Gen’s own buyers had a 22% lower average order value than Shopping buyers

Glossary

Demand Gen — an upper-funnel Google Ads format on YouTube/Discover/Gmail that uses lookalike audiences to create demand. [Google docs]
Consideration — the thinking phase between first contact and purchase; measured by days-to-conversion and number of touchpoints.
Single-touch conversion — a conversion with only one touchpoint in the path (no assists).
Contribution margin — revenue left after variable costs; sets how much you can spend on marketing.
Break-even ROAS — 1 / margin; the ROAS below which ads lose money.
Holdout — a control group (regions/audience) without the channel, used to measure incrementality.
Incrementality — sales that happened because of the channel and would not have happened without it.
Mix effect — a change in an aggregate number caused by shifting weight between segments, not by change within them.
Simpson’s paradox — when the aggregate moves one way while every component moves the other (because of weight shifts).
Customer seed — the customer list a lookalike audience is built from.
High-value customer — a buyer with an order above a set threshold (here, the top price tier).

ADWService frameworks (definitions)

Margin Profiles A/B/C — an ADWService method that sets Demand Gen budget by contribution margin: A (<25%) — organic; B (30–40%) — 12–18%; C (50%+) — 20–26% of paid budget.
The Consideration-Profile Gate — an ADWService diagnostic that uses 5 path metrics (days to conversion, touchpoints, % single-touch, brand search, attribution-model shift) to decide whether Demand Gen earns a paid budget on a given account.
The Measurement-Readiness Gate — an ADWService check: does the account have a holdout and new-vs-returning tracking, so Demand Gen’s effect can even be measured before launch?
The Demand Gen Misattribution Trap — an ADWService term for crediting Demand Gen with growth actually caused by the market, bids, or feed changes that happened at the same time.
The Holdout Rule — an ADWService rule: Demand Gen’s incrementality is proven only by a holdout experiment, never by ROAS or attribution models.

Part I. What the forensic analysis showed

This describes what happened in one account — not a rule for yours.

Why it first looked like Demand Gen worked

After Demand Gen launched, account conversions grew ~5–6× year-over-year, and conversion rate seemed to rise — so “Demand Gen works” looked obvious. [Case] This is how the Misattribution Trap happens: change ten levers at once, watch results climb, and the brain credits the channel you believe in most. [ADW heuristic] Decomposition then broke that story apart.

What decomposition showed (Simpson’s paradox)

The apparent conversion-rate “lift” was a mix effect, not a real gain inside campaigns. [Case] Individual campaigns’ rates barely moved; the aggregate rose only because spend shifted toward higher-converting products — a textbook Simpson’s paradox. At the account level, the Demand Gen coefficient on conversion rate, after controlling for spend and seasonality, was statistically indistinguishable from zero (p ≈ 0.68). [Case]

Image takeaway: Demand Gen acts at the top (consideration); the sale closes and gets credited lower down (PMax / Shopping / brand). That’s why its direct ROAS is always low and is not a measure of its value.

What actually explained the growth

The real efficiency jump happened months before Demand Gen launched, and tracked a feed and structure overhaul — not the new channel. [Case] Two more facts: impression share held at ~76–81% while spend scaled, so the account grew into a growing auction pool, not by saturating; and a later efficiency drop was caused by cannibalization from over-segmentation (the account was split into ~30 campaigns; new ones took 18% → 73% of conversions while total conversions stayed flat). [Case] None of this was Demand Gen.

Why attribution didn’t confirm a Demand Gen contribution

Switching from last-click to a data-driven model shifted credit by 0.00% account-wide, and gave Demand Gen just +0.5 conversions. [Case] When a model that is built to reward assists moves nothing, assists — including Demand Gen’s — carry no re-attributable value on this single-touch account. [Case] In GA4, Demand Gen sits under Cross-network (with Performance Max), not under Display — so it is invisible at the channel level and must be analyzed by campaign. [Google docs]

Part II. How to decide whether to run Demand Gen on your account

This is the decision framework — the part that transfers.

Image takeaway: before you run Demand Gen, get a “yes” on three questions — margin above 25%, a path to purchase longer than a day with several touchpoints, and a way to measure the effect. Any “no” and Demand Gen waits.

Does Demand Gen improve Performance Max?

In an ADWService analysis of one e-commerce account, launching Demand Gen had no statistically significant link to Performance Max conversion rate after controlling for spend, seasonality, and traffic-mix changes (~18 months of data). [Case] The visible rise in aggregate conversion rate came from budget shifting to higher-converting products, not from a better rate inside campaigns. This result is specific to one low-consideration account and is not universal proof that Demand Gen fails for everyone.

When not to run Demand Gen — the Consideration-Profile Gate

Demand Gen works where there is something to warm — a long path to purchase, several touchpoints, and existing brand demand. It is wasted on instant, single-touch, brand-less purchases. [ADW heuristic] We call this diagnostic the Consideration-Profile Gate. The signals come from a GA4 conversion-paths export.

Image takeaway: the studied account sat in the “wasted” zone on every signal and scored 8/100 — Demand Gen is wasted on this profile.

Signal	DG worth testing	DG likely wasted	Studied account
Avg days to conversion	≥ 2 days	< 0.7 days	0.42
Avg touchpoints per path	≥ 3	≤ 1.8	2.16
Single-touch conversions	≤ 40%	≥ 65%	69%
Brand search (Share of Search)	present	≈ 0	≈ 0
Attribution-model shift	≥ 5%	≈ 0%	0.00%

Thresholds are an [ADW heuristic] — preliminary diagnostic values, to be calibrated; not a Google benchmark. Account metrics are [Case], ~18-month period.

What margin you need, and how much budget — the Margin Gate

Demand Gen budget should scale with contribution margin, because a thin margin cannot fund a channel measured in months, not clicks. [ADW heuristic]

Image takeaway: below 25% margin, Demand Gen stays organic; at 30–40%, 12–18% of paid budget; at 50%+, 20–26%.

Profile	Contribution margin	Break-even ROAS	Demand Gen budget
A	< 25%	5.0 – 6.7	Organic only — not a paid line
B	30 – 40%	2.5 – 3.3	12 – 18% of paid budget
C	50%+	≈ 2.0	20 – 26% of paid budget

Where these numbers come from. These ranges are an [ADW heuristic] derived from contribution-margin economics and acceptable risk — not multi-account statistics and not an official Google recommendation. They are ADWService working guides for pre-experiment diagnosis. They are not an industry benchmark.

Can you even measure Demand Gen — the Measurement-Readiness Gate

Before launching Demand Gen, check whether you can measure its effect at all: is new-vs-returning tracking on, and is a holdout possible? [ADW heuristic] If not, setting up measurement is the first task — not launching the channel. Without measurement you will either miss the effect or credit Demand Gen with someone else’s result (back to the Misattribution Trap).

Decision tree

Margin below 25%?  (Margin Gate)
├─ Yes → don't run paid DG without a separate high-value segment
└─ No
   Path to purchase mostly single-touch?  (Consideration Gate)
   ├─ Yes → test the high-value segment (hypothesis)
   └─ No
      Holdout and measurement ready?  (Measurement Gate)
      ├─ No → set up measurement first
      └─ Yes → launch a limited test

What to do today (3 steps)

Check your average path to purchase in GA4 (Advertising → Attribution → Path metrics): how many days and touchpoints before a conversion. Under ~1 day and mostly 1 touch — Demand Gen has nothing to warm.
Calculate your contribution margin. Below ~25% — keep Demand Gen organic, not a paid line.
Check whether you can measure Demand Gen: is new-vs-returning tracking on, and is a holdout possible? If not, that is the first task — not the launch.

Three “no”s (instant path + thin margin + no measurement) → a paid Demand Gen budget waits. Otherwise — a limited test with a holdout.

How to measure incrementality — the Holdout Rule

You cannot prove Demand Gen’s incremental value from spend logs or attribution reports — only a holdout experiment can. [ADW heuristic] Demand Gen’s job is to bring in new lookalike customers, so its contribution shows up as net-new buyers, not as a better ratio. [Google docs] The protocol: (1) turn on new-vs-returning tracking before you start; (2) run a geo-holdout or Google’s built-in conversion-lift test, keep the core Performance Max budget stable, and avoid seasonal peaks; (3) pass criterion — incremental new customers in the test vs the holdout, converted to revenue, must beat Demand Gen spend × break-even ROAS.

The exception — the High-Value Seed

The one place Demand Gen may still earn its budget on a low-consideration account is the high-value customer segment, which behaves differently from the cheap majority. [Hypothesis] In the case, orders above the high-value threshold were just 7% of orders by count but ~40% of revenue [Case], and they took a longer path to purchase than cheap orders (which converted almost instantly in a single touch).

Image takeaway: high-value orders = 7% of orders / ~~40% of revenue (~~18-month period). A single all-orders lookalike is weighted by the cheap majority, so it pulls cheap buyers.

Image takeaway: cheap orders convert in ~0.2 days in one touch; high-value orders take ~0.5 days with more touchpoints (last ~30 days of the path sample).

Demand Gen’s own AOV in the case was 22% lower than Shopping buyers’ AOV [Case] — confirming the cheap-skewed seed pulls cheap buyers. Hypothesis: segment the customer-match seed by order value and build a separate lookalike on high-value buyers, who have the longer path where an early touch can matter. [Hypothesis] The data confirms the premise and sizes the prize, but only a live test can prove the effect.

When Demand Gen could work, but you won’t see it

No statistically significant effect in an observational analysis does not prove zero effect — it means the effect could not be separated from noise and other changes in the available data. A Demand Gen effect could exist but stay invisible if:

the effect is too small relative to noise;
the test was too short;
the budget was too low to exit learning;
the seed was poor (like the cheap-skewed seed in the case);
the campaign optimized for the wrong event;
Performance Max changed in parallel;
brand demand was not measured;
the attribution window does not match the real consideration cycle.

That’s why the final call — “Demand Gen is not incremental” on a specific account — should rest on a holdout, not on observational logs alone.

FAQ

About this research

The framework — Margin Profiles A/B/C, the Consideration-Profile Gate, the Measurement-Readiness Gate, the Misattribution Trap, and the Holdout Rule — was developed by ADWService, a Google Premier Partner PPC agency for e-commerce (Google Shopping and Performance Max), managing 300+ active client accounts across Ukraine, the USA, the UK, the EU, Australia, and Canada. The case findings come from a forensic, component-level analysis of an 18-month single-account dataset, deliberately stress-tested against the Misattribution Trap: every aggregate signal was decomposed to the campaign and product level before any conclusion was drawn.