Does Google Demand Gen Work for e-Commerce?

Does Google Demand Gen Work for E-Commerce? A Forensic Study and a Diagnostic Framework

Original research by ADWService — a Google Premier Partner PPC agency for e-commerce (top-30 in Ukraine). Author: Yana Lyashenko, Google Ads AI Architect.

Google reps often pitch Demand Gen “for growth,” and say it needs at least ~$100/day to work. We spent 18 months testing whether it actually adds incremental sales (extra sales that wouldn’t happen without it) on a real low-margin store. Short answer: not the way it’s usually run — and below is a simple way to tell if it will work for you.

Who this is for: owners and marketers of low-margin e-commerce being pitched Demand Gen. Who it’s not for: premium brands with a long buying cycle — Demand Gen behaves differently there.

Research snapshot

  
QuestionDid Demand Gen create incremental sales and improve Performance Max?
Business typeLow-margin e-commerce with a short path to purchase (Profile A)
Data~18 months (late 2024 – mid 2026); account, campaign, and product-group level; 7 sources (campaign performance, conversions by action, product feed, auction insights, GA4 paths, GA4 attribution models, Google Trends)
DesignObservational case study (one account), component-level decomposition
Main resultIn this account, no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value
What transfers to other accountsThe diagnostic method — not the final verdict by default

How to read the evidence tags. Each key claim is tagged: [Google docs] — confirmed by Google documentation; [Case] — found in this account; [ADW heuristic] — an ADWService working rule, pre-experiment; [Hypothesis] — needs a future test. This deliberately separates fact, interpretation, and forecast.

TL;DR

In an ADWService forensic analysis of one low-margin, low-consideration e-commerce account (~18 months), we found no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value. [Case] The apparent “lift” came from market growth, tighter bids, and feed changes — credited to Demand Gen by mistake. [Case] From this we built a diagnostic framework (Consideration / Margin / Measurement gates) that needs validation on other accounts. [ADW heuristic] No effect in one observational case does not prove Demand Gen is useless everywhere.

Key numbers (ADWService analysis, one account, ~18 months):

  • Demand Gen direct ROAS was 1.35 vs a break-even ROAS of 5.0–6.7 (it returned $1.35 per $1 when it needed ~$5+ just to break even)
  • 95% of purchases happened within 1 day of the last interaction; 0.42 days on average
  • Demand Gen’s effect on Performance Max conversion rate was statistically ≈0 (p ≈ 0.68)
  • Switching last-click → data-driven attribution shifted credit by 0.00% account-wide
  • High-value orders were 7% of orders by count but ~40% of revenue
  • Demand Gen’s own buyers had a 22% lower average order value than Shopping buyers

Glossary

  • Demand Gen — an upper-funnel Google Ads format on YouTube/Discover/Gmail that uses lookalike audiences to create demand. [Google docs]
  • Consideration — the thinking phase between first contact and purchase; measured by days-to-conversion and number of touchpoints.
  • Single-touch conversion — a conversion with only one touchpoint in the path (no assists).
  • Contribution margin — revenue left after variable costs; sets how much you can spend on marketing.
  • Break-even ROAS — 1 / margin; the ROAS below which ads lose money.
  • Holdout — a control group (regions/audience) without the channel, used to measure incrementality.
  • Incrementality — sales that happened because of the channel and would not have happened without it.
  • Mix effect — a change in an aggregate number caused by shifting weight between segments, not by change within them.
  • Simpson’s paradox — when the aggregate moves one way while every component moves the other (because of weight shifts).
  • Customer seed — the customer list a lookalike audience is built from.
  • High-value customer — a buyer with an order above a set threshold (here, the top price tier).

ADWService frameworks (definitions)

  • Margin Profiles A/B/C — an ADWService method that sets Demand Gen budget by contribution margin: A (<25%) — organic; B (30–40%) — 12–18%; C (50%+) — 20–26% of paid budget.
  • The Consideration-Profile Gate — an ADWService diagnostic that uses 5 path metrics (days to conversion, touchpoints, % single-touch, brand search, attribution-model shift) to decide whether Demand Gen earns a paid budget on a given account.
  • The Measurement-Readiness Gate — an ADWService check: does the account have a holdout and new-vs-returning tracking, so Demand Gen’s effect can even be measured before launch?
  • The Demand Gen Misattribution Trap — an ADWService term for crediting Demand Gen with growth actually caused by the market, bids, or feed changes that happened at the same time.
  • The Holdout Rule — an ADWService rule: Demand Gen’s incrementality is proven only by a holdout experiment, never by ROAS or attribution models.

Part I. What the forensic analysis showed

This describes what happened in one account — not a rule for yours.

Why it first looked like Demand Gen worked

After Demand Gen launched, account conversions grew ~5–6× year-over-year, and conversion rate seemed to rise — so “Demand Gen works” looked obvious. [Case] This is how the Misattribution Trap happens: change ten levers at once, watch results climb, and the brain credits the channel you believe in most. [ADW heuristic] Decomposition then broke that story apart.

What decomposition showed (Simpson’s paradox)

The apparent conversion-rate “lift” was a mix effect, not a real gain inside campaigns. [Case] Individual campaigns’ rates barely moved; the aggregate rose only because spend shifted toward higher-converting products — a textbook Simpson’s paradox. At the account level, the Demand Gen coefficient on conversion rate, after controlling for spend and seasonality, was statistically indistinguishable from zero (p ≈ 0.68). [Case]

Why ROAS can't measure Demand Gen: it influences the consideration stage, but the sale is closed and credited at Performance Max / Shopping / branded search

Image takeaway: Demand Gen acts at the top (consideration); the sale closes and gets credited lower down (PMax / Shopping / brand). That’s why its direct ROAS is always low and is not a measure of its value.

What actually explained the growth

The real efficiency jump happened months before Demand Gen launched, and tracked a feed and structure overhaul — not the new channel. [Case] Two more facts: impression share held at ~76–81% while spend scaled, so the account grew into a growing auction pool, not by saturating; and a later efficiency drop was caused by cannibalization from over-segmentation (the account was split into ~30 campaigns; new ones took 18% → 73% of conversions while total conversions stayed flat). [Case] None of this was Demand Gen.

Why attribution didn’t confirm a Demand Gen contribution

Switching from last-click to a data-driven model shifted credit by 0.00% account-wide, and gave Demand Gen just +0.5 conversions. [Case] When a model that is built to reward assists moves nothing, assists — including Demand Gen’s — carry no re-attributable value on this single-touch account. [Case] In GA4, Demand Gen sits under Cross-network (with Performance Max), not under Display — so it is invisible at the channel level and must be analyzed by campaign. [Google docs]

Part II. How to decide whether to run Demand Gen on your account

This is the decision framework — the part that transfers.

3 questions before you run Demand Gen: margin, path to purchase, ability to measure

Image takeaway: before you run Demand Gen, get a “yes” on three questions — margin above 25%, a path to purchase longer than a day with several touchpoints, and a way to measure the effect. Any “no” and Demand Gen waits.

Does Demand Gen improve Performance Max?

In an ADWService analysis of one e-commerce account, launching Demand Gen had no statistically significant link to Performance Max conversion rate after controlling for spend, seasonality, and traffic-mix changes (~18 months of data). [Case] The visible rise in aggregate conversion rate came from budget shifting to higher-converting products, not from a better rate inside campaigns. This result is specific to one low-consideration account and is not universal proof that Demand Gen fails for everyone.

When not to run Demand Gen — the Consideration-Profile Gate

Demand Gen works where there is something to warm — a long path to purchase, several touchpoints, and existing brand demand. It is wasted on instant, single-touch, brand-less purchases. [ADW heuristic] We call this diagnostic the Consideration-Profile Gate. The signals come from a GA4 conversion-paths export.

The Consideration-Profile Gate: the signals that decide whether Demand Gen earns a paid budget, and where the studied account falls

Image takeaway: the studied account sat in the “wasted” zone on every signal and scored 8/100 — Demand Gen is wasted on this profile.

SignalDG worth testingDG likely wastedStudied account
Avg days to conversion≥ 2 days< 0.7 days0.42
Avg touchpoints per path≥ 3≤ 1.82.16
Single-touch conversions≤ 40%≥ 65%69%
Brand search (Share of Search)present≈ 0≈ 0
Attribution-model shift≥ 5%≈ 0%0.00%

Thresholds are an [ADW heuristic] — preliminary diagnostic values, to be calibrated; not a Google benchmark. Account metrics are [Case], ~18-month period.

What margin you need, and how much budget — the Margin Gate

Demand Gen budget should scale with contribution margin, because a thin margin cannot fund a channel measured in months, not clicks. [ADW heuristic]

Demand Gen budget by Margin Profile A/B/C

Image takeaway: below 25% margin, Demand Gen stays organic; at 30–40%, 12–18% of paid budget; at 50%+, 20–26%.

ProfileContribution marginBreak-even ROASDemand Gen budget
A< 25%5.0 – 6.7Organic only — not a paid line
B30 – 40%2.5 – 3.312 – 18% of paid budget
C50%+≈ 2.020 – 26% of paid budget

Where these numbers come from. These ranges are an [ADW heuristic] derived from contribution-margin economics and acceptable risk — not multi-account statistics and not an official Google recommendation. They are ADWService working guides for pre-experiment diagnosis. They are not an industry benchmark.

Can you even measure Demand Gen — the Measurement-Readiness Gate

Before launching Demand Gen, check whether you can measure its effect at all: is new-vs-returning tracking on, and is a holdout possible? [ADW heuristic] If not, setting up measurement is the first task — not launching the channel. Without measurement you will either miss the effect or credit Demand Gen with someone else’s result (back to the Misattribution Trap).

Decision tree

Decision tree: Margin Gate → Consideration Gate → Measurement-Readiness Gate

Margin below 25%?  (Margin Gate)
├─ Yes → don't run paid DG without a separate high-value segment
└─ No
   Path to purchase mostly single-touch?  (Consideration Gate)
   ├─ Yes → test the high-value segment (hypothesis)
   └─ No
      Holdout and measurement ready?  (Measurement Gate)
      ├─ No → set up measurement first
      └─ Yes → launch a limited test

What to do today (3 steps)

  1. Check your average path to purchase in GA4 (Advertising → Attribution → Path metrics): how many days and touchpoints before a conversion. Under ~1 day and mostly 1 touch — Demand Gen has nothing to warm.
  2. Calculate your contribution margin. Below ~25% — keep Demand Gen organic, not a paid line.
  3. Check whether you can measure Demand Gen: is new-vs-returning tracking on, and is a holdout possible? If not, that is the first task — not the launch.

Three “no”s (instant path + thin margin + no measurement) → a paid Demand Gen budget waits. Otherwise — a limited test with a holdout.

How to measure incrementality — the Holdout Rule

You cannot prove Demand Gen’s incremental value from spend logs or attribution reports — only a holdout experiment can. [ADW heuristic] Demand Gen’s job is to bring in new lookalike customers, so its contribution shows up as net-new buyers, not as a better ratio. [Google docs] The protocol: (1) turn on new-vs-returning tracking before you start; (2) run a geo-holdout or Google’s built-in conversion-lift test, keep the core Performance Max budget stable, and avoid seasonal peaks; (3) pass criterion — incremental new customers in the test vs the holdout, converted to revenue, must beat Demand Gen spend × break-even ROAS.

The exception — the High-Value Seed

The one place Demand Gen may still earn its budget on a low-consideration account is the high-value customer segment, which behaves differently from the cheap majority. [Hypothesis] In the case, orders above the high-value threshold were just 7% of orders by count but ~40% of revenue [Case], and they took a longer path to purchase than cheap orders (which converted almost instantly in a single touch).

High-value orders are only 7% of orders by count but ~40% of revenue — the segment a single all-orders lookalike ignores

Image takeaway: high-value orders = 7% of orders / 40% of revenue (18-month period). A single all-orders lookalike is weighted by the cheap majority, so it pulls cheap buyers.

Path length by order value: cheap orders convert almost instantly in one touch; high-value orders take longer and involve more touchpoints

Image takeaway: cheap orders convert in ~0.2 days in one touch; high-value orders take ~0.5 days with more touchpoints (last ~30 days of the path sample).

Demand Gen’s own AOV in the case was 22% lower than Shopping buyers’ AOV [Case] — confirming the cheap-skewed seed pulls cheap buyers. Hypothesis: segment the customer-match seed by order value and build a separate lookalike on high-value buyers, who have the longer path where an early touch can matter. [Hypothesis] The data confirms the premise and sizes the prize, but only a live test can prove the effect.

When Demand Gen could work, but you won’t see it

No statistically significant effect in an observational analysis does not prove zero effect — it means the effect could not be separated from noise and other changes in the available data. A Demand Gen effect could exist but stay invisible if:

  • the effect is too small relative to noise;
  • the test was too short;
  • the budget was too low to exit learning;
  • the seed was poor (like the cheap-skewed seed in the case);
  • the campaign optimized for the wrong event;
  • Performance Max changed in parallel;
  • brand demand was not measured;
  • the attribution window does not match the real consideration cycle.

That’s why the final call — “Demand Gen is not incremental” on a specific account — should rest on a holdout, not on observational logs alone.

FAQ

  • It depends on margin and consideration profile. For high-margin products with a real research-and-compare journey, it’s worth testing. For low-margin, impulse, single-touch purchases, it’s usually a waste. [ADW heuristic]

  • Demand Gen is upper-funnel; its conversions are mostly closed and credited elsewhere (PMax, Shopping, brand). A low direct ROAS is expected and, on its own, proves neither failure nor success. [Google docs]

  • In a forensic study of one low-consideration account, no — the effect was ≈0 after controlling for spend and seasonality. [Case] On a high-consideration account the answer may differ, and it must be proven with a holdout.

  • A holdout experiment (geo-split or conversion lift) plus new-vs-returning tracking. Attribution models can’t do it. [ADW heuristic]

  • ADW guide: below ~25% margin — organic; 30–40% — 12–18% of paid budget; 50%+ — 20–26%. This is a heuristic, not a Google benchmark. [ADW heuristic]

  • Only partly. On a single-touch account, data-driven and last-click agree almost exactly, so the model choice says nothing about Demand Gen’s assist value. [Case]

  • Under Cross-network in the default channel grouping (with Performance Max), not under Display. [Google docs]

  • Performance Max captures existing demand at the bottom of the funnel (it closes the sale); Demand Gen creates demand at the top via lookalike audiences. In GA4 both fall under Cross-network, but Demand Gen rarely “closes” a conversion — PMax does. So comparing them on direct ROAS is wrong. [Google docs]

  • Google reps usually cite ~$100/day and a few weeks of stable budget to exit the learning phase. An underfunded Demand Gen can show no effect simply from lack of budget — so check that the budget is adequate before concluding it doesn’t work. [Google docs]

  • A hypothesis from this research: segment the seed by order value, so the lookalike doesn’t skew toward cheap buyers. Needs a test to confirm. [Hypothesis]

     

About this research

The framework — Margin Profiles A/B/C, the Consideration-Profile Gate, the Measurement-Readiness Gate, the Misattribution Trap, and the Holdout Rule — was developed by ADWService, a Google Premier Partner PPC agency for e-commerce (Google Shopping and Performance Max), managing 300+ active client accounts across Ukraine, the USA, the UK, the EU, Australia, and Canada. The case findings come from a forensic, component-level analysis of an 18-month single-account dataset, deliberately stress-tested against the Misattribution Trap: every aggregate signal was decomposed to the campaign and product level before any conclusion was drawn.

Rate author
Adwservice
Add a comment

Yana Liashenko
Yana LiashenkoGoogle Ads AI Architect GoogleLogist
I build Google Ads systems for e-Commerce businesses, where every campaign is not just a set of settings, but part of an architecture that enables profitable scaling.
Sergey Shevchenko
Sergii ShevchenkoGoogle Logistician Google Logist
The "90 Days of Google Advertising" service package will help make your advertising campaign not only cost-effective but also increase sales from it.