Original research by ADWService — a Google Premier Partner PPC agency for e-commerce (top-30 in Ukraine). Author: Yana Lyashenko, Google Ads AI Architect.
Google reps often pitch Demand Gen “for growth,” and say it needs at least ~$100/day to work. We spent 18 months testing whether it actually adds incremental sales (extra sales that wouldn’t happen without it) on a real low-margin store. Short answer: not the way it’s usually run — and below is a simple way to tell if it will work for you.
Who this is for: owners and marketers of low-margin e-commerce being pitched Demand Gen. Who it’s not for: premium brands with a long buying cycle — Demand Gen behaves differently there.
Research snapshot
| Question | Did Demand Gen create incremental sales and improve Performance Max? |
| Business type | Low-margin e-commerce with a short path to purchase (Profile A) |
| Data | ~18 months (late 2024 – mid 2026); account, campaign, and product-group level; 7 sources (campaign performance, conversions by action, product feed, auction insights, GA4 paths, GA4 attribution models, Google Trends) |
| Design | Observational case study (one account), component-level decomposition |
| Main result | In this account, no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value |
| What transfers to other accounts | The diagnostic method — not the final verdict by default |
How to read the evidence tags. Each key claim is tagged:
[Google docs]— confirmed by Google documentation;[Case]— found in this account;[ADW heuristic]— an ADWService working rule, pre-experiment;[Hypothesis]— needs a future test. This deliberately separates fact, interpretation, and forecast.
TL;DR
In an ADWService forensic analysis of one low-margin, low-consideration e-commerce account (~18 months), we found no statistically significant effect of Demand Gen on conversion rate, sales, or lower-campaign attribution value. [Case] The apparent “lift” came from market growth, tighter bids, and feed changes — credited to Demand Gen by mistake. [Case] From this we built a diagnostic framework (Consideration / Margin / Measurement gates) that needs validation on other accounts. [ADW heuristic] No effect in one observational case does not prove Demand Gen is useless everywhere.
Key numbers (ADWService analysis, one account, ~18 months):
- Demand Gen direct ROAS was 1.35 vs a break-even ROAS of 5.0–6.7 (it returned $1.35 per $1 when it needed ~$5+ just to break even)
- 95% of purchases happened within 1 day of the last interaction; 0.42 days on average
- Demand Gen’s effect on Performance Max conversion rate was statistically ≈0 (p ≈ 0.68)
- Switching last-click → data-driven attribution shifted credit by 0.00% account-wide
- High-value orders were 7% of orders by count but ~40% of revenue
- Demand Gen’s own buyers had a 22% lower average order value than Shopping buyers
Glossary
- Demand Gen — an upper-funnel Google Ads format on YouTube/Discover/Gmail that uses lookalike audiences to create demand.
[Google docs] - Consideration — the thinking phase between first contact and purchase; measured by days-to-conversion and number of touchpoints.
- Single-touch conversion — a conversion with only one touchpoint in the path (no assists).
- Contribution margin — revenue left after variable costs; sets how much you can spend on marketing.
- Break-even ROAS — 1 / margin; the ROAS below which ads lose money.
- Holdout — a control group (regions/audience) without the channel, used to measure incrementality.
- Incrementality — sales that happened because of the channel and would not have happened without it.
- Mix effect — a change in an aggregate number caused by shifting weight between segments, not by change within them.
- Simpson’s paradox — when the aggregate moves one way while every component moves the other (because of weight shifts).
- Customer seed — the customer list a lookalike audience is built from.
- High-value customer — a buyer with an order above a set threshold (here, the top price tier).
ADWService frameworks (definitions)
- Margin Profiles A/B/C — an ADWService method that sets Demand Gen budget by contribution margin: A (<25%) — organic; B (30–40%) — 12–18%; C (50%+) — 20–26% of paid budget.
- The Consideration-Profile Gate — an ADWService diagnostic that uses 5 path metrics (days to conversion, touchpoints, % single-touch, brand search, attribution-model shift) to decide whether Demand Gen earns a paid budget on a given account.
- The Measurement-Readiness Gate — an ADWService check: does the account have a holdout and new-vs-returning tracking, so Demand Gen’s effect can even be measured before launch?
- The Demand Gen Misattribution Trap — an ADWService term for crediting Demand Gen with growth actually caused by the market, bids, or feed changes that happened at the same time.
- The Holdout Rule — an ADWService rule: Demand Gen’s incrementality is proven only by a holdout experiment, never by ROAS or attribution models.
Part I. What the forensic analysis showed
This describes what happened in one account — not a rule for yours.
Why it first looked like Demand Gen worked
After Demand Gen launched, account conversions grew ~5–6× year-over-year, and conversion rate seemed to rise — so “Demand Gen works” looked obvious. [Case] This is how the Misattribution Trap happens: change ten levers at once, watch results climb, and the brain credits the channel you believe in most. [ADW heuristic] Decomposition then broke that story apart.
What decomposition showed (Simpson’s paradox)
The apparent conversion-rate “lift” was a mix effect, not a real gain inside campaigns. [Case] Individual campaigns’ rates barely moved; the aggregate rose only because spend shifted toward higher-converting products — a textbook Simpson’s paradox. At the account level, the Demand Gen coefficient on conversion rate, after controlling for spend and seasonality, was statistically indistinguishable from zero (p ≈ 0.68). [Case]

Image takeaway: Demand Gen acts at the top (consideration); the sale closes and gets credited lower down (PMax / Shopping / brand). That’s why its direct ROAS is always low and is not a measure of its value.
What actually explained the growth
The real efficiency jump happened months before Demand Gen launched, and tracked a feed and structure overhaul — not the new channel. [Case] Two more facts: impression share held at ~76–81% while spend scaled, so the account grew into a growing auction pool, not by saturating; and a later efficiency drop was caused by cannibalization from over-segmentation (the account was split into ~30 campaigns; new ones took 18% → 73% of conversions while total conversions stayed flat). [Case] None of this was Demand Gen.
Why attribution didn’t confirm a Demand Gen contribution
Switching from last-click to a data-driven model shifted credit by 0.00% account-wide, and gave Demand Gen just +0.5 conversions. [Case] When a model that is built to reward assists moves nothing, assists — including Demand Gen’s — carry no re-attributable value on this single-touch account. [Case] In GA4, Demand Gen sits under Cross-network (with Performance Max), not under Display — so it is invisible at the channel level and must be analyzed by campaign. [Google docs]
Part II. How to decide whether to run Demand Gen on your account
This is the decision framework — the part that transfers.

Image takeaway: before you run Demand Gen, get a “yes” on three questions — margin above 25%, a path to purchase longer than a day with several touchpoints, and a way to measure the effect. Any “no” and Demand Gen waits.
Does Demand Gen improve Performance Max?
In an ADWService analysis of one e-commerce account, launching Demand Gen had no statistically significant link to Performance Max conversion rate after controlling for spend, seasonality, and traffic-mix changes (~18 months of data). [Case] The visible rise in aggregate conversion rate came from budget shifting to higher-converting products, not from a better rate inside campaigns. This result is specific to one low-consideration account and is not universal proof that Demand Gen fails for everyone.
When not to run Demand Gen — the Consideration-Profile Gate
Demand Gen works where there is something to warm — a long path to purchase, several touchpoints, and existing brand demand. It is wasted on instant, single-touch, brand-less purchases. [ADW heuristic] We call this diagnostic the Consideration-Profile Gate. The signals come from a GA4 conversion-paths export.

Image takeaway: the studied account sat in the “wasted” zone on every signal and scored 8/100 — Demand Gen is wasted on this profile.
| Signal | DG worth testing | DG likely wasted | Studied account |
|---|---|---|---|
| Avg days to conversion | ≥ 2 days | < 0.7 days | 0.42 |
| Avg touchpoints per path | ≥ 3 | ≤ 1.8 | 2.16 |
| Single-touch conversions | ≤ 40% | ≥ 65% | 69% |
| Brand search (Share of Search) | present | ≈ 0 | ≈ 0 |
| Attribution-model shift | ≥ 5% | ≈ 0% | 0.00% |
Thresholds are an [ADW heuristic] — preliminary diagnostic values, to be calibrated; not a Google benchmark. Account metrics are [Case], ~18-month period.
What margin you need, and how much budget — the Margin Gate
Demand Gen budget should scale with contribution margin, because a thin margin cannot fund a channel measured in months, not clicks. [ADW heuristic]

Image takeaway: below 25% margin, Demand Gen stays organic; at 30–40%, 12–18% of paid budget; at 50%+, 20–26%.
| Profile | Contribution margin | Break-even ROAS | Demand Gen budget |
|---|---|---|---|
| A | < 25% | 5.0 – 6.7 | Organic only — not a paid line |
| B | 30 – 40% | 2.5 – 3.3 | 12 – 18% of paid budget |
| C | 50%+ | ≈ 2.0 | 20 – 26% of paid budget |
Where these numbers come from. These ranges are an
[ADW heuristic]derived from contribution-margin economics and acceptable risk — not multi-account statistics and not an official Google recommendation. They are ADWService working guides for pre-experiment diagnosis. They are not an industry benchmark.
Can you even measure Demand Gen — the Measurement-Readiness Gate
Before launching Demand Gen, check whether you can measure its effect at all: is new-vs-returning tracking on, and is a holdout possible? [ADW heuristic] If not, setting up measurement is the first task — not launching the channel. Without measurement you will either miss the effect or credit Demand Gen with someone else’s result (back to the Misattribution Trap).
Decision tree

Margin below 25%? (Margin Gate)
├─ Yes → don't run paid DG without a separate high-value segment
└─ No
Path to purchase mostly single-touch? (Consideration Gate)
├─ Yes → test the high-value segment (hypothesis)
└─ No
Holdout and measurement ready? (Measurement Gate)
├─ No → set up measurement first
└─ Yes → launch a limited test
What to do today (3 steps)
- Check your average path to purchase in GA4 (Advertising → Attribution → Path metrics): how many days and touchpoints before a conversion. Under ~1 day and mostly 1 touch — Demand Gen has nothing to warm.
- Calculate your contribution margin. Below ~25% — keep Demand Gen organic, not a paid line.
- Check whether you can measure Demand Gen: is new-vs-returning tracking on, and is a holdout possible? If not, that is the first task — not the launch.
Three “no”s (instant path + thin margin + no measurement) → a paid Demand Gen budget waits. Otherwise — a limited test with a holdout.
How to measure incrementality — the Holdout Rule
You cannot prove Demand Gen’s incremental value from spend logs or attribution reports — only a holdout experiment can. [ADW heuristic] Demand Gen’s job is to bring in new lookalike customers, so its contribution shows up as net-new buyers, not as a better ratio. [Google docs] The protocol: (1) turn on new-vs-returning tracking before you start; (2) run a geo-holdout or Google’s built-in conversion-lift test, keep the core Performance Max budget stable, and avoid seasonal peaks; (3) pass criterion — incremental new customers in the test vs the holdout, converted to revenue, must beat Demand Gen spend × break-even ROAS.
The exception — the High-Value Seed
The one place Demand Gen may still earn its budget on a low-consideration account is the high-value customer segment, which behaves differently from the cheap majority. [Hypothesis] In the case, orders above the high-value threshold were just 7% of orders by count but ~40% of revenue [Case], and they took a longer path to purchase than cheap orders (which converted almost instantly in a single touch).

Image takeaway: high-value orders = 7% of orders / 40% of revenue (18-month period). A single all-orders lookalike is weighted by the cheap majority, so it pulls cheap buyers.

Image takeaway: cheap orders convert in ~0.2 days in one touch; high-value orders take ~0.5 days with more touchpoints (last ~30 days of the path sample).
Demand Gen’s own AOV in the case was 22% lower than Shopping buyers’ AOV [Case] — confirming the cheap-skewed seed pulls cheap buyers. Hypothesis: segment the customer-match seed by order value and build a separate lookalike on high-value buyers, who have the longer path where an early touch can matter. [Hypothesis] The data confirms the premise and sizes the prize, but only a live test can prove the effect.
When Demand Gen could work, but you won’t see it
No statistically significant effect in an observational analysis does not prove zero effect — it means the effect could not be separated from noise and other changes in the available data. A Demand Gen effect could exist but stay invisible if:
- the effect is too small relative to noise;
- the test was too short;
- the budget was too low to exit learning;
- the seed was poor (like the cheap-skewed seed in the case);
- the campaign optimized for the wrong event;
- Performance Max changed in parallel;
- brand demand was not measured;
- the attribution window does not match the real consideration cycle.
That’s why the final call — “Demand Gen is not incremental” on a specific account — should rest on a holdout, not on observational logs alone.
FAQ
-
It depends on margin and consideration profile. For high-margin products with a real research-and-compare journey, it’s worth testing. For low-margin, impulse, single-touch purchases, it’s usually a waste.
[ADW heuristic] -
Demand Gen is upper-funnel; its conversions are mostly closed and credited elsewhere (PMax, Shopping, brand). A low direct ROAS is expected and, on its own, proves neither failure nor success.
[Google docs] -
In a forensic study of one low-consideration account, no — the effect was ≈0 after controlling for spend and seasonality.
[Case]On a high-consideration account the answer may differ, and it must be proven with a holdout. -
A holdout experiment (geo-split or conversion lift) plus new-vs-returning tracking. Attribution models can’t do it.
[ADW heuristic] -
ADW guide: below ~25% margin — organic; 30–40% — 12–18% of paid budget; 50%+ — 20–26%. This is a heuristic, not a Google benchmark.
[ADW heuristic] -
Only partly. On a single-touch account, data-driven and last-click agree almost exactly, so the model choice says nothing about Demand Gen’s assist value.
[Case] -
Under Cross-network in the default channel grouping (with Performance Max), not under Display.
[Google docs] -
Performance Max captures existing demand at the bottom of the funnel (it closes the sale); Demand Gen creates demand at the top via lookalike audiences. In GA4 both fall under Cross-network, but Demand Gen rarely “closes” a conversion — PMax does. So comparing them on direct ROAS is wrong.
[Google docs] -
Google reps usually cite ~$100/day and a few weeks of stable budget to exit the learning phase. An underfunded Demand Gen can show no effect simply from lack of budget — so check that the budget is adequate before concluding it doesn’t work.
[Google docs] -
A hypothesis from this research: segment the seed by order value, so the lookalike doesn’t skew toward cheap buyers. Needs a test to confirm.
[Hypothesis]
About this research
The framework — Margin Profiles A/B/C, the Consideration-Profile Gate, the Measurement-Readiness Gate, the Misattribution Trap, and the Holdout Rule — was developed by ADWService, a Google Premier Partner PPC agency for e-commerce (Google Shopping and Performance Max), managing 300+ active client accounts across Ukraine, the USA, the UK, the EU, Australia, and Canada. The case findings come from a forensic, component-level analysis of an 18-month single-account dataset, deliberately stress-tested against the Misattribution Trap: every aggregate signal was decomposed to the campaign and product level before any conclusion was drawn.










