“The case for the hybrid stack is not the deck. It’s the trailing-12-month numbers across the book, and the numbers are the only thing that matters.”
This is the closing post in the Hybrid Stack series. The prior nine posts argued the shape. Three black-hat archetypes that fail predictably. One stack-reference architecture. Four workflow deep-dives on catalog, ads, creative, and inventory.
This post is the evidence. Twelve months of amazon hybrid agency results across the active client book. Measured against the eight metrics we anchor on internally. Where a metric has a comparison number from the prior-agency state, we cite it. Where it does not, we cite the median and the distribution.
The brands are not named directly. Generic descriptors only. “A $5M housewares brand.” “A pet supplement client we onboarded in Q2.” That is the anonymization rule we hold to across the series.
The setup, what we measured, over what window
The window is the trailing 12 months ending Q1 2026. The book is the active client roster. Roughly 35 brands across Amazon, Walmart, and parallel channels.
Where the brand was onboarded mid-window, we measure from the onboarding date forward and annotate the partial window. The comparison baseline is the prior-agency state where the brand came to us from the offshore VA shop, the AI listing generator, or the fractional consultant archetypes documented earlier in the series.
What follows is eight metrics. In the order they tend to show up in a conversation with a brand operator about why our amazon hybrid agency results are different from what they have been getting.
Metric 1, TACoS delta
Median TACoS reduction on inbound brands at 90 days post-onboarding: 180 to 240 basis points. The distribution is wider at the tails.
Brands coming off the offshore VA + ChatGPT archetype tend to show the largest deltas (300+ basis points). The inherited account state is the most degraded. Brands coming off the AI-first consultant archetype show smaller deltas (80 to 140 basis points). Their ad accounts were less actively damaged. They were just untouched.
The mechanism is the search-term harvest we wrote about in the ads workflow post. The first 60 days on a new account are dominated by removing wasted search-term spend that has accumulated under the prior agency. The TACoS recovery from that work alone, before any new positive strategy lands, is usually 100 to 180 basis points. This is the first and most visible piece of the amazon hybrid agency results pattern.
Metric 2, ad-attributed revenue growth
Ad-attributed revenue growth at 12 months post-onboarding, median across the book: +38%. The distribution is left-skewed.
Most brands cluster in the +25% to +55% range. A tail of smaller gains (under +15%) on brands that were already well-managed before we onboarded them. A tail of larger gains (+80% or more) on brands that came off a black-hat archetype with significant unused-keyword inventory.
This is the metric the brand owner actually feels in their P&L. TACoS deltas are leading indicators. Ad-attributed revenue is the dollar number that compounds.
Metric 3, conversion rate lift
Conversion rate lift on rebuilt detail pages at 30 days post-launch: median +11%. With a 90-day tail to +18% as the listing fully indexes and the long-tail traffic stabilizes.
The lift is highest on brands where the prior A+ content was AI-generated by an earlier agency (median +19% at 30 days). Lower on brands where the prior A+ was already professionally produced (median +6%).
The creative workflow post goes through the eight-step rebuild process that produces these numbers. The compounding effect of the conversion lift against the ad-attributed revenue growth is the multiplier the brand sees in the quarterly P&L review. It is the second compounding piece of the amazon hybrid agency results stack.
Metric 4, top-10 organic ranking
Median Tier 1 keywords ranked in top 10 organic positions, 90 days post-listing-rebuild: 72% of the targeted set. At 180 days the number rises to 84%. The remaining 16% is the long-tail set the brand had not been targeting before. Ranking momentum builds over the following two quarters.
The mechanism here is the catalog workflow’s eight-step process. Cerebro keyword pull. SmartScout rank context. Rule-engine tier assignment. Human title and bullet copywriting. Parent-child architecture. AI-compiled back-end fields. Indexing verification. Remediation.
The compound result is that 12 of every 14 priority keywords are converting in top-10 organic positions by the six-month mark.
Metric 5, forecast accuracy (MAPE)
Mean Absolute Percentage Error on inventory forecasts, trailing 12 months across the book:
- Top-5 revenue SKUs: 11 to 15%
- Top-15 revenue SKUs: 14 to 19%
- Long-tail SKUs: 18 to 24%
These numbers are net of strategist PO-sizing adjustments. Meaning the forecast accuracy is measured against the unit demand actually realized. Not against the model’s pre-adjustment output.
The inventory workflow post walks through the eight-step process. The strategist’s PO-sizing adjustment is responsible for roughly 30% of the accuracy improvement over a pure-AI forecast baseline. The human seasonality annotation contributes the other 70%.
Metric 6, time-to-first-action on Buy Box loss, stockout, and suppression events
Median time from event detection to operator action shipped to Seller Central:
- Buy Box loss on a hero ASIN: under 90 minutes during business hours. Under 6 hours overnight.
- Stockout signal (weeks-of-cover crossing 4 on a top-5 SKU): under 4 hours. Replenishment PO drafted same business day.
- Suppression event on a top-15 ASIN: under 2 hours for triage. Median 11 hours for fix-shipped.
These are the metrics the brand operator notices most directly week-to-week. The ai-only Amazon agency model has no equivalent. There is no script watching for the event. No human routed when it fires. No operating-layer process to ship the fix.
Brands coming off the offshore VA archetype typically report median time-to-first-action measured in weeks, not hours, for the same event classes.
Metric 7, percentage of AI recommendations the human edits before shipping
This is the metric we anchor most heavily on internally. It is the proof that the AI layer is doing useful work without owning the decision. Trailing-12-month edit rates by workflow:
- Search-term clustering (ads workflow Step 3): ~30% edit rate
- Search Term Field + back-end attribute compile (catalog workflow Step 6): ~38% edit rate
- Campaign restructure proposals (ads workflow Step 5): ~55% edit rate
- A+ content first-draft module copy (creative workflow Step 3): ~80% edit rate
- Inventory forecast assumptions (inventory workflow Step 4): ~25% edit rate on the math itself. ~35% on the strategist-adjusted PO sizing.
The 80% creative edit rate is the highest and the most informative. Brand voice is the binding constraint on the work. The LLM cannot produce voice-aligned copy without significant human redirect. The 25% to 38% range across the other workflows is the band where AI is doing genuinely useful synthesis. High enough to prove it is producing material output. Low enough to prove the strategist is still owning the decision.
Metric 8, net-new converting search terms per account per month
Net-new converting search terms surfaced per account per month: 22 to 60. Depending on category and catalog size.
The number is highest on brands with broad catalogs in mature categories (housewares, CPG, pet). Lowest on brands with focused catalogs in narrow categories (oral care, specialty supplements).
The mechanism is the weekly human-driven search-term harvest from the Search Term Report combined with the SmartScout competitor search-term spy. AI clusters the harvested terms by intent. The harvest itself is human work. The LTV-versus-ACOS classification we wrote about cannot be made by the algorithm.
The headline case, the oral-care brand audit
The single case that anchors the amazon hybrid agency results narrative is the oral-care brand we audited and onboarded last year. The case study referenced in the frame post for the series.
The brand had been paying $695 a month to an ai-only Amazon agency. Their ad spend at the prior agency averaged $2,500 a month with ROAS in the 1.5x to 2.5x range. Within three months of onboarding to the hybrid stack:
- Ad spend dropped from $2,500 a month average to $1,750 a month average, with a $2,000 ceiling on heavy weeks
- ROAS moved from 1.5–2.5x to 4.3–6.8x. Roughly a two-to-three-times improvement.
- Sales increased 36.8% over the same window
- The increased ad efficiency more than paid for the higher agency fee within the same quarter
The case is not unusual. It is the median outcome for brands we onboard off the offshore VA + ChatGPT archetype. The 36.8% sales lift sits squarely inside the +25% to +55% ad-attributed-revenue band we cited above. The ROAS lift maps to the 180 to 240 basis-point TACoS reduction band. The whole pattern is the amazon hybrid agency results story compressed into one client.
What the numbers actually show
The numbers track the architecture. Every metric in this post traces back to a specific step in one of the four workflow posts. The search-term harvest from the ads workflow. The eight-step rebuild from the catalog workflow. The seasonality annotation from the inventory workflow. The voice fingerprint constraint from the creative workflow.
The amazon hybrid agency results are not magic. They are the predictable consequence of a workflow shape where AI sits downstream of human judgment, the rule engine catches drift before it compounds, and every recommendation passes through a strategist edit before reaching the brand.
The ai-only Amazon agency model produces the inverse results. TACoS that drifts up. Ad-attributed revenue that erodes. Conversion that decays. Ranking that softens. Forecasts that overcommit Q4. Time-to-first-action measured in weeks. Zero human edit on any AI output.
We have written deep dives on each of the three black-hat archetypes that produce these failure patterns. The offshore VA + ChatGPT shop. The AI listing-builder SaaS. The AI-first consultant who never logs into Seller Central.
The 10-post series closes here. The case for the hybrid stack is the trailing-12-month numbers across the active book. The architecture in the stack-reference post. The four workflow deep-dives that show how the architecture runs in practice.
If you are operating a brand and the patterns we have described feel familiar, the audit work is the same shape regardless of which archetype the prior agency was running. The fix is also the same shape. Replace the missing operating layer. Keep the AI downstream of the humans. Let the amazon hybrid agency results compound from there.
Reviewed by the Customer Service Team and the Ads & Marketing Team.
The Hybrid Stack, 10-post series. You are reading post 10 of 10.
Black-hat track, three archetypes of the ai-only Amazon agency:
- The three black-hat shapes of the 2026 Amazon agency
- Why offshore VA + ChatGPT shops are the most expensive cheap option
- Why AI listing-builder SaaS can’t get a 7-figure brand to actually index
- Why AI-first consultants who never log into Seller Central miss the work that moves money
White-hat track, the ClearSight hybrid stack:
- The ClearSight intelligence layer, stack reference
- Catalog AI/human workflow
- Ads AI/human workflow
- Creative AI/human workflow
- Inventory AI/human workflow
- 12 months of the hybrid stack, results recap (you are here)
← Previous: Inventory AI/human workflow
