• Explore. Learn. Thrive. Fastlane Media Network

  • ecommerceFastlane
  • PODFastlane
  • SEOfastlane
  • AdvisorFastlane
  • TheFastlaneInsider

How To Measure Email & SMS Incrementality With Klaviyo

how-to-measure-email-&-sms-incrementality-with-klaviyo
How To Measure Email & SMS Incrementality With Klaviyo

Incrementality is one of the most misunderstood concepts in ecommerce marketing.

Nearly every brand using Klaviyo can tell you how much revenue email or SMS “drove” last month. Far fewer can tell you how much of that revenue would not have happened if those messages never existed.

That distinction matters. Especially as lifecycle programs mature, budgets come under scrutiny, and leadership starts asking harder questions about ROI.

This article walks through how to measure email and SMS incrementality with Klaviyo in a way that’s practical, defensible, and grounded in real-world execution — not theory.

There’s no magic button for reporting of this nature out of the box. But, there is a repeatable process that gets you close enough to make confident decisions.

What “Incrementality” Actually Means in Ecommerce (And Why Attribution Isn’t Enough)

Incrementality answers a simple question:

What happened because this message existed — that would not have happened otherwise?

Attribution answers a different one:

What message did we give credit to when a purchase occurred?

Most ecommerce teams conflate the two.

Last-click attribution will almost always overstate the impact of email and SMS, especially for brands with strong brand demand, repeat customers, and short consideration cycles. In those cases, email is often capturing demand rather than creating it.

That doesn’t make email or SMS unimportant. It just means attribution alone can’t tell you how much incremental value those channels are actually adding.

Incrementality isn’t about discrediting lifecycle marketing. It’s about understanding its true contribution so you can make better decisions around cadence, investment, and risk.

What Klaviyo Can — and Can’t — Tell You About Incrementality

Klaviyo is very good at what it’s designed to do: event tracking, segmentation, automation logic, and attributed revenue reporting.

What it does not provide is a native incrementality model. There is no built-in “incremental revenue” metric.

Incrementality requires comparing exposed and unexposed audiences under the same conditions. Klaviyo gives you the tools to design those comparisons, but the analysis and judgment live outside the platform.

Think of Klaviyo as the execution layer. Incrementality comes from how you use it.

Before You Start: Prerequisites for Meaningful Incrementality Testing

Incrementality testing fails most often because teams skip this step.

First, be explicit about scope. Are you measuring email only, SMS only, or both together? Are you testing campaigns, flows, or a specific program? Trying to measure “all email” at once almost always leads to confusion.

Second, confirm your tracking fundamentals. Your primary conversion event should be consistent (usually Placed Order). Your Shopify integration should be healthy. You should not be mixing multiple revenue definitions across tools.

Third, choose metrics that don’t lie to you. Raw revenue totals are misleading because scale skews everything. Revenue per profile (or per recipient) is usually the most honest primary metric, with conversion rate and repeat purchase rate as supporting context.

Finally, volume matters. If your eligible audience is tiny, expect noise. That doesn’t mean you can’t learn anything, but it does mean you should be cautious about precision.

Incrementality Requires Controlled Absence

At its core, incrementality requires one thing: a control group.

You need to compare people who were exposed to messaging against people who were not, at the same time, under the same conditions.

If you are not doing that, you are not measuring incrementality. You are looking at correlation.

With Klaviyo, there are two legitimate ways to create that control. Both are valid. They just answer slightly different questions.

Option 1: True Holdout Groups (The Cleanest Approach)

Holdout groups are the most straightforward way to measure incrementality.

You intentionally do not message a small, randomized subset of people who would otherwise be eligible, then compare their behavior to the exposed group over the same period of time. The difference between those two groups is your incremental signal.

This approach works best when you have sufficient volume and can tolerate a controlled reduction in messaging.

You start by choosing one program to test. That might be a high-impact flow like Abandoned Checkout, or a recurring campaign cadence. Do not test multiple programs at once. You want one clear variable.

Next, define eligibility clearly. This should mirror who would normally receive the program. For example: profiles who would enter a specific flow, or engaged subscribers from the last 60 days.

From that eligible audience, you create a random split — often 90/10 or 80/20 depending on risk tolerance. Smaller holdouts minimize revenue risk but introduce more noise. Larger holdouts produce cleaner data, but require more confidence in the test.

How you implement the holdout depends on what you’re testing.

If you’re testing a flow, you add a conditional split as the first step after the trigger. Profiles flagged as part of the holdout exit the flow immediately. Everyone else continues normally.

If you’re testing campaigns, you use the same holdout group and explicitly exclude it from every campaign send during the test window.

The test needs to run long enough to capture real behavior. Two to four weeks is usually the minimum, and longer for lower-frequency purchase categories.

Once the test period ends, you compare revenue per profile between the exposed and holdout groups. The difference between those two numbers, multiplied by the size of the exposed audience, gives you an estimate of incremental revenue.

That number will never be perfect. That’s fine. The goal isn’t mathematical purity — it’s decision confidence.

Holdout testing fails when teams end tests early, change offers mid-stream, allow holdout profiles to receive other messaging, or over-interpret short-term spikes.

How to implement a holdout in Klaviyo (explicit walkthrough)

Start by creating a holdout segment in Klaviyo that represents a random subset of the audience that would normally qualify for the program you’re testing.

Practically, this means building your standard eligibility segment first, then layering in a random sample condition (for example, 10% of profiles). Once created, treat this holdout segment as fixed for the duration of the test.

Next, open the flow you’re testing and insert a conditional split immediately after the trigger.

The logic should be simple:

  • If the profile is in the Holdout Segment, end the flow.
  • Otherwise, allow the profile to continue as normal.

This ensures holdout profiles experience the same conditions as exposed profiles, with one deliberate exception: they do not receive the messages being tested.

For campaign-based testing, the setup is even simpler. Use the same holdout segment and explicitly exclude it from every campaign send during the test window.

Once the test begins, do not change the segment definition, message cadence, or offers until the test ends. Any changes mid-stream compromise the comparison and invalidate the result.

How to actually measure a holdout test using Klaviyo + Shopify reporting

Once your test window ends, measurement happens in two places:

  • Klaviyo (to define the groups)
  • Shopify (to validate revenue behavior)

You do not need fancy tooling to do this correctly.

Klaviyo defines the holdout and exposed groups. Shopify is the source of truth for revenue. To measure results, we export both groups from Klaviyo, export Shopify orders for the same test window, join by email, and sum total Shopify revenue for each group.

At no point in this process are we using Klaviyo attribution or channel crediting — we are only using Shopify order revenue grouped by holdout vs exposed profiles.

Step 1: Export the two groups from Klaviyo

  • Segment 1: Exposed
  • Segment 2: Holdout

Export a CSV with at least email address or any other identifier you want

Step 2: Export Shopify orders for the test window

From Shopify Admin:

  • Go to Orders
  • Filter by Order date = your test window
  • Export orders (CSV) – Make sure the export includes:
    • Customer email
    • Order ID
    • Order created at
    • Total sales (or total price / net sales depending on how you want to define revenue)
    • Refunds (if you want net)

Step 3: Join in a spreadsheet

In Google Sheets / Excel:

  • Map each Shopify order to either “Exposed” or “Holdout” based on customer email
  • Sum revenue for each group
  • Count orders and unique customers for each group

Step 4: Calculate your incrementality number

Now you’re using Shopify dollars only.

  • Revenue per profile = Shopify revenue in group ÷ # of profiles in group
  • Incremental lift per profile = Exposed RPP − Holdout RPP
  • Estimated incremental revenue = Lift per profile × # of exposed profiles

Important gotchas (don’t skip)

  • If you have a lot of guest checkout where email is missing or inconsistent, your join will drop rows.
  • If you use Shop Pay / Apple Hide My Email, email can vary (rare, but it happens).
  • Decide up front if you’re using gross sales, total sales, or net of refunds.

This method is simple and defensible because it uses Shopify orders as truth.

Option 2: Program-Level Suppression

Suppression testing follows the same principle as holdouts. You are still comparing exposed versus unexposed audiences at the same time.

The difference is scope.

Instead of suppressing all messaging to a group, you suppress a specific program while keeping everything else constant. This allows you to answer more targeted questions.

For example: Are campaigns incremental on top of flows? Is SMS incremental on top of email? Is this particular flow still doing anything incremental, or is it just intercepting demand created elsewhere?

To run a suppression test, start by choosing exactly what you want to suppress. Be specific. Vague suppression leads to vague answers.

Next, define a consistent, engaged audience — usually subscribers active in the last 60 to 90 days. Split that audience into a primary group and a suppression group.

Enforce suppression cleanly. Campaign suppression means explicitly excluding the suppression segment from sends. Flow suppression usually means adding a conditional exit based on segment membership. Be careful to prevent one-off or ad hoc sends from bypassing your rules.

Run the test long enough to observe behavior, then compare revenue per profile and conversion rates between the suppressed and non-suppressed groups.

Suppression testing is especially useful for identifying cannibalization. If suppressing a program barely impacts revenue, it may be capturing demand rather than creating it. That doesn’t mean the program is useless. It means its role may need to change.

What matters is that suppression is treated as a controlled test, not a blunt instrument. When scoped correctly, it is just as legitimate as a holdout.

What Not to Do: Time-Based Comparisons Are Not Incrementality

Comparing performance before and after a change is not incrementality. It’s correlation.

Time-based comparisons are influenced by seasonality, promotions, paid traffic shifts, inventory changes, press, and countless other variables that have nothing to do with email or SMS.

These comparisons can be useful for trend analysis. They are not valid for incrementality claims.

If you are not comparing exposed and unexposed groups at the same time, you are not measuring incremental lift.

If You Can’t Run a Control, Don’t Pretend You Did

Sometimes running a control isn’t feasible. Volume is too low. Risk tolerance is too tight. Internal buy-in isn’t there.

That’s okay.

What’s not okay is presenting directional analysis as incrementality.

It’s better to say “we don’t know yet” than to anchor decisions to a number that looks precise but isn’t. Decision-making should reflect confidence level. Incrementality is a tool, not a talking point.

How Incrementality Is Actually Used in Practice

Incrementality isn’t about proving email or SMS “works.” It’s about understanding how much they contribute so you can decide how aggressive to be.

High incremental lift often justifies increased investment, tighter segmentation, and more automation coverage. Modest lift may suggest dialing back volume and focusing on relevance. Near-zero lift is usually a signal to fix fundamentals before scaling.

The number itself matters less than what you do with it.

Common Ways Incrementality Testing Goes Wrong

Most failures come from running tests too short, changing variables mid-test, including unengaged subscribers, ignoring deliverability, or treating attribution dashboards as incrementality.

The mistake isn’t testing. It’s testing without discipline.

When Incrementality Testing Is a Waste of Time

Incrementality testing is rarely useful when volume is extremely low, data foundations are broken, or teams are unwilling to act on results.

FAQs

How long should an incrementality test run?
Usually at least two to four weeks, longer for low-frequency purchase cycles.

What size holdout should I use?
Ten to twenty percent is common, depending on risk tolerance.

Can I measure SMS incrementality separately from email?
Yes. Suppression testing works particularly well for this.

What metric matters most?
Revenue per profile is usually the most honest starting point.

Final Thoughts

Incrementality isn’t about winning an argument or defending a budget line. It’s about measuring email and SMS incrementality well enough to make better decisions with less guesswork.

It’s about making better decisions with less guesswork.

Klaviyo gives you everything you need to do that — as long as you’re willing to be disciplined about how you use it.

This article originally appeared on Zettler Digital and is available here for further discovery.
Shopify Growth Strategies for DTC Brands | Steve Hutt | Former Shopify Merchant Success Manager | 445+ Podcast Episodes | 50K Monthly Downloads