Frequently Asked Questions

Everything you need to know about testing your email content with Liftstack, explained in plain language.

Jump to section

Getting Started

What is Liftstack?

Liftstack is an A/B testing platform for email marketers. It lets you test different versions of email content against each other, then uses statistical analysis to tell you which version actually performs best and by how much. It works with Klaviyo, Customer.io, and Iterable.

What can I test?

You can test any piece of email content that you'd swap between recipients. In Liftstack, these are called "snippets." The most powerful feature is the ability to test custom HTML code blocks, which lets you experiment with virtually any element of your emails:

Custom HTML blocks. This is where Liftstack really shines. You can test entire sections of email markup: different layouts, content structures, visual treatments, or any HTML that your ESP supports. Examples include product recommendation grids, social proof sections, header layouts, footer designs, countdown timers, loyalty callouts, trust badges, dynamic content cards, shipping/returns callout blocks, and cross-sell/upsell module formats.
Subject lines. "Don't miss out!" vs "Your exclusive offer inside"
Hero blocks. Different images or headline/subheadline combinations
CTAs. "Shop Now" vs "Browse the Collection" vs "Claim Your Discount"
Copy blocks. Different tone, length, or messaging strategy
Discount framing. "20% off" vs "Save £10" vs no discount

Because Liftstack works at the HTML snippet level, you are not limited to testing simple text swaps. Any section of your email that you can express as an HTML block can become a testable snippet with multiple variants.

What is a "variant"?

A variant is one version of a snippet. If you're testing three different subject lines, each subject line is a variant. You need at least two variants to run a test.

Can I create a variant with blank content?

It depends on the snippet type:

Subject lines: content is always required. ESPs reject blank subject lines, and a blank subject line would corrupt your test results.
Copy and HTML blocks: content is always required. A blank variant would produce inflated uplift numbers for competing variants (since no one can click or convert on empty content) and poison Thompson Sampling posteriors for future campaigns.
Image snippets: text content (alt text) is optional, but you must provide either an uploaded image or an image URL.

If you want to test "no content" vs "some content" for a slot, use a minimal placeholder (e.g., a single space or a neutral message) as your control variant instead.

What is a "control" variant?

The control is the version you'd send if you weren't testing. It represents your current standard or "safe" option. Marking a variant as the control lets Liftstack measure uplift: how much better the winning variant performed compared to what you would have done anyway.

You don't have to designate a control, but it's highly recommended. Without one, Liftstack can still find a winner, but the uplift numbers will be less precise.

What is a "slot"?

A slot is a position in your campaign where a snippet is being tested. If you're testing both a subject line and a hero image in the same campaign, that's two slots. Each slot is analysed independently, so you'll get separate results for each.

How does Liftstack assign variants to recipients?

Before your campaign sends, Liftstack randomly assigns each recipient a variant for each slot. These assignments are written to your CRM profiles as a property called lf_assignments. Your email template then uses conditional logic to show each person the content they were assigned.

This is important: the assignment happens before anyone sees anything. This is what makes it a proper experiment, because we know who was shown what before we see the results.

Why Liftstack?

My ESP already has A/B testing built in. Why would I pay for this?

Native ESP testing and Liftstack solve different problems.

What native ESP A/B testing does:

Splits your audience into two groups and sends each group a completely different email (or subject line)
Picks a winner based on opens or clicks over a short window (typically 1 to 4 hours)
Sends the winning version to the remaining audience

What Liftstack does differently:

Tests individual content blocks inside a single email, not whole emails against each other. You can test just the hero image, just the CTA, or just the product grid layout while keeping everything else identical.
Runs multiple tests simultaneously in the same campaign. Test a subject line AND a hero block AND a CTA in one send, with independent results for each slot.
Uses Bayesian statistics that let you check results at any time without inflating error rates.
Carries learning across campaigns. Smart Allocation uses historical performance to send more traffic to better-performing variants automatically.
Provides revenue attribution, not just click counting.
Detects guardrail violations like unsubscribe spikes, bounce rate increases, and spam complaints.
Works across ESPs. If you use Klaviyo for lifecycle and Customer.io for transactional, your testing insights live in one place.

Can Liftstack do things my ESP cannot?

Yes. The core capability gap is in-template content testing. Native ESP tools treat the email as a single unit: you either send Email A or Email B. Liftstack injects conditional logic into your template so that different recipients see different content blocks within the same email.

Other things Liftstack does that native tools typically don't:

Multi-slot testing in a single send (subject line + hero + CTA, analysed independently)
Bayesian analysis with continuous monitoring (no fixed test duration needed)
Automatic bot filtering so inflated opens and security-scanner clicks don't corrupt your results
Revenue-per-exposure modelling that captures both conversion probability and order value
Cross-campaign learning via Thompson Sampling and content insights
Safety guardrails (unsubscribe, bounce, complaint) that block winners which damage list health or sender reputation

How Testing Works

How long does a test take?

It depends on your audience size and how different the variants are. As a rough guide:

Large audiences (50,000+) with meaningful content differences: often conclusive within a few days
Medium audiences (5,000 to 50,000): typically 3 to 7 days
Small audiences (under 5,000): may take multiple campaign sends

Liftstack will show you a progress estimate when your test is still collecting data.

Can I check results while the test is running?

Yes. The campaign report updates in real-time while your campaign is in tracking mode. You'll see live charts, preliminary numbers, and a confidence progression chart showing how close the test is to reaching a conclusion.

However, during the early data collection period, results will be labelled as preliminary. Liftstack enforces a minimum data threshold before declaring any verdict, which prevents premature conclusions from small, noisy samples.

What's the minimum audience size?

There's no hard minimum, but smaller audiences need larger differences between variants to reach a conclusion. As a planning guide:

Baseline conversion rate	Min. difference to detect	Audience per variant
1%	0.5 percentage points	~6,300
2%	1.0 percentage point	~3,100
3%	1.0 percentage point	~4,700
5%	2.0 percentage points	~1,900

If your audience is too small to detect realistic differences, Liftstack will tell you the test needs more data rather than making a premature call.

When you set up a campaign, Liftstack automatically shows a sample size guidance card after your audience is synced. This tells you whether your audience is large enough for the number of variants you're testing, based on a 3% baseline conversion rate and a 0.5 percentage point minimum detectable effect. If your audience is insufficient, you'll see a warning with specific guidance.

What is a "primary metric"?

The primary metric is the single measure you're optimising for. You choose it when setting up your campaign, and it cannot be changed once the campaign starts sending. This is deliberate: it prevents cherry-picking whichever metric happens to look best after the fact.

Your options are:

Conversion rate (default): what percentage of recipients took the desired action (purchase, sign-up, etc.)
Click rate: what percentage of recipients clicked a link in the email
Open rate: what percentage of recipients opened the email
Revenue per exposure: average revenue generated per recipient

All other metrics are still tracked and shown in your report as secondary/diagnostic metrics, but only the primary metric determines the winner.

Why can't I change the primary metric after sending?

This is a critical safeguard called pre-registration. If you could change the metric after seeing results, you might (even unconsciously) switch to whichever metric makes a particular variant look best. This would inflate your false positive rate, causing you to "find" winners that aren't real winners. Pre-registering the metric keeps the test honest.

What is the attribution window?

The attribution window is the time period after your campaign sends during which engagement events (clicks, conversions, purchases) are credited to the test. The default is 7 days.

A click that happens 3 days after the send counts. A purchase 10 days later does not (by default). This prevents distant events, which are influenced by many other factors, from muddying your test results.

If a significant number of conversions are arriving after your window closes, Liftstack will suggest extending it for future campaigns.

Integration & Setup

How does Liftstack connect to my ESP?

Liftstack connects via your ESP's API using credentials that you provide. The setup process is:

Go to Integrations in Liftstack and select your platform (Klaviyo, Customer.io, or Iterable)
Enter your credentials. What's required depends on the platform:
- Klaviyo: a private API key
- Customer.io: a Site ID, a Tracking API key, and an App API key
- Iterable: a standard API key
Liftstack validates the connection and confirms access

Your credentials are encrypted at rest using Fernet symmetric encryption. Liftstack never stores them in plain text, and they are only decrypted when making API calls on your behalf.

No developer is required. If you can find your API credentials in your ESP's settings, you can complete setup in under five minutes.

What API permissions does Liftstack need?

Liftstack needs permission to:

Read segments/lists (to sync your audience)
Read and write profile properties (to write lf_assignments for variant targeting)
Create and update templates (to push the conditional template logic)
Read engagement events (clicks, opens, conversions) for attribution

For Klaviyo, this means a private API key with full read/write scope. For Customer.io, an App API key with tracking and API access. For Iterable, a standard API key.

Does writing assignments burn through my ESP's API limits?

Liftstack uses batch endpoints wherever available and includes built-in rate limiting that respects each platform's published limits. For a 500,000-person audience:

Klaviyo: uses bulk profile import endpoints; typically completes in 10 to 20 minutes
Customer.io: uses individual profile identify calls (Customer.io does not offer a bulk endpoint); typically completes in 15 to 30 minutes for large audiences
Iterable: uses bulk user update endpoints; typically completes in 10 to 20 minutes

These API calls count toward your ESP's rate limits, but the built-in throttling means Liftstack won't spike your usage or trigger overage charges.

How long do I need to wait between assigning and sending?

The campaign wizard handles this in sequence: it syncs the audience, runs assignment, writes properties to profiles, and pushes the template. You'll see a progress indicator for each step. Once all steps show complete, you can send immediately. There is no additional waiting period.

For large audiences (100,000+), the profile writeback step is the longest part and can take 15 to 30 minutes.

What happens if the API fails halfway through assigning?

Liftstack writes profile properties in batches with automatic retry. If a batch fails (network timeout, API error), the system retries with exponential backoff. If it hits a 429 (rate limit) response, it reads the Retry-After header and waits before continuing.

If some batches fail despite retries, the progress indicator will report how many profiles succeeded and how many failed. You can re-trigger the writeback step from the campaign wizard, and since the operation is idempotent (writing the same property value twice is harmless), it will safely re-process all profiles from the beginning.

Does Liftstack slow down my campaign sending?

No. Liftstack's work happens before you send. The variant assignments are written to CRM profiles as a property, and the conditional template is pushed to your ESP. When you actually hit send in your ESP, the email renders using the pre-written profile property. There is zero additional latency at send time.

Can I connect multiple ESPs to the same workspace?

Yes. Each plan tier allows a set number of platform connections (Starter: 1, Growth: 2, Scale: 3). You might connect Klaviyo for your lifecycle campaigns and Customer.io for transactional, and run tests on both from the same workspace with shared snippet libraries.

Understanding Your Results

What does "X% probability of being best" mean?

This is the single most important number in your report. It answers: "What is the probability that this variant truly has the highest conversion rate?"

For example, "93% probability of being best" means: given all the data we've collected, there's a 93% chance this variant genuinely outperforms all the others. There's a 7% chance one of the other variants is actually better and this one just got lucky in this particular test.

Where is the p-value?

Liftstack uses Bayesian statistics instead of the traditional frequentist approach you might be familiar with from other tools. This means you won't see p-values, and that's a good thing.

P-values answer a confusing question: "If there were NO real difference between variants, what's the probability of seeing data this extreme?" That's hard to interpret and easy to misuse.

Probability of being best answers a direct question: "Given the data I have, what's the probability this variant is actually the best?" That's what you really want to know.

Think of it this way:

A p-value of 0.03 does NOT mean "there's a 97% chance variant A is better." (This is the most common misinterpretation of p-values.)
A "probability of being best" of 97% DOES mean "there's a 97% chance variant A is better." It's exactly what it says.

What about confidence intervals? I'm used to seeing those.

Liftstack shows credible intervals (displayed as "range" in the report), which look similar to confidence intervals but are easier to interpret:

A traditional 95% confidence interval means: "If we repeated this experiment many times, 95% of the resulting intervals would contain the true value." (Confusing, right?)
A 95% credible interval means: "There's a 95% probability the true value falls within this range." (Much more intuitive.)

You'll see these ranges throughout the report: for conversion rates, uplift estimates, and revenue figures. A narrow range means we're quite certain; a wide range means there's still meaningful uncertainty.

What does "expected loss" mean?

Expected loss answers: "If I pick this variant and it turns out not to be the best, how much conversion rate am I leaving on the table?"

For example, an expected loss of 0.05% means: if you go with this variant and it's not actually the winner, you'd lose about 0.05 percentage points of conversion rate on average. That's tiny, well within the "not worth worrying about" range.

Liftstack uses expected loss as part of its decision criteria. A variant isn't declared a winner just because it's probably best. It also needs to have a very low expected loss, ensuring that even in the unlikely scenario it's wrong, the cost is negligible.

What does "practical equivalence" mean?

Sometimes variants are so close in performance that the difference doesn't matter in practice. If variant A converts at 3.02% and variant B converts at 3.05%, that 0.03 percentage point difference is real but meaningless for your business.

Liftstack checks whether variants fall within a Region of Practical Equivalence (ROPE): a range around zero (default: 0.5 percentage points) where differences are too small to care about. If all variants fall within this range with high probability, the verdict is EQUIVALENT, and you're told to pick whichever version you prefer. There's no statistical reason to favour one over another.

Reading the Campaign Report

What is the verdict card?

The verdict card is the hero element at the top of each slot's results. It gives you the bottom line in plain language. There are four possible verdicts:

Winner (green, trophy icon). A clear winner has been identified, showing conversion rates compared, uplift, confidence level, and revenue range.
Equivalent (grey, equals icon). All variants performed within a negligible range of each other. Pick whichever fits your brand best.
Insufficient Data (amber, hourglass icon). No conclusion yet. One variant is leading but not decisively. Shows an estimate of how many more exposures are needed.
Guardrail Violation (red, warning icon). A variant triggered a safety guardrail, typically because it caused a meaningful increase in unsubscribe rates compared to the control.

What are the confidence levels?

Probability of Being Best	Confidence Level	What It Means
95% or higher	Very High	Extremely likely this is the true best variant. Declare a winner.
85% to 95%	High	Very probably the best, but a small chance you're wrong.
70% to 85%	Moderate	Leading, but there's meaningful uncertainty. Likely needs more data.
Below 70%	Low	Too early to tell. Keep testing.

What is the uplift callout?

The uplift callout is the key value statement of your test. It answers: "How much more did I get by using the winning variant instead of the control?"

It shows two numbers:

Additional conversions: how many extra people converted because of the winning content
Additional revenue: the estimated revenue those extra conversions generated

These numbers come with a range (e.g., "+£8,200 to +£16,800") so you know the realistic best and worst case.

What is the metrics table?

Below each slot's charts, there's an expandable metrics table showing the raw numbers for every variant. This includes exposures, opens, open rate, clicks, CTR, conversions, conversion rate, unsubscribes, bounces, complaints, revenue, and revenue per exposure.

This table is collapsed by default because the verdict card, charts, and uplift callout already tell you everything you need to make a decision.

Understanding the Charts

What is the Variant Comparison Chart (Raincloud Plot)?

A visual comparison of all variants' estimated true conversion rates, shown in the campaign report below the verdict card. Each variant gets a horizontal row with three visual layers:

The cloud (top half). A smooth density curve showing the range of likely conversion rates. Where the curve is tall, that rate is more likely. A tight, narrow cloud means more certainty.
The line and dot (middle). A horizontal line showing the 95% credible interval, with a dot at the estimated conversion rate.
The rain (bottom half). A scatter of small dots representing possible conversion rates drawn from the statistical model.

If the leading variant's cloud is clearly separated from the others (no overlap), it's a strong winner. If clouds overlap substantially, you may need more data.

What is the Chance of Winning chart?

A horizontal bar chart showing each variant's probability of being the best performer. A vertical dashed line marks the decision threshold (default: 90%). A variant needs to cross this line to be declared a winner.

The percentages always add up to 100% across all variants. If one bar dominates and crosses the threshold, you have a clear winner. If bars are close, more data is needed.

What is the Expected Improvement chart?

A density plot of the difference between the winning variant and the control, shown only when a winner has been declared. The area to the right of zero (shaded green) represents scenarios where the winner truly is better. The area to the left (shaded amber) represents scenarios where it's actually worse (unlikely, but possible).

The annotation below the chart (e.g., "92.4% chance of real improvement") tells you exactly how much of the curve is on the positive side.

What is the Confidence Progression chart?

A line chart tracking how the leading variant's probability of being best has evolved over time since the campaign was sent. A horizontal dashed line marks the decision threshold (default: 90%).

Watch for the leading variant's line climbing toward the threshold. A line that's climbing steadily suggests the test is heading toward a conclusion. A line that's flat or bouncing suggests the variants are very close. During live tracking, this chart auto-refreshes every 60 seconds.

What is the Cumulative Revenue Uplift chart?

Shown on the analytics dashboard, this is a running total of the additional revenue generated by all your winning variants across all campaigns over time. A shaded band around the line shows the confidence range.

This line should only go up (each new winner adds to the total). This is the single best chart for demonstrating ROI from your testing programme.

What is the Conversion Rate Sparkline?

Found on the snippet performance page, this small line chart shows how a specific variant's conversion rate has changed across every campaign it's appeared in. A flat line means consistent performance. An upward trend might indicate a primacy effect. A downward trend might indicate a novelty effect.

Verdicts & Decisions

How does Liftstack decide on a winner?

A variant is declared the winner when both of these conditions are met:

Probability of being best is at least 90% (configurable). We're highly confident this variant truly has the highest conversion rate.
Expected loss is at most 0.1% (configurable). Even if we're wrong, the cost of choosing this variant over the true best is negligible.

Both conditions must hold simultaneously. A variant with 92% probability of being best but an expected loss of 0.3% won't be declared a winner yet because the potential downside is still too large.

How does Liftstack decide variants are equivalent?

Variants are declared equivalent when Liftstack is highly confident (90%+ probability) that the difference between all variants falls within 0.5 percentage points (configurable). At that point, the differences are real but too small to matter for your business.

When there are many variants (4+), Liftstack can detect partial equivalence. For example: "Variant A is the clear winner. Among the remaining variants, B, C, and D are practically equivalent to each other."

What is a guardrail violation?

Guardrail metrics are safety checks that protect your audience. Even if a variant has a great conversion rate, it won't be declared a winner if it's damaging other important metrics:

Unsubscribe rate. If the variant causes unsubscribes to increase by more than 0.1 percentage points vs the control.
Spam complaint rate. If complaints increase by more than 0.05 percentage points vs the control.
Bounce rate. If the variant causes bounces to increase by more than 0.5 percentage points vs the control.

A variant that drives clicks but burns your subscriber list is destroying long-term value. The guardrail catches this and warns you.

What does "insufficient data" mean?

This means no conclusion can be reached yet. One variant is probably leading, but there isn't enough data to be confident. Common reasons:

The audience is small
The variants perform very similarly (requiring more data to distinguish them)
The campaign is still early in its tracking period

The report will show an estimate of how many more recipients need to be exposed before a conclusion can be reached.

Can I override the verdict?

The verdict is the system's statistical recommendation. You're free to take a different action, such as continuing to test a variant even after it's been declared equivalent, or choosing a variant other than the winner based on brand considerations.

What you can't do is change the primary metric after seeing results, or retroactively adjust the analysis to favour a particular outcome. These safeguards keep the testing process honest.

Metrics & What They Mean

What are the primary metrics?

Metric	What It Measures	Best For
Conversion rate	Percentage of recipients who completed the desired action	Most campaigns (the default)
Click rate	Percentage of recipients who clicked any link	Quick-signal tests, smaller audiences
Open rate	Percentage of recipients who opened the email	Subject line and preview text testing
Revenue per exposure	Average revenue generated per recipient	When variants might influence order size

What are secondary/diagnostic metrics?

All metrics not selected as primary become diagnostics. They're shown in the metrics table for context. For example, you might optimise for conversion rate but still want to see the click rate and revenue per variant. Diagnostic metrics are never used to determine the winner.

Why is open rate marked with a warning?

Open tracking is unreliable because of Apple Mail Privacy Protection (MPP) and email client pre-fetching. These technologies automatically trigger "opens" for every email, whether or not the recipient actually looked at it.

The good news: this noise affects all variants equally (since recipients are randomly assigned), so relative comparisons remain valid. The bad news: absolute open rates are inflated, and tests using open rate as the primary metric need more data to reach a conclusion.

What is "revenue per exposure"?

Revenue per exposure (RPE) measures the average revenue each recipient generates. It captures two effects:

Conversion probability. Does this variant make people more likely to buy?
Order value. When people do buy, do they spend more?

A variant could win on RPE even if it doesn't have the highest conversion rate, because it might encourage larger orders. Liftstack uses a specialised compound model for RPE that analyses these two components separately and then combines them.

What are the safety guardrails?

Even if a variant drives conversions, it might be doing so in a way that damages your list health. Liftstack monitors three guardrail metrics automatically:

Unsubscribe guardrail. If the winning variant's unsubscribe rate is meaningfully higher than the control's (more than 0.1 percentage points), the system blocks the winner declaration.
Complaint guardrail. If the variant triggers a meaningful increase in spam complaints (more than 0.05 percentage points vs the control), the winner is blocked.
Bounce guardrail. If the variant causes a meaningful increase in email bounces (more than 0.5 percentage points vs the control), the winner is blocked. High bounce rates can damage your sender reputation.

When any guardrail fires, Liftstack shows a red warning and prevents the variant from being declared a winner.

Smart Allocation (Thompson Sampling)

What is "Smart Allocation"?

When you've tested the same snippet variants across multiple campaigns, Liftstack can use historical performance data to send more traffic to the variants that have been performing well, while still sending some traffic to underperforming variants to make sure we aren't missing something. This is called Thompson Sampling.

How is it different from an equal split?

With a standard A/B test (equal split), each variant gets the same number of recipients, say 33% each for three variants. This is fair but wasteful: you're sending just as much traffic to a clearly underperforming variant as to the front-runner.

With Smart Allocation, Liftstack might split traffic 60/25/15 based on past performance. The likely winner gets more traffic (fewer wasted exposures), while alternatives still get enough to confirm whether they've improved or the leader has slipped.

Does this bias the test?

No. The system still tracks performance for every variant and runs the full statistical analysis. The unequal allocation actually makes the test more efficient. You reach conclusions faster because more recipients are exposed to the likely best variant, so uplift is captured sooner.

Can I override the smart allocation?

Yes. When Liftstack recommends an allocation, you'll see a transparency panel showing the recommended traffic split and why. You have three options: Accept, Adjust Manually (drag sliders), or Use Equal Split.

What is the "Smart Allocation Uplift"?

When a campaign uses Thompson Sampling, the report shows the additional conversions captured by the smart allocation compared to what an equal split would have produced. This isolates the value of the allocation strategy from the value of testing itself.

How does the system handle a brand-new variant with no history?

New variants (those that have never appeared in a completed campaign) receive a guaranteed minimum of 20% of traffic on their first campaign, regardless of what Thompson Sampling would recommend. This prevents established variants from starving newcomers of exposure.

Does historical data expire?

Yes. Liftstack applies a recency decay to historical data: performance from campaigns 60 days ago counts half as much as recent campaigns, and very old data fades away almost entirely. This ensures the allocation reflects current audience preferences, not stale data.

Operational Workflow

Can I fix a typo in a variant after the test starts?

It depends on how far the campaign has progressed:

Before sending (DRAFT through TEMPLATE_PUSHED): Yes. You can edit variant content in the snippet editor at any time before you confirm the send. If the template has already been pushed, Liftstack will re-push it with the updated content.
After sending (SENT, TRACKING, COMPLETED): No. Once the campaign is sent, the content that recipients saw is fixed. Editing the variant in Liftstack would update it for future campaigns, but it won't change what was already delivered.

If you spot a serious error after sending (like a broken link), the right approach is to fix it in your ESP's template directly. The Liftstack test results for that variant will be affected, and the report will reflect that.

Can I add a variant to a test that is already running?

No. Adding a variant mid-test would mean that variant has a different exposure period and audience size, which makes statistical comparison invalid. If you want to test an additional variant, create a new campaign with all the variants you want to compare (including the new one).

This is a deliberate constraint. Mixed-exposure tests produce unreliable results, and Liftstack prioritises correct conclusions over flexibility.

Can I stop or pause a single variant without killing the whole campaign?

Not currently. The campaign operates as a single unit: it's either tracking or completed. If a variant has a serious problem (offensive content, broken rendering), your best option is to fix the issue in the ESP template directly so recipients no longer see the problematic content. The statistical results for that variant will be affected, but the test continues for the remaining variants.

Can I duplicate a campaign setup?

Not yet, but this is a planned feature. For now, when setting up a new campaign you can select the same snippets and variants from your library, which preserves most of the configuration. If you're using Smart Allocation, historical performance from previous campaigns carries over automatically.

What happens if I delete a snippet that's active in a campaign?

You can't. Snippets that are referenced by campaign slots are protected at the database level. If you attempt to delete one, the operation will fail. You would need to remove the snippet from all campaign slots first. This prevents accidentally orphaning a running test.

Can I re-run the same test on a different audience?

Yes. Create a new campaign, select the same snippets and variants, and point it at a different segment. Liftstack treats each campaign as an independent experiment with fresh assignments. If Smart Allocation is enabled, the new campaign will benefit from the performance data gathered in the original test.

Segmentation & Audience

Does Liftstack work with my existing ESP segments?

Yes. When you set up a campaign, you select a segment (or list) from your ESP. Liftstack syncs the audience from that segment via the API. Whatever targeting, filtering, or segmentation logic you've built in your ESP applies as normal. Liftstack doesn't bypass or override your segmentation; it tests content within the audience you've already defined.

Can I see results broken down by segment?

The standard campaign report shows results for the full audience. Liftstack does not currently break down results by sub-segments within a single campaign.

However, you can achieve segment-level insights in two ways:

Run separate campaigns per segment. Send the same snippets to your VIP segment and your non-VIP segment as separate campaigns. Each gets its own independent analysis.
Stratified Thompson Sampling (Scale plan). Liftstack maintains separate performance estimates per segment. The allocation engine uses per-segment data, which means variants that work better for specific segments get more traffic within those segments.

Is there a way to have a global holdout (control) group?

Yes. When creating a campaign, you can set a holdout percentage (up to 20% of the audience). Holdout recipients are randomly selected and assigned only the control variant across all slots. They receive the email with your default content and serve as a baseline.

This is different from having a control variant in a slot. The holdout group isolates the effect of personalised content assignment itself, answering: "Does running any test at all produce better outcomes than sending everyone the default?"

Requirements: At least one slot must have a variant marked as control. The holdout percentage cannot be changed after assignments are made.

Can I run a test targeting only mobile users or only desktop users?

Not directly within Liftstack. However, you can achieve this by creating a segment in your ESP that filters by device type (most ESPs support this), and then running your Liftstack campaign against that segment.

Can I see if Variant A won for one demographic but Variant B won for another?

Not as a built-in report split. Liftstack analyses each campaign as a single audience. If you suspect a variant performs differently across demographics, the recommended approach is to run separate campaigns against demographic-specific segments.

The Content Insights feature (Growth and Scale plans) does detect patterns across campaigns, which can surface observations like "urgency messaging tends to outperform for your promotional segments." These are observational hints, not segment-level A/B test results, but they can guide your testing strategy.

Dashboard & Insights

What do the dashboard stat cards show?

The four cards at the top of the dashboard give you a monthly snapshot:

Card	What It Shows
Campaigns This Month	How many campaigns you've sent with Liftstack
Snippets Tested	How many unique content variants were tested
Clear Winners	Percentage of tested slots where a clear winner was found
Est. Revenue Uplift	Total estimated additional revenue from choosing winning variants

What are Content Insights?

Content Insights are patterns the system detects across your historical campaigns. For example: "Urgency tone tends to outperform your average by approximately 1.2%." These are surfaced with confidence levels:

High confidence. Pattern supported by substantial data (10,000+ exposures across many campaigns).
Moderate confidence. Suggestive pattern worth investigating, but based on less data.

Important: Insights are observational, not causal. A pattern like "urgency outperforms" is a correlation. It could be influenced by the specific copy, audience, timing, or other factors. The insight is a hypothesis to test deliberately, not a guaranteed rule.

Why don't I see any insights?

Insights require a meaningful history to detect patterns. They won't appear until:

You've completed at least 5 campaigns with the same snippet attributes
At least 3 variants share the attribute being analysed
The pattern passes a statistical threshold (adjusted for the number of attributes being tested simultaneously)

Snippet Performance

What is the Snippet Performance page?

This page aggregates how each variant has performed across all the campaigns it's appeared in. Instead of looking at one campaign at a time, you can see the big picture: which variants consistently win, which are reliable, and which are inconsistent.

What do the performance verdicts mean?

Verdict	Criteria	What It Means
Strong performer	Won 60%+ of campaigns, across 4+	Reliably outperforms. Consider making it your default.
Consistent	Won 40%+ with low variability	Reliable middle-of-the-road performer
Variable	High variability across campaigns	Sensitive to audience or timing. Unpredictable.
Needs more data	Fewer than 3 campaigns	Too early to judge. Keep testing.

What does the sparkline show?

The sparkline chart on each variant's detail page shows its conversion rate across every campaign. A flat line is good (consistent performer). A downward trend suggests novelty effects wore off. An upward trend suggests the audience is warming to it.

What is a temporal trend warning?

If a variant's performance is clearly trending up or down across campaigns, Liftstack surfaces a warning. This helps you catch novelty effects (temporary boost from new content) and primacy effects (initial resistance to change that fades over time).

Data Quality & Warnings

What is a Sample Ratio Mismatch (SRM)?

An SRM means the actual traffic split between variants doesn't match what was intended. For example, you set up a 50/50 split but actually got 53/47. This is a serious issue because it suggests something went wrong in the delivery pipeline.

Common causes: partial failures when writing assignments to your CRM, recipients unsubscribing between assignment and send, template rendering errors for one variant, or platform-side content filtering.

When SRM is detected, Liftstack blocks the verdict and shows a red warning. You should investigate the root cause before trusting any results.

What are data quality checks?

Before running any analysis, Liftstack automatically checks:

Assignment completeness. Were all audience members actually assigned a variant?
Sample ratio mismatch. Does the actual split match the intended split?
Zero-event variants. Does any variant have zero engagement events despite having recipients?
Minimum data threshold. Has each variant accumulated enough data for meaningful analysis?

Issues are flagged directly on the campaign report with severity levels (critical warnings block analysis; minor warnings are informational).

What about bot traffic?

Email engagement metrics are polluted by bots. Liftstack automatically filters these out during event ingestion by detecting:

Known bot user agents (Googlebot, link scanners, headless browsers, etc.)
Known email security scanners (Barracuda, Proofpoint, Mimecast, etc.)
Impossibly fast clicks (within 1 second of delivery)

The campaign report shows what percentage of traffic was classified as bot activity and excluded. Typical campaigns see 5 to 15% bot traffic.

What does "interaction detected" mean?

When your campaign tests multiple slots (e.g., subject line AND hero image), Liftstack checks whether the combination matters. An interaction means: Variant A in the subject line slot performs differently when paired with Variant X vs Variant Y in the hero slot.

Interactions are flagged with cautious language: "We detected a possible interaction... This may warrant investigation but could also be coincidental." The per-slot results remain valid. The interaction is additional context, not a change to the verdict.

Common Questions About the Statistics

Is Bayesian analysis as rigorous as traditional statistics?

Yes, and arguably more so for this use case. The Bayesian approach used in Liftstack:

Produces the same quality of conclusions as frequentist methods (p-values, confidence intervals)
Provides answers that are easier to interpret correctly ("93% probability this is the best" vs "p < 0.05")
Handles continuous monitoring naturally, so you can check results at any time without inflating error rates
Does not require pre-determined sample sizes
Includes built-in protection against the winner's curse (extreme results are naturally pulled toward realistic values)

Why 50,000 Monte Carlo samples?

Behind the scenes, Liftstack uses a simulation technique called Monte Carlo sampling: it draws 50,000 random scenarios from the statistical model to estimate probabilities. This is more than sufficient for stable, reproducible results. Increasing beyond 50,000 wouldn't meaningfully change any number you see in the report.

What is the "prior" and does it affect my results?

In Bayesian statistics, the prior represents your starting assumption before seeing any data. Liftstack defaults to an uninformative prior, meaning it starts with no assumptions about what the conversion rate should be. This is conservative and lets the data speak for itself.

After you've completed 5+ campaigns, Liftstack can automatically switch to an adaptive prior that encodes your workspace's typical conversion rate range (e.g., "our campaigns usually convert between 1% and 4%"). This helps small tests converge faster without biasing toward any particular variant, because it applies the same prior to all variants equally.

You can also manually set the prior if you have specific domain knowledge, but most users never need to touch this.

Won't the prior bias my results?

No, for two important reasons:

The same prior is applied to every variant in the test. It shifts all estimates equally and doesn't favour one variant over another.
The prior's influence shrinks rapidly as data arrives. After a few hundred exposures per variant, the data overwhelms the prior entirely.

The prior mainly matters in the early stages of a test (under 300 exposures per variant), where it prevents extreme estimates from tiny samples.

What is ROPE and why does it matter?

ROPE (Region of Practical Equivalence) is how Liftstack determines whether a difference is too small to care about. The default ROPE width is 0.5 percentage points, meaning if two variants are within half a percentage point of each other, they're treated as functionally equivalent.

This prevents the system from declaring a "winner" that only beats the control by 0.02 percentage points. Technically better, but practically meaningless.

How does Liftstack handle multiple comparisons?

When you test many variants across many slots, the chance of finding a false positive increases. Liftstack handles this differently for each metric tier:

Primary metric. The Bayesian framework already accounts for all variants simultaneously. Probability of being best is computed jointly, so no additional correction is needed within a slot.
Guardrail metrics. Each guardrail is checked independently against a 90% posterior probability threshold. No multiplicity correction is applied because the guardrails are intentionally conservative (high threshold, narrow tolerance).
Diagnostic metrics. No correction. They're explicitly labelled as exploratory context, not decision-drivers.
Cross-slot uplift. When summing uplift across multiple slots, the confidence intervals are widened to maintain accuracy.

Is the uplift number real? Can I trust it?

The uplift estimate ("+X additional conversions, +£Y additional revenue") is the system's best estimate based on the data, with several safeguards against overestimation:

It uses the posterior mean (not the raw observed difference), which naturally shrinks extreme estimates toward realistic values
It always includes a credible interval (range) so you can see the best and worst case
It reports the probability this is a real improvement (e.g., "94% chance of real improvement")

That said, all estimates have uncertainty. The true uplift could be at the high end of the range, the low end, or anywhere in between. The headline number is the most likely value, and the range gives you the realistic spread.

What is the winner's curse?

When you test many variants and declare the best-performing one the "winner", its observed performance tends to be slightly inflated by luck. The variant that happened to get favourable randomness in this particular test looks better than it truly is.

Liftstack mitigates this automatically through the Bayesian model (which shrinks extreme estimates) and by always reporting credible intervals alongside point estimates. You should interpret the range, not just the headline number.

Commercial, Privacy & Administration

How is Liftstack priced?

Liftstack offers three paid tiers, billed monthly or annually (with a discount for annual billing):

	Starter	Growth	Scale
Audience profiles	50,000	150,000	500,000
Campaigns/month	10	30	Unlimited
Slots per campaign	2	4	Unlimited
Variants per slot	3	5	5
Platform connections	1	2	3
Team members	3	10	25
Smart Allocation	No	Yes	Yes
Revenue modelling	No	Yes	Yes
Content Insights	No	Yes	Yes
Stratified TS	No	No	Yes
Interaction detection	No	No	Yes
Adaptive priors	No	No	Yes

There is also a 14-day free trial with Growth-tier features and 1 campaign, so you can run a real test before committing.

Can I invite my agency or team members to my workspace?

Yes. Every plan includes multiple workspace seats. You invite team members by email. Liftstack supports three roles:

Owner: full access, including billing and workspace settings
Admin: full access to campaigns, snippets, integrations, and workspace settings
Member: can create and manage campaigns and snippets; cannot modify integrations or workspace settings

Is Liftstack GDPR compliant?

Liftstack is designed with data minimisation in mind:

What Liftstack stores: Platform profile IDs, email addresses (for audience sync), and engagement events with their metadata.
What Liftstack does NOT store: Payment information (handled by Stripe), email content rendered to recipients (stays in your ESP), or any personal data beyond what's listed above.
Encryption at rest: API credentials, email addresses, audience profile properties, and event payloads are all encrypted using Fernet symmetric encryption with per-workspace derived keys. All data in transit uses TLS.
Data processing: Liftstack acts as a data processor on your behalf. You remain the data controller for your subscriber data.

If your organisation requires a Data Processing Agreement (DPA), contact support.

Does Liftstack store Personally Identifiable Information (PII)?

Liftstack stores the minimum PII necessary to run tests: platform profile IDs and email addresses from your audience sync. These are used to match assignments to engagement events for attribution.

Email addresses and audience profile properties are encrypted at rest using per-workspace Fernet keys. Platform profile IDs are stored unencrypted because they are required for database lookups and attribution joins.

What happens to my data if I cancel?

When you cancel your subscription:

Your workspace and all its data remain accessible in read-only mode through the end of your current billing period.
After the billing period ends, your workspace enters a grace period. You can reactivate your subscription during this time to restore full access.
If you want your data deleted, contact support and we will permanently remove your workspace and all associated data.

Historical campaign results are yours. You can export CSV reports from any campaign before your access expires.

Who can see my test results?

Only members of your workspace. Liftstack is multi-tenant with strict workspace isolation. Users in one workspace cannot see campaigns, snippets, integrations, or results belonging to another workspace.

Does Liftstack have access to my ESP account?

Liftstack uses the API key you provide to make specific API calls: syncing audiences, writing profile properties, pushing templates, and fetching engagement events. It does not have access to your ESP dashboard, billing, or any data outside the scope of those API calls.

You can revoke access at any time by deleting the API key in your ESP's settings. Liftstack will immediately lose the ability to make any calls.

Troubleshooting

My test has been running for days but still says "Insufficient Data"

This usually means one of:

The variants perform very similarly. If the true difference is tiny, you need a very large audience to detect it. Consider whether the content differences are meaningful enough.
Small audience. Check whether your audience meets the minimum size guidance for the effect size you're trying to detect.
Low event volume. If conversions are rare (e.g., under 1%), you need substantially more recipients per variant.

The report will show an estimate of how many more exposures are needed. If that number is impractically large, the variants may simply be too similar to distinguish. That is a valid result; consider declaring them equivalent and moving on.

Why does one variant show zero events?

A variant with recipients but zero engagement events may indicate a tracking issue:

Check that the template conditional logic is rendering correctly for that variant
Verify that the tracking links contain the correct lf_cid parameter
Confirm that your webhook or event polling is functioning

Liftstack flags this as a data quality warning on the campaign report.

Why was my winner blocked by a guardrail?

The variant with the best primary metric performance also triggered a safety threshold. The three guardrails that can block a winner are:

Unsubscribe rate: the variant caused a meaningful increase in unsubscribes vs the control
Complaint rate: the variant caused a meaningful increase in spam complaints vs the control
Bounce rate: the variant caused a meaningful increase in bounces vs the control (which can damage sender reputation)

Consider:

Reviewing the variant's content for overly aggressive messaging
Looking at which audience segments are unsubscribing or complaining
Whether the increase is acceptable given the conversion gains (you can acknowledge the guardrail and proceed if you've investigated)
For bounce rate violations, check whether the variant contains content that might trigger spam filters

The report shows an SRM warning. What do I do?

An SRM (Sample Ratio Mismatch) means the traffic split doesn't match what was configured. Steps to investigate:

Check for partial failures in the CRM profile write step (look for error logs during the writeback)
Check whether audience members were suppressed or unsubscribed between assignment and send
Verify that the template renders correctly for all variants (a broken conditional could funnel everyone to a default)
Check for platform-side filtering (spam filters catching one variant's content)

Until the root cause is identified, the statistical results for this slot should not be trusted.

Can I re-run a test?

Yes. Create a new campaign with the same snippet and variants. Liftstack will use the historical data from previous campaigns to inform the new test (especially with Smart Allocation enabled). Each campaign is a fresh experiment with fresh assignments.

How do I export my data?

Click the "Export CSV" button on any campaign report to download the full metrics table. This includes all variants, all metrics, and the verdict information.

Start compounding revenue from the emails you already send

Start your free trial

14-day free trial on the Growth tier. No credit card required.