Frequently Asked Questions
Everything you need to know about testing your email content with Liftstack, explained in plain language.
Jump to section
Getting Started
What is Liftstack?
What can I test?
You can test any piece of email content that you'd swap between recipients. In Liftstack, these are called "snippets." The most powerful feature is the ability to test custom HTML code blocks, which lets you experiment with virtually any element of your emails:
- Custom HTML blocks. This is where Liftstack really shines. You can test entire sections of email markup: different layouts, content structures, visual treatments, or any HTML that your ESP supports. Examples include product recommendation grids, social proof sections, header layouts, footer designs, countdown timers, loyalty callouts, trust badges, dynamic content cards, shipping/returns callout blocks, and cross-sell/upsell module formats.
- Subject lines. "Don't miss out!" vs "Your exclusive offer inside"
- Hero blocks. Different images or headline/subheadline combinations
- CTAs. "Shop Now" vs "Browse the Collection" vs "Claim Your Discount"
- Copy blocks. Different tone, length, or messaging strategy
- Discount framing. "20% off" vs "Save £10" vs no discount
Because Liftstack works at the HTML snippet level, you are not limited to testing simple text swaps. Any section of your email that you can express as an HTML block can become a testable snippet with multiple variants.
What is a "variant"?
Can I create a variant with blank content?
It depends on the snippet type:
- Subject lines: content is always required. ESPs reject blank subject lines, and a blank subject line would corrupt your test results.
- Copy and HTML blocks: content is always required. A blank variant would produce inflated uplift numbers for competing variants (since no one can click or convert on empty content) and poison Thompson Sampling posteriors for future campaigns.
- Image snippets: text content (alt text) is optional, but you must provide either an uploaded image or an image URL.
If you want to test "no content" vs "some content" for a slot, use a minimal placeholder (e.g., a single space or a neutral message) as your control variant instead.
What is a "control" variant?
The control is the version you'd send if you weren't testing. It represents your current standard or "safe" option. Marking a variant as the control lets Liftstack measure uplift: how much better the winning variant performed compared to what you would have done anyway.
You don't have to designate a control, but it's highly recommended. Without one, Liftstack can still find a winner, but the uplift numbers will be less precise.
What is a "slot"?
How does Liftstack assign variants to recipients?
Before your campaign sends, Liftstack randomly assigns each recipient a variant for each slot. These assignments are written to your CRM profiles as a property called lf_assignments. Your email template then uses conditional logic to show each person the content they were assigned.
This is important: the assignment happens before anyone sees anything. This is what makes it a proper experiment, because we know who was shown what before we see the results.
Why Liftstack?
My ESP already has A/B testing built in. Why would I pay for this?
Native ESP testing and Liftstack solve different problems.
What native ESP A/B testing does:
- Splits your audience into two groups and sends each group a completely different email (or subject line)
- Picks a winner based on opens or clicks over a short window (typically 1 to 4 hours)
- Sends the winning version to the remaining audience
What Liftstack does differently:
- Tests individual content blocks inside a single email, not whole emails against each other. You can test just the hero image, just the CTA, or just the product grid layout while keeping everything else identical.
- Runs multiple tests simultaneously in the same campaign. Test a subject line AND a hero block AND a CTA in one send, with independent results for each slot.
- Uses Bayesian statistics that let you check results at any time without inflating error rates.
- Carries learning across campaigns. Smart Allocation uses historical performance to send more traffic to better-performing variants automatically.
- Provides revenue attribution, not just click counting.
- Detects guardrail violations like unsubscribe spikes, bounce rate increases, and spam complaints.
- Works across ESPs. If you use Klaviyo for lifecycle and Customer.io for transactional, your testing insights live in one place.
Can Liftstack do things my ESP cannot?
Yes. The core capability gap is in-template content testing. Native ESP tools treat the email as a single unit: you either send Email A or Email B. Liftstack injects conditional logic into your template so that different recipients see different content blocks within the same email.
Other things Liftstack does that native tools typically don't:
- Multi-slot testing in a single send (subject line + hero + CTA, analysed independently)
- Bayesian analysis with continuous monitoring (no fixed test duration needed)
- Automatic bot filtering so inflated opens and security-scanner clicks don't corrupt your results
- Revenue-per-exposure modelling that captures both conversion probability and order value
- Cross-campaign learning via Thompson Sampling and content insights
- Safety guardrails (unsubscribe, bounce, complaint) that block winners which damage list health or sender reputation
How Testing Works
How long does a test take?
It depends on your audience size and how different the variants are. As a rough guide:
- Large audiences (50,000+) with meaningful content differences: often conclusive within a few days
- Medium audiences (5,000 to 50,000): typically 3 to 7 days
- Small audiences (under 5,000): may take multiple campaign sends
Liftstack will show you a progress estimate when your test is still collecting data.
Can I check results while the test is running?
Yes. The campaign report updates in real-time while your campaign is in tracking mode. You'll see live charts, preliminary numbers, and a confidence progression chart showing how close the test is to reaching a conclusion.
However, during the early data collection period, results will be labelled as preliminary. Liftstack enforces a minimum data threshold before declaring any verdict, which prevents premature conclusions from small, noisy samples.
What's the minimum audience size?
There's no hard minimum, but smaller audiences need larger differences between variants to reach a conclusion. As a planning guide:
| Baseline conversion rate | Min. difference to detect | Audience per variant |
|---|---|---|
| 1% | 0.5 percentage points | ~6,300 |
| 2% | 1.0 percentage point | ~3,100 |
| 3% | 1.0 percentage point | ~4,700 |
| 5% | 2.0 percentage points | ~1,900 |
If your audience is too small to detect realistic differences, Liftstack will tell you the test needs more data rather than making a premature call.
When you set up a campaign, Liftstack automatically shows a sample size guidance card after your audience is synced. This tells you whether your audience is large enough for the number of variants you're testing, based on a 3% baseline conversion rate and a 0.5 percentage point minimum detectable effect. If your audience is insufficient, you'll see a warning with specific guidance.
What is a "primary metric"?
The primary metric is the single measure you're optimising for. You choose it when setting up your campaign, and it cannot be changed once the campaign starts sending. This is deliberate: it prevents cherry-picking whichever metric happens to look best after the fact.
Your options are:
- Conversion rate (default): what percentage of recipients took the desired action (purchase, sign-up, etc.)
- Click rate: what percentage of recipients clicked a link in the email
- Open rate: what percentage of recipients opened the email
- Revenue per exposure: average revenue generated per recipient
All other metrics are still tracked and shown in your report as secondary/diagnostic metrics, but only the primary metric determines the winner.
Why can't I change the primary metric after sending?
What is the attribution window?
The attribution window is the time period after your campaign sends during which engagement events (clicks, conversions, purchases) are credited to the test. The default is 7 days.
A click that happens 3 days after the send counts. A purchase 10 days later does not (by default). This prevents distant events, which are influenced by many other factors, from muddying your test results.
If a significant number of conversions are arriving after your window closes, Liftstack will suggest extending it for future campaigns.
Integration & Setup
How does Liftstack connect to my ESP?
Liftstack connects via your ESP's API using credentials that you provide. The setup process is:
- Go to Integrations in Liftstack and select your platform (Klaviyo, Customer.io, or Iterable)
- Enter your credentials. What's required depends on the platform:
- Klaviyo: a private API key
- Customer.io: a Site ID, a Tracking API key, and an App API key
- Iterable: a standard API key
- Liftstack validates the connection and confirms access
Your credentials are encrypted at rest using Fernet symmetric encryption. Liftstack never stores them in plain text, and they are only decrypted when making API calls on your behalf.
No developer is required. If you can find your API credentials in your ESP's settings, you can complete setup in under five minutes.
What API permissions does Liftstack need?
Liftstack needs permission to:
- Read segments/lists (to sync your audience)
- Read and write profile properties (to write
lf_assignmentsfor variant targeting) - Create and update templates (to push the conditional template logic)
- Read engagement events (clicks, opens, conversions) for attribution
For Klaviyo, this means a private API key with full read/write scope. For Customer.io, an App API key with tracking and API access. For Iterable, a standard API key.
Does writing assignments burn through my ESP's API limits?
Liftstack uses batch endpoints wherever available and includes built-in rate limiting that respects each platform's published limits. For a 500,000-person audience:
- Klaviyo: uses bulk profile import endpoints; typically completes in 10 to 20 minutes
- Customer.io: uses individual profile identify calls (Customer.io does not offer a bulk endpoint); typically completes in 15 to 30 minutes for large audiences
- Iterable: uses bulk user update endpoints; typically completes in 10 to 20 minutes
These API calls count toward your ESP's rate limits, but the built-in throttling means Liftstack won't spike your usage or trigger overage charges.
How long do I need to wait between assigning and sending?
The campaign wizard handles this in sequence: it syncs the audience, runs assignment, writes properties to profiles, and pushes the template. You'll see a progress indicator for each step. Once all steps show complete, you can send immediately. There is no additional waiting period.
For large audiences (100,000+), the profile writeback step is the longest part and can take 15 to 30 minutes.
What happens if the API fails halfway through assigning?
Liftstack writes profile properties in batches with automatic retry. If a batch fails (network timeout, API error), the system retries with exponential backoff. If it hits a 429 (rate limit) response, it reads the Retry-After header and waits before continuing.
If some batches fail despite retries, the progress indicator will report how many profiles succeeded and how many failed. You can re-trigger the writeback step from the campaign wizard, and since the operation is idempotent (writing the same property value twice is harmless), it will safely re-process all profiles from the beginning.
Does Liftstack slow down my campaign sending?
Can I connect multiple ESPs to the same workspace?
Understanding Your Results
What does "X% probability of being best" mean?
This is the single most important number in your report. It answers: "What is the probability that this variant truly has the highest conversion rate?"
For example, "93% probability of being best" means: given all the data we've collected, there's a 93% chance this variant genuinely outperforms all the others. There's a 7% chance one of the other variants is actually better and this one just got lucky in this particular test.
Where is the p-value?
Liftstack uses Bayesian statistics instead of the traditional frequentist approach you might be familiar with from other tools. This means you won't see p-values, and that's a good thing.
P-values answer a confusing question: "If there were NO real difference between variants, what's the probability of seeing data this extreme?" That's hard to interpret and easy to misuse.
Probability of being best answers a direct question: "Given the data I have, what's the probability this variant is actually the best?" That's what you really want to know.
Think of it this way:
- A p-value of 0.03 does NOT mean "there's a 97% chance variant A is better." (This is the most common misinterpretation of p-values.)
- A "probability of being best" of 97% DOES mean "there's a 97% chance variant A is better." It's exactly what it says.
What about confidence intervals? I'm used to seeing those.
Liftstack shows credible intervals (displayed as "range" in the report), which look similar to confidence intervals but are easier to interpret:
- A traditional 95% confidence interval means: "If we repeated this experiment many times, 95% of the resulting intervals would contain the true value." (Confusing, right?)
- A 95% credible interval means: "There's a 95% probability the true value falls within this range." (Much more intuitive.)
You'll see these ranges throughout the report: for conversion rates, uplift estimates, and revenue figures. A narrow range means we're quite certain; a wide range means there's still meaningful uncertainty.
What does "expected loss" mean?
Expected loss answers: "If I pick this variant and it turns out not to be the best, how much conversion rate am I leaving on the table?"
For example, an expected loss of 0.05% means: if you go with this variant and it's not actually the winner, you'd lose about 0.05 percentage points of conversion rate on average. That's tiny, well within the "not worth worrying about" range.
Liftstack uses expected loss as part of its decision criteria. A variant isn't declared a winner just because it's probably best. It also needs to have a very low expected loss, ensuring that even in the unlikely scenario it's wrong, the cost is negligible.
What does "practical equivalence" mean?
Sometimes variants are so close in performance that the difference doesn't matter in practice. If variant A converts at 3.02% and variant B converts at 3.05%, that 0.03 percentage point difference is real but meaningless for your business.
Liftstack checks whether variants fall within a Region of Practical Equivalence (ROPE): a range around zero (default: 0.5 percentage points) where differences are too small to care about. If all variants fall within this range with high probability, the verdict is EQUIVALENT, and you're told to pick whichever version you prefer. There's no statistical reason to favour one over another.
Reading the Campaign Report
What is the verdict card?
The verdict card is the hero element at the top of each slot's results. It gives you the bottom line in plain language. There are four possible verdicts:
- Winner (green, trophy icon). A clear winner has been identified, showing conversion rates compared, uplift, confidence level, and revenue range.
- Equivalent (grey, equals icon). All variants performed within a negligible range of each other. Pick whichever fits your brand best.
- Insufficient Data (amber, hourglass icon). No conclusion yet. One variant is leading but not decisively. Shows an estimate of how many more exposures are needed.
- Guardrail Violation (red, warning icon). A variant triggered a safety guardrail, typically because it caused a meaningful increase in unsubscribe rates compared to the control.
What are the confidence levels?
| Probability of Being Best | Confidence Level | What It Means |
|---|---|---|
| 95% or higher | Very High | Extremely likely this is the true best variant. Declare a winner. |
| 85% to 95% | High | Very probably the best, but a small chance you're wrong. |
| 70% to 85% | Moderate | Leading, but there's meaningful uncertainty. Likely needs more data. |
| Below 70% | Low | Too early to tell. Keep testing. |
What is the uplift callout?
The uplift callout is the key value statement of your test. It answers: "How much more did I get by using the winning variant instead of the control?"
It shows two numbers:
- Additional conversions: how many extra people converted because of the winning content
- Additional revenue: the estimated revenue those extra conversions generated
These numbers come with a range (e.g., "+£8,200 to +£16,800") so you know the realistic best and worst case.
What is the metrics table?
Below each slot's charts, there's an expandable metrics table showing the raw numbers for every variant. This includes exposures, opens, open rate, clicks, CTR, conversions, conversion rate, unsubscribes, bounces, complaints, revenue, and revenue per exposure.
This table is collapsed by default because the verdict card, charts, and uplift callout already tell you everything you need to make a decision.
Understanding the Charts
What is the Variant Comparison Chart (Raincloud Plot)?
A visual comparison of all variants' estimated true conversion rates, shown in the campaign report below the verdict card. Each variant gets a horizontal row with three visual layers:
- The cloud (top half). A smooth density curve showing the range of likely conversion rates. Where the curve is tall, that rate is more likely. A tight, narrow cloud means more certainty.
- The line and dot (middle). A horizontal line showing the 95% credible interval, with a dot at the estimated conversion rate.
- The rain (bottom half). A scatter of small dots representing possible conversion rates drawn from the statistical model.
If the leading variant's cloud is clearly separated from the others (no overlap), it's a strong winner. If clouds overlap substantially, you may need more data.
What is the Chance of Winning chart?
A horizontal bar chart showing each variant's probability of being the best performer. A vertical dashed line marks the decision threshold (default: 90%). A variant needs to cross this line to be declared a winner.
The percentages always add up to 100% across all variants. If one bar dominates and crosses the threshold, you have a clear winner. If bars are close, more data is needed.
What is the Expected Improvement chart?
A density plot of the difference between the winning variant and the control, shown only when a winner has been declared. The area to the right of zero (shaded green) represents scenarios where the winner truly is better. The area to the left (shaded amber) represents scenarios where it's actually worse (unlikely, but possible).
The annotation below the chart (e.g., "92.4% chance of real improvement") tells you exactly how much of the curve is on the positive side.
What is the Confidence Progression chart?
A line chart tracking how the leading variant's probability of being best has evolved over time since the campaign was sent. A horizontal dashed line marks the decision threshold (default: 90%).
Watch for the leading variant's line climbing toward the threshold. A line that's climbing steadily suggests the test is heading toward a conclusion. A line that's flat or bouncing suggests the variants are very close. During live tracking, this chart auto-refreshes every 60 seconds.
What is the Cumulative Revenue Uplift chart?
Shown on the analytics dashboard, this is a running total of the additional revenue generated by all your winning variants across all campaigns over time. A shaded band around the line shows the confidence range.
This line should only go up (each new winner adds to the total). This is the single best chart for demonstrating ROI from your testing programme.
What is the Conversion Rate Sparkline?
Verdicts & Decisions
How does Liftstack decide on a winner?
A variant is declared the winner when both of these conditions are met:
- Probability of being best is at least 90% (configurable). We're highly confident this variant truly has the highest conversion rate.
- Expected loss is at most 0.1% (configurable). Even if we're wrong, the cost of choosing this variant over the true best is negligible.
Both conditions must hold simultaneously. A variant with 92% probability of being best but an expected loss of 0.3% won't be declared a winner yet because the potential downside is still too large.
How does Liftstack decide variants are equivalent?
Variants are declared equivalent when Liftstack is highly confident (90%+ probability) that the difference between all variants falls within 0.5 percentage points (configurable). At that point, the differences are real but too small to matter for your business.
When there are many variants (4+), Liftstack can detect partial equivalence. For example: "Variant A is the clear winner. Among the remaining variants, B, C, and D are practically equivalent to each other."
What is a guardrail violation?
Guardrail metrics are safety checks that protect your audience. Even if a variant has a great conversion rate, it won't be declared a winner if it's damaging other important metrics:
- Unsubscribe rate. If the variant causes unsubscribes to increase by more than 0.1 percentage points vs the control.
- Spam complaint rate. If complaints increase by more than 0.05 percentage points vs the control.
- Bounce rate. If the variant causes bounces to increase by more than 0.5 percentage points vs the control.
A variant that drives clicks but burns your subscriber list is destroying long-term value. The guardrail catches this and warns you.
What does "insufficient data" mean?
This means no conclusion can be reached yet. One variant is probably leading, but there isn't enough data to be confident. Common reasons:
- The audience is small
- The variants perform very similarly (requiring more data to distinguish them)
- The campaign is still early in its tracking period
The report will show an estimate of how many more recipients need to be exposed before a conclusion can be reached.
Can I override the verdict?
The verdict is the system's statistical recommendation. You're free to take a different action, such as continuing to test a variant even after it's been declared equivalent, or choosing a variant other than the winner based on brand considerations.
What you can't do is change the primary metric after seeing results, or retroactively adjust the analysis to favour a particular outcome. These safeguards keep the testing process honest.
Metrics & What They Mean
What are the primary metrics?
| Metric | What It Measures | Best For |
|---|---|---|
| Conversion rate | Percentage of recipients who completed the desired action | Most campaigns (the default) |
| Click rate | Percentage of recipients who clicked any link | Quick-signal tests, smaller audiences |
| Open rate | Percentage of recipients who opened the email | Subject line and preview text testing |
| Revenue per exposure | Average revenue generated per recipient | When variants might influence order size |
What are secondary/diagnostic metrics?
Why is open rate marked with a warning?
Open tracking is unreliable because of Apple Mail Privacy Protection (MPP) and email client pre-fetching. These technologies automatically trigger "opens" for every email, whether or not the recipient actually looked at it.
The good news: this noise affects all variants equally (since recipients are randomly assigned), so relative comparisons remain valid. The bad news: absolute open rates are inflated, and tests using open rate as the primary metric need more data to reach a conclusion.
What is "revenue per exposure"?
Revenue per exposure (RPE) measures the average revenue each recipient generates. It captures two effects:
- Conversion probability. Does this variant make people more likely to buy?
- Order value. When people do buy, do they spend more?
A variant could win on RPE even if it doesn't have the highest conversion rate, because it might encourage larger orders. Liftstack uses a specialised compound model for RPE that analyses these two components separately and then combines them.
What are the safety guardrails?
Even if a variant drives conversions, it might be doing so in a way that damages your list health. Liftstack monitors three guardrail metrics automatically:
- Unsubscribe guardrail. If the winning variant's unsubscribe rate is meaningfully higher than the control's (more than 0.1 percentage points), the system blocks the winner declaration.
- Complaint guardrail. If the variant triggers a meaningful increase in spam complaints (more than 0.05 percentage points vs the control), the winner is blocked.
- Bounce guardrail. If the variant causes a meaningful increase in email bounces (more than 0.5 percentage points vs the control), the winner is blocked. High bounce rates can damage your sender reputation.
When any guardrail fires, Liftstack shows a red warning and prevents the variant from being declared a winner.
Smart Allocation (Thompson Sampling)
What is "Smart Allocation"?
How is it different from an equal split?
With a standard A/B test (equal split), each variant gets the same number of recipients, say 33% each for three variants. This is fair but wasteful: you're sending just as much traffic to a clearly underperforming variant as to the front-runner.
With Smart Allocation, Liftstack might split traffic 60/25/15 based on past performance. The likely winner gets more traffic (fewer wasted exposures), while alternatives still get enough to confirm whether they've improved or the leader has slipped.
Does this bias the test?
Can I override the smart allocation?
What is the "Smart Allocation Uplift"?
How does the system handle a brand-new variant with no history?
Does historical data expire?
Operational Workflow
Can I fix a typo in a variant after the test starts?
It depends on how far the campaign has progressed:
- Before sending (DRAFT through TEMPLATE_PUSHED): Yes. You can edit variant content in the snippet editor at any time before you confirm the send. If the template has already been pushed, Liftstack will re-push it with the updated content.
- After sending (SENT, TRACKING, COMPLETED): No. Once the campaign is sent, the content that recipients saw is fixed. Editing the variant in Liftstack would update it for future campaigns, but it won't change what was already delivered.
If you spot a serious error after sending (like a broken link), the right approach is to fix it in your ESP's template directly. The Liftstack test results for that variant will be affected, and the report will reflect that.
Can I add a variant to a test that is already running?
No. Adding a variant mid-test would mean that variant has a different exposure period and audience size, which makes statistical comparison invalid. If you want to test an additional variant, create a new campaign with all the variants you want to compare (including the new one).
This is a deliberate constraint. Mixed-exposure tests produce unreliable results, and Liftstack prioritises correct conclusions over flexibility.
Can I stop or pause a single variant without killing the whole campaign?
Can I duplicate a campaign setup?
What happens if I delete a snippet that's active in a campaign?
Can I re-run the same test on a different audience?
Segmentation & Audience
Does Liftstack work with my existing ESP segments?
Can I see results broken down by segment?
The standard campaign report shows results for the full audience. Liftstack does not currently break down results by sub-segments within a single campaign.
However, you can achieve segment-level insights in two ways:
- Run separate campaigns per segment. Send the same snippets to your VIP segment and your non-VIP segment as separate campaigns. Each gets its own independent analysis.
- Stratified Thompson Sampling (Scale plan). Liftstack maintains separate performance estimates per segment. The allocation engine uses per-segment data, which means variants that work better for specific segments get more traffic within those segments.
Is there a way to have a global holdout (control) group?
Yes. When creating a campaign, you can set a holdout percentage (up to 20% of the audience). Holdout recipients are randomly selected and assigned only the control variant across all slots. They receive the email with your default content and serve as a baseline.
This is different from having a control variant in a slot. The holdout group isolates the effect of personalised content assignment itself, answering: "Does running any test at all produce better outcomes than sending everyone the default?"
Requirements: At least one slot must have a variant marked as control. The holdout percentage cannot be changed after assignments are made.
Can I run a test targeting only mobile users or only desktop users?
Can I see if Variant A won for one demographic but Variant B won for another?
Not as a built-in report split. Liftstack analyses each campaign as a single audience. If you suspect a variant performs differently across demographics, the recommended approach is to run separate campaigns against demographic-specific segments.
The Content Insights feature (Growth and Scale plans) does detect patterns across campaigns, which can surface observations like "urgency messaging tends to outperform for your promotional segments." These are observational hints, not segment-level A/B test results, but they can guide your testing strategy.
Dashboard & Insights
What do the dashboard stat cards show?
The four cards at the top of the dashboard give you a monthly snapshot:
| Card | What It Shows |
|---|---|
| Campaigns This Month | How many campaigns you've sent with Liftstack |
| Snippets Tested | How many unique content variants were tested |
| Clear Winners | Percentage of tested slots where a clear winner was found |
| Est. Revenue Uplift | Total estimated additional revenue from choosing winning variants |
What are Content Insights?
Content Insights are patterns the system detects across your historical campaigns. For example: "Urgency tone tends to outperform your average by approximately 1.2%." These are surfaced with confidence levels:
- High confidence. Pattern supported by substantial data (10,000+ exposures across many campaigns).
- Moderate confidence. Suggestive pattern worth investigating, but based on less data.
Important: Insights are observational, not causal. A pattern like "urgency outperforms" is a correlation. It could be influenced by the specific copy, audience, timing, or other factors. The insight is a hypothesis to test deliberately, not a guaranteed rule.
Why don't I see any insights?
Insights require a meaningful history to detect patterns. They won't appear until:
- You've completed at least 5 campaigns with the same snippet attributes
- At least 3 variants share the attribute being analysed
- The pattern passes a statistical threshold (adjusted for the number of attributes being tested simultaneously)
Snippet Performance
What is the Snippet Performance page?
What do the performance verdicts mean?
| Verdict | Criteria | What It Means |
|---|---|---|
| Strong performer | Won 60%+ of campaigns, across 4+ | Reliably outperforms. Consider making it your default. |
| Consistent | Won 40%+ with low variability | Reliable middle-of-the-road performer |
| Variable | High variability across campaigns | Sensitive to audience or timing. Unpredictable. |
| Needs more data | Fewer than 3 campaigns | Too early to judge. Keep testing. |
What does the sparkline show?
What is a temporal trend warning?
Data Quality & Warnings
What is a Sample Ratio Mismatch (SRM)?
An SRM means the actual traffic split between variants doesn't match what was intended. For example, you set up a 50/50 split but actually got 53/47. This is a serious issue because it suggests something went wrong in the delivery pipeline.
Common causes: partial failures when writing assignments to your CRM, recipients unsubscribing between assignment and send, template rendering errors for one variant, or platform-side content filtering.
When SRM is detected, Liftstack blocks the verdict and shows a red warning. You should investigate the root cause before trusting any results.
What are data quality checks?
Before running any analysis, Liftstack automatically checks:
- Assignment completeness. Were all audience members actually assigned a variant?
- Sample ratio mismatch. Does the actual split match the intended split?
- Zero-event variants. Does any variant have zero engagement events despite having recipients?
- Minimum data threshold. Has each variant accumulated enough data for meaningful analysis?
Issues are flagged directly on the campaign report with severity levels (critical warnings block analysis; minor warnings are informational).
What about bot traffic?
Email engagement metrics are polluted by bots. Liftstack automatically filters these out during event ingestion by detecting:
- Known bot user agents (Googlebot, link scanners, headless browsers, etc.)
- Known email security scanners (Barracuda, Proofpoint, Mimecast, etc.)
- Impossibly fast clicks (within 1 second of delivery)
The campaign report shows what percentage of traffic was classified as bot activity and excluded. Typical campaigns see 5 to 15% bot traffic.
What does "interaction detected" mean?
When your campaign tests multiple slots (e.g., subject line AND hero image), Liftstack checks whether the combination matters. An interaction means: Variant A in the subject line slot performs differently when paired with Variant X vs Variant Y in the hero slot.
Interactions are flagged with cautious language: "We detected a possible interaction... This may warrant investigation but could also be coincidental." The per-slot results remain valid. The interaction is additional context, not a change to the verdict.
Common Questions About the Statistics
Is Bayesian analysis as rigorous as traditional statistics?
Yes, and arguably more so for this use case. The Bayesian approach used in Liftstack:
- Produces the same quality of conclusions as frequentist methods (p-values, confidence intervals)
- Provides answers that are easier to interpret correctly ("93% probability this is the best" vs "p < 0.05")
- Handles continuous monitoring naturally, so you can check results at any time without inflating error rates
- Does not require pre-determined sample sizes
- Includes built-in protection against the winner's curse (extreme results are naturally pulled toward realistic values)
Why 50,000 Monte Carlo samples?
What is the "prior" and does it affect my results?
In Bayesian statistics, the prior represents your starting assumption before seeing any data. Liftstack defaults to an uninformative prior, meaning it starts with no assumptions about what the conversion rate should be. This is conservative and lets the data speak for itself.
After you've completed 5+ campaigns, Liftstack can automatically switch to an adaptive prior that encodes your workspace's typical conversion rate range (e.g., "our campaigns usually convert between 1% and 4%"). This helps small tests converge faster without biasing toward any particular variant, because it applies the same prior to all variants equally.
You can also manually set the prior if you have specific domain knowledge, but most users never need to touch this.
Won't the prior bias my results?
No, for two important reasons:
- The same prior is applied to every variant in the test. It shifts all estimates equally and doesn't favour one variant over another.
- The prior's influence shrinks rapidly as data arrives. After a few hundred exposures per variant, the data overwhelms the prior entirely.
The prior mainly matters in the early stages of a test (under 300 exposures per variant), where it prevents extreme estimates from tiny samples.
What is ROPE and why does it matter?
ROPE (Region of Practical Equivalence) is how Liftstack determines whether a difference is too small to care about. The default ROPE width is 0.5 percentage points, meaning if two variants are within half a percentage point of each other, they're treated as functionally equivalent.
This prevents the system from declaring a "winner" that only beats the control by 0.02 percentage points. Technically better, but practically meaningless.
How does Liftstack handle multiple comparisons?
When you test many variants across many slots, the chance of finding a false positive increases. Liftstack handles this differently for each metric tier:
- Primary metric. The Bayesian framework already accounts for all variants simultaneously. Probability of being best is computed jointly, so no additional correction is needed within a slot.
- Guardrail metrics. Each guardrail is checked independently against a 90% posterior probability threshold. No multiplicity correction is applied because the guardrails are intentionally conservative (high threshold, narrow tolerance).
- Diagnostic metrics. No correction. They're explicitly labelled as exploratory context, not decision-drivers.
- Cross-slot uplift. When summing uplift across multiple slots, the confidence intervals are widened to maintain accuracy.
Is the uplift number real? Can I trust it?
The uplift estimate ("+X additional conversions, +£Y additional revenue") is the system's best estimate based on the data, with several safeguards against overestimation:
- It uses the posterior mean (not the raw observed difference), which naturally shrinks extreme estimates toward realistic values
- It always includes a credible interval (range) so you can see the best and worst case
- It reports the probability this is a real improvement (e.g., "94% chance of real improvement")
That said, all estimates have uncertainty. The true uplift could be at the high end of the range, the low end, or anywhere in between. The headline number is the most likely value, and the range gives you the realistic spread.
What is the winner's curse?
When you test many variants and declare the best-performing one the "winner", its observed performance tends to be slightly inflated by luck. The variant that happened to get favourable randomness in this particular test looks better than it truly is.
Liftstack mitigates this automatically through the Bayesian model (which shrinks extreme estimates) and by always reporting credible intervals alongside point estimates. You should interpret the range, not just the headline number.
Commercial, Privacy & Administration
How is Liftstack priced?
Liftstack offers three paid tiers, billed monthly or annually (with a discount for annual billing):
| Starter | Growth | Scale | |
|---|---|---|---|
| Audience profiles | 50,000 | 150,000 | 500,000 |
| Campaigns/month | 10 | 30 | Unlimited |
| Slots per campaign | 2 | 4 | Unlimited |
| Variants per slot | 3 | 5 | 5 |
| Platform connections | 1 | 2 | 3 |
| Team members | 3 | 10 | 25 |
| Smart Allocation | No | Yes | Yes |
| Revenue modelling | No | Yes | Yes |
| Content Insights | No | Yes | Yes |
| Stratified TS | No | No | Yes |
| Interaction detection | No | No | Yes |
| Adaptive priors | No | No | Yes |
There is also a 14-day free trial with Growth-tier features and 1 campaign, so you can run a real test before committing.
Can I invite my agency or team members to my workspace?
Yes. Every plan includes multiple workspace seats. You invite team members by email. Liftstack supports three roles:
- Owner: full access, including billing and workspace settings
- Admin: full access to campaigns, snippets, integrations, and workspace settings
- Member: can create and manage campaigns and snippets; cannot modify integrations or workspace settings
Is Liftstack GDPR compliant?
Liftstack is designed with data minimisation in mind:
- What Liftstack stores: Platform profile IDs, email addresses (for audience sync), and engagement events with their metadata.
- What Liftstack does NOT store: Payment information (handled by Stripe), email content rendered to recipients (stays in your ESP), or any personal data beyond what's listed above.
- Encryption at rest: API credentials, email addresses, audience profile properties, and event payloads are all encrypted using Fernet symmetric encryption with per-workspace derived keys. All data in transit uses TLS.
- Data processing: Liftstack acts as a data processor on your behalf. You remain the data controller for your subscriber data.
If your organisation requires a Data Processing Agreement (DPA), contact support.
Does Liftstack store Personally Identifiable Information (PII)?
Liftstack stores the minimum PII necessary to run tests: platform profile IDs and email addresses from your audience sync. These are used to match assignments to engagement events for attribution.
Email addresses and audience profile properties are encrypted at rest using per-workspace Fernet keys. Platform profile IDs are stored unencrypted because they are required for database lookups and attribution joins.
What happens to my data if I cancel?
When you cancel your subscription:
- Your workspace and all its data remain accessible in read-only mode through the end of your current billing period.
- After the billing period ends, your workspace enters a grace period. You can reactivate your subscription during this time to restore full access.
- If you want your data deleted, contact support and we will permanently remove your workspace and all associated data.
Historical campaign results are yours. You can export CSV reports from any campaign before your access expires.
Who can see my test results?
Does Liftstack have access to my ESP account?
Liftstack uses the API key you provide to make specific API calls: syncing audiences, writing profile properties, pushing templates, and fetching engagement events. It does not have access to your ESP dashboard, billing, or any data outside the scope of those API calls.
You can revoke access at any time by deleting the API key in your ESP's settings. Liftstack will immediately lose the ability to make any calls.
Troubleshooting
My test has been running for days but still says "Insufficient Data"
This usually means one of:
- The variants perform very similarly. If the true difference is tiny, you need a very large audience to detect it. Consider whether the content differences are meaningful enough.
- Small audience. Check whether your audience meets the minimum size guidance for the effect size you're trying to detect.
- Low event volume. If conversions are rare (e.g., under 1%), you need substantially more recipients per variant.
The report will show an estimate of how many more exposures are needed. If that number is impractically large, the variants may simply be too similar to distinguish. That is a valid result; consider declaring them equivalent and moving on.
Why does one variant show zero events?
A variant with recipients but zero engagement events may indicate a tracking issue:
- Check that the template conditional logic is rendering correctly for that variant
- Verify that the tracking links contain the correct
lf_cidparameter - Confirm that your webhook or event polling is functioning
Liftstack flags this as a data quality warning on the campaign report.
Why was my winner blocked by a guardrail?
The variant with the best primary metric performance also triggered a safety threshold. The three guardrails that can block a winner are:
- Unsubscribe rate: the variant caused a meaningful increase in unsubscribes vs the control
- Complaint rate: the variant caused a meaningful increase in spam complaints vs the control
- Bounce rate: the variant caused a meaningful increase in bounces vs the control (which can damage sender reputation)
Consider:
- Reviewing the variant's content for overly aggressive messaging
- Looking at which audience segments are unsubscribing or complaining
- Whether the increase is acceptable given the conversion gains (you can acknowledge the guardrail and proceed if you've investigated)
- For bounce rate violations, check whether the variant contains content that might trigger spam filters
The report shows an SRM warning. What do I do?
An SRM (Sample Ratio Mismatch) means the traffic split doesn't match what was configured. Steps to investigate:
- Check for partial failures in the CRM profile write step (look for error logs during the writeback)
- Check whether audience members were suppressed or unsubscribed between assignment and send
- Verify that the template renders correctly for all variants (a broken conditional could funnel everyone to a default)
- Check for platform-side filtering (spam filters catching one variant's content)
Until the root cause is identified, the statistical results for this slot should not be trusted.
Can I re-run a test?
How do I export my data?
Start compounding revenue from the emails you already send
14-day free trial on the Growth tier. No credit card required.