Glossary • Definitions • Standards

Glossary: Synthetic Market Research

Plain-English definitions for common terms in the field..

Draft v0.1

Core concepts

Synthetic market research
Running research workflows (concept tests, message tests, surveys, scenario analysis) using simulated participants or panels that are designed to represent a target population. Done well, it accelerates hypothesis testing; done poorly, it can produce confident-sounding but ungrounded output.
Traditional market research
Research that measures responses from real humans (surveys, interviews, experiments, panels). Often slower and more expensive, but provides direct empirical evidence.
Synthetic data
Artificially generated data that aims to preserve the properties that matter in a real dataset (distributions, correlations, conditional relationships), without containing rows that correspond to real individuals.
Simulation vs measurement
Simulation creates plausible outcomes under assumptions; measurement observes what actually happened. Synthetic market research is a simulation method and should be interpreted and validated accordingly.
Decision-grade vs exploratory
Exploratory work is hypothesis-generating and directional. Decision-grade work is backed by stability checks, benchmarks, disclosure, and (when needed) targeted human validation.

Panels, personas, twins, and respondents

Synthetic persona
A structured, simulated profile representing a segment or archetype (demographics, constraints, motivations, behaviours) used to generate responses to stimuli or questions. It is more than a “character”; it should be engineered for consistency and evaluation.
Traditional persona
A narrative or semi-researched archetype used for marketing/product thinking. Helpful for alignment, but not designed to behave like a measurable research instrument.
Synthetic panel
A collection of synthetic personas intended to represent a target population, enabling breakdowns by segment, region, income band, or other variables.
Synthetic audience
Another term for a synthetic panel, often emphasising cohorts and group-level outputs (e.g., “Gen Z UK shoppers” or “B2B IT buyers”).
Synthetic respondent
A simulated participant that answers survey-style questions. Useful for rapid iteration, but easy to misuse if outputs are treated as equivalent to a fresh human sample without validation.
Synthetic user
A simulated participant used in UX/product research tasks (e.g., critiquing a flow, interpreting copy). Often best for hypothesis generation and identifying edge cases, not as proof of usability.
Digital twin
A model intended to represent a specific real-world entity. In consumer contexts, “digital twin” is sometimes used for recreating an individual based on data traces; this can raise ethical concerns and may not reflect the person’s real behaviour beyond their digital footprint.
SPL (Synthetic Persona Levels) ladder
A practical scale describing “how real” a synthetic persona system is: from prompt-only personas to systems with memory, context feeds, internal state, and (at the highest levels) social interaction and multi-agent dynamics.

Grounding, calibration, and representativeness

Grounding
Anchoring a model to external constraints or facts so it doesn’t drift into wishful invention. Grounding can include structured data, rules, price points, documented claims, or other controlled inputs.
Calibration
Adjusting a synthetic system so outputs match known benchmarks (e.g., census distributions, known category behaviours, historical rates).
Population frame
A precise statement of who the results represent (geography, time, age range, segment definitions, exclusions). Without a population frame, “representative” claims are meaningless.
Representativeness
The degree to which a sample (synthetic or human) matches the population of interest. Synthetic systems can amplify bias or reduce it, depending on what they are grounded and calibrated against.
Bias
Systematic error that skews results. Bias can originate from source data, modelling choices, prompts, or evaluation methods.
Weighting & quotas
Techniques used to shape a panel so it matches a target population. Quotas control composition up front; weighting adjusts results afterward.
Priors
The assumptions and background information a synthetic system starts with (e.g., known distributions, known behaviours, constraints). Strong priors can improve realism; wrong priors can “lock in” errors.

Validation, reliability, and benchmarking

Validation
Checking whether synthetic outputs track reality for a specific use case. This can include parallel human studies, back-testing against known outcomes, external benchmarks, and sensitivity tests.
Benchmark
A reference point used to judge accuracy or realism (e.g., known statistics, historical outcomes, a small real sample, or well-established survey results).
Test–retest stability
Whether repeating the same study conditions yields similar results. Low stability is a red flag for decision-grade usage.
Sensitivity analysis
Testing how much outputs change when you vary inputs slightly (prompt framing, context, ordering, model parameters). Helps reveal brittle conclusions.
Back-testing
Validating a system by asking it to “predict” outcomes that are already known (historical launches, past sentiment shifts), then comparing outputs to reality.
Reliability vs accuracy
Reliability is consistency across runs; accuracy is closeness to truth. A system can be reliably wrong, so both matter.
Disclosure label / “nutrition label”
A standardised summary of how a study was produced (population frame, panel design, grounding inputs, validation checks, limitations, reproducibility notes). Improves comparability and reduces hype-driven misuse.

Study types and outputs

Concept testing
Evaluating multiple product/service concepts to identify which resonates, what confuses, and what objections arise. Synthetic studies often excel at fast iteration and ranking.
Message testing
Comparing claims, taglines, value propositions, or creative directions for comprehension, credibility, differentiation, and tone.
Pricing exploration
Exploring price thresholds and “why” behind price sensitivity. Typically best treated as directional unless benchmarked against real signals.
Segmentation exploration
Using synthetic panels to propose segment hypotheses, motivations, and language. Fieldwork is often needed to confirm segment sizes and incidence.
Scenario testing
“What if” analysis: price changes, competitor moves, supply shocks, regulatory shifts. Synthetic systems are well suited to running many structured variations quickly.
Quant-style output vs qual-style output
Quant-style outputs look like distributions, rankings, and segment splits. Qual-style outputs are explanations, objections, narratives, and language suggestions. Many synthetic workflows produce both.

Operational risks and governance

Drift
Unintended change in behaviour over time or across runs (e.g., personas “evolving” in inconsistent ways). Drift must be monitored and bounded.
Leakage / knowledge boundary failures
When a synthetic respondent “knows” things it shouldn’t (e.g., future facts, private information, implausible expertise). This undermines credibility and can mislead decisions.
Auditability
The ability to inspect why outputs happened (inputs used, what mattered, which assumptions were activated). Higher-fidelity persona systems aim to make mediators visible and testable.
Reproducibility
Whether another team can run the same method and obtain comparable results. Requires versioning, protocol discipline, and transparent reporting.
Human-in-the-loop
Research processes where humans design protocols, review outputs, validate with benchmarks, and decide when to escalate to real fieldwork.
Hybrid studies
Studies that use synthetic work to iterate quickly, then validate key points with a smaller human sample (synthetic-first, human-confirm).