Minimum requirements (baseline)
If a study cannot meet these, it should be labelled as exploratory and hypothesis-generating only.
1) Population framing
Define who the results represent
Specify the population (country/region, age range, segment definition, exclusions) and what the panel is intended to model.
2) Method disclosure
Explain how results are produced
Provide a transparent description of panel construction, prompting approach, controls, and output aggregation.
3) Validation
Report basic reliability checks
Include test-retest stability and at least one external benchmark or known-truth check where feasible.
4) Limitations
State what the study cannot conclude
Be explicit about uncertainty, out-of-scope topics, and conditions under which results may change.
5) Reproducibility
Enable a comparable re-run
Provide enough detail (workflow, seeds if applicable, versions, parameters) that another team can reproduce the method.
6) Responsible use
Prevent misuse and overclaiming
Do not represent simulated outputs as measured reality; include safeguards for sensitive or high-stakes uses.
Labelling rule (simple)
If you cannot disclose population frame, panel construction, and validation checks, label the output as
Exploratory and treat it as hypothesis generation only.
Disclosure standard
What should be disclosed in every synthetic research study.
A) Population & scope
- Target population definition (who results represent)
- Geography, language, and time frame
- Segment definitions and exclusions
- Intended use (exploratory vs decision-grade)
B) Panel design
- Panel size and composition (segments/quotas)
- Weighting approach (if any)
- Stability assumptions (what stays fixed over time)
- Refresh cadence (if personas change over time)
C) Model & grounding
- Model family and versioning (at least at a high level)
- Grounding inputs (public statistics, curated corpora, first-party data, etc.)
- Guardrails and constraints used to reduce drift and bias
- Known limitations of the approach
D) Workflow & aggregation
- Research method (survey, interview, experiment, simulation)
- Prompting protocol and controls (method-level description)
- How responses are scored, aggregated, and interpreted
- How many runs and how variance is handled
E) Validation & reliability
- Test-retest stability (repeatability over time/runs)
- Sensitivity tests (prompt/context variation)
- Known-truth tasks (where feasible)
- External benchmarks (fieldwork, published stats, historical outcomes)
F) Limitations & uncertainty
- What the method can and cannot infer
- Where results are directional vs quantitative
- Uncertainty and confidence qualifiers
- Known failure modes and mitigation steps
G) Governance & ethics
- Privacy posture and data provenance statement
- Bias assessment and mitigation approach
- Intended use policy and misuse safeguards
- Conflicts of interest disclosures
H) Reproducibility
- Workflow versioning and parameter disclosure
- Run conditions (e.g., sampling settings, seeds if applicable)
- Audit logs / traceability (method-level)
- How to replicate a comparable run
Study disclosure label (v0.1)
A compact “nutrition label” for making studies comparable.
Study type
Concept test / message test / scenario simulation
Intended use
Exploratory / decision-support
Population frame
Who this represents (and who it does not)
Geography & language
Country/region, language, time period
Panel design
Panel size, segments, quotas, weighting
Grounding inputs
Stats, corpora, first-party data (if any)
Model & constraints
Model family/version + guardrails (high level)
Workflow
Prompt protocol + aggregation method
Validation
Test-retest + benchmark(s) + sensitivity checks
Limitations
Known failure modes + uncertainty
Privacy posture
Provenance, safeguards, retention policy
Reproducibility
Versioning + parameters + audit logs
Tip: Use consistent wording for
Population frame and Intended use.
That makes it far easier to compare studies.
Validation expectations
Synthetic research should be evaluated like a measurement system. Prefer repeatable protocols over impressive anecdotes.
Test-retest stability
Does the same method yield similar results?
Re-run the same study conditions and report variance. If results swing wildly, treat outputs as exploratory.
Sensitivity to context
Do small prompt changes create big swings?
Compare outcomes across controlled prompt variants and document how robust conclusions are.
External benchmarks
Can you tie results to known reference points?
Use fieldwork, public statistics, or historical outcomes as checks, even if only for partial tasks.
Decision-grade rule of thumb
If a study will influence spend, pricing, positioning, or major product decisions, require at least:
stability reporting + one external benchmark + explicit limitations and disclosure.
Privacy & data provenance
Synthetic research should make privacy posture explicit - not implied.
What to disclose
- Data provenance: what sources were used (public stats, first-party, third-party, etc.)
- Whether any personal data about identifiable individuals is used
- Retention and deletion policy
- Security controls and access policy
- How sensitive topics are handled
Minimum safeguards
- Do not claim outputs describe specific real individuals
- Prevent re-identification attempts and sensitive inference misuse
- Provide clear user guidance on appropriate interpretation
- Maintain audit logs for high-stakes usage
- Respect applicable laws and professional codes
Risk & misuse safeguards
The biggest operational risk is confusing simulation for measurement. Safeguards should be practical and explicit.
Overclaiming
Treating simulated results as “fact”
Require uncertainty language, disclose limitations, and tie high-stakes decisions to external benchmarks.
Prompt / operator bias
Leading setups that predetermine outcomes
Use standardised protocols, blinded comparisons where possible, and sensitivity testing.
Domain leakage
Assuming knowledge the panel should not have
Restrict context injection, log grounding sources, and run “knowledge boundary” tests.
High-stakes uses
For elections, medical decisions, credit decisions, or safety-critical contexts, synthetic research should not be used
as a standalone evidence source. If used at all, it must be paired with domain governance, external validation, and
clear policy constraints.
Code of conduct (members)
A minimal, enforceable baseline for member organisations and contributors.
Member commitments
- Honesty: Do not misrepresent simulated outputs as measured reality.
- Disclosure: Provide study-level disclosure using the association template.
- Validation: Maintain and publish baseline reliability checks for core workflows.
- Privacy: Disclose data provenance and maintain appropriate safeguards.
- Fairness: Assess and mitigate bias; avoid discriminatory use cases.
- Accountability: Provide an escalation path for concerns and corrections.
Prohibited behaviours
- Claiming the system describes specific real individuals
- Hiding data provenance or making unverifiable performance claims
- Publishing fabricated “benchmarks” or cherry-picked results
- Encouraging deception, manipulation, or targeted harassment
- Using synthetic research to justify discriminatory decisions
- Misleading marketing that implies official endorsement
Vendor evaluation checklist
Practical questions for procurement and research teams.
Population & grounding
What does your panel represent?
Ask: population frame, segment definitions, grounding sources, and what is held constant vs learned on the fly.
Validation
How do you prove reliability?
Ask for: test-retest results, sensitivity testing, external benchmarks, and failure mode documentation.
Disclosure & governance
What do you disclose, always?
Request example study labels, audit logs, privacy posture, and a clear statement of limitations.
Ten questions (starter set)
- What population does this panel represent, and how do you justify that claim?
- What grounding sources do you use (public stats, first-party, third-party), and what is the provenance?
- How do you prevent “plausible but ungrounded” outputs?
- What is your test-retest stability on a standard study?
- How sensitive are results to prompt or context changes?
- What external benchmarks have you run (and can we replicate them)?
- How do you handle bias and segment fairness checks?
- What disclosure label fields do you provide in every study?
- What are your known failure modes, and how do you communicate uncertainty?
- What auditability and reproducibility tooling do you provide?
FAQ
Today, they are a draft baseline. The association’s goal is to move from guidelines to adoption over time,
beginning with voluntary disclosure and working group review. Buyers can treat these as procurement requirements immediately.
At minimum: population frame, panel design, grounding approach, the research workflow (method-level),
validation checks performed, and limitations. If those cannot be disclosed, label the output as exploratory.
Not in general. Synthetic research is best used to iterate quickly, explore scenarios, and narrow hypotheses.
For high-stakes decisions, regulated contexts, or questions requiring direct measurement of incidence, fieldwork remains essential.
It means the method has been tested for repeatability (test-retest), robustness (sensitivity to context),
and alignment to external benchmarks or known-truth tasks, with the results documented and reproducible.
Want to help shape v0.2? Join the standards working group.
Join working group