Grounded in peer-reviewed research
How we validate
Synthetic survey research is a young field. The academic literature since 2023 shows both genuine promise and real limitations. We take both seriously.
Match rate from Maier et al. (2025), tested across 57 real product surveys with 9,300 human respondents. Replication rate from Park et al. (2024), 1,052 AI agents tested against real survey data.
What the evidence shows
The research is clear: synthetic respondents are good at capturing the big picture — which product people prefer, which message resonates more, which concept ranks highest. They're less reliable for fine-grained demographic breakdowns or predicting individual behavior.
We build vcrowd around these findings. We use the methods with the strongest published results, and we're upfront about where the technology has limits.
The chart shows what researchers have found: where synthetic and real human respondents agree closely, and where gaps remain.
Argyle et al. 2023, Brand et al. 2025 — willingness-to-pay estimates within $0.13 of real consumers
Maier et al. 2025 — 85%+ distribution match across 57 product surveys
Li et al. 2024 — 75–85% agreement; weaker for niche or regional brands
Bisbee et al. 2024 — about half of demographic patterns diverge from real data
Synthetic opinions are ~3x narrower than real human opinions
How we measure quality
We test synthetic responses the same way researchers do. No single test tells the whole story — so we use five.
Distribution Match
Do synthetic answer patterns look like the real ones? We compare full response curves, not just averages.
Question Consistency
When humans link two topics, do synthetic respondents link them too? We check these relationships hold.
Direction Check
Does the synthetic crowd pick the same top answer as real people? We measure how often they agree.
Bias Scan
We check for political lean, over-polite answers, and whether the spread of opinions is too narrow.
Demographic Fit
Do age, income, and gender groups respond differently in the same way real groups do?
What we're honest about
Opinions cluster to the middle
Synthetic responses tend to avoid extremes. Strong supporters and strong critics are underrepresented.
Western & liberal skew
AI models reflect their training data, which over-represents Western, English-speaking, and liberal perspectives.
Too polite on sensitive topics
On controversial questions, synthetic respondents give more socially acceptable answers than real people do.
Groups, not individuals
Synthetic data captures crowd-level patterns well, but can't reliably predict what any single person would say.
How we address them
These aren't problems we can wave away with marketing language. They're active areas of research, documented across dozens of peer-reviewed papers. We address them through methodology, not hand-waving.
- We ask AI to explain its reasoning in words first, then convert to ratings — the method with the best published accuracy (Maier et al.)
- Every report includes a Data Confidence Score so you can assess the strength of the signal
- We monitor for known bias patterns and flag responses where synthetic data is least reliable
- We recommend synthetic research for early-stage screening and directional signal — not as a replacement for human panels in high-stakes decisions
The research foundation
Our approach draws on validation work from research groups at Harvard, Stanford, Columbia, PyMC Labs, Anthropic, and others. The field is moving fast — new benchmarks and methods are published regularly, and we update our pipeline accordingly.
Based on OpinionQA and SubPOP, the two largest public survey benchmarks. Full reference list →
Try it and compare
Run a study on a topic where you already have human survey data. See how the results compare. That's the best validation.
No credit card required